[SPIKE-008] Workflow State Machine (Project/Sprint Lifecycle) #8

Closed
opened 2025-12-29 03:51:01 +00:00 by cardosofelipe · 1 comment

Objective

Design state machines for project and sprint lifecycle management.

State Machines Needed

  1. Project Lifecycle: Draft → Requirements → Architecture → Planning → Active → Completed → Archived
  2. Sprint Lifecycle: Planning → Active → Review → Retrospective → Closed
  3. Issue Lifecycle: Backlog → Ready → InProgress → InReview → Testing → Done

Key Questions

  1. What triggers state transitions?
  2. What validations are needed before transitions?
  3. How do we handle rollbacks/reversals?
  4. How do we enforce workflow rules (e.g., can't start sprint without approved backlog)?
  5. How do we track state history for audit?

Research Areas

  • State machine patterns for workflow systems
  • Transition guards and side effects
  • State history tracking
  • Integration with approval system

Expected Deliverables

  • State machine definitions for all entities
  • Transition rules and guards
  • Audit logging for state changes
  • ADR documenting the approach

Acceptance Criteria

  • All entities have defined state machines
  • Invalid transitions are rejected
  • State history is tracked
  • Approvals gate critical transitions

Labels

spike, architecture, workflow

## Objective Design state machines for project and sprint lifecycle management. ## State Machines Needed 1. **Project Lifecycle**: Draft → Requirements → Architecture → Planning → Active → Completed → Archived 2. **Sprint Lifecycle**: Planning → Active → Review → Retrospective → Closed 3. **Issue Lifecycle**: Backlog → Ready → InProgress → InReview → Testing → Done ## Key Questions 1. What triggers state transitions? 2. What validations are needed before transitions? 3. How do we handle rollbacks/reversals? 4. How do we enforce workflow rules (e.g., can't start sprint without approved backlog)? 5. How do we track state history for audit? ## Research Areas - [ ] State machine patterns for workflow systems - [ ] Transition guards and side effects - [ ] State history tracking - [ ] Integration with approval system ## Expected Deliverables - State machine definitions for all entities - Transition rules and guards - Audit logging for state changes - ADR documenting the approach ## Acceptance Criteria - [ ] All entities have defined state machines - [ ] Invalid transitions are rejected - [ ] State history is tracked - [ ] Approvals gate critical transitions ## Labels `spike`, `architecture`, `workflow`
Author
Owner

SPIKE-008 Research Complete

The comprehensive spike document has been created at docs/spikes/SPIKE-008-workflow-state-machine.md.

Executive Summary

After evaluating multiple approaches (Temporal, Prefect, custom solutions), the recommendation is a hybrid architecture:

  1. transitions library for state machine logic - lightweight, Pythonic, well-tested
  2. PostgreSQL for state persistence with event sourcing for audit trail
  3. Celery for task execution (integrates with SPIKE-004)
  4. Custom workflow engine built on these primitives

Key Findings

Library Comparison:

Library Recommendation Rationale
transitions Selected Mature, flexible, hierarchical states, visualization
python-statemachine Alternative Good but less feature-rich
Temporal Not recommended Heavy infrastructure, overkill for our scale
Prefect Not recommended Designed for data pipelines, not business workflows

State Machines Defined

The spike includes complete state machine definitions for:

  1. Sprint Workflow: Planning -> Development -> Testing -> Demo -> Retrospective -> Completed
  2. Story Workflow: Backlog -> Analysis -> Design -> Implementation -> Review -> Testing -> Done
  3. PR Workflow: Created -> Review -> Changes Requested -> Approved -> Merged
  4. Agent Task Workflow: Assigned -> In Progress -> Blocked -> Completed

Persistence Strategy

  • Event Sourcing: All transitions recorded in workflow_transitions table
  • State Checkpointing: For long-running workflows (hours/days)
  • SLA Monitoring: Automatic detection of stalled workflows

Additional Patterns Covered

  • Retry with exponential backoff
  • Saga pattern for compensation
  • Celery integration for task execution
  • Visualization with Graphviz/Mermaid.js

Implementation Roadmap

  • Phase 1 (Week 1): Foundation - models, migrations, basic engine
  • Phase 2 (Week 2): Core workflows - Story, Sprint, PR
  • Phase 3 (Week 3): Durability - retry, compensation, checkpoints
  • Phase 4 (Week 4): Visualization - diagrams, monitoring dashboard

Code Examples

The document includes complete code examples for:

  • Database models (WorkflowInstance, WorkflowTransition)
  • Workflow classes with transitions library
  • WorkflowEngine for durable execution
  • Celery task integration
  • Visualization endpoints
  • Frontend Mermaid.js component

Decision

Adopt transitions library + PostgreSQL persistence - balances simplicity with durability while avoiding the operational complexity of dedicated workflow engines like Temporal.


Full documentation: docs/spikes/SPIKE-008-workflow-state-machine.md

This spike is ready for review and will inform ADR-008: Workflow State Machine Architecture.

## SPIKE-008 Research Complete The comprehensive spike document has been created at `docs/spikes/SPIKE-008-workflow-state-machine.md`. ### Executive Summary After evaluating multiple approaches (Temporal, Prefect, custom solutions), the recommendation is a **hybrid architecture**: 1. **`transitions` library** for state machine logic - lightweight, Pythonic, well-tested 2. **PostgreSQL** for state persistence with event sourcing for audit trail 3. **Celery** for task execution (integrates with SPIKE-004) 4. **Custom workflow engine** built on these primitives ### Key Findings **Library Comparison:** | Library | Recommendation | Rationale | |---------|---------------|-----------| | transitions | **Selected** | Mature, flexible, hierarchical states, visualization | | python-statemachine | Alternative | Good but less feature-rich | | Temporal | Not recommended | Heavy infrastructure, overkill for our scale | | Prefect | Not recommended | Designed for data pipelines, not business workflows | ### State Machines Defined The spike includes complete state machine definitions for: 1. **Sprint Workflow**: Planning -> Development -> Testing -> Demo -> Retrospective -> Completed 2. **Story Workflow**: Backlog -> Analysis -> Design -> Implementation -> Review -> Testing -> Done 3. **PR Workflow**: Created -> Review -> Changes Requested -> Approved -> Merged 4. **Agent Task Workflow**: Assigned -> In Progress -> Blocked -> Completed ### Persistence Strategy - **Event Sourcing**: All transitions recorded in `workflow_transitions` table - **State Checkpointing**: For long-running workflows (hours/days) - **SLA Monitoring**: Automatic detection of stalled workflows ### Additional Patterns Covered - Retry with exponential backoff - Saga pattern for compensation - Celery integration for task execution - Visualization with Graphviz/Mermaid.js ### Implementation Roadmap - **Phase 1** (Week 1): Foundation - models, migrations, basic engine - **Phase 2** (Week 2): Core workflows - Story, Sprint, PR - **Phase 3** (Week 3): Durability - retry, compensation, checkpoints - **Phase 4** (Week 4): Visualization - diagrams, monitoring dashboard ### Code Examples The document includes complete code examples for: - Database models (`WorkflowInstance`, `WorkflowTransition`) - Workflow classes with transitions library - `WorkflowEngine` for durable execution - Celery task integration - Visualization endpoints - Frontend Mermaid.js component ### Decision **Adopt `transitions` library + PostgreSQL persistence** - balances simplicity with durability while avoiding the operational complexity of dedicated workflow engines like Temporal. --- Full documentation: [`docs/spikes/SPIKE-008-workflow-state-machine.md`](../docs/spikes/SPIKE-008-workflow-state-machine.md) This spike is ready for review and will inform **ADR-008: Workflow State Machine Architecture**.
Sign in to join this conversation.