Long-Running Workflows
Yggdrasil includes runtime helpers for workflows that must pause, survive restarts, and resume later.
These features build on top of normal graph execution. You still model the workflow as graph nodes and edges, but the runtime can persist operational state around that graph.
Use this when:
- a human approval happens later
- an external callback arrives minutes or hours afterward
- a worker process may restart between steps
- multiple workers need coarse coordination over resumptions
Core Building Blocks
The long-running workflow model has two main parts:
- Pause and resume
- Checkpoints
Pause And Resume
A workflow can pause because of:
pause_before=Truepause_after=Truewait_for_input="..."- an
ApprovalNode - deterministic routing rules that pause on match
Resume with the same ExecutionContext after updating workflow state:
ctx = await executor.run(approval.node_id, "Review this request.")
ctx.state.data["approval"] = {"approved": True, "approver": "mgr-1"}
resumed = await executor.resume(approval.node_id, ctx, query="Continue.")
When a paused workflow resumes, the trace records a resume event and execution continues with the preserved workflow state.
Checkpoints
Use checkpoints when the process that paused the run might not be the process that resumes it.
checkpoint = await executor.checkpoint_context(ctx)
restored = await executor.load_checkpoint(checkpoint.node_id)
resumed = await executor.resume(entry.node_id, restored, query="Continue.")
Or in one step:
resumed = await executor.resume_from_checkpoint(
checkpoint.node_id,
entry.node_id,
query="Continue.",
)
Checkpoint nodes store a serialized ExecutionContext.
Trace And Debugging
Long-running workflows emit richer trace events than a simple request-response run.
Watch for:
pauseresumecheckpointapproval_task
Use:
inspect_trace(ctx)for terminal inspectionserve_trace(ctx, store=executor.store)for the browser UIexplain_run(ctx)for a structured summary
Choosing The Right Level
Use the simplest thing that fits:
- use
resume(...)if the same process keeps the context in memory - add checkpoints when resumptions may happen later or elsewhere