Resume and inspect runs¶
Goal: continue interrupted work and inspect stored snapshots, execution state, and evaluation artifacts.
When to use this:
Use this guide when a run already exists and you want to inspect or continue it rather than starting from scratch.
Procedure¶
Use this flow when you need to reopen first and decide later whether any new execution is required.
flowchart TD
A["Persistent store"] --> B["Reopen compiled run_id"]
B --> C["Inspect execution state"]
C --> D{"Work still pending?"}
D -->|Yes| E["Resume or continue run"]
D -->|No| F["Report or inspect artifacts"]
The safe order is reopen, inspect, and only then decide whether to continue execution.
- Use a persistent store, typically SQLite.
- Reopen the run by the same compiled
run_id. - Inspect execution state before rerunning anything.
- Decide whether you want to continue the same run, stop at a stage boundary, or replay only a downstream stage.
- Use the CLI or Python helpers to examine progress and failures.
Stage-limited execution:
Experiment.run(..., until_stage="generate"|"reduce"|"parse"|"score"|"judge")themis run --config ... --until-stage generate|reduce|parse|score|judge- stored runs record
completed_through_stage, so a generation-only run is considered complete for that stage instead of looking like an interrupted failure
Existing-run behavior:
RuntimeConfig(existing_run_policy="auto"): completed runs are reused and incomplete runs resumeRuntimeConfig(existing_run_policy="error"): fail fast if the compiledrun_idalready existsRuntimeConfig(existing_run_policy="rerun"): clear the stored run and execute it again
Portable stage artifacts:
- generation:
export_generation_bundle(...)/import_generation_bundle(...) - reduction:
export_reduction_bundle(...)/import_reduction_bundle(...) - parse:
export_parse_bundle(...)/import_parse_bundle(...) - score:
export_score_bundle(...)/import_score_bundle(...) - evaluation workflow executions:
export_evaluation_bundle(...)/import_evaluation_bundle(...)
Imported artifacts are persisted through normal events, so resume, report, cache reuse, and replay all see the same stored state.
Variants¶
| Variant | Best when | Tradeoff | Related APIs / commands |
|---|---|---|---|
| Quick state summary | You need a fast operational check before digging deeper | Less detail than snapshot, state, or report views | themis quickcheck |
| Stored snapshot inspection | You want identity and provenance details for the run | Does not show per-stage execution progress by itself | get_run_snapshot(...), themis inspect snapshot |
| Explicit persisted state inspection | You need stage completion, counts, and failure state | Lower-level than a report | get_execution_state(...), themis inspect state |
| Workflow execution inspection | You need judge prompts, responses, or workflow artifacts for one case | Only applies to workflow-backed metrics | get_evaluation_execution(...), themis inspect evaluation |
| Downstream-only recompute | Upstream artifacts are good and only later stages should rerun | Requires stored artifacts and careful stage choice | Experiment.replay(stage="reduce"|"parse"|"score"|"judge") |
| Report generation from the stored run | You want shareable output after inspection | Requires a persistent run state to report from | Reporter, themis report |
Expected result¶
You should know whether the run can be resumed, what already completed, and where failures occurred.