Reproducibility and replay¶
What it is: the model for reproducing a run from stored artifacts and replaying downstream stages without regenerating candidates.
When it matters: whenever generation should remain fixed but evaluation needs to move or be rerun.
What you provide: stored upstream artifacts and, for memory-backed runs, the original store instance.
What Themis provides: generation/evaluation bundles plus Experiment.replay() and Experiment.rejudge().
Use this flow when evaluation must move forward while generation stays frozen.
flowchart LR
A["Original run"] --> B["Stored generation artifacts"]
B --> C["Export or reopen store"]
C --> D["Import artifacts or reuse store"]
D --> E["Experiment.replay(...)"]
E --> F["New evaluation executions"]
Replay works because the upstream generation evidence stays fixed, so only the requested downstream stages are rerun. rejudge() is the convenience form for replay(stage="judge").
What to inspect when it goes wrong: verify snapshot identity first, then confirm stored upstream artifacts exist, then inspect the rerun evaluation executions.