Generation vs evaluation¶
What it is: the split between candidate production and the runtime-owned scoring/evaluation pipeline.
When it matters: whenever you are deciding whether logic belongs in a generator, parser, reducer, or metric.
What you provide: a generator that produces one candidate per call plus any custom metrics or workflows you need.
What Themis provides: candidate fan-out, reduction, parsing, workflow execution, judge orchestration, persistence, and inspection.
This ownership diagram shows where user-supplied generation stops and runtime-managed evaluation begins.
flowchart LR
A["Generator"] --> B["Candidates"]
B --> C["Fan-out policy"]
C --> D["Reducer"]
D --> E["Parser"]
E --> F["Metric / evaluation workflow"]
F --> G["Persistence and inspection"]
Generation owns candidate creation, while Themis owns the staged pipeline that turns candidates into inspected results.
What to inspect when it goes wrong: generation artifacts when the raw candidate is wrong, evaluation executions when scoring or judgment is wrong.