Data models reference
Important payloads to inspect directly
| Name |
Kind |
Use when |
Key constraints / notes |
RunEstimate |
Planning model |
You want task counts plus token estimate fields such as estimated_generation_input_tokens, estimated_generation_output_tokens, estimated_judge_prompt_tokens, estimated_judge_output_tokens, estimated_total_tokens, and assumptions |
Informational only; pair with your own pricing model |
BenchmarkScoreRow |
Per-case score model |
You want one scored row with outcome, value, error_category, error_message, and details |
Includes additive dataset_id and case_key fields so duplicate case_ids across datasets stay distinguishable |
BenchmarkResult |
Aggregated benchmark model |
You want combined score_rows, metric_means, outcome_counts, and error_counts |
Best for reporting and comparison |
Dataset-scoped case identity
Case-level runtime and read-model payloads now carry three related fields:
case_id: the original case identifier from the dataset
dataset_id: the dataset that contributed the case
case_key: the dataset-scoped internal identity used by execution state, resume, and projections
CaseResult, BenchmarkScoreRow, TimelineEntry, and trace records all expose dataset_id and case_key additively. New runs persist these fields directly. Older stored runs remain readable via case_key or case_id fallback, but only new runs provide full duplicate-case_id safety across datasets.
Core runtime and output models:
themis.core.models
Core immutable domain models for Themis.
Case
Bases: HashableModel
One dataset case evaluated by the runtime.
ConversationTrace
Bases: HashableModel
Conversation trace captured during generation.
Dataset
Bases: HashableModel
A collection of cases evaluated together.
GenerationResult
Bases: HashableModel
The candidate artifact returned by a generator call.
Message
Bases: HashableModel
One conversation message captured as an artifact.
ParsedOutput
Bases: HashableModel
Normalized output produced by a parser before scoring.
ReducedCandidate
Bases: HashableModel
Candidate selected or synthesized by the reduction stage.
ScoreError
Bases: HashableModel
Structured score failure recorded by the runtime.
TraceStep
Bases: HashableModel
One structured step in a generation or evaluation trace.
WorkflowTrace
Bases: HashableModel
Trace emitted by a workflow-backed evaluation.
Prompt-oriented models:
themis.core.prompts
Prompt-oriented configuration models and rendering helpers.
PromptSpec
Bases: HashableModel
Generic prompt instructions and structured prompt material.
render_input(prompt_input: JSONValue) -> JSONValue
Render prompt-oriented input for provider adapters.
render_sections
render_sections() -> list[str]
Render prompt sections that can prefix a prompt body.
render_prompt_spec
render_prompt_spec(
prompt_spec: PromptSpec | None, body: str
) -> str
Render a complete prompt body with optional prompt-spec sections.
Run state, results, and bundle models:
themis.core.results
Runtime result, work-item, and resume state models.
CaseExecutionState
Bases: FrozenModel
Persisted per-case execution state derived from stored events.
CaseResult
Bases: FrozenModel
Final case-level result returned from a run.
EvaluationBundle
Bases: FrozenModel
Portable bundle of evaluation artifacts for a run.
EvaluationBundleRecord
Bases: FrozenModel
One portable evaluation execution record.
ExecutionState
Bases: FrozenModel
Persisted run state rebuilt from the run event stream.
GenerationBundle
Bases: FrozenModel
Portable bundle of generation artifacts for a run.
GenerationBundleRecord
Bases: FrozenModel
One portable generation artifact record.
GenerationWorkItem
Bases: FrozenModel
Planner output for one generation task.
ParseBundle
Bases: FrozenModel
Portable bundle of parse artifacts for a run.
ParseBundleRecord
Bases: FrozenModel
One portable parse artifact record.
ProgressSnapshot
Bases: FrozenModel
Aggregate case progress for a run.
ReductionBundle
Bases: FrozenModel
Portable bundle of reduction artifacts for a run.
ReductionBundleRecord
Bases: FrozenModel
One portable reduction artifact record.
RunEstimate
Bases: FrozenModel
Planner estimate for the work implied by a compiled run.
RunResult
Bases: FrozenModel
Final run-level result returned from execution.
RunStatus
Bases: StrEnum
User-facing run status values.
ScoreBundle
Bases: FrozenModel
Portable bundle of score artifacts for a run.
ScoreBundleRecord
Bases: FrozenModel
One portable score artifact record.
Snapshot and identity models:
themis.core.snapshot
Run snapshot models for Themis.
ComponentRefs
Bases: FrozenModel
Resolved component refs stored with the snapshot.
DatasetRef
Bases: HashableModel
Identity-bearing reference to one dataset.
RunIdentity
Bases: HashableModel
Inputs that determine the logical identity and run_id of a run.
RunProvenance
Bases: FrozenModel
Environment metadata recorded with a run but excluded from run_id.
RunSnapshot
Bases: FrozenModel
Immutable executable artifact produced by Experiment.compile().
StoredRun
Bases: FrozenModel
Snapshot plus stored events loaded back from a run store.
snapshot_from_dict
snapshot_from_dict(payload: dict[str, Any]) -> RunSnapshot
Load a stored snapshot payload and ignore any cached run_id field.
Projection/read-model types:
themis.core.read_models
Projection-backed read models for the Phase 4 read side.
BenchmarkResult
Bases: FrozenModel
Aggregate benchmark-style projection for a run.
BenchmarkScoreRow
Bases: FrozenModel
One score row in the benchmark projection.
ConversationTraceRecord
Bases: FrozenModel
One conversation trace record.
EvaluationTraceRecord
Bases: FrozenModel
One evaluation trace record.
GenerationTraceRecord
Bases: FrozenModel
One generation trace record.
TimelineEntry
Bases: FrozenModel
One chronological event entry in the timeline projection.
TimelineView
Bases: FrozenModel
Timeline projection for a run.
TraceView
Bases: FrozenModel
Trace-oriented projection for a run.