Skip to content

Data models reference

Important payloads to inspect directly

Name Kind Use when Key constraints / notes
RunEstimate Planning model You want task counts plus token estimate fields such as estimated_generation_input_tokens, estimated_generation_output_tokens, estimated_judge_prompt_tokens, estimated_judge_output_tokens, estimated_total_tokens, and assumptions Informational only; pair with your own pricing model
BenchmarkScoreRow Per-case score model You want one scored row with outcome, value, error_category, error_message, and details Includes additive dataset_id and case_key fields so duplicate case_ids across datasets stay distinguishable
BenchmarkResult Aggregated benchmark model You want combined score_rows, metric_means, outcome_counts, and error_counts Best for reporting and comparison

Dataset-scoped case identity

Case-level runtime and read-model payloads now carry three related fields:

  • case_id: the original case identifier from the dataset
  • dataset_id: the dataset that contributed the case
  • case_key: the dataset-scoped internal identity used by execution state, resume, and projections

CaseResult, BenchmarkScoreRow, TimelineEntry, and trace records all expose dataset_id and case_key additively. New runs persist these fields directly. Older stored runs remain readable via case_key or case_id fallback, but only new runs provide full duplicate-case_id safety across datasets.

Core runtime and output models:

themis.core.models

Core immutable domain models for Themis.

Case

Bases: HashableModel

One dataset case evaluated by the runtime.

ConversationTrace

Bases: HashableModel

Conversation trace captured during generation.

Dataset

Bases: HashableModel

A collection of cases evaluated together.

GenerationResult

Bases: HashableModel

The candidate artifact returned by a generator call.

Message

Bases: HashableModel

One conversation message captured as an artifact.

ParsedOutput

Bases: HashableModel

Normalized output produced by a parser before scoring.

ReducedCandidate

Bases: HashableModel

Candidate selected or synthesized by the reduction stage.

Score

Bases: HashableModel

Successful metric output.

ScoreError

Bases: HashableModel

Structured score failure recorded by the runtime.

TraceStep

Bases: HashableModel

One structured step in a generation or evaluation trace.

WorkflowTrace

Bases: HashableModel

Trace emitted by a workflow-backed evaluation.

Prompt-oriented models:

themis.core.prompts

Prompt-oriented configuration models and rendering helpers.

PromptSpec

Bases: HashableModel

Generic prompt instructions and structured prompt material.

render_input

render_input(prompt_input: JSONValue) -> JSONValue

Render prompt-oriented input for provider adapters.

render_sections

render_sections() -> list[str]

Render prompt sections that can prefix a prompt body.

render_prompt_spec

render_prompt_spec(
    prompt_spec: PromptSpec | None, body: str
) -> str

Render a complete prompt body with optional prompt-spec sections.

Run state, results, and bundle models:

themis.core.results

Runtime result, work-item, and resume state models.

CaseExecutionState

Bases: FrozenModel

Persisted per-case execution state derived from stored events.

CaseResult

Bases: FrozenModel

Final case-level result returned from a run.

EvaluationBundle

Bases: FrozenModel

Portable bundle of evaluation artifacts for a run.

EvaluationBundleRecord

Bases: FrozenModel

One portable evaluation execution record.

ExecutionState

Bases: FrozenModel

Persisted run state rebuilt from the run event stream.

GenerationBundle

Bases: FrozenModel

Portable bundle of generation artifacts for a run.

GenerationBundleRecord

Bases: FrozenModel

One portable generation artifact record.

GenerationWorkItem

Bases: FrozenModel

Planner output for one generation task.

ParseBundle

Bases: FrozenModel

Portable bundle of parse artifacts for a run.

ParseBundleRecord

Bases: FrozenModel

One portable parse artifact record.

ProgressSnapshot

Bases: FrozenModel

Aggregate case progress for a run.

ReductionBundle

Bases: FrozenModel

Portable bundle of reduction artifacts for a run.

ReductionBundleRecord

Bases: FrozenModel

One portable reduction artifact record.

RunEstimate

Bases: FrozenModel

Planner estimate for the work implied by a compiled run.

RunResult

Bases: FrozenModel

Final run-level result returned from execution.

RunStatus

Bases: StrEnum

User-facing run status values.

ScoreBundle

Bases: FrozenModel

Portable bundle of score artifacts for a run.

ScoreBundleRecord

Bases: FrozenModel

One portable score artifact record.

Snapshot and identity models:

themis.core.snapshot

Run snapshot models for Themis.

ComponentRefs

Bases: FrozenModel

Resolved component refs stored with the snapshot.

DatasetRef

Bases: HashableModel

Identity-bearing reference to one dataset.

RunIdentity

Bases: HashableModel

Inputs that determine the logical identity and run_id of a run.

RunProvenance

Bases: FrozenModel

Environment metadata recorded with a run but excluded from run_id.

RunSnapshot

Bases: FrozenModel

Immutable executable artifact produced by Experiment.compile().

StoredRun

Bases: FrozenModel

Snapshot plus stored events loaded back from a run store.

snapshot_from_dict

snapshot_from_dict(payload: dict[str, Any]) -> RunSnapshot

Load a stored snapshot payload and ignore any cached run_id field.

Projection/read-model types:

themis.core.read_models

Projection-backed read models for the Phase 4 read side.

BenchmarkResult

Bases: FrozenModel

Aggregate benchmark-style projection for a run.

BenchmarkScoreRow

Bases: FrozenModel

One score row in the benchmark projection.

ConversationTraceRecord

Bases: FrozenModel

One conversation trace record.

EvaluationTraceRecord

Bases: FrozenModel

One evaluation trace record.

GenerationTraceRecord

Bases: FrozenModel

One generation trace record.

TimelineEntry

Bases: FrozenModel

One chronological event entry in the timeline projection.

TimelineView

Bases: FrozenModel

Timeline projection for a run.

TraceView

Bases: FrozenModel

Trace-oriented projection for a run.