Data models reference¶

Important payloads to inspect directly¶

Name	Kind	Use when	Key constraints / notes
`RunEstimate`	Planning model	You want task counts plus token estimate fields such as `estimated_generation_input_tokens`, `estimated_generation_output_tokens`, `estimated_judge_prompt_tokens`, `estimated_judge_output_tokens`, `estimated_total_tokens`, and `assumptions`	Informational only; pair with your own pricing model
`BenchmarkScoreRow`	Per-case score model	You want one scored row with `outcome`, `value`, `error_category`, `error_message`, and `details`	Includes additive `dataset_id` and `case_key` fields so duplicate `case_id`s across datasets stay distinguishable
`BenchmarkResult`	Aggregated benchmark model	You want combined `score_rows`, `metric_means`, `outcome_counts`, and `error_counts`	Best for reporting and comparison

Dataset-scoped case identity¶

Case-level runtime and read-model payloads now carry three related fields:

case_id: the original case identifier from the dataset
dataset_id: the dataset that contributed the case
case_key: the dataset-scoped internal identity used by execution state, resume, and projections

CaseResult, BenchmarkScoreRow, TimelineEntry, and trace records all expose dataset_id and case_key additively. New runs persist these fields directly. Older stored runs remain readable via case_key or case_id fallback, but only new runs provide full duplicate-case_id safety across datasets.

Core runtime and output models:

themis.core.models ¶

Core immutable domain models for Themis.

Case ¶

Bases: HashableModel

One dataset case evaluated by the runtime.

ConversationTrace ¶

Bases: HashableModel

Conversation trace captured during generation.

Dataset ¶

Bases: HashableModel

A collection of cases evaluated together.

GenerationResult ¶

Bases: HashableModel

The candidate artifact returned by a generator call.

Message ¶

Bases: HashableModel

One conversation message captured as an artifact.

ParsedOutput ¶

Bases: HashableModel

Normalized output produced by a parser before scoring.

ReducedCandidate ¶

Bases: HashableModel

Candidate selected or synthesized by the reduction stage.

Score ¶

Bases: HashableModel

Successful metric output.

ScoreError ¶

Bases: HashableModel

Structured score failure recorded by the runtime.

TraceStep ¶

Bases: HashableModel

One structured step in a generation or evaluation trace.

WorkflowTrace ¶

Bases: HashableModel

Trace emitted by a workflow-backed evaluation.

Prompt-oriented models:

themis.core.prompts ¶

Prompt-oriented configuration models and rendering helpers.

PromptSpec ¶

Bases: HashableModel

Generic prompt instructions and structured prompt material.

render_input ¶

render_input(prompt_input: JSONValue) -> JSONValue

Render prompt-oriented input for provider adapters.

render_sections ¶

render_sections() -> list[str]

Render prompt sections that can prefix a prompt body.

render_prompt_spec ¶

render_prompt_spec(
    prompt_spec: PromptSpec | None, body: str
) -> str

Render a complete prompt body with optional prompt-spec sections.

Run state, results, and bundle models:

themis.core.results ¶

Runtime result, work-item, and resume state models.

Data models reference¶

Important payloads to inspect directly¶

Dataset-scoped case identity¶

themis.core.models ¶

Case ¶

ConversationTrace ¶

Dataset ¶

GenerationResult ¶

Message ¶

ParsedOutput ¶

ReducedCandidate ¶

Score ¶

ScoreError ¶

TraceStep ¶

WorkflowTrace ¶

themis.core.prompts ¶

PromptSpec ¶

render_input ¶

render_sections ¶

render_prompt_spec ¶

themis.core.results ¶

CaseExecutionState ¶

CaseResult ¶

EvaluationBundle ¶

EvaluationBundleRecord ¶

ExecutionState ¶

GenerationBundle ¶

GenerationBundleRecord ¶

GenerationWorkItem ¶

ParseBundle ¶

ParseBundleRecord ¶

ProgressSnapshot ¶

ReductionBundle ¶

ReductionBundleRecord ¶

RunEstimate ¶

RunResult ¶

RunStatus ¶

ScoreBundle ¶

ScoreBundleRecord ¶

themis.core.snapshot ¶

ComponentRefs ¶

DatasetRef ¶

RunIdentity ¶

RunProvenance ¶

RunSnapshot ¶

StoredRun ¶

snapshot_from_dict ¶

themis.core.read_models ¶

BenchmarkResult ¶

BenchmarkScoreRow ¶

ConversationTraceRecord ¶

EvaluationTraceRecord ¶

GenerationTraceRecord ¶

TimelineEntry ¶

TimelineView ¶

TraceView ¶