Skip to content

Python API reference

This page is the generated entry point into the public Python API. Use the smaller reference pages in this section when you already know the category of symbol you need.

Root exports

Name Kind Use when Key constraints / notes
__version__ Constant You want the installed package version Useful for docs, debugging, and release checks
Experiment Core class You want the main reusable experiment authoring surface Use for config-backed or Python-authored experiments
InMemoryRunStore Store implementation You want ephemeral local storage No cross-process persistence
PromptSpec Prompt model You want prompt instructions, prefixes, suffixes, or prompt blocks as part of experiment identity Shared across generation and builtin judge workflows
Reporter Reporting API You want exports such as JSON, Markdown, CSV, or LaTeX Works from stored projections
RunEstimate Data model You want planned task counts and token estimates Informational only; not pricing
RunResult Data model You want the top-level execution result returned by a run Includes status and benchmark output
RunSnapshot Data model You want the compiled identity and provenance artifact Produced by compile()
RunStatus Enum-like status model You want run lifecycle state values Useful in automation and inspection
RunStore Storage protocol You are typing against or implementing custom stores Abstract interface rather than a concrete backend
RuntimeConfig Config model You want runtime tuning without changing logical identity Covers concurrency, retries, and deferred execution paths
SqliteRunStore Store implementation You want the default persistent local store Good default for real runs
StatsEngine Analysis helper You want statistical comparison utilities Used in comparison and reporting flows
evaluate Convenience function You want the shortest synchronous Python path to a run Best for simple scripts; call only when no event loop is already running
evaluate_async Convenience function You want the shortest async Python path to a run Use in notebooks, async apps, and any environment with a running event loop
export_evaluation_bundle Artifact helper You want portable evaluation workflow artifacts Best for judge-backed replay or handoff
export_generation_bundle Artifact helper You want portable generation artifacts Good for external evaluation pipelines
export_parse_bundle Artifact helper You want portable parsed-output artifacts Python-only today
export_reduction_bundle Artifact helper You want portable reduction-stage artifacts Python-only today
export_score_bundle Artifact helper You want portable pure-score artifacts Python-only today
get_evaluation_execution Inspection helper You want one stored workflow execution Judge-backed metrics only; pass dataset_id or case_key when duplicate case_ids exist across datasets
get_execution_state Inspection helper You want stored progress and failure details Best before resume or replay decisions
get_run_snapshot Inspection helper You want compiled identity and provenance details Read-only lookup
import_evaluation_bundle Artifact helper You want to ingest external evaluation artifacts into a store Match bundle shape to the target run
import_generation_bundle Artifact helper You want to ingest generation artifacts into a store Enables later replay without regeneration
import_parse_bundle Artifact helper You want to ingest parsed-output artifacts Python-only today
import_reduction_bundle Artifact helper You want to ingest reduction-stage artifacts Python-only today
import_score_bundle Artifact helper You want to ingest score artifacts Python-only today
quickcheck Inspection helper You want a compact run summary Smaller surface than full reporting
snapshot_report Reporting helper You want a concise Python report from stored snapshot data Lighter than Reporter
sqlite_store Store factory helper You want a quick SQLite store constructor Shortcut for the persistent local backend

Generated modules

Root package:

themis

Public package surface for Themis.

Experiment

Bases: FrozenModel

Authoring model for a Themis experiment.

An experiment owns the compile-time inputs required to build a RunSnapshot and provides sync and async helpers for running or rejudging that snapshot.

compile

compile() -> RunSnapshot

Compile the experiment into an immutable RunSnapshot.

from_config classmethod

from_config(
    path: str | Path, *, overrides: list[str] | None = None
) -> Experiment

Load an experiment definition from YAML or TOML configuration.

rejudge

rejudge(
    *,
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Re-run workflow-backed metrics synchronously.

rejudge_async async

rejudge_async(
    *,
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Re-run workflow-backed metrics from stored upstream artifacts.

replay

replay(
    *,
    stage: Literal["reduce", "parse", "score", "judge"],
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Replay persisted runs from a downstream stage synchronously.

replay_async async

replay_async(
    *,
    stage: Literal["reduce", "parse", "score", "judge"],
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Replay persisted runs from a downstream stage.

run

run(
    *,
    until_stage: Literal[
        "generate", "reduce", "parse", "score", "judge"
    ] = "judge",
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Run the compiled snapshot synchronously.

run_async async

run_async(
    *,
    until_stage: Literal[
        "generate", "reduce", "parse", "score", "judge"
    ] = "judge",
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Run the compiled snapshot asynchronously.

InMemoryRunStore

Bases: ProjectionRefreshingStore

Simple in-memory store used by tests and local development.

PromptSpec

Bases: HashableModel

Generic prompt instructions and structured prompt material.

render_input

render_input(prompt_input: JSONValue) -> JSONValue

Render prompt-oriented input for provider adapters.

render_sections

render_sections() -> list[str]

Render prompt sections that can prefix a prompt body.

Reporter

Export persisted run projections in JSON, Markdown, CSV, or LaTeX.

export_csv

export_csv(run_id: str) -> str

Export benchmark score rows as CSV.

export_json

export_json(run_id: str) -> str

Export all major persisted projections for a run as formatted JSON.

export_latex

export_latex(run_id: str) -> str

Export benchmark score rows as a compact LaTeX table.

export_markdown

export_markdown(run_id: str) -> str

Export a human-readable Markdown summary for a persisted run.

export_score_table

export_score_table(
    run_id: str,
) -> list[dict[str, JSONValue]]

Return benchmark score rows in a normalized table structure.

RunEstimate

Bases: FrozenModel

Planner estimate for the work implied by a compiled run.

RunResult

Bases: FrozenModel

Final run-level result returned from execution.

RunSnapshot

Bases: FrozenModel

Immutable executable artifact produced by Experiment.compile().

RunStatus

Bases: StrEnum

User-facing run status values.

RunStore

Bases: Protocol

Persistence contract used by Themis runtime components.

RuntimeConfig

Bases: HashableModel

Execution-time controls that do not affect snapshot identity.

SqliteRunStore

Bases: ProjectionRefreshingStore

Small SQLite-backed run store.

evaluate_async async

evaluate_async(
    *,
    model: object,
    data: Dataset
    | Sequence[Dataset]
    | Sequence[Mapping[str, Any]],
    metric: object | Sequence[object],
    parser: object | Sequence[object] | None = None,
    judge: object | Sequence[object] | None = None,
    samples: int = 1,
    reducer: object | None = None,
    storage: StorageConfig | None = None,
    runtime: RuntimeConfig | None = None,
    seeds: list[int] | None = None,
    workflow_overrides: dict[str, object] | None = None,
    judge_config: dict[str, object] | None = None,
    environment_metadata: dict[str, str] | None = None,
    themis_version: str | None = None,
    python_version: str = "3.12",
    platform: str = "unknown",
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
) -> RunResult

Compile and run a Themis experiment asynchronously through the Layer 1 API.

export_evaluation_bundle

export_evaluation_bundle(
    store: RunStore, run_id: str
) -> EvaluationBundle

Export stored evaluation artifacts into a portable bundle.

export_generation_bundle

export_generation_bundle(
    store: RunStore, run_id: str
) -> GenerationBundle

Export stored generation artifacts into a portable bundle.

export_parse_bundle

export_parse_bundle(
    store: RunStore, run_id: str
) -> ParseBundle

Export stored parse artifacts into a portable bundle.

export_reduction_bundle

export_reduction_bundle(
    store: RunStore, run_id: str
) -> ReductionBundle

Export stored reduction artifacts into a portable bundle.

export_score_bundle

export_score_bundle(
    store: RunStore, run_id: str
) -> ScoreBundle

Export stored score artifacts into a portable bundle.

get_evaluation_execution

get_evaluation_execution(
    store: RunStore,
    run_id: str,
    case_id: str,
    metric_id: str,
    *,
    dataset_id: str | None = None,
    case_key: str | None = None,
) -> EvaluationExecution | None

Return one stored workflow execution for a case and metric.

get_execution_state

get_execution_state(
    store: RunStore, run_id: str
) -> ExecutionState

Return the persisted execution state for a run.

get_run_snapshot

get_run_snapshot(
    store: RunStore, run_id: str
) -> RunSnapshot

Return the persisted snapshot for a run.

import_evaluation_bundle

import_evaluation_bundle(
    store: RunStore, bundle: EvaluationBundle
) -> None

Import evaluation artifacts from a bundle into a store.

import_generation_bundle

import_generation_bundle(
    store: RunStore, bundle: GenerationBundle
) -> None

Import generation artifacts from a bundle into a store.

import_parse_bundle

import_parse_bundle(
    store: RunStore, bundle: ParseBundle
) -> None

Import parse artifacts from a bundle into a store.

import_reduction_bundle

import_reduction_bundle(
    store: RunStore, bundle: ReductionBundle
) -> None

Import reduction artifacts from a bundle into a store.

import_score_bundle

import_score_bundle(
    store: RunStore, bundle: ScoreBundle
) -> None

Import score artifacts from a bundle into a store.

snapshot_report

snapshot_report(
    snapshot: RunSnapshot,
    run_metadata: dict[str, JSONValue] | None = None,
) -> dict[str, JSONValue]

Return a JSON-serializable summary for a compiled snapshot.

sqlite_store

sqlite_store(path: str | Path) -> SqliteRunStore

Build a SQLite-backed store.

Catalog namespace:

themis.catalog

Manifest-backed catalog entry points.

builtin_component_refs

builtin_component_refs() -> dict[str, Any]

Return component references for the builtin shipped catalog entries.

get_benchmark

get_benchmark(name: str) -> BenchmarkCatalogEntry

Return structured metadata for a shipped catalog benchmark.

list_benchmark_ids

list_benchmark_ids() -> list[str]

List canonical benchmark identifiers from the shipped catalog.

list_benchmarks

list_benchmarks() -> list[BenchmarkCatalogEntry]

Return structured metadata for shipped catalog benchmarks.

list_component_ids

list_component_ids(*, kind: str | None = None) -> list[str]

List builtin component identifiers, optionally filtered by kind.

load

load(name: str) -> object

Load a builtin component or named benchmark from the shipped catalog.

run

run(
    name: str,
    *,
    model: object | None = None,
    store: RunStore | None = None,
) -> RunResult

Execute a named benchmark through the catalog convenience layer.

validate_benchmark

validate_benchmark(name: str) -> BenchmarkValidationResult

Validate that a shipped benchmark can load, materialize, and score.

Core namespace:

themis.core

Core namespace for Themis.

AfterGenerate

Bases: Protocol

Hook invoked after a generator returns a candidate.

AfterJudge

Bases: Protocol

Hook invoked after a workflow-backed metric finishes.

AfterParse

Bases: Protocol

Hook invoked after parsing completes.

AfterReduce

Bases: Protocol

Hook invoked after reduction produces a final candidate.

AfterScore

Bases: Protocol

Hook invoked after a pure metric emits a score or error.

BeforeGenerate

Bases: Protocol

Hook invoked before a generator runs.

BeforeJudge

Bases: Protocol

Hook invoked before a workflow-backed metric begins judging.

BeforeParse

Bases: Protocol

Hook invoked before parsing a reduced candidate.

BeforeReduce

Bases: Protocol

Hook invoked before reduction starts.

BeforeScore

Bases: Protocol

Hook invoked before a pure metric runs.

BenchmarkResult

Bases: FrozenModel

Aggregate benchmark-style projection for a run.

CandidateReducer

Bases: Protocol

Protocol for reducers that collapse multiple candidates into one.

CandidateSelector

Bases: Protocol

Protocol for selectors that choose candidates before reduction.

Case

Bases: HashableModel

One dataset case evaluated by the runtime.

CaseResult

Bases: FrozenModel

Final case-level result returned from a run.

ComponentRefs

Bases: FrozenModel

Resolved component refs stored with the snapshot.

ConversationTrace

Bases: HashableModel

Conversation trace captured during generation.

Dataset

Bases: HashableModel

A collection of cases evaluated together.

DatasetRef

Bases: HashableModel

Identity-bearing reference to one dataset.

DefaultWorkflowRunner

Concurrent interpreter for Themis-owned evaluation workflows.

EvalScoreContext

Bases: ScoreContext

Score context extended with judge workflow configuration.

EvaluationBundle

Bases: FrozenModel

Portable bundle of evaluation artifacts for a run.

EvaluationBundleRecord

Bases: FrozenModel

One portable evaluation execution record.

EvaluationCompletedEvent

Bases: CaseRunEvent

Event emitted when a workflow-backed metric finishes.

EvaluationConfig

Bases: HashableModel

Evaluation-stage configuration for parsing, metrics, and judges.

EvaluationFailedEvent

Bases: CaseRunEvent

Event emitted when a workflow-backed metric fails.

EvaluationWorkflow

Bases: Protocol

Protocol for workflow-backed metrics driven by judge model calls.

ExecutionState

Bases: FrozenModel

Persisted run state rebuilt from the run event stream.

Experiment

Bases: FrozenModel

Authoring model for a Themis experiment.

An experiment owns the compile-time inputs required to build a RunSnapshot and provides sync and async helpers for running or rejudging that snapshot.

compile

compile() -> RunSnapshot

Compile the experiment into an immutable RunSnapshot.

from_config classmethod

from_config(
    path: str | Path, *, overrides: list[str] | None = None
) -> Experiment

Load an experiment definition from YAML or TOML configuration.

rejudge

rejudge(
    *,
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Re-run workflow-backed metrics synchronously.

rejudge_async async

rejudge_async(
    *,
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Re-run workflow-backed metrics from stored upstream artifacts.

replay

replay(
    *,
    stage: Literal["reduce", "parse", "score", "judge"],
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Replay persisted runs from a downstream stage synchronously.

replay_async async

replay_async(
    *,
    stage: Literal["reduce", "parse", "score", "judge"],
    metric_ids: list[str] | None = None,
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Replay persisted runs from a downstream stage.

run

run(
    *,
    until_stage: Literal[
        "generate", "reduce", "parse", "score", "judge"
    ] = "judge",
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Run the compiled snapshot synchronously.

run_async async

run_async(
    *,
    until_stage: Literal[
        "generate", "reduce", "parse", "score", "judge"
    ] = "judge",
    runtime: RuntimeConfig | None = None,
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
)

Run the compiled snapshot asynchronously.

FrozenModel

Bases: BaseModel

Base Pydantic model used by the immutable core.

GenerateContext

Bases: HashableModel

Context passed to generators for one case execution.

GenerationBundle

Bases: FrozenModel

Portable bundle of generation artifacts for a run.

GenerationBundleRecord

Bases: FrozenModel

One portable generation artifact record.

GenerationCompletedEvent

Bases: CaseRunEvent

Event emitted when candidate generation finishes for a case.

GenerationConfig

Bases: HashableModel

Generation-stage configuration for a run.

GenerationFailedEvent

Bases: CaseRunEvent

Event emitted when candidate generation fails for a case.

GenerationResult

Bases: HashableModel

The candidate artifact returned by a generator call.

GenerationWorkItem

Bases: FrozenModel

Planner output for one generation task.

Generator

Bases: Protocol

Protocol for generation components that produce candidate outputs.

HashableModel

Bases: FrozenModel

Immutable model with stable content-addressable hashing.

InMemoryRunStore

Bases: ProjectionRefreshingStore

Simple in-memory store used by tests and local development.

JudgeModel

Bases: Protocol

Protocol for judge models used inside evaluation workflows.

LLMMetric

Bases: Protocol

Protocol for metrics that judge a reduced candidate set with an LLM.

LifecycleSubscriber

Bases: BeforeGenerate, AfterGenerate, BeforeReduce, AfterReduce, BeforeParse, AfterParse, BeforeScore, AfterScore, BeforeJudge, AfterJudge, OnEvent, Protocol

Aggregate lifecycle subscriber protocol.

Message

Bases: HashableModel

One conversation message captured as an artifact.

OnEvent

Bases: Protocol

Hook invoked after an execution event is persisted.

ParseBundle

Bases: FrozenModel

Portable bundle of parse artifacts for a run.

ParseBundleRecord

Bases: FrozenModel

One portable parse artifact record.

ParseCompletedEvent

Bases: CaseRunEvent

Event emitted when parsing a reduced candidate succeeds.

ParseContext

Bases: HashableModel

Context passed to parsers for a reduced candidate.

ParseFailedEvent

Bases: CaseRunEvent

Event emitted when parsing a reduced candidate fails.

ParsedOutput

Bases: HashableModel

Normalized output produced by a parser before scoring.

Parser

Bases: Protocol

Protocol for parsers that normalize reduced candidate outputs.

ProgressSnapshot

Bases: FrozenModel

Aggregate case progress for a run.

PromptSpec

Bases: HashableModel

Generic prompt instructions and structured prompt material.

render_input

render_input(prompt_input: JSONValue) -> JSONValue

Render prompt-oriented input for provider adapters.

render_sections

render_sections() -> list[str]

Render prompt sections that can prefix a prompt body.

PureMetric

Bases: Protocol

Protocol for deterministic metrics that score parsed outputs directly.

ReduceContext

Bases: HashableModel

Context passed to reducers choosing a final candidate.

ReducedCandidate

Bases: HashableModel

Candidate selected or synthesized by the reduction stage.

ReductionBundle

Bases: FrozenModel

Portable bundle of reduction artifacts for a run.

ReductionBundleRecord

Bases: FrozenModel

One portable reduction artifact record.

ReductionCompletedEvent

Bases: CaseRunEvent

Event emitted when candidate reduction succeeds.

ReductionFailedEvent

Bases: CaseRunEvent

Event emitted when candidate reduction fails.

Reporter

Export persisted run projections in JSON, Markdown, CSV, or LaTeX.

export_csv

export_csv(run_id: str) -> str

Export benchmark score rows as CSV.

export_json

export_json(run_id: str) -> str

Export all major persisted projections for a run as formatted JSON.

export_latex

export_latex(run_id: str) -> str

Export benchmark score rows as a compact LaTeX table.

export_markdown

export_markdown(run_id: str) -> str

Export a human-readable Markdown summary for a persisted run.

export_score_table

export_score_table(
    run_id: str,
) -> list[dict[str, JSONValue]]

Return benchmark score rows in a normalized table structure.

RunCompletedEvent

Bases: RunEvent

Event emitted when orchestration completes successfully.

RunEstimate

Bases: FrozenModel

Planner estimate for the work implied by a compiled run.

RunEvent

Bases: HashableModel

Base event persisted for a compiled run.

RunFailedEvent

Bases: RunEvent

Event emitted when orchestration aborts with an unrecoverable error.

RunIdentity

Bases: HashableModel

Inputs that determine the logical identity and run_id of a run.

RunProvenance

Bases: FrozenModel

Environment metadata recorded with a run but excluded from run_id.

RunResult

Bases: FrozenModel

Final run-level result returned from execution.

RunSnapshot

Bases: FrozenModel

Immutable executable artifact produced by Experiment.compile().

RunStartedEvent

Bases: RunEvent

Event emitted when orchestration starts for a run.

RunStatus

Bases: StrEnum

User-facing run status values.

RunStore

Bases: Protocol

Persistence contract used by Themis runtime components.

RuntimeConfig

Bases: HashableModel

Execution-time controls that do not affect snapshot identity.

Score

Bases: HashableModel

Successful metric output.

ScoreBundle

Bases: FrozenModel

Portable bundle of score artifacts for a run.

ScoreBundleRecord

Bases: FrozenModel

One portable score artifact record.

ScoreCompletedEvent

Bases: CaseRunEvent

Event emitted when a pure metric succeeds.

ScoreContext

Bases: HashableModel

Context passed to deterministic scoring metrics.

ScoreError

Bases: HashableModel

Structured score failure recorded by the runtime.

ScoreFailedEvent

Bases: CaseRunEvent

Event emitted when a pure metric produces an error payload.

SelectContext

Bases: HashableModel

Context passed to candidate selectors before reduction.

SelectionMetric

Bases: Protocol

Protocol for metrics that judge multiple generated candidates.

SqliteRunStore

Bases: ProjectionRefreshingStore

Small SQLite-backed run store.

StepCompletedEvent

Bases: RunEvent

Event emitted when a workflow step completes.

StepFailedEvent

Bases: RunEvent

Event emitted when a workflow step fails.

StepStartedEvent

Bases: RunEvent

Event emitted when a workflow step starts.

StorageConfig

Bases: HashableModel

Store backend configuration used for persistence.

StoredRun

Bases: FrozenModel

Snapshot plus stored events loaded back from a run store.

TimelineView

Bases: FrozenModel

Timeline projection for a run.

TraceMetric

Bases: Protocol

Protocol for metrics that score traces or conversations.

TraceStep

Bases: HashableModel

One structured step in a generation or evaluation trace.

TraceView

Bases: FrozenModel

Trace-oriented projection for a run.

TracingProvider

Bases: Protocol

Protocol for span-based tracing integrations.

WorkflowBuildError

Bases: ValueError

Raised when a metric cannot build a valid evaluation workflow.

WorkflowRunner

Bases: Protocol

Protocol for executing evaluation workflows and returning traces.

WorkflowTrace

Bases: HashableModel

Trace emitted by a workflow-backed evaluation.

evaluate_async async

evaluate_async(
    *,
    model: object,
    data: Dataset
    | Sequence[Dataset]
    | Sequence[Mapping[str, Any]],
    metric: object | Sequence[object],
    parser: object | Sequence[object] | None = None,
    judge: object | Sequence[object] | None = None,
    samples: int = 1,
    reducer: object | None = None,
    storage: StorageConfig | None = None,
    runtime: RuntimeConfig | None = None,
    seeds: list[int] | None = None,
    workflow_overrides: dict[str, object] | None = None,
    judge_config: dict[str, object] | None = None,
    environment_metadata: dict[str, str] | None = None,
    themis_version: str | None = None,
    python_version: str = "3.12",
    platform: str = "unknown",
    store: RunStore | None = None,
    subscribers: list[LifecycleSubscriber] | None = None,
    tracing_provider: TracingProvider | None = None,
) -> RunResult

Compile and run a Themis experiment asynchronously through the Layer 1 API.

event_from_dict

event_from_dict(payload: dict[str, Any]) -> RunEvent

Deserialize a stored event payload into the correct event model.

export_evaluation_bundle

export_evaluation_bundle(
    store: RunStore, run_id: str
) -> EvaluationBundle

Export stored evaluation artifacts into a portable bundle.

export_generation_bundle

export_generation_bundle(
    store: RunStore, run_id: str
) -> GenerationBundle

Export stored generation artifacts into a portable bundle.

export_parse_bundle

export_parse_bundle(
    store: RunStore, run_id: str
) -> ParseBundle

Export stored parse artifacts into a portable bundle.

export_reduction_bundle

export_reduction_bundle(
    store: RunStore, run_id: str
) -> ReductionBundle

Export stored reduction artifacts into a portable bundle.

export_score_bundle

export_score_bundle(
    store: RunStore, run_id: str
) -> ScoreBundle

Export stored score artifacts into a portable bundle.

get_evaluation_execution

get_evaluation_execution(
    store: RunStore,
    run_id: str,
    case_id: str,
    metric_id: str,
    *,
    dataset_id: str | None = None,
    case_key: str | None = None,
) -> EvaluationExecution | None

Return one stored workflow execution for a case and metric.

get_execution_state

get_execution_state(
    store: RunStore, run_id: str
) -> ExecutionState

Return the persisted execution state for a run.

get_run_snapshot

get_run_snapshot(
    store: RunStore, run_id: str
) -> RunSnapshot

Return the persisted snapshot for a run.

import_evaluation_bundle

import_evaluation_bundle(
    store: RunStore, bundle: EvaluationBundle
) -> None

Import evaluation artifacts from a bundle into a store.

import_generation_bundle

import_generation_bundle(
    store: RunStore, bundle: GenerationBundle
) -> None

Import generation artifacts from a bundle into a store.

import_parse_bundle

import_parse_bundle(
    store: RunStore, bundle: ParseBundle
) -> None

Import parse artifacts from a bundle into a store.

import_reduction_bundle

import_reduction_bundle(
    store: RunStore, bundle: ReductionBundle
) -> None

Import reduction artifacts from a bundle into a store.

import_score_bundle

import_score_bundle(
    store: RunStore, bundle: ScoreBundle
) -> None

Import score artifacts from a bundle into a store.

snapshot_report

snapshot_report(
    snapshot: RunSnapshot,
    run_metadata: dict[str, JSONValue] | None = None,
) -> dict[str, JSONValue]

Return a JSON-serializable summary for a compiled snapshot.

sqlite_store

sqlite_store(path: str | Path) -> SqliteRunStore

Build a SQLite-backed store.

Adapters:

themis.adapters

Provider-backed generator adapters for Themis.