Public Surface¶

Themis now documents one benchmark-first public API.

Main Objects¶

Object	Role
`ProjectSpec`	Shared storage, seed, and execution policy
`BenchmarkSpec`	Models, slices, prompt variants, parse pipelines, and scores
`SliceSpec`	One benchmark slice with dataset config, dimensions, and allowed prompts
`DatasetQuerySpec`	Subset, filters, item pinning, and sampling hints
`PromptVariantSpec`	A reusable prompt family and message template
`ParseSpec`	A named parser pipeline
`ScoreSpec`	A named scoring overlay, optionally tied to a parse pipeline
`PluginRegistry`	Runtime lookup for engines, parsers, metrics, judges, and hooks
`Orchestrator`	Planning, execution, export, import, resume, and progress
`BenchmarkResult`	Aggregation, paired comparisons, timelines, and artifact bundles

Mental Model¶

BenchmarkSpec is the public authoring model. Internally, Themis compiles it to a private execution IR before planning trials. That lower layer is an implementation detail, not a second public API.

Dataset access uses the benchmark-first provider contract DatasetProvider.scan(slice_spec, query).

Use this split when deciding where logic belongs:

project-wide runtime policy: ProjectSpec
benchmark semantics: BenchmarkSpec
provider-specific execution: InferenceEngine
answer parsing: ParseSpec + extractor chain
scoring: ScoreSpec + metrics
read-side analysis: BenchmarkResult

Public Imports¶

Use the root package for the main entry points:

from themis import (
    BenchmarkResult,
    BenchmarkSpec,
    DatasetQuerySpec,
    Orchestrator,
    ParseSpec,
    PluginRegistry,
    ProjectSpec,
    PromptMessage,
    PromptVariantSpec,
    ScoreSpec,
    SliceSpec,
    generate_config_report,
)

Use themis.specs for supporting spec models that are still public but not curated into the root package:

from themis.specs import DatasetSpec, GenerationSpec, JudgeInferenceSpec