Skip to content

Public Surface

Themis now documents one benchmark-first public API.

Main Objects

Object Role
ProjectSpec Shared storage, seed, and execution policy
BenchmarkSpec Models, slices, prompt variants, parse pipelines, and scores
SliceSpec One benchmark slice with dataset config, dimensions, and allowed prompts
DatasetQuerySpec Subset, filters, item pinning, and sampling hints
PromptVariantSpec A reusable prompt family and message template
ParseSpec A named parser pipeline
ScoreSpec A named scoring overlay, optionally tied to a parse pipeline
PluginRegistry Runtime lookup for engines, parsers, metrics, judges, and hooks
Orchestrator Planning, execution, export, import, resume, and progress
BenchmarkResult Aggregation, paired comparisons, timelines, and artifact bundles

Mental Model

BenchmarkSpec is the public authoring model. Internally, Themis compiles it to a private execution IR before planning trials. That lower layer is an implementation detail, not a second public API.

Dataset access uses the benchmark-first provider contract DatasetProvider.scan(slice_spec, query).

Use this split when deciding where logic belongs:

  • project-wide runtime policy: ProjectSpec
  • benchmark semantics: BenchmarkSpec
  • provider-specific execution: InferenceEngine
  • answer parsing: ParseSpec + extractor chain
  • scoring: ScoreSpec + metrics
  • read-side analysis: BenchmarkResult

Public Imports

Use the root package for the main entry points:

from themis import (
    BenchmarkResult,
    BenchmarkSpec,
    DatasetQuerySpec,
    Orchestrator,
    ParseSpec,
    PluginRegistry,
    ProjectSpec,
    PromptMessage,
    PromptVariantSpec,
    ScoreSpec,
    SliceSpec,
    generate_config_report,
)

Use themis.specs for supporting spec models that are still public but not curated into the root package:

from themis.specs import DatasetSpec, GenerationSpec, JudgeInferenceSpec