Specs¶
Benchmark Authoring¶
DatasetQuerySpec ¶
Bases: SpecBase
Declarative slice query and sampling controls for dataset providers.
Source code in themis/benchmark/query.py
BenchmarkSpec ¶
Bases: SpecBase
Top-level benchmark configuration compiled into an execution plan.
Source code in themis/benchmark/specs.py
SliceSpec ¶
Bases: SpecBase
One benchmark slice with dataset identity, queries, prompts, and scoring.
Source code in themis/benchmark/specs.py
PromptVariantSpec ¶
Bases: SpecBase
Structured prompt variant scoped to one family or benchmark workflow.
Source code in themis/benchmark/specs.py
ParseSpec ¶
Bases: SpecBase
Named parse pipeline backed by one extractor chain.
Source code in themis/benchmark/specs.py
ScoreSpec ¶
Bases: SpecBase
Named scoring pass over raw or parsed candidate outputs.
Source code in themis/benchmark/specs.py
Project and Runtime Support¶
ProjectSpec ¶
Bases: SpecBase
Shared project-level identity, storage defaults, and execution policy.
Keep this stable across related experiment runs so resume behavior and run manifests refer to the same storage and backend context.
Source code in themis/specs/experiment.py
StorageConfig
module-attribute
¶
StorageConfig = Annotated[
SqliteBlobStorageSpec | PostgresBlobStorageSpec,
Field(discriminator="backend"),
]
SqliteBlobStorageSpec ¶
Bases: _StorageSpecBase
SQLite event/projection store plus local filesystem blob persistence.
Source code in themis/specs/experiment.py
PostgresBlobStorageSpec ¶
Bases: _StorageSpecBase
Postgres event/projection store plus local filesystem blob persistence.
Source code in themis/specs/experiment.py
ExecutionPolicySpec ¶
Bases: SpecBase
Retry, backoff, circuit-breaker, and concurrency controls for orchestration.
These settings live above provider SDK behavior. Engines are still responsible for classifying provider failures into stable retryable codes.
Source code in themis/specs/experiment.py
InferenceGridSpec ¶
Bases: SpecBase
Typed inference sweep over base params and scalar override grids.
Use this for temperature, top-p, or provider-extra sweeps while keeping unchanged parameter combinations resumable across runs.
Source code in themis/specs/experiment.py
expand ¶
Expand the base inference params over all configured overrides.
Source code in themis/specs/experiment.py
InferenceParamsSpec ¶
Bases: SpecBase
Sampling and response-shape settings forwarded to inference engines.
Source code in themis/specs/experiment.py
PromptMessage ¶
Bases: BaseModel
One structured chat message in a prompt template.
Source code in themis/specs/experiment.py
ModelSpec ¶
Bases: SpecBase
Configures one inference-engine target and its provider-specific extras.
Source code in themis/specs/foundational.py
DatasetSpec ¶
Bases: SpecBase
Declarative dataset source description passed to a dataset loader.
Dataset identity is part of deterministic planning. Use revision when the
upstream dataset source supports version pinning.
Source code in themis/specs/foundational.py
GenerationSpec ¶
JudgeInferenceSpec ¶
Bases: SpecBase
Optional judge-model configuration used by judge-backed metrics.
Separate metrics can carry separate judge specs, which is how one candidate can be scored by multiple judge prompts or judge models in the same run.