Skip to content

Config schema reference

Config file support

Field Required Purpose Affects run_id Notes
Config file format Yes Experiment.from_config(...) loads YAML (.yaml / .yml) and TOML (.toml) Yes, after normalization into the compiled snapshot Choose the format that best fits your repo conventions
Config values Yes Carry strings and JSON-like values for components, prompts, storage, and runtime settings Yes for identity-bearing fields; no for pure runtime tuning fields Live Python objects belong only in direct Python authoring

Component target syntax

Field Required Purpose Affects run_id Notes
Builtin ids such as builtin/exact_match No Reference shipped catalog components from config files Yes Best when you want stable builtins without writing import paths
Importable factory path such as package.module:factory No Reference your own component factory from config Yes Best when constructor logic belongs in Python
Importable class path such as package.module:Class No Reference a component type directly from config Yes Themis instantiates the class without constructor arguments

GenerationConfig

Field Required Purpose Affects run_id Notes
generator Yes Chooses the candidate producer Yes In config, use a builtin id or import path; in Python, you may pass a live object
candidate_policy No Controls generation fan-out such as num_samples Yes Defaults to {} and is part of logical experiment identity
prompt_spec No Carries prompt instructions, prefixes, suffixes, and generic prompt blocks Yes Prompt changes invalidate generation-stage cache reuse as expected
PromptSpec.blocks No Stores arbitrary structured prompt material Yes Themis does not assign example-specific semantics to block contents
reducer No Chooses how multiple candidates collapse after fan-out Yes Pair with selectors or reducers when num_samples is greater than one

EvaluationConfig

Field Required Purpose Affects run_id Notes
metrics Yes Lists the pure or workflow-backed metrics to run Yes Metric choice defines evaluation semantics
parsers No Normalizes reduced output into metric-ready subjects Yes Choose parsers that match the expected output shape
judge_models No Provides judge models for workflow-backed metrics Yes Omit when using only deterministic pure metrics
prompt_spec No Adds prompt instructions or blocks for builtin judge workflows Yes Judge prompt changes are identity-bearing
judge_config No Carries generic runtime configuration for workflow implementations Yes Exposed to workflows as EvalScoreContext.judge_config; use for custom workflow config that should affect runtime behavior and identity
workflow_overrides No Carries builtin-oriented prompt and rubric overrides Yes Exposed as EvalScoreContext.eval_workflow_config; useful for rubric text and benchmark-specific builtin judge settings

StorageConfig

Field Required Purpose Affects run_id Notes
store Yes Selects the backend such as memory, sqlite, jsonl, mongodb, or postgres Yes Choose based on persistence and operational needs
parameters No Supplies backend-specific settings No Stored as provenance rather than logical run identity
Relative parameters.path, parameters.root, and parameters.blob_root No Resolves storage paths from the config file directory No Keeps checked-in configs portable across environments

RuntimeConfig

Field Required Purpose Affects run_id Notes
max_concurrent_tasks No Sets the global execution cap No Use for coarse operational throttling
stage_concurrency No Sets per-stage concurrency caps No Useful when generation and judging need different limits
provider_concurrency No Limits concurrency per provider endpoint No Helps share one process fairly across models or services
provider_rate_limits No Sets explicit per-provider request or token limits No Use when the endpoint enforces quotas or rate contracts
generation_retry_attempts, generation_retry_delay, generation_retry_backoff No Controls generation retry behavior No Retries transient provider failures without changing identity
judge_retry_attempts, judge_retry_delay, judge_retry_backoff No Controls judge retry behavior No Applies only to workflow-backed metrics
store_retry_attempts, store_retry_delay No Controls persistence retry behavior No Use when the store can fail transiently
existing_run_policy No Chooses duplicate-run handling with auto, error, or rerun No Affects execution behavior, not logical identity
queue_root and batch_root No Select manifest output roots for deferred execution No Used by submit, worker, and batch flows
Relative queue_root and batch_root No Resolves runtime paths from the config file directory No Keeps checked-in configs portable across machines

Overrides

Field Required Purpose Affects run_id Notes
Experiment.from_config(path, overrides=[...]) No Applies OmegaConf dotlist overrides before normalization and component loading Depends on the fields you override Useful for environment-specific paths or small execution changes
Override usage No Lets one checked-in config serve multiple environments or execution shapes Depends on the fields you override Prefer this over forking a config file for small changes

Use Identity vs provenance when deciding whether a config change should create a new logical run.