Tune runtime controls¶
Goal: adjust concurrency, provider limits, retry behavior, and duplicate-run policy safely.
When to use this:
Use this guide when the experiment definition is correct but execution behavior needs operational tuning.
Procedure¶
Configure RuntimeConfig to change execution-time behavior:
max_concurrent_tasksstage_concurrencyprovider_concurrencyprovider_rate_limitsgeneration_retry_attempts,generation_retry_delay,generation_retry_backoffjudge_retry_attempts,judge_retry_delay,judge_retry_backoffstore_retry_attemptsstore_retry_delayexisting_run_policy
Provider-backed models are treated as endpoints. Use provider_concurrency and provider_rate_limits to keep one process fair across multiple endpoint-backed models or benchmarks without changing the experiment identity.
Retry behavior:
- generation retries classify explicit retryable errors, timeouts, connection failures,
429rate limits, and5xxprovider failures - judge retries use the same classification and preserve retry history, including
retry_after_shints when available - retry metadata is persisted on generation artifacts and workflow failures so later inspection can distinguish a hard failure from a transient recovery
- retry is transient same-stage recovery; it is different from
resume,replay, orexisting_run_policy
Estimate behavior:
themis estimate --config ...now returns task counts and token-level estimates- generation estimates report input and assumed output tokens
- judge estimates report estimated prompt and assumed output tokens
- Themis does not price those tokens; use the estimate JSON as input to an external cost model
- the estimate payload includes
estimated_total_tokensplus its assumptions, so external pricing can remain versioned outside Themis
Variants¶
| Variant | Best when | Tradeoff | Related APIs / commands |
|---|---|---|---|
| Conservative provider rollout | A provider has strict quotas or unstable limits and you want safety first | Lower throughput | provider_concurrency, provider_rate_limits, retry settings |
| Throughput-oriented local runs | Local hardware or permissive endpoints can handle more parallel work | Higher pressure on stores, providers, and error handling | max_concurrent_tasks, stage_concurrency, provider_concurrency |
Expected result¶
You should be able to alter execution behavior without changing run_id.