First Experiment(...)¶
What you will build¶
You will define generation, evaluation, storage, and seeds explicitly, compile them into a RunSnapshot, and execute the run.
Prerequisites¶
- comfort with the first
evaluate(...)tutorial - base Themis install
Steps¶
- Create an explicit
Experiment(...). - Call
compile()to inspect the stablerun_id. - Run the experiment with an explicit
RuntimeConfig.
from __future__ import annotations
from themis import Experiment, RuntimeConfig
from themis.core.config import EvaluationConfig, GenerationConfig, StorageConfig
from themis.core.models import Case, Dataset
def run_example() -> dict[str, object]:
"""Compile and run an explicit Experiment definition."""
experiment = Experiment(
generation=GenerationConfig(
generator="builtin/demo_generator",
candidate_policy={"num_samples": 1},
reducer="builtin/majority_vote",
),
evaluation=EvaluationConfig(
metrics=["builtin/exact_match"],
parsers=["builtin/json_identity"],
),
storage=StorageConfig(store="memory"),
datasets=[
Dataset(
dataset_id="sample",
cases=[
Case(
case_id="case-1",
input={"question": "2+2"},
expected_output={"answer": "4"},
)
],
)
],
seeds=[7],
)
snapshot = experiment.compile()
result = experiment.run(runtime=RuntimeConfig(max_concurrent_tasks=4))
return {"run_id": snapshot.run_id, "status": result.status.value}
if __name__ == "__main__":
print(run_example())
Expected results¶
Expected result:
Experiment.compile()gives you a concreteRunSnapshotExperiment.run()finishes withcompleted- you now own generation, evaluation, storage, and seed configuration explicitly
Common failure points¶
- confusing
runtimeexecution controls with identity-bearing inputs - expecting
compile()to execute work