Evolve a Benchmark¶

The benchmark-first model still supports incremental evolution against the same storage root.

Typical changes:

add a model
add a prompt variant
add a parse pipeline
add a score overlay
narrow or widen a dataset query

Worked example: examples/09_experiment_evolution.py

The baseline run uses one model and one prompt. The expanded run adds another model and another prompt variant while reusing the same project storage root.