Skip to content

Evolve a Benchmark

The benchmark-first model still supports incremental evolution against the same storage root.

Typical changes:

  • add a model
  • add a prompt variant
  • add a parse pipeline
  • add a score overlay
  • narrow or widen a dataset query

Worked example: examples/09_experiment_evolution.py

The baseline run uses one model and one prompt. The expanded run adds another model and another prompt variant while reusing the same project storage root.