FAQ¶

Why is the public API benchmark-first now?¶

Because serious eval authors need first-class slices, prompt variants, parse pipelines, semantic dimensions, and benchmark-native reporting.

Planning and execution still run on a lower-level IR, but that layer is an implementation detail. The public contract is the benchmark surface.

Use DatasetProvider.scan(slice_spec, query).

Treat it as a handoff and acceptance reference. It was intentionally not rewritten during the benchmark-first overhaul.

Use themis-quickcheck against the SQLite database.

Use BenchmarkResult.aggregate(...) and include slice_id, prompt_variant_id, or dimension keys in group_by.