Skip to content

Themis

First evaluate()

First `evaluate(...)`¶

What you will build¶

You will run a single deterministic evaluation from Python using builtin generation, parsing, and scoring components.

Prerequisites¶

base Themis install
no provider extras required
basic familiarity with running a Python script

Steps¶

Read the example below.
Run it as a standalone script or import run_example().
Inspect the returned run_id and status.

from __future__ import annotations

from themis import evaluate
from themis.core.models import Case, Dataset


def run_example() -> dict[str, object]:
    """Run the smallest end-to-end evaluation through the Layer 1 API."""

    result = evaluate(
        model="builtin/demo_generator",
        data=[
            Dataset(
                dataset_id="sample",
                cases=[
                    Case(
                        case_id="case-1",
                        input={"question": "2+2"},
                        expected_output={"answer": "4"},
                    )
                ],
            )
        ],
        metric="builtin/exact_match",
        parser="builtin/json_identity",
    )
    return {"run_id": result.run_id, "status": result.status.value}


if __name__ == "__main__":
    print(run_example())

Expected results¶

Expected result:

status is completed
run_id is stable for the same compiled identity inputs
you used the shortest supported Python entry point

Common failure points¶

using a different expected output than the builtin demo generator returns
assuming memory storage can be reopened from another process

Next steps¶