Capture traces and conversations¶

Goal: store trace and conversation artifacts so you can inspect or score them later.

When to use this:

Use this guide when generated final output alone is not enough to explain or evaluate model behavior.

Procedure¶

Populate GenerationResult.trace and GenerationResult.conversation inside your generator.

from __future__ import annotations

from themis import Experiment, InMemoryRunStore
from themis.core.config import EvaluationConfig, GenerationConfig, StorageConfig
from themis.core.models import Case, Dataset, GenerationResult, Message, TraceStep


class TracedGenerator:
    """Generator example that emits trace and conversation artifacts."""

    component_id = "generator/traced_example"
    version = "1.0"

    def fingerprint(self) -> str:
        return "traced-example-generator"

    async def generate(self, case: Case, ctx: object) -> GenerationResult:
        del ctx
        return GenerationResult(
            candidate_id=f"{case.case_id}-candidate",
            final_output={"answer": "4"},
            trace=[
                TraceStep(
                    step_name="reason",
                    step_type="tool",
                    input={"question": case.input},
                    output={"answer": "4"},
                )
            ],
            conversation=[
                Message(role="user", content=case.input),
                Message(role="assistant", content={"answer": "4"}),
            ],
        )


def run_example() -> dict[str, object]:
    """Run with trace-producing generation and inspect the trace view projection."""

    store = InMemoryRunStore()
    experiment = Experiment(
        generation=GenerationConfig(generator=TracedGenerator()),
        evaluation=EvaluationConfig(),
        storage=StorageConfig(store="memory"),
        datasets=[
            Dataset(
                dataset_id="sample",
                cases=[Case(case_id="case-1", input={"question": "2+2"})],
            )
        ],
    )
    result = experiment.run(store=store)
    trace_view = store.get_projection(result.run_id, "trace_view")
    generation_traces = []
    if isinstance(trace_view, dict):
        maybe_traces = trace_view.get("generation_traces")
        if isinstance(maybe_traces, list):
            generation_traces = maybe_traces
    return {
        "run_id": result.run_id,
        "status": result.status.value,
        "generation_trace_count": len(generation_traces),
    }


if __name__ == "__main__":
    print(run_example())

Trace and conversation artifacts are only useful if the user knows where to inspect them afterward; always link to the relevant inspection docs.

Variants¶

Variant	Best when	Tradeoff	Related APIs / commands
Trace only	You need execution breadcrumbs without full turn-by-turn conversation state	Less useful for prompt-review workflows	`GenerationResult.trace`
Conversation only	You need the conversational exchange for later inspection or judging	Provides less internal execution structure than a trace	`GenerationResult.conversation`
Full inspection workflows	You want both trace-level and conversation-level evidence persisted for later analysis	More storage and generator implementation work	`GenerationResult.trace`, `GenerationResult.conversation`, persistent stores

Expected result¶

The run should expose trace-oriented projections and make those artifacts available for later inspection.

Capture traces and conversations¶

Procedure¶

Variants¶

Expected result¶

Troubleshooting¶