Identity vs provenance¶

What it is: the split between logical run inputs and execution metadata.

When it matters: whenever a run_id changes unexpectedly or stays the same when you expected a new logical run.

What you provide: identity-bearing inputs such as dataset refs, component refs, candidate policy, judge config, workflow overrides, and seeds.

What Themis provides: provenance capture for version, platform, runtime, storage, environment metadata, and runtime-only execution wiring such as tracing or subscribers.

Use this split when you need to explain why two runs are logically the same or different.

flowchart TD
    A["Experiment inputs"] --> B["RunSnapshot.identity"]
    A --> C["RunSnapshot.provenance"]
    B --> D["dataset refs and fingerprints"]
    B --> E["component refs"]
    B --> F["candidate policy, judges, seeds"]
    C --> G["platform, version, storage, environment"]
    C --> J["subscribers, tracing backend"]
    B --> H["Changes run_id"]
    C --> I["Recorded metadata only"]

If the logical run changed, the difference should appear on the identity side; provenance explains where and how that same logical run happened. Changing a LifecycleSubscriber or TracingProvider changes runtime observation, not logical run identity.

What to inspect when it goes wrong: look at RunSnapshot.identity first. If the logical run should be the same, differences should only appear in RunSnapshot.provenance.