# Reproducibility Policy

> Closes phase-00 issue #7. Pinned by the regression batteries in
> `tests/core/test_seed_policy.py`,
> `tests/core/test_deterministic_replay.py`, and
> `tests/core/test_execution_cache.py`.

macroforecast v0.1 promises that **the same recipe produces the same artifacts
bit-for-bit**, on the same machine and across machines that share the
package version + dependency lockfile. This page documents what
"reproducible" means in practice, what knobs control it, and what is
deliberately *out of scope*.

## Public API

```python
import macroforecast

# Run any recipe (inline YAML, dict, or Path).
result = macroforecast.run("recipe.yaml", output_directory="out/")

# Re-execute the stored manifest and verify per-cell sink hashes match.
replication = macroforecast.replicate("out/manifest.json")
assert replication.recipe_match
assert replication.sink_hashes_match
```

## Seed-policy modes (L0)

The L0 layer's `reproducibility_mode` axis selects one of two regimes:

| Mode | When | Seed source | Best for |
|------|------|-------------|----------|
| `seeded_reproducible` *(default)* | every run is a deterministic replay | `0_meta.leaf_config.random_seed` (default `0`) | paper replication, regression tests, multi-cell sweeps |
| `exploratory` | seed is left to whatever process state happens to be | none | one-off interactive runs where determinism doesn't matter |

`strict` and any other unknown value are rejected by the L0 schema
validator. Pass `random_seed` explicitly when you want a non-zero base.

```yaml
0_meta:
  fixed_axes:
    reproducibility_mode: seeded_reproducible
  leaf_config:
    random_seed: 42
```

`_resolve_seed(recipe_root)` returns:

* the explicit `leaf_config.random_seed` if present,
* `0` for the default `seeded_reproducible` mode,
* `None` for `exploratory` (or any other non-`seeded_reproducible` value).

## What `_apply_seed` actually seeds

A best-effort propagation that covers every RNG macroforecast or its
dependencies are likely to touch:

| Library | Call |
|---------|------|
| Python `random` module | `random.seed(seed)` |
| NumPy global state | `np.random.seed(seed % 2**32)` |
| Process env (hash-seed-sensitive iteration) | `os.environ.setdefault("PYTHONHASHSEED", str(seed))` |
| PyTorch (when installed) | `torch.manual_seed(seed)` + `torch.cuda.manual_seed_all(seed)` |

`scikit-learn` estimators receive `random_state=seed_int` from the L4
recipe params (`_build_l4_model`) -- the global numpy seed isn't enough
for sklearn because most estimators capture `random_state=None` and call
`check_random_state` once. **Pin `random_state` per estimator if you need
deterministic ensembles.**

## Cell-index seed schedule

A multi-cell sweep is *not* run with the same seed in every cell. The
sweep loop applies `base_seed + (cell_index - 1)` so:

* Cell 1 uses `random_seed`.
* Cell 2 uses `random_seed + 1`.
* ... cell N uses `random_seed + N - 1`.

This means two cells of the same recipe with different `{sweep: [...]}`
values produce *different* RNG streams (bug-catching: see
`test_distinct_cells_get_distinct_seeds`), but a re-run of the same
sweep produces *identical* streams cell-by-cell.

## Bit-exact replicate

`macroforecast.replicate(manifest_path)` reads the stored manifest, expands
the same sweep, and re-executes every cell. The returned
`ReplicationResult` carries:

* `recipe_match: bool` -- the canonicalized recipe dict round-trips
  identically (key order, sweep marker placement, etc.).
* `sink_hashes_match: bool` -- every cell's per-sink SHA-256 matches
  the original.
* `per_cell_match: dict[str, bool]` -- per-cell breakdown.

Two sinks are exempt from the strict equality check because they
legitimately encode environmental data:

* `l1_data_definition_v1` -- carries `leaf_config.cache_root` which
  depends on the local filesystem layout.
* `l8_artifacts_v1` -- records the absolute paths of exported files.

The other eight sinks (L1 regime, L2, L3 features + metadata, L4
forecasts + models + training, L5 evaluation, plus L6 / L7 / L8 outputs
when produced) are byte-equal across runs.

## Shared raw cache (`cache_root`)

Multi-cell sweeps that hit the same FRED vintage many times share the
on-disk raw cache when you pass `cache_root=`:

```python
macroforecast.run(
    "recipe.yaml",
    output_directory="out/sweep_a",
    cache_root="/var/macroforecast/raw_cache",
)
```

Resolution order (first non-None wins):

1. The explicit `cache_root=` argument.
2. `recipe['1_data']['leaf_config']['cache_root']` (recipe-level
   override).
3. `output_directory / ".raw_cache"` (auto-derived).
4. The raw loader's package default.

The effective value is recorded in `manifest.json[ "cache_root"]` and on
the `ManifestExecutionResult.cache_root` attribute, so a follow-up run
or a downstream auditor can verify exactly which cache backed the
artifacts.

## Determinism boundaries

| Boundary | Guarantee | Caveats |
|----------|-----------|---------|
| Two re-runs of the same recipe in the same Python session | byte-identical sinks (excluding `l1_data_definition_v1` + `l8_artifacts_v1` when output paths differ) | -- |
| Two re-runs in **different processes** with the same package + lockfile | byte-identical sinks (validated by `tests/core/test_v01_1_hot_patch.py::test_hash_sink_with_set_payload_is_stable_across_processes`, which rotates `PYTHONHASHSEED`) | -- |
| `compute_mode = parallel` cell loop | byte-identical sinks vs. serial run for the same cells (validated by `tests/core/test_compute_mode_parallel.py::test_parallel_matches_serial_sink_hashes`) | `l8_artifacts_v1` legitimately differs because of output paths |
| Across machines with the same package version + lockfile | numerical equality at machine epsilon | floating-point summation order across BLAS implementations can drift on the last bit |
| Across `xgboost` / `lightgbm` / `catboost` versions | best-effort | C++ trees are sensitive to library upgrades; pin via the lockfile |
| Deep-NN families (`lstm` / `gru` / `transformer`) | seeded (we call `torch.manual_seed`) but **not** guaranteed bit-exact across torch versions or CUDA driver versions | install `torch[cpu]` for tighter portability |
| Across `shap` versions or with `shap` not installed | best-effort | the L7 SHAP path falls back to a coefficient / permutation proxy when `shap` is missing; the proxy is itself deterministic |

## Worked examples

### Single-path recipe -> identical artifacts twice

```python
import macroforecast
from pathlib import Path

a = macroforecast.run("recipe.yaml", output_directory=Path("out/a"))
b = macroforecast.run("recipe.yaml", output_directory=Path("out/b"))

# Every cell's sink hashes match (excluding path-dependent l1, l8).
for left, right in zip(a.cells, b.cells):
    for sink_name in left.sink_hashes:
        if sink_name in {"l1_data_definition_v1", "l8_artifacts_v1"}:
            continue
        assert left.sink_hashes[sink_name] == right.sink_hashes[sink_name]
```

### Sweep variant ID -> distinct seed

```python
recipe = """
0_meta:
  fixed_axes: {reproducibility_mode: seeded_reproducible}
  leaf_config: {random_seed: 100}
3_feature_engineering:
  nodes:
    - {id: lag_x, type: step, op: lag, params: {n_lag: {sweep: [1, 2, 3, 4]}}, ...}
"""
result = macroforecast.run(recipe)
# Cells get seeds 100, 101, 102, 103.
```

### Replicate the manifest

```python
import macroforecast

primary = macroforecast.run("paper_recipe.yaml", output_directory="paper_out/")
replication = macroforecast.replicate("paper_out/manifest.json")
assert replication.sink_hashes_match
```

## Out of scope

* GPU determinism beyond `torch.manual_seed`. Set
  `torch.use_deterministic_algorithms(True)` and the relevant cuDNN
  flags yourself if you need bit-exact CUDA output -- that is a
  platform-specific decision.
* Reproducibility across BLAS implementations (OpenBLAS vs. MKL vs.
  Apple Accelerate). The L4 estimators are deterministic given fixed
  parameters, but floating-point reductions are not associative.
* Reproducibility across Python versions. The package targets
  `python>=3.10`; minor versions are tested in CI but cross-version
  hash equality is not guaranteed.

## Related tests

| Test file | Pins |
|-----------|------|
| `tests/core/test_seed_policy.py` | `_resolve_seed`, `_apply_seed` contract |
| `tests/core/test_deterministic_replay.py` | identical recipe twice -> identical sinks + byte-identical CSVs |
| `tests/core/test_execution_cache.py` | `cache_root` precedence + shared cache + independence |
| `tests/core/test_v01_1_hot_patch.py` | `set` hashing across `PYTHONHASHSEED` |
| `tests/core/test_compute_mode_parallel.py` | parallel run matches serial run |
| `tests/core/test_execute_recipe_dispatch.py` | str-vs-Path dispatch + deprecation |

## Related issues

* #4 -- `cache_root` parameter on `execute_recipe`
* #6 -- determinism regression battery
* #167 -- L6/L7 numerical golden tests
* #169 -- explicit dispatch in `execute_recipe`