# Example Walkthrough — minimal ridge This walkthrough opens the smallest bundled recipe [`examples/recipes/l4_minimal_ridge.yaml`](https://github.com/NanyeonK/macroforecast/blob/main/examples/recipes/l4_minimal_ridge.yaml) and explains every layer choice end-to-end. Use it as a **template** when writing your own replication page (see [study_1.md](study_1.md) etc.). ## Reproduce in two commands ```bash macroforecast run examples/recipes/l4_minimal_ridge.yaml -o out/walkthrough macroforecast replicate out/walkthrough/manifest.json ``` The first command writes per-cell artifacts and `manifest.json` to `out/walkthrough/`. The second re-runs from the manifest and verifies every sink hash matches bit-for-bit. ## Layer 0 — study setup ```yaml 0_meta: fixed_axes: {failure_policy: fail_fast, reproducibility_mode: seeded_reproducible} leaf_config: {random_seed: 42} ``` - `failure_policy: fail_fast` — abort the sweep on first cell failure (default while developing a recipe; switch to `continue_on_failure` for large unsupervised sweeps). - `reproducibility_mode: seeded_reproducible` + `random_seed: 42` — every stochastic step receives a deterministic seed derived from the L0 seed plus the cell index. This is what makes `macroforecast.replicate(manifest_path)` bit-exact. ## Layer 1 — data ```yaml 1_data: fixed_axes: {custom_source_policy: custom_panel_only, frequency: monthly, horizon_set: custom_list} leaf_config: target: y target_horizons: [1] custom_panel_inline: date: [2018-01-01, ..., 2018-12-01] y: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] x1: [0.5, 1.0, 1.5, ..., 6.0] ``` - `custom_source_policy: custom_panel_only` — bypass the FRED loaders; use the inline panel. Real recipes use `dataset: fred_md` (or `fred_qd`, `fred_sd`, `fred_md+fred_sd`, `fred_qd+fred_sd`) plus `start` / `end`. - `frequency: monthly` — the panel calendar; required when not derivable from the dataset. - `horizon_set: custom_list` + `target_horizons: [1]` — predict h=1 only. Multi-horizon recipes use `[1, 3, 6, 12]` (monthly) or `[1, 2, 4, 8]` (quarterly). ## Layer 2 — preprocessing ```yaml 2_preprocessing: fixed_axes: transform_policy: no_transform outlier_policy: none imputation_policy: none_propagate frame_edge_policy: keep_unbalanced ``` The minimal recipe disables every L2 stage (panel is already clean). A realistic FRED-MD recipe would use: - `transform_policy: apply_official_tcode` — apply McCracken-Ng codes. - `outlier_policy: mccracken_ng_iqr` + `outlier_action: flag_as_nan`. - `imputation_policy: em_factor` — McCracken-Ng PCA-EM imputation. - `frame_edge_policy: truncate_to_balanced`. For FRED-SD / mixed-frequency studies see [`mixed_frequency_representation`](../encyclopedia/l2/axes/mixed_frequency_representation.md). ## Layer 3 — feature engineering DAG ```yaml 3_feature_engineering: nodes: - {id: src_X, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: predictors}}} - {id: src_y, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: target}}} - {id: lag_x, type: step, op: lag, params: {n_lag: 1}, inputs: [src_X]} - {id: y_h, type: step, op: target_construction, params: {mode: point_forecast, method: direct, horizon: 1}, inputs: [src_y]} sinks: l3_features_v1: {X_final: lag_x, y_final: y_h} l3_metadata_v1: auto ``` - L3 is a DAG. `src_X` / `src_y` pull predictors / target from the L2 clean panel. - `lag_x` step: a single 1-period lag of every predictor column. - `y_h` step: build the L3 target as a direct h=1 forecast (no lead / cumulative). - Sinks: the DAG terminates at `l3_features_v1` (an `(X_final, y_final)` pair) plus `l3_metadata_v1` (lineage for L7 attribution). Real recipes compose richer DAGs: `pca` reduction, `ma_increasing_order` (MARX), `scaled_pca`, `feature_selection`, etc. See the [encyclopedia L3 page](../encyclopedia/l3/index.md) for the 37 operational ops. ## Layer 4 — forecasting model ```yaml 4_forecasting_model: nodes: - {id: src_X, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: X_final}}} - {id: src_y, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: y_final}}} - id: fit_ridge type: step op: fit_model params: family: ridge alpha: 1.0 forecast_strategy: direct training_start_rule: expanding refit_policy: every_origin search_algorithm: none min_train_size: 6 is_benchmark: true inputs: [src_X, src_y] - {id: predict_ridge, type: step, op: predict, inputs: [fit_ridge, src_X]} sinks: l4_forecasts_v1: predict_ridge l4_model_artifacts_v1: fit_ridge l4_training_metadata_v1: auto ``` - `family: ridge` with `alpha: 1.0` — standard L2-regularised OLS. Replace with `lasso`, `elastic_net`, `ar_p`, `random_forest`, `xgboost`, `bayesian_ridge`, `bvar_minnesota`, `macroeconomic_random_forest`, `dfm_mixed_mariano_murasawa`, ... see the [encyclopedia L4 page](../encyclopedia/l4/index.md) for all 35+. - `forecast_strategy: direct` — train one model per horizon (vs. `iterated` which recursively rolls h=1 forecasts). - `training_start_rule: expanding` + `refit_policy: every_origin` — expanding-window walk-forward, refit at every OOS origin. - `search_algorithm: none` — no hyperparameter tuning. Set to `bayesian_optimization` or `cv_path` for tuning. - `is_benchmark: true` — flags this model as the L5 / L6 reference. Required when comparing models via `compare_models([...])`. ## Layer 5 — evaluation The minimal recipe defaults to `primary_metric: mse`. Realistic recipes add: ```yaml 5_evaluation: fixed_axes: primary_metric: mse point_metrics: [mse, rmse, mae] relative_metrics: [relative_mse, r2_oos] benchmark_scope: per_target_horizon ranking: by_relative_metric ``` For statistical inference (DM / MCS / SPA / Reality Check), enable L6 — see the [encyclopedia L6 page](../encyclopedia/l6/index.md). ## Output (Layer 8) `manifest.json` records the canonical-key-ordered recipe, per-cell sink hashes, and provenance (Python / package versions, lockfile, git SHA, OS, CPU). `out/walkthrough/cell_001/` carries: - `forecasts.csv` — y_true / y_pred per origin - `metrics.json` — point / relative metrics - `cell_manifest.json` - `figures/` — when L7 importance is enabled ## Replication contract ```bash macroforecast replicate out/walkthrough/manifest.json ``` Returns a `ReplicationResult` with `recipe_match=True` and `sink_hashes_match=True` when every artifact reproduces bit-for-bit. This is the package's core promise; if it fails, file an issue. ## Programmatic equivalent (`mf.Experiment`) ```python import macroforecast as mf # One-shot result = mf.forecast( dataset="fred_md", target="INDPRO", horizons=[1, 3, 6], model_family="ridge", output_directory="out/quickstart", ) print(result.metrics) # Builder (multi-cell horse race) exp = ( mf.Experiment(dataset="fred_md", target="INDPRO", horizons=[1, 3, 6]) .compare_models(["ridge", "lasso", "ar_p"]) ) horse_race = exp.run(output_directory="out/horse_race") print(horse_race.ranking) print(horse_race.replicate().sink_hashes_match) # True ``` ## How to write your own replication page 1. Copy this file's structure into `study_.md`. 2. Fill **Paper / Year / Source** in the page header. 3. Drop the recipe YAML into `examples/recipes/replications/study_.yaml` and commit it. 4. Run the recipe + replicate; capture the per-cell metrics in the "Expected artifacts" section. 5. Note which axes are paper-faithful vs simplified (e.g. an approximation of the published method). See [study_1.md](study_1.md), [study_2.md](study_2.md), [study_3.md](study_3.md), [study_4.md](study_4.md) for the four maintainer replications (filled in as each study runs).