# Your First Study: Ridge With Diagnostics, Tests, Importance, And Export

This walkthrough builds a complete core layer-contract study. It uses the current supported runtime path: custom panel data, L3 lag features, L4 linear sklearn forecasting, L5 point metrics, optional diagnostic layers, lightweight L6 tests, L7 linear importance, and L8 file export.

For the exact support boundary, see [Runtime Support Matrix](runtime_support.md).

## Study Design

Fixed design:

- Data: custom monthly panel
- Target: `y`
- Horizon: 1 month ahead
- Features: one lag of all predictors
- Model: expanding-window ridge regression
- Evaluation: MSE/RMSE/MAE
- Output: forecasts, metrics, ranking, tests, importance, diagnostics

## Recipe

```yaml
1_data:
  fixed_axes:
    custom_source_policy: custom_panel_only
    frequency: monthly
    horizon_set: custom_list
  leaf_config:
    target: y
    target_horizons: [1]
    custom_panel_inline:
      date: [2020-01-01, 2020-02-01, 2020-03-01, 2020-04-01, 2020-05-01, 2020-06-01]
      y: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
      x1: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
      x2: [2.0, 1.0, 2.0, 1.0, 2.0, 1.0]

2_preprocessing:
  fixed_axes:
    transform_policy: no_transform
    outlier_policy: none
    imputation_policy: none_propagate
    frame_edge_policy: keep_unbalanced

3_feature_engineering:
  nodes:
    - {id: src_X, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: predictors}}}
    - {id: src_y, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: target}}}
    - {id: lag_x, type: step, op: lag, params: {n_lag: 1}, inputs: [src_X]}
    - {id: y_h, type: step, op: target_construction, params: {mode: point_forecast, method: direct, horizon: 1}, inputs: [src_y]}
  sinks:
    l3_features_v1: {X_final: lag_x, y_final: y_h}
    l3_metadata_v1: auto

4_forecasting_model:
  nodes:
    - {id: src_X, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: X_final}}}
    - {id: src_y, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: y_final}}}
    - id: fit_ridge
      type: step
      op: fit_model
      params: {family: ridge, alpha: 1.0, min_train_size: 2, forecast_strategy: direct, training_start_rule: expanding, refit_policy: every_origin, search_algorithm: none}
      inputs: [src_X, src_y]
    - {id: predict_ridge, type: step, op: predict, inputs: [fit_ridge, src_X]}
  sinks:
    l4_forecasts_v1: predict_ridge
    l4_model_artifacts_v1: fit_ridge
    l4_training_metadata_v1: auto

5_evaluation:
  fixed_axes:
    primary_metric: mse
    point_metrics: [mse, rmse, mae]

1_5_data_summary:
  enabled: true
2_5_pre_post_preprocessing:
  enabled: true
3_5_feature_diagnostics:
  enabled: true
4_5_generator_diagnostics:
  enabled: true

6_statistical_tests:
  enabled: true
  sub_layers:
    L6_F_direction:
      enabled: true
    L6_G_residual:
      enabled: true

7_interpretation:
  enabled: true
  nodes:
    - id: src_model
      type: source
      selector: {layer_ref: l4, sink_name: l4_model_artifacts_v1, subset: {model_id: fit_ridge}}
    - id: src_X
      type: source
      selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: X_final}}
    - id: linear_imp
      type: step
      op: model_native_linear_coef
      params: {model_family: ridge}
      inputs: [src_model, src_X]
  sinks:
    l7_importance_v1:
      global: linear_imp

8_output:
  fixed_axes:
    saved_objects: [forecasts, metrics, ranking, tests, importance, diagnostics_all]
  leaf_config:
    output_directory: ./macroforecast_output/first_study/
```

## Execute

```python
import macroforecast as mf

result = mf.run("my_study.yaml")
```

## Inspect Results

```python
print(result.sink("l5_evaluation_v1").metrics_table)
print(result.sink("l5_evaluation_v1").ranking_table)
print(result.sink("l6_tests_v1").direction_results)
print(result.sink("l7_importance_v1").global_importance)
print(result.sink("l8_artifacts_v1").exported_files)
```

## What You Learned

- L1-L4 construct forecasts from a fixed information set.
- L5 evaluates forecast accuracy.
- L1.5-L4.5 provide non-blocking diagnostic artifacts.
- L6 adds lightweight inference artifacts when enabled.
- L7 adds importance artifacts when enabled.
- L8 writes a directory that can be inspected without rerunning the study.

## Real-Time Data Caveat

The recipe above uses `custom_source_policy: custom_panel_only` and inline data. When switching
to FRED data via `custom_source_policy: official_only`, macroforecast v0.9.x uses
**final-revised FRED data** (current vintage) by default. It does **not** simulate real-time
data availability.

| `vintage_policy` | Status in v0.9.x | Notes |
|---|---|---|
| `current_vintage` (default) | Operational | Downloads the latest FRED revision; not a real-time vintage |
| `real_time_alfred` | Not yet operational | Raises `NotImplementedError`; planned for v1.x |

**What this means for your study:**

- Walk-forward evaluation with `custom_source_policy: official_only` uses data as-of today,
  not as-of each forecast origin date. This is appropriate for benchmarking models on a fixed
  dataset but is **not** a real-time forecasting simulation.
- Published macro-forecasting papers typically evaluate over real-time vintages (ALFRED).
  To faithfully replicate such papers, you must supply your own vintage-specific panels via
  `custom_panel_inline` or an external CSV, one panel per origin date.
- The `data_revision_tag` field in the manifest records the FRED data-through date so you
  can detect when the upstream FRED cache was refreshed between runs.

For the real-time limitation context, see
[`docs/CONVENTIONS.md`](../CONVENTIONS.md) and the
[Goulet-Coulombe (2021) replication page](../replications/goulet_coulombe_2021.md).

## Next Steps

- [Understanding Output](understanding_output.md) — every current core runtime artifact
- [Runtime Support Matrix](runtime_support.md) — what is runtime-supported today