Your First Study: Ridge With Diagnostics, Tests, Importance, And Export#
This walkthrough builds a complete core layer-contract study. It uses the current supported runtime path: custom panel data, L3 lag features, L4 linear sklearn forecasting, L5 point metrics, optional diagnostic layers, lightweight L6 tests, L7 linear importance, and L8 file export.
For the exact support boundary, see Runtime Support Matrix.
Study Design#
Fixed design:
Data: custom monthly panel
Target:
yHorizon: 1 month ahead
Features: one lag of all predictors
Model: expanding-window ridge regression
Evaluation: MSE/RMSE/MAE
Output: forecasts, metrics, ranking, tests, importance, diagnostics
Recipe#
1_data:
fixed_axes:
custom_source_policy: custom_panel_only
frequency: monthly
horizon_set: custom_list
leaf_config:
target: y
target_horizons: [1]
custom_panel_inline:
date: [2020-01-01, 2020-02-01, 2020-03-01, 2020-04-01, 2020-05-01, 2020-06-01]
y: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
x1: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
x2: [2.0, 1.0, 2.0, 1.0, 2.0, 1.0]
2_preprocessing:
fixed_axes:
transform_policy: no_transform
outlier_policy: none
imputation_policy: none_propagate
frame_edge_policy: keep_unbalanced
3_feature_engineering:
nodes:
- {id: src_X, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: predictors}}}
- {id: src_y, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: target}}}
- {id: lag_x, type: step, op: lag, params: {n_lag: 1}, inputs: [src_X]}
- {id: y_h, type: step, op: target_construction, params: {mode: point_forecast, method: direct, horizon: 1}, inputs: [src_y]}
sinks:
l3_features_v1: {X_final: lag_x, y_final: y_h}
l3_metadata_v1: auto
4_forecasting_model:
nodes:
- {id: src_X, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: X_final}}}
- {id: src_y, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: y_final}}}
- id: fit_ridge
type: step
op: fit_model
params: {family: ridge, alpha: 1.0, min_train_size: 2, forecast_strategy: direct, training_start_rule: expanding, refit_policy: every_origin, search_algorithm: none}
inputs: [src_X, src_y]
- {id: predict_ridge, type: step, op: predict, inputs: [fit_ridge, src_X]}
sinks:
l4_forecasts_v1: predict_ridge
l4_model_artifacts_v1: fit_ridge
l4_training_metadata_v1: auto
5_evaluation:
fixed_axes:
primary_metric: mse
point_metrics: [mse, rmse, mae]
1_5_data_summary:
enabled: true
2_5_pre_post_preprocessing:
enabled: true
3_5_feature_diagnostics:
enabled: true
4_5_generator_diagnostics:
enabled: true
6_statistical_tests:
enabled: true
sub_layers:
L6_F_direction:
enabled: true
L6_G_residual:
enabled: true
7_interpretation:
enabled: true
nodes:
- id: src_model
type: source
selector: {layer_ref: l4, sink_name: l4_model_artifacts_v1, subset: {model_id: fit_ridge}}
- id: src_X
type: source
selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: X_final}}
- id: linear_imp
type: step
op: model_native_linear_coef
params: {model_family: ridge}
inputs: [src_model, src_X]
sinks:
l7_importance_v1:
global: linear_imp
8_output:
fixed_axes:
saved_objects: [forecasts, metrics, ranking, tests, importance, diagnostics_all]
leaf_config:
output_directory: ./macroforecast_output/first_study/
Execute#
import macroforecast as mf
result = mf.run("my_study.yaml")
Inspect Results#
print(result.sink("l5_evaluation_v1").metrics_table)
print(result.sink("l5_evaluation_v1").ranking_table)
print(result.sink("l6_tests_v1").direction_results)
print(result.sink("l7_importance_v1").global_importance)
print(result.sink("l8_artifacts_v1").exported_files)
What You Learned#
L1-L4 construct forecasts from a fixed information set.
L5 evaluates forecast accuracy.
L1.5-L4.5 provide non-blocking diagnostic artifacts.
L6 adds lightweight inference artifacts when enabled.
L7 adds importance artifacts when enabled.
L8 writes a directory that can be inspected without rerunning the study.
Real-Time Data Caveat#
The recipe above uses custom_source_policy: custom_panel_only and inline data. When switching
to FRED data via custom_source_policy: official_only, macroforecast v0.9.x uses
final-revised FRED data (current vintage) by default. It does not simulate real-time
data availability.
|
Status in v0.9.x |
Notes |
|---|---|---|
|
Operational |
Downloads the latest FRED revision; not a real-time vintage |
|
Not yet operational |
Raises |
What this means for your study:
Walk-forward evaluation with
custom_source_policy: official_onlyuses data as-of today, not as-of each forecast origin date. This is appropriate for benchmarking models on a fixed dataset but is not a real-time forecasting simulation.Published macro-forecasting papers typically evaluate over real-time vintages (ALFRED). To faithfully replicate such papers, you must supply your own vintage-specific panels via
custom_panel_inlineor an external CSV, one panel per origin date.The
data_revision_tagfield in the manifest records the FRED data-through date so you can detect when the upstream FRED cache was refreshed between runs.
For the real-time limitation context, see
docs/CONVENTIONS.md and the
Goulet-Coulombe (2021) replication page.
Next Steps#
Understanding Output — every current core runtime artifact
Runtime Support Matrix — what is runtime-supported today