Macroeconomic Data Transformations Matter#
This page records the current replication setting for Goulet Coulombe, Leroux, Stevanovic, and Surprenant (2021), “Macroeconomic data transformations matter,” International Journal of Forecasting, 37(4), 1338-1354.
Sources checked for this page:
IJF article DOI: https://doi.org/10.1016/j.ijforecast.2021.05.005
arXiv working-paper page: https://arxiv.org/abs/2008.01714
local main PDF:
/Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/raw/papers/10.1016j.ijforecast.2021.05.005.pdflocal online appendix PDF:
/Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/raw/papers/10.1016j.ijforecast.2021.05.005_appendix.pdfauthor Code & Data page: https://philippegouletcoulombe.com/code
author MARX note: https://philippegouletcoulombe.com/blog/ml-based-time-series-modelling-with-marx-2
local MARX snippet:
/Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/raw/paper_code/coulombe_site_github_20260530/marx/MARX_cheap_code.R
Replication Status#
Item |
Status |
Consequence |
|---|---|---|
Main-text design |
available |
The sample, target list, horizon grid, transformation grid, learner list, and evaluation objects can be fixed. |
Appendix model algorithms |
available |
Elastic Net, Adaptive Lasso, Random Forest, Boosted Trees, Linear Boosting, GA search, and Bayesian search can be mapped to package settings. |
Full official replication code |
not found in the checked public/local materials |
Table-identical replication is not claimed. |
Exact FRED-MD vintage |
not stated in the checked public/local materials |
Use a historical FRED-MD vintage just after the |
MATLAB default optimizer behavior |
partially stated |
We can match the high-level optimizer design, but MATLAB default internal randomness/surrogate settings are not exactly portable. |
MARX reference code |
available as a small R snippet |
|
|
invalid as replication evidence |
The runner and diagnostic harness used |
- Replication level:
reconstructed_design, notexact_table_replication.- Package feasibility:
all Table 2 design axes can be expressed with public
macroforecastcallables: official FRED-MD vintage loading, McCracken-Ng t-code preprocessing, direct-average and path-average targets, all 16 feature matrices, AR/FM/AL/EN/LB/RF/BT learner families, RMSE/relative-RMSE scoring, and DM/MCS forecast-comparison tests.- Exact-replication blocker:
the paper and appendix do not provide a full machine-readable replication package or exact FRED-MD vintage in the checked materials.
- Vintage policy:
do not use
current.csvfor this paper. The official FRED-MD page provides historical vintage zip files. Because the pseudo-out-of-sample period ends in2017M12, the first replication candidate is2018-01.csvextracted from the historical FRED-MD vintage zip; use2018-02.csvor2018-03.csvonly if the January vintage does not contain the needed December 2017 realized values.
Paper Design#
Axis |
Paper setting |
|
|---|---|---|
Data |
monthly FRED-MD after McCracken-Ng transformations |
|
Initial estimation sample |
starts at |
|
Pseudo-out-of-sample period |
|
test start |
Window type |
expanding |
expanding train/validation/test policy |
Targets |
|
pass one target at a time to |
Horizons |
1, 3, 6, 9, 12, 24 months |
|
Target autoregressive lags |
include lags of the target transformation; appendix setting |
|
Target construction |
direct average growth/difference and path-average forecasts from one-period transformed targets |
after official t-code preprocessing, use |
Loss |
RMSE |
|
Reference benchmark |
factor model, FM |
package mapping: fixed or BIC-selected |
Accuracy tests |
DM test and MCS |
|
Target Variables#
The paper uses ten monthly targets:
| Paper label | Meaning in the article | FRED-MD column in 2018-01 vintage | Package alias |
| — | — | — |
| INDPRO | Industrial production index | INDPRO | INDPRO |
| EMP | Total nonfarm employment | PAYEMS | EMP |
| UNRATE | Unemployment rate | UNRATE | UNRATE |
| INCOME | Real personal income excluding current transfers | W875RX1 | INCOME |
| CONS | Real personal consumption expenditures | DPCERA3M086SBEA | CONS |
| RETAIL | Retail and food services sales | RETAILx | RETAIL |
| HOUST | Housing starts | HOUST | HOUST |
| M2 | M2 money stock | M2SL | M2 |
| CPI | Consumer price index | CPIAUCSL | CPI |
| PPI | Producer price index | PPICMM | PPI |
If the loaded FRED-MD vintage uses different column labels, create a target alias map before running the grid. Do not silently substitute a nearby series.
Feature Matrices#
The main text fixes sixteen feature-matrix cases. Lags of the
month-to-month change or log-change of the target are always included. Cases
with MARX and MAF also include target-derived blocks, written in the paper
as MARX_y and MAF_y; these are not the same as the plain autoregressive
target lags. Cases with Level include raw level variables and the target’s raw
level Y_t.
Paper case |
Content |
Package expression |
|---|---|---|
|
factors and factor lags |
|
|
factors plus lagged transformed observables |
|
|
factors plus predictor MARX and |
|
|
factors plus predictor MAF and |
|
|
factors plus raw levels including |
|
|
factors, lagged observables, MARX |
|
|
factors, lagged observables, MAF |
|
|
factors, lagged observables, levels |
|
|
factors, lagged observables, MARX, levels |
|
|
lagged transformed observables |
|
|
predictor MARX plus |
|
|
predictor MAF plus |
|
|
lagged observables plus MARX |
|
|
lagged observables plus MAF |
|
|
lagged observables plus levels |
|
|
lagged observables, MARX, levels |
|
Default reconstruction settings:
feature_settings = {
"lags": range(1, 13),
"max_lag": 12,
"n_factors": 8,
"n_maf_components": 2,
"fit_policy": "expanding",
"include_current_factor": True,
"scale_factors": True,
"scale_marx": False,
"scale_maf": False,
}
The MAF setting follows the article’s stated P_MAF = 12 and two MAFs per
series. The MARX setting follows the author R snippet: for each variable and lag
order l, replace the lag-l column by the mean of lags 1, ..., l.
Learner Settings#
Learner |
Paper setting |
Closest package setting |
|---|---|---|
AR |
autoregressive benchmark; lag order selected by BIC |
|
FM |
Stock-Watson-style factor model; BIC hyperparameter selection mentioned in main text |
|
Adaptive Lasso |
gamma |
|
Elastic Net |
100 lambda values; 100 alpha values in |
|
Linear Boosting |
component-wise L2 boosting; |
|
Random Forest |
200 trees; bootstrap sample; terminal node size greater than 5; |
|
Boosted Trees |
initial mean; depth 5 splits; |
|
- Important approximation:
MATLAB tree defaults and Python/sklearn tree defaults are not identical. The package setting above matches the published algorithmic intent, not every backend implementation detail.
Paper Result Anchors#
These are not package test assertions yet. They are qualitative/structural anchors that a successful reconstructed run should make visible.
Anchor |
Expected pattern from the article |
|---|---|
MARX |
material gains for real activity variables at short and medium horizons, especially with tree learners |
MAF |
weaker on average than MARX, but useful in selected longer-horizon cases |
Factors |
included in most best specifications |
Level |
useful for money and selected real/activity cases |
Direct average vs path average |
path average often helps cyclical real variables; direct average is stronger for selected nominal variables and horizons |
RF vs BT |
nonlinear tree methods dominate many best cases; RF appears more often than BT among best specifications |
For exact comparison against Table 2 or Appendix B relative RMSE tables, freeze a FRED-MD vintage and record the backend versions before treating deviations as package bugs.
Notebook-Style Skeleton#
The current page fixes the setting. The full replication notebook should execute the same cells and then attach paper-table captures plus package outputs.
For the current package-only executed notebook report, see Macroeconomic Data Transformations Matter: Package-Only Notebook.
Cell 1: Load a Frozen FRED-MD Panel#
import macroforecast as mf
frozen_vintage = "2018-01" # first candidate after the paper's 2017M12 sample end
bundle = mf.data.load_fred_md(vintage=frozen_vintage)
raw_panel = bundle.panel
raw_metadata = bundle.metadata
Expected output:
raw_panel: pandas.DataFrame indexed by monthly date
raw_metadata: metadata dictionary attached to the data bundle
raw_metadata["artifact"]["source_url"]: official historical archive plus member CSV
Cell 2: Build the Stationary Panel and Preserve Levels#
levels = raw_panel.copy()
X = mf.preprocessing.reprocess(
raw_panel,
transform="official",
outliers="iqr",
outlier_action="flag_as_nan",
iqr_threshold=10.0,
impute="em_factor",
em_n_factors=8,
frame="keep",
)
Expected output:
X: FRED-MD-style stationary pandas.DataFrame
levels: original level panel for Level feature specifications
Cell 3: Define the Paper Grid#
target_map = {
"INDPRO": "INDPRO",
"EMP": "PAYEMS",
"UNRATE": "UNRATE",
"INCOME": "W875RX1",
"CONS": "DPCERA3M086SBEA",
"RETAIL": "RETAILx",
"HOUST": "HOUST",
"M2": "M2SL",
"CPI": "CPIAUCSL",
"PPI": "PPICMM",
}
targets = list(target_map.values())
horizons = [1, 3, 6, 9, 12, 24]
feature_cases = [
"F", "F-X", "F-MARX", "F-MAF", "F-Level",
"F-X-MARX", "F-X-MAF", "F-X-Level", "F-X-MARX-Level",
"X", "MARX", "MAF", "X-MARX", "X-MAF", "X-Level", "X-MARX-Level",
]
models = ["ar", "far", "adaptive_lasso", "elastic_net", "glmboost", "random_forest", "gradient_boosting"]
target_policies = ["direct_average", "path_average"]
Expected output:
10 targets x 6 horizons x 16 feature cases x 5 ML learners x 2 target policies,
plus AR and FM benchmarks.
Cell 3B: Encode Table 2 Best-Specification Targets#
The published Table 2 can be encoded as a target comparison dictionary. Each
entry is (target_policy, learner, feature_case).
PAPER_TABLE2 = {
("INDPRO", 1): ("direct_average", "random_forest", "F-X-MARX-Level"),
("INDPRO", 3): ("direct_average", "random_forest", "MARX"),
("INDPRO", 6): ("path_average", "random_forest", "MARX"),
("INDPRO", 9): ("path_average", "random_forest", "MARX"),
("INDPRO", 12): ("path_average", "random_forest", "MARX"),
("INDPRO", 24): ("direct_average", "random_forest", "F-Level"),
("EMP", 1): ("direct_average", "random_forest", "F-X-MARX-Level"),
("EMP", 3): ("path_average", "random_forest", "F-MARX"),
("EMP", 6): ("path_average", "gradient_boosting", "F-MARX"),
("EMP", 9): ("path_average", "gradient_boosting", "F-MARX"),
("EMP", 12): ("path_average", "gradient_boosting", "F-MARX"),
("EMP", 24): ("path_average", "gradient_boosting", "MAF"),
("UNRATE", 1): ("direct_average", "gradient_boosting", "F-MARX"),
("UNRATE", 3): ("direct_average", "random_forest", "F-X-MARX-Level"),
("UNRATE", 6): ("path_average", "random_forest", "F-MARX"),
("UNRATE", 9): ("path_average", "glmboost", "F-X-MARX-Level"),
("UNRATE", 12): ("path_average", "glmboost", "F-X-MARX-Level"),
("UNRATE", 24): ("direct_average", "gradient_boosting", "F-MAF"),
("INCOME", 1): ("direct_average", "random_forest", "MARX"),
("INCOME", 3): ("direct_average", "random_forest", "F-MARX"),
("INCOME", 6): ("path_average", "random_forest", "F-X-MARX"),
("INCOME", 9): ("path_average", "random_forest", "F-MARX"),
("INCOME", 12): ("path_average", "random_forest", "F-MARX"),
("INCOME", 24): ("path_average", "random_forest", "F-X-MARX"),
("CONS", 1): ("direct_average", "far", "F"),
("CONS", 3): ("direct_average", "random_forest", "F-Level"),
("CONS", 6): ("path_average", "random_forest", "F-Level"),
("CONS", 9): ("direct_average", "random_forest", "MAF"),
("CONS", 12): ("path_average", "random_forest", "F-MAF"),
("CONS", 24): ("path_average", "random_forest", "F-MAF"),
("RETAIL", 1): ("direct_average", "far", "F"),
("RETAIL", 3): ("path_average", "gradient_boosting", "F-X-MARX"),
("RETAIL", 6): ("path_average", "adaptive_lasso", "F-MARX"),
("RETAIL", 9): ("direct_average", "gradient_boosting", "F-X-MARX-Level"),
("RETAIL", 12): ("direct_average", "gradient_boosting", "F-X-Level"),
("RETAIL", 24): ("direct_average", "gradient_boosting", "F-X-MAF"),
("HOUST", 1): ("direct_average", "elastic_net", "F-Level"),
("HOUST", 3): ("path_average", "elastic_net", "F-Level"),
("HOUST", 6): ("path_average", "random_forest", "F-X-MARX"),
("HOUST", 9): ("direct_average", "random_forest", "F-MAF"),
("HOUST", 12): ("direct_average", "random_forest", "F"),
("HOUST", 24): ("direct_average", "random_forest", "F"),
("M2", 1): ("direct_average", "random_forest", "X-Level"),
("M2", 3): ("path_average", "adaptive_lasso", "X-Level"),
("M2", 6): ("path_average", "random_forest", "F-Level"),
("M2", 9): ("direct_average", "random_forest", "F-Level"),
("M2", 12): ("direct_average", "gradient_boosting", "F-Level"),
("M2", 24): ("path_average", "random_forest", "F-Level"),
("CPI", 1): ("direct_average", "adaptive_lasso", "MARX"),
("CPI", 3): ("direct_average", "random_forest", "F"),
("CPI", 6): ("direct_average", "random_forest", "F"),
("CPI", 9): ("direct_average", "random_forest", "F"),
("CPI", 12): ("direct_average", "random_forest", "F"),
("CPI", 24): ("path_average", "random_forest", "X"),
("PPI", 1): ("direct_average", "elastic_net", "F-MARX"),
("PPI", 3): ("direct_average", "elastic_net", "MARX"),
("PPI", 6): ("direct_average", "random_forest", "F"),
("PPI", 9): ("direct_average", "random_forest", "F"),
("PPI", 12): ("direct_average", "random_forest", "F"),
("PPI", 24): ("direct_average", "gradient_boosting", "F-Level"),
}
assert set(PAPER_TABLE2) == {(target, horizon) for target in target_map for horizon in horizons}
The learner labels are package callables:
MODEL_CALLABLES = {
"ar": mf.models.ar,
"far": mf.models.far,
"adaptive_lasso": mf.models.adaptive_lasso,
"elastic_net": mf.models.elastic_net,
"glmboost": mf.models.glmboost,
"random_forest": mf.models.random_forest,
"gradient_boosting": mf.models.gradient_boosting,
}
Cell 4A: Build the Table-1 Feature Matrix Directly#
Z = mf.feature_engineering.feature_matrix(
X,
specification="F-X-MARX",
level_data=levels,
lags=range(1, 13),
max_lag=12,
n_factors=8,
n_maf_components=2,
fit_policy="expanding",
)
Expected output:
Z: pandas.DataFrame matching the paper's F-X-MARX feature family.
Z.attrs["macroforecast_feature_metadata"]: feature lineage table.
This direct call is the clearest way to inspect Table 1. For strict
walk-forward forecasting, let the runner fit feature state inside each
expanding window with a FeatureSpec step pipeline.
Cell 4B: Runner-Safe Equivalent for One Feature Case#
features = mf.feature_engineering.feature_spec(
target="INDPRO",
horizon=3,
predictors=[column for column in X.columns if column != "INDPRO"],
lags=None,
target_lags=range(0, 13),
target_transform="average_value",
target_mode="direct",
steps=[
mf.feature_engineering.pca_step(
name="F_raw",
n_components=8,
scale=True,
include=False,
),
mf.feature_engineering.lag_step(
name="F",
input="F_raw",
lags=range(0, 13),
include=True,
),
mf.feature_engineering.lag_step(
name="X",
input="panel",
lags=range(1, 13),
include=True,
),
mf.feature_engineering.marx_step(
name="MARX_X",
input="panel",
max_lag=12,
include=True,
),
mf.feature_engineering.marx_step(
name="MARX_y",
input="target_panel",
columns=["INDPRO"],
max_lag=12,
include=True,
),
],
)
Expected output:
features: FeatureSpec fitted by forecasting.run() according to feature_policy.
FeatureSpec does not call feature_matrix() internally. It uses smaller
step-level callables so PCA, MAF, scaling, and custom fitted transforms can be
estimated only on the rows available to each forecast origin.
The paper’s MARX_y/MAF_y target-derived blocks use
input="target_panel"; this keeps target columns out of predictors while
still allowing the target’s observed history to enter the feature matrix.
Cell 4C: Run One Reconstructed Design Point#
window = mf.window.from_cutoffs(
estimation_start="1960-01",
test_start="1980-01",
test_end="2017-12",
mode="expanding",
val_method="random_kfold",
val_n_splits=5,
val_random_state=123,
horizon=3,
step=1,
)
result = mf.forecasting.run(
X,
model="random_forest",
target="INDPRO",
horizon=3,
forecast_policy="direct_average",
features=features,
window=window,
params={
"n_estimators": 200,
"min_samples_leaf": 5,
"max_features": 0.3333333333333333,
"bootstrap": True,
},
)
Expected output:
result.forecasts: one row per forecast origin/date with prediction, actual, model metadata, and window metadata
result.metadata: run-level record of data, feature, model, selection, and window choices
The max_features value uses a fraction because sklearn accepts fractional
feature subsampling. If an exact floor rule is required, resolve
max(1, p // 3) after the feature matrix has been materialized.
For learner tuning, the paper reports randomly assigned 5-fold CV, not
chronological blocked CV. mf.window.val_random_kfold(...) is therefore used
only for this replication surface. The ordinary macro-forecasting default should
remain time-aware validation such as blocked_kfold, poos, or expanding.
Cell 5: Run Table 2 Cells With Package Callables#
The following helper code runs the package-only Table 2 comparison path. It is written as plain Python so the same block can be copied into a notebook or run as a batch script.
import math
import numpy as np
import pandas as pd
def package_model_fit_predict(model_name, X_train, y_train, X_test):
"""Fit one Table 2 learner with macroforecast public callables only."""
if model_name == "far":
# FM benchmark as factor autoregression. For Table 2 cells whose feature
# matrix is already F-only, mf.models.ols on the factor matrix is also a
# valid fixed-k FM reconstruction.
fit = mf.models.ols(X_train, y_train)
elif model_name == "adaptive_lasso":
fit = mf.models.adaptive_lasso(
X_train,
y_train,
gamma=1.0,
initial="ridge",
initial_alpha=1.0,
alpha=0.01,
max_iter=20_000,
random_state=123,
)
elif model_name == "elastic_net":
fit = mf.models.elastic_net(
X_train,
y_train,
alpha=0.01,
l1_ratio=0.5,
max_iter=20_000,
standardize=True,
)
elif model_name == "glmboost":
fit = mf.models.glmboost(
X_train,
y_train,
n_iter=100,
learning_rate=0.1,
candidate_sampling="random",
candidate_fraction=1 / 3,
candidate_cap=200,
candidate_rounding="floor",
random_state=123,
)
elif model_name == "random_forest":
p = X_train.shape[1]
fit = mf.models.random_forest(
X_train,
y_train,
n_estimators=200,
min_samples_leaf=5,
max_features=max(1, p // 3),
bootstrap=True,
random_state=123,
n_jobs=1,
)
elif model_name == "gradient_boosting":
p = X_train.shape[1]
fit = mf.models.gradient_boosting(
X_train,
y_train,
n_estimators=200,
learning_rate=0.05,
max_depth=5,
max_features=max(1, p // 3),
random_state=123,
)
else:
raise ValueError(f"unsupported Table 2 learner: {model_name}")
prediction = fit.predict(X_test)
return float(prediction.iloc[0] if hasattr(prediction, "iloc") else np.asarray(prediction).reshape(-1)[0])
def direct_target_column(target_column, horizon, columns):
name = f"{target_column}_average_value_h{horizon}"
if name in columns:
return name
matches = [
column for column in columns
if column.startswith(target_column) and "average_value" in column and f"h{horizon}" in column
]
if not matches:
raise KeyError((target_column, horizon, "direct_average"))
return matches[0]
def path_target_columns(target_column, horizon, columns):
out = []
for step in range(1, horizon + 1):
name = f"{target_column}_value_step{step}"
if name in columns:
out.append(name)
else:
matches = [
column for column in columns
if column.startswith(target_column) and "value" in column and f"step{step}" in column
]
if not matches:
raise KeyError((target_column, horizon, step, "path_average"))
out.append(matches[0])
return out
def annual_december_origins(index, horizon):
"""Small Table 2 smoke grid. Use every month for the full paper run."""
index = set(pd.DatetimeIndex(index))
origins = []
for year in range(1980, 2018):
realized = pd.Timestamp(year=year, month=12, day=1)
origin = realized - pd.DateOffset(months=horizon)
if origin in index:
origins.append(origin)
return origins
def available_target_training_frame(frame, base_index, origin, target_step):
"""Keep only rows whose h-step or step-specific label is observable at origin."""
base_index = pd.DatetimeIndex(base_index)
origin_pos = int(base_index.get_indexer([origin])[0])
if origin_pos < 0:
raise KeyError(origin)
row_pos = pd.Series(base_index.get_indexer(frame.index), index=frame.index)
mask = (row_pos >= 0) & ((row_pos + int(target_step)) <= origin_pos)
return frame.loc[mask]
def run_one_table2_cell(processed, levels, target_label, horizon, policy, model_name, feature_case):
target_column = target_map[target_label]
feature_set = mf.feature_engineering.build_features(
processed,
targets=[target_column],
horizons=[horizon],
predictors="all",
target_lags=range(0, 13),
feature_specification=feature_case,
max_lag=12,
n_factors=8,
n_maf_components=2,
feature_fit_policy="expanding",
feature_min_train_size=240,
level_data=levels,
target_mode="path" if policy == "path_average" else "direct",
target_transform="value" if policy == "path_average" else "average_value",
drop_missing=False,
)
realized_direct = mf.feature_engineering.average_target(
processed,
targets=[target_column],
horizons=[horizon],
transform="value",
)
actual_col = direct_target_column(target_column, horizon, realized_direct.columns)
rows = []
for origin in annual_december_origins(feature_set.X.index, horizon):
if origin not in realized_direct.index:
continue
actual = float(realized_direct.loc[origin, actual_col])
if math.isnan(actual):
continue
if policy == "direct_average":
y_col = direct_target_column(target_column, horizon, feature_set.y.columns)
frame = pd.concat([feature_set.X, feature_set.y[[y_col]]], axis=1).dropna()
if origin not in frame.index:
continue
train = available_target_training_frame(frame, feature_set.X.index, origin, horizon)
if len(train) < 240:
continue
pred = package_model_fit_predict(
model_name,
train[feature_set.X.columns],
train[y_col],
frame.loc[[origin], feature_set.X.columns],
)
else:
step_preds = []
for step, y_col in enumerate(path_target_columns(target_column, horizon, feature_set.y.columns), start=1):
frame = pd.concat([feature_set.X, feature_set.y[[y_col]]], axis=1).dropna()
if origin not in frame.index:
continue
train = available_target_training_frame(frame, feature_set.X.index, origin, step)
if len(train) < 240:
continue
step_preds.append(
package_model_fit_predict(
model_name,
train[feature_set.X.columns],
train[y_col],
frame.loc[[origin], feature_set.X.columns],
)
)
if len(step_preds) != horizon:
continue
pred = float(np.mean(step_preds))
rows.append(
{
"target_label": target_label,
"target_column": target_column,
"horizon": horizon,
"policy": policy,
"model": model_name,
"feature_case": feature_case,
"origin": origin,
"realized_date": origin + pd.DateOffset(months=horizon),
"prediction": pred,
"actual": actual,
"error": actual - pred,
}
)
return pd.DataFrame(rows)
table2_forecasts = []
for (target_label, horizon), (policy, model_name, feature_case) in PAPER_TABLE2.items():
table2_forecasts.append(
run_one_table2_cell(
processed=X,
levels=levels,
target_label=target_label,
horizon=horizon,
policy=policy,
model_name=model_name,
feature_case=feature_case,
)
)
table2_forecasts = pd.concat(table2_forecasts, ignore_index=True)
table2_rmse = (
table2_forecasts.assign(squared_error=lambda d: d["error"] ** 2)
.groupby(["target_label", "horizon", "policy", "model", "feature_case"], as_index=False)
.agg(n=("error", "size"), rmse=("squared_error", lambda x: float(np.sqrt(x.mean()))))
)
The annual-December origin loop is intentionally light. It verifies that every
Table 2 cell is callable with the package and creates a first comparison table.
For the full paper run, replace annual_december_origins() with every monthly
origin from 1980-01 through 2017-12, and keep the same package calls.
- Long-horizon boundary:
the package window uses origin-date cutoffs. With monthly
step=1, h=24 forecasts overlap month by month, but a scored forecast at origintstill needs the realized target att + 24. For a FRED-MD vintage ending in2017-12, the last h=24 origin that can be scored is2015-12. Tail blocks such as calendar year2016or2017should be skipped in a scored replication run unless a later vintage supplies the required realized target dates.
Scripted package run:
uv run python scripts/replication/gcls_2021_table2_batch.py \
--out-root /path/to/table2_run \
--vintage 2018-01 \
--start-year 1980 \
--end-year 2017 \
--workers 8 \
--tuning-mode paper-small
--tuning-mode off uses fixed paper-style defaults and is the fastest pipeline
check. --tuning-mode paper-small uses the same callable path as the paper
settings but with small lambda grids and short stochastic searches; use it for
server smoke runs. --tuning-mode paper expands Elastic Net to 100 lambda
values by 100 l1_ratio values, uses random 5-fold CV, and uses the configured
Bayesian/GA search sizes for boosted trees and linear boosting. Because the
original MATLAB GA/Bayesian optimizer state is not available, exact
bit-for-bit optimizer replication is not claimed.
Cell 6: Evaluate Against the FM Benchmark#
rf_forecasts = result.forecasts
fm_forecasts = ... # run the matching FM benchmark with the same window and target policy
relative = mf.metrics.relative_mse(
y_true=rf_forecasts["actual"],
y_model=rf_forecasts["prediction"],
y_benchmark=fm_forecasts["prediction"],
)
dm = mf.tests.dm_test(
(rf_forecasts["actual"] - rf_forecasts["prediction"]) ** 2,
(fm_forecasts["actual"] - fm_forecasts["prediction"]) ** 2,
horizon=3,
input_type="loss",
)
Expected output:
relative_mse < 1 means the reconstructed model improves on FM under squared loss.
dm returns a statistic and p-value for equal predictive accuracy against the benchmark.
Observed Package Diagnostics#
This section records package runs already executed from an isolated PyPI environment. These runs are not exact Table-identical replications. They are diagnostic checks that the public package can execute the Table 2 cells and that the resulting patterns can be compared with the paper’s main anchors.
- Corrected source smoke check:
after fixing target-label availability and pre-transformed target semantics, the local source checkout was run on
mf.data.load_fred_md(vintage="2018-01")with official preprocessing, targetINDPRO, horizon3, and origin2005-12-01.forecast_policy="direct_average", target_transform="value"normalized toaverage_value;forecast_policy="path_average", target_transform="value"kept value step targets. Both policies produced the same realized target,0.0011626872172302665. The direct row recordedtarget_availability_end_pos=558andtarget_availability_lag=3; the path row recorded step cutoffs{1: 560, 2: 559, 3: 558}. This is a smoke check of the corrected callable contract, not a Table 2 result.
Execution environment:
machine: server1
package:
macroforecast==0.9.5a1installed from PyPI in a fresh virtual environmentdata:
mf.data.load_fred_md(vintage="2018-01")raw panel: 708 monthly rows x 127 columns,
1959-01through2017-12preprocessing: official McCracken-Ng transformation codes with the package default FRED-MD preprocessing path
benchmark: matching FM cell with the same target, horizon, forecast policy, diagnostic origins, and realized target construction
- Important caveat:
the main diagnostic below uses every monthly realized date from
1980-01through2017-12, but still uses capped lightweight model settings and full-sample feature fitting for factors, MARX, and MAF. It is closer to the paper’s OOS calendar than the sparse diagnostics. However, it was run withmacroforecast==0.9.5a1, whose direct/path training cutoff did not enforce target realization availability forh > 1, and the diagnostic script built targets withaverage_changeon an already McCracken-Ng transformed panel. It also used capped lightweight RF/BT settings and fixed hyperparameters instead of the appendix optimizer settings. Treat the numbers below as an investigation log, not replication evidence.
Run |
Origins |
Table 2 cells |
Forecast rows |
Share beats FM |
Mean relative RMSE vs FM |
Runtime |
Role |
|---|---|---|---|---|---|---|---|
Ultrafast sparse OOS-lite |
1980, 1990, 2000, 2010, 2017 |
60 |
600 |
0.467 |
1.188 |
179.8s |
smoke diagnostic |
Capped sparse OOS-lite |
1980, 1983, 1986, 1989, 1992, 1995, 1998, 2001, 2004, 2007, 2010, 2013, 2016, 2017 |
60 |
1680 |
0.367 |
1.148 |
333.3s |
broader diagnostic |
Monthly capped OOS-lite |
every month from 1980-01 to 2017-12 |
60 |
54720 |
0.200 |
1.149 |
7461.1s |
main diagnostic |
- Verdict from the diagnostic runs:
invalid as replication evidence. The runs are useful only because they exposed two package/harness errors to fix before any serious Table 2 comparison: label-availability leakage for h-step targets and a double-transformed target construction. Re-run the grid after the corrected runner and
average_value/valuetarget semantics are released.
Target-level summary for the invalid monthly capped diagnostic:
Target |
Cells beating FM |
Mean relative RMSE vs FM |
|---|---|---|
CONS |
0/6 |
1.324 |
CPI |
0/6 |
1.317 |
EMP |
2/6 |
1.029 |
HOUST |
4/6 |
0.928 |
INCOME |
0/6 |
1.284 |
INDPRO |
0/6 |
1.331 |
M2 |
2/6 |
1.205 |
PPI |
0/6 |
1.022 |
RETAIL |
0/6 |
1.034 |
UNRATE |
4/6 |
1.012 |
Horizon-level summary for the invalid monthly capped diagnostic:
Horizon |
Cells beating FM |
Mean relative RMSE vs FM |
|---|---|---|
1 |
3/10 |
1.046 |
3 |
2/10 |
1.051 |
6 |
2/10 |
1.133 |
9 |
2/10 |
1.098 |
12 |
2/10 |
1.186 |
24 |
1/10 |
1.378 |
Full monthly capped diagnostic table:
Target |
h |
Target policy |
Learner |
Feature case |
n |
Relative RMSE vs FM |
Beats FM? |
|---|---|---|---|---|---|---|---|
CONS |
1 |
direct |
FM |
F |
456 |
1.000 |
no |
CONS |
3 |
direct |
RF |
F-Level |
456 |
1.052 |
no |
CONS |
6 |
path |
RF |
F-Level |
456 |
1.295 |
no |
CONS |
9 |
direct |
RF |
MAF |
456 |
1.060 |
no |
CONS |
12 |
path |
RF |
F-MAF |
456 |
1.493 |
no |
CONS |
24 |
path |
RF |
F-MAF |
456 |
2.041 |
no |
CPI |
1 |
direct |
AL |
MARX |
456 |
1.309 |
no |
CPI |
3 |
direct |
RF |
F |
456 |
1.121 |
no |
CPI |
6 |
direct |
RF |
F |
456 |
1.120 |
no |
CPI |
9 |
direct |
RF |
F |
456 |
1.123 |
no |
CPI |
12 |
direct |
RF |
F |
456 |
1.141 |
no |
CPI |
24 |
path |
RF |
X |
456 |
2.086 |
no |
EMP |
1 |
direct |
RF |
F-X-MARX-Level |
456 |
0.934 |
yes |
EMP |
3 |
path |
RF |
F-MARX |
456 |
1.009 |
no |
EMP |
6 |
path |
BT |
F-MARX |
456 |
0.933 |
yes |
EMP |
9 |
path |
BT |
F-MARX |
456 |
1.027 |
no |
EMP |
12 |
path |
BT |
F-MARX |
456 |
1.103 |
no |
EMP |
24 |
path |
BT |
MAF |
456 |
1.171 |
no |
HOUST |
1 |
direct |
EN |
F-Level |
456 |
1.064 |
no |
HOUST |
3 |
path |
EN |
F-Level |
456 |
1.032 |
no |
HOUST |
6 |
path |
RF |
F-X-MARX |
456 |
0.929 |
yes |
HOUST |
9 |
direct |
RF |
F-MAF |
456 |
0.749 |
yes |
HOUST |
12 |
direct |
RF |
F |
456 |
0.914 |
yes |
HOUST |
24 |
direct |
RF |
F |
456 |
0.882 |
yes |
INCOME |
1 |
direct |
RF |
MARX |
456 |
1.051 |
no |
INCOME |
3 |
direct |
RF |
F-MARX |
456 |
1.047 |
no |
INCOME |
6 |
path |
RF |
F-X-MARX |
456 |
1.163 |
no |
INCOME |
9 |
path |
RF |
F-MARX |
456 |
1.295 |
no |
INCOME |
12 |
path |
RF |
F-MARX |
456 |
1.445 |
no |
INCOME |
24 |
path |
RF |
F-X-MARX |
456 |
1.703 |
no |
INDPRO |
1 |
direct |
RF |
F-X-MARX-Level |
456 |
1.190 |
no |
INDPRO |
3 |
direct |
RF |
MARX |
456 |
1.219 |
no |
INDPRO |
6 |
path |
RF |
MARX |
456 |
1.337 |
no |
INDPRO |
9 |
path |
RF |
MARX |
456 |
1.545 |
no |
INDPRO |
12 |
path |
RF |
MARX |
456 |
1.658 |
no |
INDPRO |
24 |
direct |
RF |
F-Level |
456 |
1.034 |
no |
M2 |
1 |
direct |
RF |
X-Level |
456 |
0.961 |
yes |
M2 |
3 |
path |
AL |
X-Level |
456 |
0.993 |
yes |
M2 |
6 |
path |
RF |
F-Level |
456 |
1.320 |
no |
M2 |
9 |
direct |
RF |
F-Level |
456 |
1.169 |
no |
M2 |
12 |
direct |
BT |
F-Level |
456 |
1.103 |
no |
M2 |
24 |
path |
RF |
F-Level |
456 |
1.686 |
no |
PPI |
1 |
direct |
EN |
F-MARX |
456 |
1.008 |
no |
PPI |
3 |
direct |
EN |
MARX |
456 |
1.000 |
no |
PPI |
6 |
direct |
RF |
F |
456 |
1.002 |
no |
PPI |
9 |
direct |
RF |
F |
456 |
1.012 |
no |
PPI |
12 |
direct |
RF |
F |
456 |
1.021 |
no |
PPI |
24 |
direct |
BT |
F-Level |
456 |
1.088 |
no |
RETAIL |
1 |
direct |
FM |
F |
456 |
1.000 |
no |
RETAIL |
3 |
path |
BT |
F-X-MARX |
456 |
1.044 |
no |
RETAIL |
6 |
path |
AL |
F-MARX |
456 |
1.077 |
no |
RETAIL |
9 |
direct |
BT |
F-X-MARX-Level |
456 |
1.020 |
no |
RETAIL |
12 |
direct |
BT |
F-X-Level |
456 |
1.016 |
no |
RETAIL |
24 |
direct |
BT |
F-X-MAF |
456 |
1.046 |
no |
UNRATE |
1 |
direct |
BT |
F-MARX |
456 |
0.947 |
yes |
UNRATE |
3 |
direct |
RF |
F-X-MARX-Level |
456 |
0.991 |
yes |
UNRATE |
6 |
path |
RF |
F-MARX |
456 |
1.150 |
no |
UNRATE |
9 |
path |
LB |
F-X-MARX-Level |
456 |
0.977 |
yes |
UNRATE |
12 |
path |
LB |
F-X-MARX-Level |
456 |
0.965 |
yes |
UNRATE |
24 |
direct |
BT |
F-MAF |
456 |
1.042 |
no |
Why the diagnostics can differ from the paper:
Model caps: RF and BT use ten lightweight trees in the diagnostics, not the paper’s full tree counts and optimizer searches.
Feature fitting: diagnostic feature matrices use full-sample fitting to keep all early origins; the final replication should use fit-aware expanding feature state.
Hyperparameter search: the diagnostic does not run the appendix GA and Bayesian optimization loops.
Backend: the paper is MATLAB-based, while this run uses Python package backends. Tree defaults and optimizer randomness are not table-identical.
Gap Ledger#
Gap |
Current handling |
|---|---|
Exact FRED-MD vintage |
start from |
MATLAB GA/Bayesian optimizer defaults |
use the article’s ranges, random 5-fold CV, and iteration counts where possible; record backend difference and unavailable MATLAB optimizer seed/state |
BIC lag selection for AR/FM |
add a small benchmark helper or run fixed |
Boosted-tree depth interpretation |
document whether Python backend uses |
Linear Boosting candidate sampling |
implemented with explicit |
Full monthly OOS calendar |
monthly capped diagnostic now runs all realized months from |
Fit-aware feature replication |
current diagnostics use full-sample feature fitting; next replication pass should use expanding feature state inside each origin |
Paper captures |
add static captures of Table 1, Table 2, and selected Appendix B tables only after the settings page is accepted |