GCLS 2021 INDPRO Reconstructed Replication Report#

This report records the corrected macroforecast run for the INDPRO cells in Table 2 of Goulet Coulombe, Leroux, Stevanovic, and Surprenant (2021), “Macroeconomic data transformations matter,” International Journal of Forecasting, 37(4), 1338-1354.

Report date:

2026-06-06

Replication level:

reconstructed_design, not exact_table_replication.

Execution scope:

INDPRO only, six horizons, paper Table 2 best-specification cells, compared against the matching factor-model benchmark on the same realized-target support.

Main result:

the corrected INDPRO best-specification cells beat the matching FM benchmark at every horizon. Mean relative RMSE is 0.889818; median relative RMSE is 0.935358. Relative RMSE below 1 means the Table 2 best-specification cell has lower RMSE than the FM benchmark on the common support.

Status#

Item

Status

Best-spec INDPRO run

complete

FM benchmark run

complete

Common-support comparison

complete

Failed tasks

0

Active server jobs

none detected

Table-identical replication claim

not made

The run should be read as corrected package evidence. It is not a claim that the numbers are identical to the paper’s Table 2, because the checked paper, appendix, local files, and public author materials do not expose a full machine-readable replication package, exact FRED-MD vintage, or exact MATLAB backend state.

Source Material#

Sources used to define the replication setting:

  • IJF article DOI: https://doi.org/10.1016/j.ijforecast.2021.05.005

  • arXiv working-paper page: https://arxiv.org/abs/2008.01714

  • local main PDF: /Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/raw/papers/10.1016j.ijforecast.2021.05.005.pdf

  • local appendix PDF: /Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/raw/papers/10.1016j.ijforecast.2021.05.005_appendix.pdf

  • local review note: /Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/papers/reviews/10.1016j.ijforecast.2021.05.005-ea1152c5.md

  • author MARX snippet: /Users/nanyeon/Library/CloudStorage/SynologyDrive-second_brain/wiki/raw/paper_code/coulombe_site_github_20260530/marx/MARX_cheap_code.R

Execution Artifacts#

The run was executed on server1.

Object

Path

Source checkout used for run

/home/nanyeon99/project/macroforecast_gcls_replication_main_2f526bdf

Source checkout commit

2f526bdf

Best-spec output root

/home/nanyeon99/project/macroforecast_gcls_runs/table2_indpro_full_20260605

FM benchmark output root

/home/nanyeon99/project/macroforecast_gcls_runs/indpro_fm_benchmark_20260606

Relative comparison output root

/home/nanyeon99/project/macroforecast_gcls_runs/indpro_relative_vs_fm_20260606

Relative comparison CSV

/home/nanyeon99/project/macroforecast_gcls_runs/indpro_relative_vs_fm_20260606/indpro_relative_vs_fm.csv

The best-spec run produced about 19M of output, the FM benchmark about 16M, and the relative comparison about 336K.

Data And Sample#

Axis

Setting

Dataset

FRED-MD

Vintage

2018-01

Loader

mf.data.load_fred_md(vintage="2018-01")

Raw panel

708 monthly rows x 127 columns

Raw period

1959-01 through 2017-12

Preprocessing

official McCracken-Ng FRED-MD t-code pipeline

Processed panel

706 monthly rows x 127 columns

Processed period

1959-03 through 2017-12

Initial estimation start

1960-01

Test calendar

monthly origins from 1980-01 through 2017-12 where realized targets are available

Horizons

1, 3, 6, 9, 12, 24 months

Target

INDPRO

The h-step forecast at origin t is scored only when the realized target dated t + h is available. For h=24 with a 2018-01 FRED-MD vintage ending in 2017-12, the final scored origins stop before the tail origins whose realizations would fall after the vintage endpoint. This is expected and is not a missing monthly-step bug: monthly origins still move by one month, but scoring requires the future realized target.

Best-Specification Cells#

The batch script fixes the Table 2 best-specification cell for each INDPRO horizon:

Horizon

Target policy

Model

Feature case

1

direct_average

random_forest

F-X-MARX-Level

3

direct_average

random_forest

MARX

6

path_average

random_forest

MARX

9

path_average

random_forest

MARX

12

path_average

random_forest

MARX

24

direct_average

random_forest

F-Level

Random forest was run with n_estimators=200, min_samples_leaf=5, max_features=1/3, bootstrap=True, random_state=123, and n_jobs=1. Hyperparameter tuning was off for this pass, so this is a fixed paper-style configuration rather than a full appendix optimizer replication.

Command Log#

Best-spec INDPRO batch:

uv run python scripts/replication/gcls_2021_table2_batch.py \
  --out-root /home/nanyeon99/project/macroforecast_gcls_runs/table2_indpro_full_20260605 \
  --targets INDPRO \
  --workers 3 \
  --vintage 2018-01 \
  --cache-root /home/nanyeon99/project/macroforecast_replication_cache \
  --start-year 1980 \
  --end-year 2017 \
  --n-estimators 200 \
  --random-state 123 \
  --tuning-mode off \
  --skip-existing

Observed batch summary:

status: done
workers: 3
task_count: 6
finished_count: 6
failed_count: 0
elapsed: about 15.4 hours

The matching FM benchmark used the same single-cell runner with --feature-case F --model far, horizon-specific target policies matching the best-spec cell, and the same 2018-01 vintage, 1980 to 2017 calendar, and target construction.

Conceptually:

uv run python scripts/replication/gcls_2021_table2_single.py \
  --target-alias INDPRO \
  --horizon <horizon> \
  --feature-case F \
  --target-policy <matching_policy> \
  --model far \
  --vintage 2018-01 \
  --cache-root /home/nanyeon99/project/macroforecast_replication_cache \
  --out-dir /home/nanyeon99/project/macroforecast_gcls_runs/indpro_fm_benchmark_20260606/<task_slug> \
  --start-year 1980 \
  --end-year 2017 \
  --random-state 123 \
  --tuning-mode off \
  --skip-existing

The relative comparison aligns best-spec and FM forecast files by realized target date, checks that the realized targets are identical, and computes RMSE and relative MSE/RMSE on the common support.

Absolute Results#

Horizon

Best-spec task

Rows

RMSE

MAE

1

INDPRO_h1_direct_average_random_forest_F-X-MARX-Level

455

0.005964

0.004248

3

INDPRO_h3_direct_average_random_forest_MARX

453

0.004482

0.003086

6

INDPRO_h6_path_average_random_forest_MARX

450

0.003937

0.002727

9

INDPRO_h9_path_average_random_forest_MARX

447

0.003559

0.002487

12

INDPRO_h12_path_average_random_forest_MARX

444

0.003328

0.002316

24

INDPRO_h24_direct_average_random_forest_F-Level

432

0.002407

0.001698

The row counts fall with the horizon because later origins need later realized targets. The h=24 row count is 432, corresponding to the available common support after excluding tail origins whose 24-month-ahead target is unavailable in the vintage.

Relative Results Against FM#

Horizon

Best RMSE

FM RMSE

Relative MSE

Relative RMSE

Common rows

Beats FM

1

0.005964

0.006283

0.900921

0.949169

455

yes

3

0.004482

0.004573

0.960364

0.979982

453

yes

6

0.003937

0.004159

0.896344

0.946754

450

yes

9

0.003559

0.003852

0.853704

0.923961

447

yes

12

0.003328

0.003752

0.786731

0.886979

444

yes

24

0.002407

0.003692

0.425190

0.652066

432

yes

Common-support checks:

actual_max_abs_diff: 0.0 for every horizon
invalid_rows: 0
nan_prediction_rows: 0
nan_actual_rows: 0

Interpretation:

  • actual_max_abs_diff=0.0 means the best-spec and FM rows use the same realized target values after alignment.

  • relative_mse < 1 means the best-spec cell has lower squared-error loss than FM.

  • relative_rmse < 1 means the same result expressed in RMSE units.

  • The largest improvement appears at h=24, where the RF F-Level direct-average cell has relative RMSE 0.652066.

What Changed Relative To The Invalid Diagnostic Runs#

Earlier package diagnostics were useful for finding defects, but they are not valid replication evidence. The corrected run differs in the following material ways:

Issue found in earlier diagnostics

Corrected behavior

h-step labels were allowed when the realized target date was after the forecast origin support

forecasts are scored only when t + h is available

average_change was applied to an already McCracken-Ng transformed target

direct-average cells use average_value; path-average cells use one-step value targets

target-derived paper blocks were missing

MARX_y and MAF_y can be built from input="target_panel"

feature materialization was too slow for repeated windows

runner now supports cached/corrected feature construction paths

invalid runs compared diagnostic shortcuts

corrected run compares best-spec cells against matching FM cells on identical actual support

Remaining Replication Gaps#

These gaps are not package runtime failures; they are evidence boundaries for claiming exact paper-table equality.

Gap

Consequence

Exact FRED-MD vintage is not stated in the checked materials

2018-01 is the first defensible post-2017M12 candidate, but may not be the paper’s exact vintage

Full machine-readable replication package was not found

exact table reproduction cannot be audited line-by-line against author code

MATLAB tree and optimizer defaults are not exactly portable

Python/scikit-style RF/BT values can differ even under the same high-level algorithm

This pass uses tuning-mode=off

appendix GA/Bayesian/random-CV tuning is not yet replicated for every learner

Benchmark is fixed FM mapping

BIC-selected FM variants should be added if the paper’s benchmark implementation is recovered

This report covers INDPRO only

ten-target Table 2 completion still requires EMP, UNRATE, INCOME, CONS, RETAIL, HOUST, M2, CPI, and PPI

Next Actions#

  1. Run the same corrected pipeline for the remaining nine Table 2 targets.

  2. Add a benchmark helper that can switch between fixed FM and BIC-selected FM once the paper’s exact benchmark selection rule is pinned down.

  3. Run at least one paper-small tuning pass for Elastic Net, Adaptive Lasso, Linear Boosting, and Boosted Trees to verify the tuned-learner branch.

  4. Add paper-table capture comparison in the notebook page after the full ten-target table is available.

  5. Keep the invalid diagnostic section in the setting page as a debugging log, but do not use those values as evidence.