macroforecast.interpretation.dual#
macroforecast.interpretation.dual is the dedicated namespace for the dual
interpretation route in Goulet Coulombe, Goebel, and Klieber (2024), “Dual
Interpretation of Machine Learning Forecasts” (arXiv:2412.13076). Standard
variable-importance tools ask which predictor columns matter. Dual
interpretation asks which historical training observations matter for a
forecast.
The central identity is:
yhat_new = sum_i w_i(new) y_i
The weights w_i(new) are observation weights, also called data-portfolio
weights in the paper/code. A positive weight means the model borrows from that
historical outcome. A negative weight means the model uses that observation by
contrast. A concentrated weight vector means the forecast relies on a small
number of episodes. A high short position or high gross leverage means the
forecast is extrapolative rather than a simple local average.
Relation to Goulet Coulombe (2026), “Ordinary Least Squares as an Attention
Mechanism”: OLS-as-attention is the exact linear algebra route
X_test (X_train'X_train)^-1 X_train'. It is available through
macroforecast.interpretation.ols_attention_weights(),
ridge_attention_weights(), ols_attention_embedding(), and
ols_attention_equivalence(). This dual namespace is broader: it uses the
same historical-observation idea for ridge/OLS, kernel ridge, and random
forest data-portfolio weights, plus contribution, diagnostic, top-observation,
and group tables.
Reference Sources#
Source |
Used for |
|---|---|
Goulet Coulombe, Goebel, and Klieber (2024), “Dual Interpretation of Machine Learning Forecasts” |
Paper terminology and interpretation target. |
Goulet Coulombe (2026), “Ordinary Least Squares as an Attention Mechanism” |
Exact OLS/ridge attention identity and whitened embedding interpretation. |
|
Ridge, kernel-ridge, and random-forest observation-weight formulas. |
|
Forecast concentration, forecast short position, forecast leverage, and forecast turnover definitions. |
|
Original model-route inventory: OLS, RF, LGB, RR, KRR, and NN. |
Implemented now: ridge/OLS, kernel ridge, and sklearn-style random forest. Deferred routes: boosted-tree AXIL, LGB+/LGBA+ channel-specific weights, neural embedding-ridge approximation, and classification log-odds decomposition.
Public Functions#
Function |
Input |
Output |
Purpose |
|---|---|---|---|
|
model, train features, train target, optional test features |
|
Run the paper-aligned ridge/KRR/RF path and return all dual tables together. |
|
completed |
|
Build a dual sidecar for a completed runner result. |
|
model, |
long |
Compute historical observation/data-portfolio weights. |
|
weights and |
long |
Multiply observation weights by historical outcomes. |
|
weights |
|
Compute concentration, short position, leverage, gross leverage, and turnover. |
|
weights or contributions |
long |
Return the largest historical observations for each forecast. |
|
weights/contributions and a group mapping |
|
Aggregate observation weights over user-defined regimes or episodes. |
|
result object |
dict of |
Expand the result for |
Backward-compatible aliases are still available:
Alias |
Preferred name |
|---|---|
|
|
|
|
|
|
|
|
Public Flow#
import macroforecast as mf
dual = mf.interpretation.dual.dual_interpretation(
model,
X_train,
y_train,
X_test,
method="random_forest",
top_n=10,
groups={
"gfc": gfc_train_dates,
"covid": covid_train_dates,
},
)
tables = dual.to_tables(prefix="inflation")
For completed forecast runs, attach the same result as a sidecar:
result = mf.forecasting.run(feature_set, "ridge", window=window)
result = mf.interpretation.dual.dual_from_forecast_result(
result,
fit,
X_train,
y_train,
X_test,
method="ridge",
)
# Equivalent method form:
result = result.with_dual(fit, X_train, y_train, X_test, method="ridge")
forecasting.run() does not compute dual interpretation automatically. The
completed forecast table does not contain the exact fitted estimator,
training-feature matrix, training target, or forecast-row feature matrix. Those
objects must be passed explicitly to avoid silent look-ahead or stale-design
errors.
For a ridge/KRR route, model can be None:
dual = mf.interpretation.dual.dual_interpretation(
None,
X_train,
y_train,
X_test,
method="krr",
kernel="laplace",
sigma=1e-4,
lambda_=0.1,
)
dual_interpretation#
macroforecast.interpretation.dual.dual_interpretation(
model,
X_train,
y_train,
X_test=None,
*,
method="auto",
lambda_=1e-8,
kernel="linear",
sigma=1.0,
add_intercept=False,
ridge_penalty_scale="n_train",
normalize=False,
center=False,
include_base=False,
top_n=10,
top_sort_by="abs_weight",
top_q=0.05,
groups=None,
include_contributions=True,
include_diagnostics=True,
include_top_observations=True,
include_group_weights=None,
)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
fitted model or |
required |
Required for random-forest weights. Optional for ridge/KRR because weights are closed-form from |
|
pandas |
required |
Training feature matrix. Its index becomes |
|
pandas |
required |
Training target aligned to |
|
pandas |
|
Forecast-row feature matrix. If omitted, each training row is explained against the training panel. |
|
string |
|
|
|
float |
|
Ridge/KRR regularization. |
|
string |
|
KRR kernel: |
|
float |
|
Kernel bandwidth convention used by the reviewed code: |
|
bool |
|
Adds an unpenalized intercept for ridge/OLS. The paper code usually works with standardized no-intercept matrices. |
|
string |
|
Ridge penalty convention. |
|
bool |
|
Re-normalize row weights to sum to one. Default is false because leverage and negative weights are meaningful diagnostics. |
|
bool |
|
Center |
|
bool |
|
With |
|
int |
|
Number of top observations returned per forecast row. |
|
string |
|
|
|
float |
|
Share of observations used in concentration. Values above |
|
mapping or |
|
Named historical episode groups, mapping group name to training-index labels. |
|
bool |
varies |
Include or skip contribution, diagnostic, top-observation, and group tables. |
Output: DualInterpretationResult.
Field |
Type |
Meaning |
|---|---|---|
|
|
Observation/data-portfolio weights. |
|
|
Observation-level forecast contributions. |
|
|
Forecast concentration, short position, leverage, gross leverage, and turnover. |
|
|
Largest historical observations per forecast. |
|
|
Group-level observation weights and contributions. |
|
dict |
Paper route, implemented/deferred routes, and options used. |
dual_from_forecast_result#
macroforecast.interpretation.dual.dual_from_forecast_result(
result,
model,
X_train,
y_train,
X_test=None,
*,
attach=True,
sidecar_name="dual",
**dual_options,
)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
required |
Completed forecast runner output. |
|
fitted model or |
required |
Same model argument passed to |
|
pandas objects |
required except |
Exact design matrices used for the dual explanation. |
|
bool |
|
If true, return a copy of |
|
str |
|
Name used in |
|
keyword args |
none |
Forwarded to |
Output: with attach=True, a new ForecastResult; with attach=False, a
standalone DualInterpretationResult.
observation_weights#
macroforecast.interpretation.dual.observation_weights(
model,
X_train,
X_test=None,
*,
method="auto",
lambda_=1e-8,
kernel="linear",
sigma=1.0,
add_intercept=False,
ridge_penalty_scale="n_train",
normalize=False,
)
Implemented routes:
Route |
Formula / logic |
Notes |
|---|---|---|
Ridge / OLS |
|
Set |
Kernel ridge |
|
Kernels: |
Random forest |
For each tree, assign test and train rows to leaves; train rows in the same leaf share weight; average across trees |
For sklearn forests, bootstrap sample counts are used when recoverable. |
Output columns:
Column |
Meaning |
|---|---|
|
Forecast-row position and index. |
|
Historical observation position and index. |
|
Signed and absolute observation weight. |
|
Implemented route: |
The dense matrix is attached as attrs["weight_matrix"] with shape
(n_test, n_train).
observation_contributions#
macroforecast.interpretation.dual.observation_contributions(
weights,
y_train,
*,
center=False,
include_base=False,
)
Input: an observation-weight table and the aligned training target.
Output columns add:
Column |
Meaning |
|---|---|
|
Realized historical outcome. |
|
|
|
|
|
Sum of contributions for the forecast row. |
|
|
Default center=False preserves the exact identity
prediction = weights @ y_train. Centering is useful for plots but changes the
table into a base-plus-centered-contribution decomposition.
forecast_diagnostics#
macroforecast.interpretation.dual.forecast_diagnostics(weights, *, top_q=0.05)
Output:
Column |
Paper/code meaning |
|---|---|
|
Forecast concentration: sum of top absolute weights divided by total absolute weight. |
|
Forecast short position: signed sum of negative weights. |
|
Absolute short-side exposure. |
|
Signed weight sum. |
|
Sum of absolute weights. |
|
Sum of absolute weight changes relative to the previous forecast row. |
|
Diagnostic settings. |
Negative weights are not automatically errors. In this paper they identify contrast-based use of historical observations. The caution is economic: macroeconomic shocks are often asymmetric, so a mirror-image historical analogy may be a weak explanation even if the model uses it.
top_observations#
macroforecast.interpretation.dual.top_observations(
weights,
*,
y_train=None,
n=10,
sort_by="abs_weight",
)
Input: observation weights or observation contributions. If y_train is
provided and the table lacks contribution, contributions are computed first.
Output: top historical observations per forecast row with a rank column.
Supported sort_by values: abs_weight, weight, contribution, and
abs_contribution.
group_observation_weights#
macroforecast.interpretation.dual.group_observation_weights(
weights,
groups,
*,
y_train=None,
)
Input:
Argument |
Meaning |
|---|---|
|
Observation-weight or contribution table. |
|
Mapping from group name to training-index labels. |
|
Optional training target used to create contributions before grouping. |
Example:
groups = {
"gfc": pd.period_range("2007Q4", "2009Q2", freq="Q").to_timestamp("Q"),
"covid": pd.period_range("2020Q1", "2021Q2", freq="Q").to_timestamp("Q"),
}
grouped = mf.interpretation.dual.group_observation_weights(
dual.weights,
groups,
y_train=y_train,
)
Output columns: test_row, test_index, episode_group, weight,
abs_weight, n_episodes, and, when available, contribution and
abs_contribution.
Output Integration#
DualInterpretationResult.to_tables(prefix="dual") returns:
Table key |
Meaning |
|---|---|
|
Long observation-weight table. |
|
Long contribution table, when requested. |
|
Concentration, short-position, leverage, gross-leverage, and turnover table. |
|
Top historical observations per forecast row. |
|
Group-level weights/contributions, when groups are provided. |
|
Result metadata as key/value rows. |
The output module recognizes this result directly:
bundle = mf.output.bundle_outputs(
forecasts=result,
interpretation={"dual": dual},
metadata={"study": "inflation_dual"},
)
manifest = mf.output.write_artifacts(
bundle,
"results/inflation_dual",
layout="grouped",
)
With layout="grouped", dual tables are written under:
interpretation/dual/
The same grouped path is used when a ForecastResult contains a dual sidecar:
result = result.with_dual(fit, X_train, y_train, X_test, method="ridge")
mf.output.write_artifacts(result, "results/dual_run", layout="grouped")
This keeps DualML observation-based explanations separate from SHAP, oShapley/PBSV, PDP/ICE/ALE, and other feature-based interpretation outputs.