op#

Back to L7 | Browse all axes | Browse all options

Axis op on sub-layer L7_A_importance_dag_body (layer l7).

Sub-layer#

L7_A_importance_dag_body

Axis metadata#

  • Default: 'permutation_importance'

  • Sweepable: False

  • Status: operational

Operational status summary#

  • Operational: 34 option(s)

  • Future: 6 option(s)

Options#

accumulated_local_effect – operational#

Apley & Zhu (2020) accumulated local effects – PDP alternative robust to correlation.

See accumulated_local_effect function page for full documentation + parameters + standalone usage. Standalone: mf.functions.ale_importance.

attention_weights – operational#

OLS-as-attention closed-form attention matrix (Goulet Coulombe 2026).

Goulet Coulombe (2026) ‘OLS as an Attention Mechanism’ Eq. 3 closed form: Ω = X_test · (X'_train · X_train)⁻¹ · X'_train. The (n_test, n_train) matrix encodes how strongly each test point attends to each training point under an OLS / ridge fit, identical to the representer expansion of the dual ridge solution. Output table carries one row per training observation (per-test-point weight aggregates) plus the full attention matrix and representer-identity diagnostics inline via frame.attrs.

Promoted from future to operational in Phase B-10 (paper-10 replication). Compatible with linear-family L4 models (ols / ridge / lasso / elastic_net / bayesian_ridge / huber).

When to use

Linear-family attribution as a kernel-attention map; pedagogical / replication of paper-10 Coulombe (2026).

When NOT to use

Non-linear models (the closed form requires a linear estimator).

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Goulet Coulombe (2026) ‘OLS as an Attention Mechanism’, working paper – Eq. 3 closed-form attention matrix.

Related options: dual_decomposition, model_native_linear_coef, shap_linear

Last reviewed 2026-05-05 by macroforecast author.

bootstrap_jackknife – operational#

Bootstrap / jackknife confidence bands around any importance score.

Wraps another importance op and re-runs it on B stationary-bootstrap (Politis-White 2004) or jackknife resamples. Emits (score_mean, score_p2.5, score_p97.5) per feature; pair with the boxplot figure type.

When to use

Reporting confidence-banded importance rankings.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Politis & White (2004) ‘Automatic Block-Length Selection for the Dependent Bootstrap’, Econometric Reviews 23(1): 53-70.

Related options: rolling_recompute

Last reviewed 2026-05-05 by macroforecast author.

boruta_selection – future#

(no schema description for boruta_selection)

TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full OptionDoc entry under macroforecast/scaffold/option_docs/l7.py are welcome.

bvar_pip – operational#

Posterior inclusion probabilities for BVAR / Bayesian linear models.

For each predictor j, returns P(β_j 0 | data) – the posterior probability that the variable enters the model with non-zero effect. Compatible with bvar_minnesota / bvar_normal_inverse_wishart / bayesian_ridge.

When to use

Bayesian model selection; comparing variable importance under posterior uncertainty.

When NOT to use

Frequentist models – use lasso_inclusion_frequency for an analogous stability score.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Koop & Korobilis (2010) ‘Bayesian Multivariate Time Series Methods for Empirical Macroeconomics’, Foundations and Trends in Econometrics 3(4): 267-358.

Related options: lasso_inclusion_frequency

Last reviewed 2026-05-05 by macroforecast author.

cumulative_r2_contribution – operational#

Cumulative R² gain from adding features one at a time (forward-selection ranking).

Re-fits the L4 estimator with features added in descending order of marginal contribution; each step records the cumulative OOS-R² achieved. Pair with the lineplot figure type to visualise the marginal information value of each predictor.

When to use

Quantifying how many predictors the model actually needs to reach a target R².

When NOT to use

Highly correlated features – the order is sensitive to entry rules.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Stock & Watson (2012) ‘Generalized Shrinkage Methods for Forecasting using Many Predictors’, JBES 30(4): 481-493.

Related options: lasso_inclusion_frequency, lofo

Last reviewed 2026-05-05 by macroforecast author.

deep_lift – operational#

DeepLIFT (Shrikumar 2017) – difference-from-reference attribution.

Decomposes the difference f(x) - f(x') into per-feature contributions using rescaled-difference / reveal-cancel rules for non-linear activations. Faster than integrated gradients but with less rigorous axiomatic backing.

When to use

NN attribution where integrated-gradients runtime is too high.

When NOT to use

When the completeness / sensitivity axioms matter – prefer integrated gradients.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Shrikumar, Greenside & Kundaje (2017) ‘Learning Important Features Through Propagating Activation Differences’, ICML.

Related options: integrated_gradients, gradient_shap, saliency_map, shap_deep

Last reviewed 2026-05-05 by macroforecast author.

dual_decomposition – operational#

Forecast-as-weighted-training-targets via the representer theorem (Coulombe et al. 2024); equivalently a restricted attention module (Goulet Coulombe 2026).

Goulet Coulombe / Goebel / Klieber (2024) ‘Dual Interpretation of ML Forecasts’. Surfaces each prediction as a weighted combination of historical training targets; weights recovered through the representer theorem applied to the fitted model. Atomic L7 primitive: SHAP-family ops decompose by feature contribution, this op decomposes by training-row contribution, the natively interpretable view for small-sample temporally-ordered macro panels.

Linear families (operational v0.8.9): ridge / OLS / lasso via closed-form w(xₜ) = X(X'X + αI)⁻¹xₜ.

Tree-bagging ensembles (operational v0.9.1 dev-stage v0.9.0B-5): RandomForestRegressor / ExtraTreesRegressor via the leaf-co-occurrence kernel wⱼ(xₜ) = (1/B) Σ_b 1[j B_b] · 1[leaf_b(xₜ) == leaf_b(xⱼ)] / leaf_size_b(xⱼ) where B_b is tree b’s bootstrap subset (sklearn estimators_samples_). Reproduces forest.predict to machine precision (~4e-16). Helper _rf_leaf_cooccurrence_weights in core/runtime.py.

Output frame layout: rows = training row labels, columns = mean_weight, abs_mean_weight, max_abs_weight. Full (n_test × n_train) weight matrix attached as frame.attrs['dual_weights']; frame.attrs['method'] carries 'linear_closed_form' or 'rf_leaf_cooccurrence_kernel' for downstream renderers.

Inline portfolio diagnostics. The output artifact also carries the four portfolio metrics from the same paper (HHI = Σwⱼ², short = Σ max(0,-wⱼ), turnover = ‖wₜ - wₜ₋₁‖₁, leverage = ‖w‖₁) at frame.attrs['portfolio_metrics']. These are trivial numpy reductions on the primary dual weights and do not warrant their own L7 op (decomposition discipline).

OLS-as-attention equivalence. Goulet Coulombe (2026) ‘Ordinary Least Squares as an Attention Mechanism’ (SSRN 5200864) shows that the same dual representation ŷ_test = F_test F_train' y_train (eq. 7) coincides with a restricted attention module: queries Q = X_test W, keys K = X_train W with W = U Λ^{-½}, values V = y, and the softmax replaced by the identity (eqs. 17-19). The training-row weights ωⱼᵢ = ⟨Fⱼ, Fᵢ⟩ surfaced by this op are exactly the (restricted) attention weights of that paper. Same compute, different vocabulary – no separate runtime needed.

Boosted-tree (gradient_boosting / xgboost / lightgbm) and NN extensions are deferred: residual-bagging and learned non-linear models do not admit a clean sum-of-training-targets dual representation.

When to use

Decomposing macro forecasts into training-target contributions; explaining ML predictions to econometric audiences; bridging classical OLS to transformer-attention literature; per-prediction provenance for tree ensembles.

When NOT to use

Boosted-tree / NN families (gradient_boosting, xgboost, lightgbm, mlp, lstm, etc.) – raises NotImplementedError; the residual-bagging structure does not factor into a sum-of-training-targets representation.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Goulet Coulombe, Goebel & Klieber (2024) ‘Dual Interpretation of Machine Learning Forecasts’, arXiv:2412.13076.

  • Goulet Coulombe (2026) ‘Ordinary Least Squares as an Attention Mechanism’, SSRN 5200864 – shows OLS predictions ŷ_test = F_test F_train’ y_train (eq. 7) coincide with a restricted attention module (eqs. 17-19, identity activation, tied W_Q W_K’ = (X_train’ X_train)^{-1}). The dual_decomposition op already implements the same compute via the closed-form ridge representer; no separate runtime needed.

Related options: permutation_importance, shap_kernel

Last reviewed 2026-05-05 by macroforecast author.

fevd – operational#

Forecast error variance decomposition (Sims 1980).

For a fitted VAR (var / factor_augmented_var / bvar_*), decomposes the h-step-ahead forecast error variance into shares attributable to each orthogonalised shock. Default Cholesky orthogonalisation; ordering is set by the column order of the VAR. statsmodels fevd backend.

When to use

Standard VAR analysis; interpreting how shocks propagate across variables.

When NOT to use

Non-VAR models – use permutation_importance instead.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Sims (1980) ‘Macroeconomics and Reality’, Econometrica 48(1): 1-48.

Related options: historical_decomposition, generalized_irf, forecast_decomposition

Last reviewed 2026-05-05 by macroforecast author.

forecast_decomposition – operational#

Decompose a single forecast into per-feature contributions.

For a single (cell, target, horizon) forecast, returns a table (feature contribution) summing to forecast - benchmark. Linear models: β_j x_j. Trees: Tree SHAP. NN: gradient SHAP. Universal entry point unified across families – delegates to the appropriate family-specific op.

When to use

Reporting feature contributions for a specific forecast (e.g. ‘why is the model bullish on Q3 GDP’).

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

Related options: shap_tree, shap_linear, shap_deep

Last reviewed 2026-05-05 by macroforecast author.

friedman_h_interaction – operational#

Friedman & Popescu (2008) H-statistic for two-way feature interactions.

For feature pair (j, k), computes H²_{jk} = Σ[PD_{jk}(x_j, x_k) - PD_j(x_j) - PD_k(x_k)]² / Σ PD²_{jk}. [0, 1]; the share of the joint partial-dependence variance attributable to non-additive structure.

When to use

Identifying which feature pairs the model treats non-additively.

When NOT to use

Wide panels – the M² PDP grid grows expensive.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Friedman & Popescu (2008) ‘Predictive Learning via Rule Ensembles’, Annals of Applied Statistics 2(3): 916-954.

Related options: shap_interaction, partial_dependence

Last reviewed 2026-05-05 by macroforecast author.

generalized_irf – future#

Pesaran-Shin (1998) generalized impulse-response function (future, v0.9.x).

Order-invariant IRF where each shock is constructed as the multivariate-normal projection of all residuals onto the j-th canonical direction. Distinct from Cholesky orthogonalised IRFs (which use a recursive lower-triangular rotation). Future – the runtime currently raises NotImplementedError. For the Cholesky variant operational since v0.2, use orthogonalised_irf.

When to use

VAR analysis where the variable ordering has no theoretical motivation – order-invariance is the desired property.

When NOT to use

When a recursive identification IS theoretically motivated – use orthogonalised_irf instead.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Pesaran & Shin (1998) ‘Generalized impulse response analysis in linear multivariate models’, Economics Letters 58(1): 17-29.

Related options: fevd, historical_decomposition, orthogonalised_irf

Last reviewed 2026-05-05 by macroforecast author.

gradient_shap – operational#

Gradient SHAP – expectation-of-gradient SHAP approximation (Lundberg-Lee 2017).

Approximates SHAP values via expected gradients at random interpolations between input and a baseline distribution. Captum-backed; requires the macroforecast[deep] extra.

When to use

Differentiable models (NN families) where exact SHAP is too expensive.

When NOT to use

Non-NN models.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.

Related options: shap_deep, integrated_gradients, saliency_map, deep_lift

Last reviewed 2026-05-05 by macroforecast author.

group_aggregate – operational#

Aggregate per-feature importance into pre-defined block sums (FRED-SD blocks, theme blocks).

Sums (or means) per-feature importance scores over groups defined by a user-supplied or built-in mapping table. v0.25 ships 8 built-in blocks: 8-group FRED-MD + 14-group FRED-QD + 50-state FRED-SD grids.

Required input for the FRED-SD us_state_choropleth figure.

When to use

FRED-MD / -QD / -SD analyses where per-series importance should roll up to thematic / geographic blocks.

When NOT to use

Custom panels lacking a meaningful grouping.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • McCracken & Ng (2016) ‘FRED-MD: A Monthly Database for Macroeconomic Research’, JBES 34(4): 574-589.

Related options: lineage_attribution, transformation_attribution

Last reviewed 2026-05-05 by macroforecast author.

historical_decomposition – operational#

Historical decomposition (Burbidge-Harrison 1985) of the realised series into structural shocks.

Reconstructs each variable’s realised path as the convolution of orthogonalised IRF coefficients (Cholesky-rotated structural form) with the time series of structural shocks recovered from the reduced-form residuals. Returns the per-shock cumulative absolute contribution to the target variable’s realised fluctuations; the row labels match the VAR variable ordering.

When to use

Telling the historical narrative – which shocks drove specific recessions / expansions.

When NOT to use

Non-VAR models.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Burbidge & Harrison (1985) ‘A historical decomposition of the great depression to determine the role of money’, JME 16(1): 45-54.

Related options: fevd, orthogonalised_irf

Last reviewed 2026-05-05 by macroforecast author.

integrated_gradients – operational#

Integrated gradients (Sundararajan 2017) – path-integral attribution.

Computes (x_j - x'_j) · ∫₀¹ ∂f(x' + α(x - x')) / ∂x_j for a baseline x' (default zero). Satisfies the completeness axiom (sum of attributions equals f(x) - f(x')). Captum-backed.

When to use

Axiomatically-grounded NN attribution (Sundararajan completeness + sensitivity properties).

When NOT to use

Non-NN models; pathological models where integration along the linear path is misleading.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Sundararajan, Taly & Yan (2017) ‘Axiomatic Attribution for Deep Networks’, ICML.

Related options: gradient_shap, saliency_map, deep_lift

Last reviewed 2026-05-05 by macroforecast author.

lasso_inclusion_frequency – operational#

Bootstrap inclusion frequency for Lasso-selected features (Bach 2008).

For each feature j, computes the share of B Lasso fits (on bootstrap or rolling-window resamples) for which β̂_j 0. Returns a stability score in [0, 1]. v0.25 supports sampling = bootstrap | rolling | both (via leaf_config).

When to use

Feature-selection stability audit for Lasso / Lasso-Path / Elastic Net.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Bach (2008) ‘Bolasso: model consistent Lasso estimation through the bootstrap’, ICML.

  • Meinshausen & Bühlmann (2010) ‘Stability selection’, JRSS Series B 72(4): 417-473.

Related options: model_native_linear_coef, bootstrap_jackknife

Last reviewed 2026-05-05 by macroforecast author.

lasso_path_selection – future#

(no schema description for lasso_path_selection)

TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full OptionDoc entry under macroforecast/scaffold/option_docs/l7.py are welcome.

lineage_attribution – operational#

Trace importance back through L3 feature lineage to the L1 raw source.

For each L3 feature, walks the L3.metadata column_lineage graph to identify the chain of transforms that produced it; attributes the L7 importance score back to the L1 raw column at the head of the lineage chain.

Solves the ‘PCA factors are most important; what does that mean in terms of original variables?’ problem.

When to use

Pipelines with PCA / factor / dimensionality-reduction stages where downstream importance must be traced back to raw inputs.

When NOT to use

Pipelines with only direct-input features (no L3 transforms).

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

Related options: group_aggregate, transformation_attribution

Last reviewed 2026-05-05 by macroforecast author.

lofo – operational#

Leave-one-feature-out (LOFO) refit importance.

For each predictor j, refits the L4 estimator on the panel with column j removed and reports the OOS-loss delta. More expensive than permutation importance (one extra fit per feature) but free from the permutation-and-correlation interaction.

Compatible with every L4 family; runtime scales as n_features × cost_per_fit.

When to use

Small / medium feature panels (< 100) where N-extra fits are affordable.

When NOT to use

Wide panels (n_features > 200) – prohibitive runtime.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Lemaître, Aridas & Nogueira (2018) ‘imbalanced-learn’, JMLR 18(17): 1-5 – LOFO popularised; pre-dating refit-importance traditions in econometrics.

Related options: permutation_importance

Last reviewed 2026-05-05 by macroforecast author.

lstm_hidden_state – future#

(no schema description for lstm_hidden_state)

TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full OptionDoc entry under macroforecast/scaffold/option_docs/l7.py are welcome.

model_native_linear_coef – operational#

Standardised regression coefficients from a fitted linear model.

See model_native_linear_coef function page for full documentation + parameters + standalone usage. Standalone: mf.functions.model_native_linear_coef_importance.

model_native_tree_importance – operational#

Mean-decrease-impurity importance from a fitted tree ensemble.

See model_native_tree_importance function page for full documentation + parameters + standalone usage. Standalone: mf.functions.model_native_tree_importance.

mrf_gtvp – operational#

Macroeconomic Random Forest GTVP – per-leaf time-varying coefficients (Coulombe 2024).

Compatible only with the macroeconomic_random_forest L4 family. For each leaf and predictor j, returns the leaf-local linear coefficient β̂_{j, ℓ}; the full output is an (n_leaves × n_features) GTVP (Generalised Time-Varying Parameter) panel.

When to use

Coulombe (2024) MRF interpretation; spotting non-linearity captured by the leaf partition.

When NOT to use

Non-MRF models.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Coulombe (2024) ‘The Macroeconomic Random Forest’, Journal of Applied Econometrics 39(7): 1190-1209.

Related options: rolling_recompute, model_native_tree_importance

Last reviewed 2026-05-05 by macroforecast author.

orthogonalised_irf – operational#

Cholesky-orthogonalised impulse-response function (Sims 1980).

Standard structural-VAR IRF: residual covariance Σᵤ is Cholesky-decomposed P P’ = Σᵤ; the structural shocks P⁻¹ u_t are orthogonalised by construction. orth_irfs[s, i, j] is the response of variable i at horizon s to a unit structural shock to variable j at time 0. Order-dependent: the variable ordering in the recipe determines the recursive causal scheme imposed.

When to use

VAR analysis with a theoretically motivated recursive identification (e.g. monetary policy ordered last; supply ordered first).

When NOT to use

When the variable ordering is arbitrary – file a v0.9.x request for generalized_irf (Pesaran-Shin 1998 order-invariant variant, currently future-gated).

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Sims (1980) ‘Macroeconomics and Reality’, Econometrica 48(1): 1-48.

Related options: fevd, historical_decomposition

Last reviewed 2026-05-05 by macroforecast author.

oshapley_vi – operational#

Out-of-sample SHAP-style variable importance (Borup et al. 2022) [schema; runtime via anatomy package].

Borup, Goulet Coulombe, Montes-Rojas, Schutte & Veiga (2022) ‘Anatomy of Out-of-Sample Forecasting Accuracy’. Recomputes Shapley-style feature contributions on the out-of-sample loss rather than in-sample fit, addressing the distribution-shift mismatch where in-sample SHAP misranks features that matter for OOS accuracy.

Atomic primitive – existing in-sample shap_* ops do not compose into oShapley-VI. Runtime delegates to the Borup et al. anatomy Python package as an optional dep (pip install macroforecast[anatomy]). Schema-only in v0.9.0; operational promotion lands once the anatomy integration is wired.

When to use

OOS-aware variable importance for macro forecast audits; replicating Borup et al. (2022).

When NOT to use

Pre-promotion. Without the anatomy extra installed.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Borup, Goulet Coulombe, Montes-Rojas, Schutte & Veiga (2022) ‘Anatomy of Out-of-Sample Forecasting Accuracy’, SSRN 4278745.

Related options: shap_kernel, shap_tree, permutation_importance, pbsv

Last reviewed 2026-05-05 by macroforecast author.

partial_dependence – operational#

Friedman (2001) partial dependence plot.

See partial_dependence function page for full documentation + parameters + standalone usage. Standalone: mf.functions.partial_dependence_importance.

pbsv – operational#

Performance-Based Shapley Value (Borup et al. 2022) [schema; runtime via anatomy package].

OOS accuracy decomposition: Shapley-attributes the forecast performance improvement over a benchmark to each feature coalition’s contribution. Differs from oshapley_vi in decomposing the accuracy gain rather than the OOS loss; they are companion ops covering the two faces of OOS Shapley.

Runtime delegates to anatomy package. Schema-only in v0.9.0.

When to use

Decomposing OOS forecast skill by feature; benchmark-relative interpretation studies.

When NOT to use

Pre-promotion. Without the anatomy extra installed.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Borup, Goulet Coulombe, Montes-Rojas, Schutte & Veiga (2022) ‘Anatomy of Out-of-Sample Forecasting Accuracy’, SSRN 4278745.

Related options: oshapley_vi, permutation_importance

Last reviewed 2026-05-05 by macroforecast author.

permutation_importance – operational#

Breiman-Fisher-Rudin (2019) model-agnostic permutation importance.

See permutation_importance function page for full documentation + parameters + standalone usage. Standalone: mf.functions.permutation_importance.

permutation_importance_strobl – operational#

Strobl (2008) conditional permutation importance.

See permutation_importance_strobl function page for full documentation + parameters + standalone usage. Standalone: mf.functions.cond_permutation_importance.

recursive_feature_elimination – future#

(no schema description for recursive_feature_elimination)

TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full OptionDoc entry under macroforecast/scaffold/option_docs/l7.py are welcome.

rolling_recompute – operational#

Re-compute any importance score on a rolling-window basis.

Applies an inner importance op (e.g. permutation_importance) on each of K rolling-window subsamples; emits a (K × n_features) matrix tracking how importance evolves over time. Pair with the heatmap or lineplot figure type.

When to use

Detecting time-varying feature importance; structural-stability audits.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

Related options: bootstrap_jackknife, mrf_gtvp

Last reviewed 2026-05-05 by macroforecast author.

saliency_map – operational#

Saliency map (Simonyan 2014) – absolute gradient at the input.

Returns |∂f / ∂x_j| evaluated at the input. The earliest and simplest gradient-based attribution; useful as a baseline but susceptible to gradient-saturation issues that integrated gradients address.

When to use

Quick NN attribution baseline; sanity-check vs more elaborate methods.

When NOT to use

Production attribution – prefer integrated gradients or SHAP.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Simonyan, Vedaldi & Zisserman (2014) ‘Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps’, ICLR Workshops.

Related options: integrated_gradients, gradient_shap, deep_lift

Last reviewed 2026-05-05 by macroforecast author.

shap_deep – operational#

Deep SHAP – DeepLIFT-based SHAP for neural networks.

DeepLIFT (Shrikumar 2017) interpreted as Shapley-value approximation. Compatible with the mlp / lstm / gru / transformer L4 families when the macroforecast[deep] extra is installed (captum backend).

When to use

Neural-network forecasters (LSTM / GRU / Transformer / MLP).

When NOT to use

Non-NN models – use shap_tree / shap_linear / shap_kernel instead.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.

  • Shrikumar, Greenside & Kundaje (2017) ‘Learning Important Features Through Propagating Activation Differences’, ICML.

Related options: shap_tree, shap_kernel, deep_lift, gradient_shap, integrated_gradients

Last reviewed 2026-05-05 by macroforecast author.

shap_interaction – operational#

SHAP interaction values – pairwise feature-interaction Shapley.

Lundberg-Erion-Lee (2020) extension that decomposes each SHAP value into a main-effect term plus pairwise interaction terms. Available for tree ensembles via the same polynomial-time algorithm as shap_tree.

Output is an (n × M × M) tensor; pair with the heatmap figure type for visualisation.

When to use

Identifying which feature pairs drive the model’s non-additive structure.

When NOT to use

Wide feature panels – the storage cost grows quickly.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.

  • Lundberg, Erion & Lee (2020) ‘From local explanations to global understanding with explainable AI for trees’, Nature Machine Intelligence 2: 56-67.

Related options: shap_tree, friedman_h_interaction

Last reviewed 2026-05-05 by macroforecast author.

shap_kernel – operational#

Kernel SHAP – model-agnostic Shapley value approximation.

Lundberg-Lee (2017) weighted-LIME estimator that approximates Shapley values for any model via local linear regression on perturbed inputs. Slow (O(2^M) coalitions sampled) but universally applicable.

When to use

Non-tree, non-linear, non-deep models (SVM, kNN, custom callables).

When NOT to use

Trees (use shap_tree) or linear models (use shap_linear) – both are dramatically faster.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.

Related options: shap_tree, shap_linear

Last reviewed 2026-05-05 by macroforecast author.

shap_linear – operational#

Linear SHAP – closed-form Shapley values for linear models.

See shap_linear function page for full documentation + parameters + standalone usage. Standalone: mf.functions.shap_linear_importance.

shap_tree – operational#

Tree SHAP – exact polynomial-time Shapley values for tree ensembles.

See shap_tree function page for full documentation + parameters + standalone usage. Standalone: mf.functions.shap_tree_importance.

stability_selection – future#

(no schema description for stability_selection)

TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full OptionDoc entry under macroforecast/scaffold/option_docs/l7.py are welcome.

transformation_attribution – operational#

Shapley over pipelines – decompose forecast skill across alternative L3 transforms.

Multi-cell sweep aggregator: given multiple pipelines that differ in their L3 transform choices, computes the Shapley share of each transform’s contribution to the metric improvement. v0.25 uses the Castro-Gómez-Tejada (2009) permutation-Shapley sampler when n_pipelines > 8.

When to use

Interpreting horse-race sweeps – which L3 transform delivers the win?

When NOT to use

Single-pipeline studies; sweeps with fewer than 3 alternative pipelines.

References

  • macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’

  • Castro, Gómez & Tejada (2009) ‘Polynomial calculation of the Shapley value based on sampling’, Computers & Operations Research 36(5): 1726-1730.

Related options: lineage_attribution, group_aggregate

Last reviewed 2026-05-05 by macroforecast author.