op#
Back to L7 | Browse all axes | Browse all options
Axis
opon sub-layerL7_A_importance_dag_body(layerl7).
Sub-layer#
L7_A_importance_dag_body
Axis metadata#
Default:
'permutation_importance'Sweepable: False
Status: operational
Operational status summary#
Operational: 34 option(s)
Future: 6 option(s)
Options#
accumulated_local_effect – operational#
Apley & Zhu (2020) accumulated local effects – PDP alternative robust to correlation.
See accumulated_local_effect function page for full documentation + parameters + standalone usage. Standalone: mf.functions.ale_importance.
attention_weights – operational#
OLS-as-attention closed-form attention matrix (Goulet Coulombe 2026).
Goulet Coulombe (2026) ‘OLS as an Attention Mechanism’ Eq. 3 closed form: Ω = X_test · (X'_train · X_train)⁻¹ · X'_train. The (n_test, n_train) matrix encodes how strongly each test point attends to each training point under an OLS / ridge fit, identical to the representer expansion of the dual ridge solution. Output table carries one row per training observation (per-test-point weight aggregates) plus the full attention matrix and representer-identity diagnostics inline via frame.attrs.
Promoted from future to operational in Phase B-10 (paper-10 replication). Compatible with linear-family L4 models (ols / ridge / lasso / elastic_net / bayesian_ridge / huber).
When to use
Linear-family attribution as a kernel-attention map; pedagogical / replication of paper-10 Coulombe (2026).
When NOT to use
Non-linear models (the closed form requires a linear estimator).
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Goulet Coulombe (2026) ‘OLS as an Attention Mechanism’, working paper – Eq. 3 closed-form attention matrix.
Related options: dual_decomposition, model_native_linear_coef, shap_linear
Last reviewed 2026-05-05 by macroforecast author.
bootstrap_jackknife – operational#
Bootstrap / jackknife confidence bands around any importance score.
Wraps another importance op and re-runs it on B stationary-bootstrap (Politis-White 2004) or jackknife resamples. Emits (score_mean, score_p2.5, score_p97.5) per feature; pair with the boxplot figure type.
When to use
Reporting confidence-banded importance rankings.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Politis & White (2004) ‘Automatic Block-Length Selection for the Dependent Bootstrap’, Econometric Reviews 23(1): 53-70.
Related options: rolling_recompute
Last reviewed 2026-05-05 by macroforecast author.
boruta_selection – future#
(no schema description for boruta_selection)
TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full
OptionDocentry undermacroforecast/scaffold/option_docs/l7.pyare welcome.
bvar_pip – operational#
Posterior inclusion probabilities for BVAR / Bayesian linear models.
For each predictor j, returns P(β_j ≠ 0 | data) – the posterior probability that the variable enters the model with non-zero effect. Compatible with bvar_minnesota / bvar_normal_inverse_wishart / bayesian_ridge.
When to use
Bayesian model selection; comparing variable importance under posterior uncertainty.
When NOT to use
Frequentist models – use lasso_inclusion_frequency for an analogous stability score.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Koop & Korobilis (2010) ‘Bayesian Multivariate Time Series Methods for Empirical Macroeconomics’, Foundations and Trends in Econometrics 3(4): 267-358.
Related options: lasso_inclusion_frequency
Last reviewed 2026-05-05 by macroforecast author.
cumulative_r2_contribution – operational#
Cumulative R² gain from adding features one at a time (forward-selection ranking).
Re-fits the L4 estimator with features added in descending order of marginal contribution; each step records the cumulative OOS-R² achieved. Pair with the lineplot figure type to visualise the marginal information value of each predictor.
When to use
Quantifying how many predictors the model actually needs to reach a target R².
When NOT to use
Highly correlated features – the order is sensitive to entry rules.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Stock & Watson (2012) ‘Generalized Shrinkage Methods for Forecasting using Many Predictors’, JBES 30(4): 481-493.
Related options: lasso_inclusion_frequency, lofo
Last reviewed 2026-05-05 by macroforecast author.
deep_lift – operational#
DeepLIFT (Shrikumar 2017) – difference-from-reference attribution.
Decomposes the difference f(x) - f(x') into per-feature contributions using rescaled-difference / reveal-cancel rules for non-linear activations. Faster than integrated gradients but with less rigorous axiomatic backing.
When to use
NN attribution where integrated-gradients runtime is too high.
When NOT to use
When the completeness / sensitivity axioms matter – prefer integrated gradients.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Shrikumar, Greenside & Kundaje (2017) ‘Learning Important Features Through Propagating Activation Differences’, ICML.
Related options: integrated_gradients, gradient_shap, saliency_map, shap_deep
Last reviewed 2026-05-05 by macroforecast author.
dual_decomposition – operational#
Forecast-as-weighted-training-targets via the representer theorem (Coulombe et al. 2024); equivalently a restricted attention module (Goulet Coulombe 2026).
Goulet Coulombe / Goebel / Klieber (2024) ‘Dual Interpretation of ML Forecasts’. Surfaces each prediction as a weighted combination of historical training targets; weights recovered through the representer theorem applied to the fitted model. Atomic L7 primitive: SHAP-family ops decompose by feature contribution, this op decomposes by training-row contribution, the natively interpretable view for small-sample temporally-ordered macro panels.
Linear families (operational v0.8.9): ridge / OLS / lasso via closed-form w(xₜ) = X(X'X + αI)⁻¹xₜ.
Tree-bagging ensembles (operational v0.9.1 dev-stage v0.9.0B-5): RandomForestRegressor / ExtraTreesRegressor via the leaf-co-occurrence kernel wⱼ(xₜ) = (1/B) Σ_b 1[j ∈ B_b] · 1[leaf_b(xₜ) == leaf_b(xⱼ)] / leaf_size_b(xⱼ) where B_b is tree b’s bootstrap subset (sklearn estimators_samples_). Reproduces forest.predict to machine precision (~4e-16). Helper _rf_leaf_cooccurrence_weights in core/runtime.py.
Output frame layout: rows = training row labels, columns = mean_weight, abs_mean_weight, max_abs_weight. Full (n_test × n_train) weight matrix attached as frame.attrs['dual_weights']; frame.attrs['method'] carries 'linear_closed_form' or 'rf_leaf_cooccurrence_kernel' for downstream renderers.
Inline portfolio diagnostics. The output artifact also carries the four portfolio metrics from the same paper (HHI = Σwⱼ², short = Σ max(0,-wⱼ), turnover = ‖wₜ - wₜ₋₁‖₁, leverage = ‖w‖₁) at frame.attrs['portfolio_metrics']. These are trivial numpy reductions on the primary dual weights and do not warrant their own L7 op (decomposition discipline).
OLS-as-attention equivalence. Goulet Coulombe (2026) ‘Ordinary Least Squares as an Attention Mechanism’ (SSRN 5200864) shows that the same dual representation ŷ_test = F_test F_train' y_train (eq. 7) coincides with a restricted attention module: queries Q = X_test W, keys K = X_train W with W = U Λ^{-½}, values V = y, and the softmax replaced by the identity (eqs. 17-19). The training-row weights ωⱼᵢ = ⟨Fⱼ, Fᵢ⟩ surfaced by this op are exactly the (restricted) attention weights of that paper. Same compute, different vocabulary – no separate runtime needed.
Boosted-tree (gradient_boosting / xgboost / lightgbm) and NN extensions are deferred: residual-bagging and learned non-linear models do not admit a clean sum-of-training-targets dual representation.
When to use
Decomposing macro forecasts into training-target contributions; explaining ML predictions to econometric audiences; bridging classical OLS to transformer-attention literature; per-prediction provenance for tree ensembles.
When NOT to use
Boosted-tree / NN families (gradient_boosting, xgboost, lightgbm, mlp, lstm, etc.) – raises NotImplementedError; the residual-bagging structure does not factor into a sum-of-training-targets representation.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Goulet Coulombe, Goebel & Klieber (2024) ‘Dual Interpretation of Machine Learning Forecasts’, arXiv:2412.13076.
Goulet Coulombe (2026) ‘Ordinary Least Squares as an Attention Mechanism’, SSRN 5200864 – shows OLS predictions ŷ_test = F_test F_train’ y_train (eq. 7) coincide with a restricted attention module (eqs. 17-19, identity activation, tied W_Q W_K’ = (X_train’ X_train)^{-1}). The dual_decomposition op already implements the same compute via the closed-form ridge representer; no separate runtime needed.
Related options: permutation_importance, shap_kernel
Last reviewed 2026-05-05 by macroforecast author.
fevd – operational#
Forecast error variance decomposition (Sims 1980).
For a fitted VAR (var / factor_augmented_var / bvar_*), decomposes the h-step-ahead forecast error variance into shares attributable to each orthogonalised shock. Default Cholesky orthogonalisation; ordering is set by the column order of the VAR. statsmodels fevd backend.
When to use
Standard VAR analysis; interpreting how shocks propagate across variables.
When NOT to use
Non-VAR models – use permutation_importance instead.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Sims (1980) ‘Macroeconomics and Reality’, Econometrica 48(1): 1-48.
Related options: historical_decomposition, generalized_irf, forecast_decomposition
Last reviewed 2026-05-05 by macroforecast author.
forecast_decomposition – operational#
Decompose a single forecast into per-feature contributions.
For a single (cell, target, horizon) forecast, returns a table (feature → contribution) summing to forecast - benchmark. Linear models: β_j x_j. Trees: Tree SHAP. NN: gradient SHAP. Universal entry point unified across families – delegates to the appropriate family-specific op.
When to use
Reporting feature contributions for a specific forecast (e.g. ‘why is the model bullish on Q3 GDP’).
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Related options: shap_tree, shap_linear, shap_deep
Last reviewed 2026-05-05 by macroforecast author.
friedman_h_interaction – operational#
Friedman & Popescu (2008) H-statistic for two-way feature interactions.
For feature pair (j, k), computes H²_{jk} = Σ[PD_{jk}(x_j, x_k) - PD_j(x_j) - PD_k(x_k)]² / Σ PD²_{jk}. H² ∈ [0, 1]; the share of the joint partial-dependence variance attributable to non-additive structure.
When to use
Identifying which feature pairs the model treats non-additively.
When NOT to use
Wide panels – the M² PDP grid grows expensive.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Friedman & Popescu (2008) ‘Predictive Learning via Rule Ensembles’, Annals of Applied Statistics 2(3): 916-954.
Related options: shap_interaction, partial_dependence
Last reviewed 2026-05-05 by macroforecast author.
generalized_irf – future#
Pesaran-Shin (1998) generalized impulse-response function (future, v0.9.x).
Order-invariant IRF where each shock is constructed as the multivariate-normal projection of all residuals onto the j-th canonical direction. Distinct from Cholesky orthogonalised IRFs (which use a recursive lower-triangular rotation). Future – the runtime currently raises NotImplementedError. For the Cholesky variant operational since v0.2, use orthogonalised_irf.
When to use
VAR analysis where the variable ordering has no theoretical motivation – order-invariance is the desired property.
When NOT to use
When a recursive identification IS theoretically motivated – use orthogonalised_irf instead.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Pesaran & Shin (1998) ‘Generalized impulse response analysis in linear multivariate models’, Economics Letters 58(1): 17-29.
Related options: fevd, historical_decomposition, orthogonalised_irf
Last reviewed 2026-05-05 by macroforecast author.
gradient_shap – operational#
Gradient SHAP – expectation-of-gradient SHAP approximation (Lundberg-Lee 2017).
Approximates SHAP values via expected gradients at random interpolations between input and a baseline distribution. Captum-backed; requires the macroforecast[deep] extra.
When to use
Differentiable models (NN families) where exact SHAP is too expensive.
When NOT to use
Non-NN models.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.
Related options: shap_deep, integrated_gradients, saliency_map, deep_lift
Last reviewed 2026-05-05 by macroforecast author.
group_aggregate – operational#
Aggregate per-feature importance into pre-defined block sums (FRED-SD blocks, theme blocks).
Sums (or means) per-feature importance scores over groups defined by a user-supplied or built-in mapping table. v0.25 ships 8 built-in blocks: 8-group FRED-MD + 14-group FRED-QD + 50-state FRED-SD grids.
Required input for the FRED-SD us_state_choropleth figure.
When to use
FRED-MD / -QD / -SD analyses where per-series importance should roll up to thematic / geographic blocks.
When NOT to use
Custom panels lacking a meaningful grouping.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
McCracken & Ng (2016) ‘FRED-MD: A Monthly Database for Macroeconomic Research’, JBES 34(4): 574-589.
Related options: lineage_attribution, transformation_attribution
Last reviewed 2026-05-05 by macroforecast author.
historical_decomposition – operational#
Historical decomposition (Burbidge-Harrison 1985) of the realised series into structural shocks.
Reconstructs each variable’s realised path as the convolution of orthogonalised IRF coefficients (Cholesky-rotated structural form) with the time series of structural shocks recovered from the reduced-form residuals. Returns the per-shock cumulative absolute contribution to the target variable’s realised fluctuations; the row labels match the VAR variable ordering.
When to use
Telling the historical narrative – which shocks drove specific recessions / expansions.
When NOT to use
Non-VAR models.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Burbidge & Harrison (1985) ‘A historical decomposition of the great depression to determine the role of money’, JME 16(1): 45-54.
Related options: fevd, orthogonalised_irf
Last reviewed 2026-05-05 by macroforecast author.
integrated_gradients – operational#
Integrated gradients (Sundararajan 2017) – path-integral attribution.
Computes (x_j - x'_j) · ∫₀¹ ∂f(x' + α(x - x')) / ∂x_j dα for a baseline x' (default zero). Satisfies the completeness axiom (sum of attributions equals f(x) - f(x')). Captum-backed.
When to use
Axiomatically-grounded NN attribution (Sundararajan completeness + sensitivity properties).
When NOT to use
Non-NN models; pathological models where integration along the linear path is misleading.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Sundararajan, Taly & Yan (2017) ‘Axiomatic Attribution for Deep Networks’, ICML.
Related options: gradient_shap, saliency_map, deep_lift
Last reviewed 2026-05-05 by macroforecast author.
lasso_inclusion_frequency – operational#
Bootstrap inclusion frequency for Lasso-selected features (Bach 2008).
For each feature j, computes the share of B Lasso fits (on bootstrap or rolling-window resamples) for which β̂_j ≠ 0. Returns a stability score in [0, 1]. v0.25 supports sampling = bootstrap | rolling | both (via leaf_config).
When to use
Feature-selection stability audit for Lasso / Lasso-Path / Elastic Net.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Bach (2008) ‘Bolasso: model consistent Lasso estimation through the bootstrap’, ICML.
Meinshausen & Bühlmann (2010) ‘Stability selection’, JRSS Series B 72(4): 417-473.
Related options: model_native_linear_coef, bootstrap_jackknife
Last reviewed 2026-05-05 by macroforecast author.
lasso_path_selection – future#
(no schema description for lasso_path_selection)
TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full
OptionDocentry undermacroforecast/scaffold/option_docs/l7.pyare welcome.
lineage_attribution – operational#
Trace importance back through L3 feature lineage to the L1 raw source.
For each L3 feature, walks the L3.metadata column_lineage graph to identify the chain of transforms that produced it; attributes the L7 importance score back to the L1 raw column at the head of the lineage chain.
Solves the ‘PCA factors are most important; what does that mean in terms of original variables?’ problem.
When to use
Pipelines with PCA / factor / dimensionality-reduction stages where downstream importance must be traced back to raw inputs.
When NOT to use
Pipelines with only direct-input features (no L3 transforms).
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Related options: group_aggregate, transformation_attribution
Last reviewed 2026-05-05 by macroforecast author.
lofo – operational#
Leave-one-feature-out (LOFO) refit importance.
For each predictor j, refits the L4 estimator on the panel with column j removed and reports the OOS-loss delta. More expensive than permutation importance (one extra fit per feature) but free from the permutation-and-correlation interaction.
Compatible with every L4 family; runtime scales as n_features × cost_per_fit.
When to use
Small / medium feature panels (< 100) where N-extra fits are affordable.
When NOT to use
Wide panels (n_features > 200) – prohibitive runtime.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Lemaître, Aridas & Nogueira (2018) ‘imbalanced-learn’, JMLR 18(17): 1-5 – LOFO popularised; pre-dating refit-importance traditions in econometrics.
Related options: permutation_importance
Last reviewed 2026-05-05 by macroforecast author.
model_native_linear_coef – operational#
Standardised regression coefficients from a fitted linear model.
See model_native_linear_coef function page for full documentation + parameters + standalone usage. Standalone: mf.functions.model_native_linear_coef_importance.
model_native_tree_importance – operational#
Mean-decrease-impurity importance from a fitted tree ensemble.
See model_native_tree_importance function page for full documentation + parameters + standalone usage. Standalone: mf.functions.model_native_tree_importance.
mrf_gtvp – operational#
Macroeconomic Random Forest GTVP – per-leaf time-varying coefficients (Coulombe 2024).
Compatible only with the macroeconomic_random_forest L4 family. For each leaf ℓ and predictor j, returns the leaf-local linear coefficient β̂_{j, ℓ}; the full output is an (n_leaves × n_features) GTVP (Generalised Time-Varying Parameter) panel.
When to use
Coulombe (2024) MRF interpretation; spotting non-linearity captured by the leaf partition.
When NOT to use
Non-MRF models.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Coulombe (2024) ‘The Macroeconomic Random Forest’, Journal of Applied Econometrics 39(7): 1190-1209.
Related options: rolling_recompute, model_native_tree_importance
Last reviewed 2026-05-05 by macroforecast author.
orthogonalised_irf – operational#
Cholesky-orthogonalised impulse-response function (Sims 1980).
Standard structural-VAR IRF: residual covariance Σᵤ is Cholesky-decomposed P P’ = Σᵤ; the structural shocks P⁻¹ u_t are orthogonalised by construction. orth_irfs[s, i, j] is the response of variable i at horizon s to a unit structural shock to variable j at time 0. Order-dependent: the variable ordering in the recipe determines the recursive causal scheme imposed.
When to use
VAR analysis with a theoretically motivated recursive identification (e.g. monetary policy ordered last; supply ordered first).
When NOT to use
When the variable ordering is arbitrary – file a v0.9.x request for generalized_irf (Pesaran-Shin 1998 order-invariant variant, currently future-gated).
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Sims (1980) ‘Macroeconomics and Reality’, Econometrica 48(1): 1-48.
Related options: fevd, historical_decomposition
Last reviewed 2026-05-05 by macroforecast author.
oshapley_vi – operational#
Out-of-sample SHAP-style variable importance (Borup et al. 2022) [schema; runtime via anatomy package].
Borup, Goulet Coulombe, Montes-Rojas, Schutte & Veiga (2022) ‘Anatomy of Out-of-Sample Forecasting Accuracy’. Recomputes Shapley-style feature contributions on the out-of-sample loss rather than in-sample fit, addressing the distribution-shift mismatch where in-sample SHAP misranks features that matter for OOS accuracy.
Atomic primitive – existing in-sample shap_* ops do not compose into oShapley-VI. Runtime delegates to the Borup et al. anatomy Python package as an optional dep (pip install macroforecast[anatomy]). Schema-only in v0.9.0; operational promotion lands once the anatomy integration is wired.
When to use
OOS-aware variable importance for macro forecast audits; replicating Borup et al. (2022).
When NOT to use
Pre-promotion. Without the anatomy extra installed.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Borup, Goulet Coulombe, Montes-Rojas, Schutte & Veiga (2022) ‘Anatomy of Out-of-Sample Forecasting Accuracy’, SSRN 4278745.
Related options: shap_kernel, shap_tree, permutation_importance, pbsv
Last reviewed 2026-05-05 by macroforecast author.
partial_dependence – operational#
Friedman (2001) partial dependence plot.
See partial_dependence function page for full documentation + parameters + standalone usage. Standalone: mf.functions.partial_dependence_importance.
pbsv – operational#
Performance-Based Shapley Value (Borup et al. 2022) [schema; runtime via anatomy package].
OOS accuracy decomposition: Shapley-attributes the forecast performance improvement over a benchmark to each feature coalition’s contribution. Differs from oshapley_vi in decomposing the accuracy gain rather than the OOS loss; they are companion ops covering the two faces of OOS Shapley.
Runtime delegates to anatomy package. Schema-only in v0.9.0.
When to use
Decomposing OOS forecast skill by feature; benchmark-relative interpretation studies.
When NOT to use
Pre-promotion. Without the anatomy extra installed.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Borup, Goulet Coulombe, Montes-Rojas, Schutte & Veiga (2022) ‘Anatomy of Out-of-Sample Forecasting Accuracy’, SSRN 4278745.
Related options: oshapley_vi, permutation_importance
Last reviewed 2026-05-05 by macroforecast author.
permutation_importance – operational#
Breiman-Fisher-Rudin (2019) model-agnostic permutation importance.
See permutation_importance function page for full documentation + parameters + standalone usage. Standalone: mf.functions.permutation_importance.
permutation_importance_strobl – operational#
Strobl (2008) conditional permutation importance.
See permutation_importance_strobl function page for full documentation + parameters + standalone usage. Standalone: mf.functions.cond_permutation_importance.
recursive_feature_elimination – future#
(no schema description for recursive_feature_elimination)
TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full
OptionDocentry undermacroforecast/scaffold/option_docs/l7.pyare welcome.
rolling_recompute – operational#
Re-compute any importance score on a rolling-window basis.
Applies an inner importance op (e.g. permutation_importance) on each of K rolling-window subsamples; emits a (K × n_features) matrix tracking how importance evolves over time. Pair with the heatmap or lineplot figure type.
When to use
Detecting time-varying feature importance; structural-stability audits.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Related options: bootstrap_jackknife, mrf_gtvp
Last reviewed 2026-05-05 by macroforecast author.
saliency_map – operational#
Saliency map (Simonyan 2014) – absolute gradient at the input.
Returns |∂f / ∂x_j| evaluated at the input. The earliest and simplest gradient-based attribution; useful as a baseline but susceptible to gradient-saturation issues that integrated gradients address.
When to use
Quick NN attribution baseline; sanity-check vs more elaborate methods.
When NOT to use
Production attribution – prefer integrated gradients or SHAP.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Simonyan, Vedaldi & Zisserman (2014) ‘Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps’, ICLR Workshops.
Related options: integrated_gradients, gradient_shap, deep_lift
Last reviewed 2026-05-05 by macroforecast author.
shap_deep – operational#
Deep SHAP – DeepLIFT-based SHAP for neural networks.
DeepLIFT (Shrikumar 2017) interpreted as Shapley-value approximation. Compatible with the mlp / lstm / gru / transformer L4 families when the macroforecast[deep] extra is installed (captum backend).
When to use
Neural-network forecasters (LSTM / GRU / Transformer / MLP).
When NOT to use
Non-NN models – use shap_tree / shap_linear / shap_kernel instead.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.
Shrikumar, Greenside & Kundaje (2017) ‘Learning Important Features Through Propagating Activation Differences’, ICML.
Related options: shap_tree, shap_kernel, deep_lift, gradient_shap, integrated_gradients
Last reviewed 2026-05-05 by macroforecast author.
shap_interaction – operational#
SHAP interaction values – pairwise feature-interaction Shapley.
Lundberg-Erion-Lee (2020) extension that decomposes each SHAP value into a main-effect term plus pairwise interaction terms. Available for tree ensembles via the same polynomial-time algorithm as shap_tree.
Output is an (n × M × M) tensor; pair with the heatmap figure type for visualisation.
When to use
Identifying which feature pairs drive the model’s non-additive structure.
When NOT to use
Wide feature panels – the M² storage cost grows quickly.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.
Lundberg, Erion & Lee (2020) ‘From local explanations to global understanding with explainable AI for trees’, Nature Machine Intelligence 2: 56-67.
Related options: shap_tree, friedman_h_interaction
Last reviewed 2026-05-05 by macroforecast author.
shap_kernel – operational#
Kernel SHAP – model-agnostic Shapley value approximation.
Lundberg-Lee (2017) weighted-LIME estimator that approximates Shapley values for any model via local linear regression on perturbed inputs. Slow (O(2^M) coalitions sampled) but universally applicable.
When to use
Non-tree, non-linear, non-deep models (SVM, kNN, custom callables).
When NOT to use
Trees (use shap_tree) or linear models (use shap_linear) – both are dramatically faster.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Lundberg & Lee (2017) ‘A Unified Approach to Interpreting Model Predictions’, NeurIPS 30: 4765-4774.
Related options: shap_tree, shap_linear
Last reviewed 2026-05-05 by macroforecast author.
shap_linear – operational#
Linear SHAP – closed-form Shapley values for linear models.
See shap_linear function page for full documentation + parameters + standalone usage. Standalone: mf.functions.shap_linear_importance.
shap_tree – operational#
Tree SHAP – exact polynomial-time Shapley values for tree ensembles.
See shap_tree function page for full documentation + parameters + standalone usage. Standalone: mf.functions.shap_tree_importance.
stability_selection – future#
(no schema description for stability_selection)
TBD: option doc not yet authored for this value. The encyclopedia falls back to the bare schema description above. PRs adding a full
OptionDocentry undermacroforecast/scaffold/option_docs/l7.pyare welcome.
transformation_attribution – operational#
Shapley over pipelines – decompose forecast skill across alternative L3 transforms.
Multi-cell sweep aggregator: given multiple pipelines that differ in their L3 transform choices, computes the Shapley share of each transform’s contribution to the metric improvement. v0.25 uses the Castro-Gómez-Tejada (2009) permutation-Shapley sampler when n_pipelines > 8.
When to use
Interpreting horse-race sweeps – which L3 transform delivers the win?
When NOT to use
Single-pipeline studies; sweeps with fewer than 3 alternative pipelines.
References
macroforecast design Part 3, L7: ‘every importance op produces (table, figure) pairs; the L7.B sub-layer governs export shape.’
Castro, Gómez & Tejada (2009) ‘Polynomial calculation of the Shapley value based on sampling’, Computers & Operations Research 36(5): 1726-1730.
Related options: lineage_attribution, group_aggregate
Last reviewed 2026-05-05 by macroforecast author.