macroforecast.models#

Back to reference

macroforecast.models contains direct callable model fits. Each function accepts pandas data, fits immediately, and returns a fitted result object with predict().

lasso_path is intentionally not a public model family. Use lasso() with a chosen alpha, or use get_model("lasso") with model_selection.select_params() to choose alpha from the lasso-owned search space.

Return Objects#

Model functions return ModelFit.

macroforecast.models.ModelFit(
    estimator,
    model,
    feature_names=(),
    target_name=None,
    metadata={},
    diagnostics={},
)

Attribute	Type	Description
`estimator`	object	Fitted underlying estimator.
`model`	str	Canonical model name.
`feature_names`	tuple[str, …]	Feature columns used at fit time.
`target_name`	str or `None`	Target name when available.
`metadata`	dict	Fit metadata such as `n_obs`, `alpha`, or tree budget.
`diagnostics`	dict	Model-specific fitted diagnostics that can be collected safely.

ModelFit.predict(X) returns a pandas.Series named "prediction" and keeps the index of the provided X when X is a DataFrame.

For custom fitted objects used in forecasting.run(...), predict(X_test) may return an array-like object or a pandas Series/single-column DataFrame. Array-like output is treated positionally. Pandas output must either be indexed exactly like X_test.index or use the default positional index RangeIndex(len(X_test)). Any other index raises an error rather than silently creating missing forecasts. The same DataFrame index rule applies to predict_quantiles(X_test) when a fitted object exposes quantile forecasts.

ModelFit.to_dict() returns JSON-ready fit metadata. It records the canonical model name, the underlying estimator class name, the fitted feature names, the target name, n_features, the fit metadata, and JSON-ready diagnostics. It does not serialize the fitted estimator itself.

fit = macroforecast.models.ridge(X, y, alpha=0.5)
fit.to_dict()

Output shape:

{
    "model": "ridge",
    "estimator": "sklearn.linear_model._ridge.Ridge",
    "feature_names": ["x1", "x2"],
    "target_name": "y",
    "n_features": 2,
    "metadata": {"n_obs": 120, "alpha": 0.5},
    "diagnostics": {
        "fitted_values": {"name": "fitted", "index": [...], "data": [...]},
        "residuals": {"name": "residual", "index": [...], "data": [...]},
        "metrics": {"n": 120, "mae": 0.04, "mse": 0.003, "rmse": 0.055},
        "coefficients": {"name": "coefficient", "index": ["x1", "x2"], "data": [...]},
        "selected_features": ["x1", "x2"],
    },
}

ModelFit.to_metadata() wraps the same block under {"model": ..., "fit": ...} for downstream forecasting and result records.

Volatility functions return VolatilityFit, which extends ModelFit with predict_variance(horizon=1) and conditional_volatility.

Some model families expose backend estimator classes as public symbols. These are useful when users need estimator-native attributes, custom wrappers, or type checks; the normal user entry point remains the lowercase fit function.

Estimator class	Fit function	Meaning
`MARSRegressor`	`mars(...)`	Internal MARS-style spline regressor.
`LGBPlusRegressor`	`lgb_plus(...)`	Competition-based LGB+ hybrid tree/linear boosting backend.
`LGBAPlusRegressor`	`lgba_plus(...)`	Alternating LGB^A+ hybrid tree/linear boosting backend.
`QuantileRegressionForestRegressor`	`quantile_regression_forest(...)`	Quantile forest backend.
`MacroRandomForestRegressor`	`macro_random_forest(...)`	Macro Random Forest backend.
`SupervisedPCARegressor`	`supervised_pca(...)`	Supervised PCA backend.
`SupervisedScaledPCARegressor`	`supervised_scaled_pca(...)`	Supervised scaled-PCA backend.
`GARCHEstimator`	`garch11(...)` and `egarch(...)`	ARCH/GARCH volatility backend.
`RealizedGARCHEstimator`	`realized_garch(...)`	Realized-GARCH backend.
`SupervisedAggregationRegressor`	`supervised_aggregation(...)` and wrappers	Generic assemblage/Albacore-style constrained aggregation backend.

Fit Persistence#

models owns the low-level persistence format for fitted model objects. Forecasting runners decide which fitted object should be saved and which window, selection, and parameter metadata should be attached.

saved = macroforecast.models.save_fit(
    fit,
    "trained_model/ridge/origin_0_h1_20000131.pkl",
    metadata={
        "alias": "ridge",
        "params": {"alpha": 0.1},
        "model_selection": selection_metadata,
    },
)
loaded = macroforecast.models.load_fit(saved.model_path)

Function	Input	Output	Meaning
`save_fit(fit, model_path, metadata_path=None, metadata=None)`	Fitted model object and output paths.	`SavedModel`	Writes pickle plus JSON sidecar.
`load_fit(model_path)`	Pickle path.	fitted object	Loads a saved fit.

SavedModel.to_dict() returns model_path, metadata_path, and save_error. If a custom/local model cannot be pickled, model_path is None, save_error records the failure, and the JSON sidecar is still written. The sidecar always includes the available fit.to_metadata() block when the fit exposes it.

Fit Diagnostics#

Diagnostics are collected on a best-effort basis. A model only records values that the fitted backend exposes and that can be computed without changing the fit. Missing keys mean the model does not expose that diagnostic, not that the fit failed.

Common keys:

Key	Recorded when	Meaning
`fitted_values`	Estimator exposes `predict()` on the training matrix.	In-sample fitted values indexed like the aligned target.
`residuals`	`fitted_values` is available.	Training residuals, `y - fitted`.
`metrics`	`residuals` is available.	Residual count, mean, standard deviation, MAE, MSE, and RMSE.
`coefficients`	Estimator exposes `coef_`.	Coefficients indexed by feature name when possible.
`intercept`	Estimator exposes `intercept_`.	Scalar or list intercept.
`selected_features`	Nonzero coefficients or estimator selection metadata is available.	Selected feature names.
`feature_importance`	Estimator exposes `feature_importances_`.	Tree-style importances sorted descending.
`factor_loadings`	Estimator exposes factor loadings, loadings, or components.	Factor/PCA loading matrix when available.
`component_selected_features`	Estimator exposes component-level selection metadata.	Selected source features for supervised PCA-style components.
`training_history`	Estimator records iterative training history.	Epoch loss or backend-specific training trace.
`conditional_volatility`	Volatility estimator exposes fitted conditional volatility.	In-sample conditional volatility path.
`params`	Volatility estimator exposes fitted parameter estimates.	Fitted volatility-model parameters.

Example:

fit = macroforecast.models.random_forest(X, y, n_estimators=200)
fit.diagnostics["feature_importance"].head()

Model Specs And Hyperparameter Spaces#

Model functions fit immediately. Model specs are the model-selection objects: they keep the fit callable together with model-owned defaults, tunable parameters, and preset search spaces.

model = macroforecast.models.get_model("lasso", preset="standard")
result = macroforecast.model_selection.select_params(
    model,
    X,
    y,
    window=macroforecast.window.expanding(min_train_size=120),
    metric=macroforecast.metrics.rmse,
)
fit = model(X, y, **result.best_params)

ModelSpec#

macroforecast.models.ModelSpec(
    name,
    family,
    fit_func,
    default_params={},
    parameters=(),
    search_spaces={},
    default_search_method="grid",
    default_preset="standard",
    input_kind="supervised",
    preset="standard",
    params={},
    backend="internal",
    requires_extra=None,
    requires_scaling=False,
    recommended_preprocessing=(),
)

Attribute	Type	Meaning
`name`	str	Canonical model name.
`family`	str	Model family such as `linear`, `tree`, `factor`, or `volatility`.
`fit_func`	callable	Underlying fit function.
`default_params`	dict	Model-owned default keyword arguments.
`parameters`	tuple	`ModelParameter` descriptions.
`search_spaces`	dict	Preset-specific hyperparameter candidates.
`default_search_method`	str	Search method normally used for the model.
`default_preset`	str	Default hyperparameter preset.
`input_kind`	str	Input convention: `supervised`, `target`, `panel`, or `volatility`.
`preset`	str	Active search-space preset.
`params`	dict	User-fixed model parameters.
`backend`	str	Implementation backend, for example `sklearn.svm.SVR` or `torch.nn.LSTM`.
`requires_extra`	str or `None`	Optional dependency extra required to fit the model.
`requires_scaling`	bool	Whether the model is scale-sensitive and expects explicit preprocessing.
`recommended_preprocessing`	tuple[str, …]	Short preprocessing notes attached to metadata.

ModelSpec is callable:

model = macroforecast.models.get_model("ridge", params={"alpha": 0.5})
fit = model(X, y)

ModelSpec.to_dict() returns a detailed JSON-ready specification including defaults, fixed params, parameter descriptions, and all preset search spaces. ModelSpec.to_metadata() returns the compact runner-facing block:

{
    "model": "ridge",
    "model_family": "linear",
    "model_preset": "small",
    "input_kind": "supervised",
    "backend": "sklearn.linear_model.Ridge",
    "requires_extra": None,
    "requires_scaling": False,
    "recommended_preprocessing": [],
    "default_search_method": "cv_path",
    "default_params": {"alpha": 1.0},
    "params": {"alpha": 0.5},
    "search_space": {"alpha": [0.01, 0.1, 1.0]},
}

get_model#

macroforecast.models.get_model(model, *, preset=None, params=None)

Input	Type	Meaning
`model`	str, callable, or `ModelSpec`	Model name, registered model callable, or existing spec.
`preset`	str or `None`	Search-space preset to attach.
`params`	dict or `None`	Fixed model parameters to attach.

Output	Type	Meaning
return	`ModelSpec`	Callable model spec with model-owned defaults and spaces.

custom_model#

Build a user-owned ModelSpec without registering a package model.

macroforecast.models.custom_model(
    name: str,
    fit_func,
    *,
    family: str = "custom",
    default_params: Mapping[str, object] | None = None,
    parameters: tuple[ModelParameter, ...] = (),
    search_spaces: dict[str, dict[str, tuple[object, ...]]] | None = None,
    default_search_method: str = "grid",
    default_preset: str = "standard",
    input_kind: str = "supervised",
    backend: str = "custom",
    requires_extra: str | None = None,
    requires_scaling: bool = False,
    recommended_preprocessing: tuple[str, ...] = (),
    description: str | None = None,
) -> ModelSpec

Callable Contract#

The default supervised contract is:

fit_func(X: pandas.DataFrame, y: pandas.Series, **params) -> fitted_object

The fitted object must expose:

fitted_object.predict(X_test)

predict(X_test) may return a pandas Series, a single-column DataFrame, or an array-like object with length len(X_test). Pandas output must either use X_test.index or RangeIndex(len(X_test)); any other index is rejected by forecasting.run(...).

Set input_kind when the custom model follows another convention:

`input_kind`	Fit callable receives	Use case
`"supervised"`	`fit_func(X, y, **params)`	Regression-style models.
`"target"`	`fit_func(y, **params)`	Target-only time-series models.
`"panel"`	`fit_func(panel, **params)`	Panel-input models.
`"volatility"`	`fit_func(y, X=None, **params)`	Volatility or density models.

search_spaces uses the same model-owned preset contract as registered models:

model = mf.models.custom_model(
    "mean_model",
    mean_model,
    default_params={"offset": 0.0},
    search_spaces={
        "small": {"offset": (-0.1, 0.0, 0.1)},
        "standard": {"offset": (-0.5, 0.0, 0.5)},
    },
)

result = mf.forecasting.run(
    panel,
    {"mean": model},
    window=window,
    features=features,
    preset={"mean": "small"},
)

custom_model() does not mutate the global registry. Pass the returned ModelSpec directly to forecasting.run(...), model_selection.select_params(...), or model_search_space(...).

list_model_specs#

macroforecast.models.list_model_specs(family=None)

Returns a DataFrame with one row per registered model: name, family, input_kind, backend, requires_extra, requires_scaling, recommended_preprocessing, default_search_method, default_preset, available presets, and n_tunable.

describe_model#

macroforecast.models.describe_model(model)

Returns a DataFrame with parameter-level documentation and preset search spaces.

Example:

parameter	default	tunable	small_space	standard_space
`alpha`	`1.0`	`True`	`(0.01, 0.1, 1.0)`	`(0.001, 0.01, 0.1, 1.0, 10.0)`
`max_iter`	`20000`	`False`	`None`	`None`

model_search_space#

macroforecast.models.model_search_space(model, *, preset=None)

Returns the model-owned candidate dictionary for the selected preset. MODEL_SPECS is the public registry backing get_model(...), list_model_specs(), describe_model(...), and model_search_space(...).

macroforecast.models.model_search_space("random_forest", preset="small")

Output:

{
    "n_estimators": (50, 100),
    "max_depth": (3, 5, None),
    "min_samples_leaf": (1, 3),
}

Presets#

Preset	Purpose
`small`	Fast smoke tests and short interactive checks.
`standard`	Default analysis-scale search space.
`wide`	Larger search space for more expensive runs.

Input And Output Conventions#

Input kind	Callable shape	Use case
`supervised`	`model(X, y, **params)`	Most regression, factor, and tree models. Fit-time ensembles use the same shape in `macroforecast.model_ensemble`.
`target`	`model(y, **params)`	Univariate target-only models such as `ar`.
`panel`	`model(panel, **params)`	Multivariate time-series models such as `var`.
`volatility`	`model(y, X=None, **params)`	Return and volatility models.

For supervised models, X may be a pandas DataFrame, a 2-D array, or a FeatureSet. y may be a Series, 1-D array, or one-column DataFrame. If X is a FeatureSet, y can be omitted.

All non-volatility model functions return ModelFit. Volatility functions return VolatilityFit.

Scaling Policy#

The clean model API does not silently standardize predictors for models that are traditionally scale-sensitive. Instead, those models advertise requires_scaling=True through ModelSpec, list_model_specs(), model_search_space(), and select_params() metadata. lasso(..., standardize=True) and elastic_net(..., standardize=True) are explicit opt-in replication helpers; the default remains False.

There are two different scaling locations:

macroforecast.preprocessing.standardize_panel() standardizes a panel before model fitting. If it is run on the full sample outside the forecasting runner, it uses full-sample moments. In a leakage-safe run, use runner preprocessing specs and window policies so the scaling state is fitted only on allowed rows.
model(..., standardize=True) standardizes inside that model’s own fit call. It is useful when only selected models need model-local scaling, or when a lasso/elastic-net replication requires the penalty grid to be defined on window-local standardized predictors. For broader model-specific transformations beyond scaling, run separate model pipelines or use a model-pipeline runner layer rather than hiding those transformations inside a single estimator.

Current scale-sensitive callable models:

Model	Backend	Scaling policy
`svr`	`sklearn.svm.SVR`	Standardize predictors with `preprocessing.standardize_panel()` or a runner preprocessing spec before fitting.
`linear_svr`	`sklearn.svm.LinearSVR`	Standardize predictors before fitting.
`nu_svr`	`sklearn.svm.NuSVR`	Standardize predictors before fitting.
`nn`	`torch.nn.Sequential`	Standardizes `X` and `y` inside each fit window and maps predictions back to target units.
`transformer`	`torch.nn.TransformerEncoder`	Standardizes `X` and `y` inside each fit window and maps predictions back to target units.
`hemisphere_nn`	torch dual-head dense network	Standardizes `X` inside each fit window, fits mean and variance heads, and returns point, variance, and normal-approximation quantile forecasts.
`density_hnn`	torch-native Aionx DensityHNN port	Standardizes `X` and `y`, estimates prior-DNN OOB volatility emphasis, fits a density HNN ensemble, and returns point, variance, volatility, and quantile forecasts.

nn, lstm, gru, transformer, and density_hnn standardize X and y inside each fit window and map predictions back to the target scale. hemisphere_nn standardizes X and keeps the target in original units because its variance head is a compact density-forecast object. Their metadata records requires_extra="deep" and requires_scaling=False.

Registered Model Catalog#

Model	Family	Input kind	Default search	Presets
`ols`	linear	supervised	`grid`	none
`ridge`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`nonneg_ridge`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`shrink_to_target_ridge`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`fused_difference_ridge`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`supervised_aggregation`	assemblage	supervised	`cv_path`	`small`, `standard`, `wide`
`component_aggregation`	assemblage	supervised	`cv_path`	`small`, `standard`, `wide`
`rank_aggregation`	assemblage	supervised	`cv_path`	`small`, `standard`, `wide`
`assemblage_regression`	assemblage	supervised	`cv_path`	`small`, `standard`, `wide`
`albacore_components`	assemblage	supervised	`cv_path`	`small`, `standard`, `wide`
`albacore_ranks`	assemblage	supervised	`cv_path`	`small`, `standard`, `wide`
`random_walk_ridge`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`tvp_ridge`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`lasso`	linear	supervised	`cv_path`	`small`, `standard`, `wide`
`elastic_net`	linear	supervised	`grid`	`small`, `standard`, `wide`
`adaptive_lasso`	linear	supervised	`grid`	`small`, `standard`, `wide`
`adaptive_elastic_net`	linear	supervised	`grid`	`small`, `standard`, `wide`
`group_lasso`	linear	supervised	`grid`	`small`, `standard`, `wide`
`sparse_group_lasso`	linear	supervised	`grid`	`small`, `standard`, `wide`
`bayesian_ridge`	linear	supervised	`grid`	none
`huber`	linear	supervised	`grid`	`small`, `standard`, `wide`
`kernel_ridge`	nonparametric	supervised	`random`	`small`, `standard`, `wide`
`knn`	nonparametric	supervised	`random`	`small`, `standard`, `wide`
`glmboost`	linear	supervised	`grid`	`small`, `standard`, `wide`
`svr`	support_vector	supervised	`random`	`small`, `standard`, `wide`
`linear_svr`	support_vector	supervised	`random`	`small`, `standard`, `wide`
`nu_svr`	support_vector	supervised	`random`	`small`, `standard`, `wide`
`nn`	neural	supervised	`random`	`small`, `standard`, `wide`
`lstm`	neural	supervised	`random`	`small`, `standard`, `wide`
`gru`	neural	supervised	`random`	`small`, `standard`, `wide`
`transformer`	neural	supervised	`random`	`small`, `standard`, `wide`
`hemisphere_nn`	neural	supervised	`random`	`small`, `standard`, `wide`
`density_hnn`	neural	supervised	`random`	`small`, `standard`, `wide`
`pls`	composite	supervised	`grid`	`small`, `standard`, `wide`
`scaled_pca`	composite	supervised	`grid`	`small`, `standard`, `wide`
`supervised_pca`	composite	supervised	`grid`	`small`, `standard`, `wide`
`supervised_scaled_pca`	composite	supervised	`grid`	`small`, `standard`, `wide`
`ar`	timeseries	target	`grid`	`small`, `standard`, `wide`
`var`	timeseries	panel	`grid`	`small`, `standard`, `wide`
`bvar_minnesota`	timeseries	panel	`grid`	`small`, `standard`, `wide`
`bvar_normal_inverse_wishart`	timeseries	panel	`grid`	`small`, `standard`, `wide`
`ets`	timeseries	target	`grid`	none
`holt_winters`	timeseries	target	`grid`	none
`theta_method`	timeseries	target	`grid`	none
`dfm_mixed_mariano_murasawa`	mixed_frequency	panel	`grid`	`small`, `standard`, `wide`
`midas_almon`	mixed_frequency	supervised	`grid`	`small`, `standard`, `wide`
`midas_beta`	mixed_frequency	supervised	`grid`	`small`, `standard`, `wide`
`midas_step`	mixed_frequency	supervised	`grid`	`small`, `standard`, `wide`
`restricted_midas`	mixed_frequency	supervised	`grid`	`small`, `standard`, `wide`
`unrestricted_midas`	mixed_frequency	supervised	`grid`	`small`, `standard`, `wide`
`dfm_unrestricted_midas`	mixed_frequency	panel	`grid`	`small`, `standard`, `wide`
`far`	factor	supervised	`grid`	`small`, `standard`, `wide`
`favar`	factor	supervised	`grid`	`small`, `standard`, `wide`
`decision_tree`	tree	supervised	`grid`	`small`, `standard`, `wide`
`random_forest`	tree	supervised	`random`	`small`, `standard`, `wide`
`extra_trees`	tree	supervised	`random`	`small`, `standard`, `wide`
`gradient_boosting`	tree	supervised	`random`	`small`, `standard`, `wide`
`mars`	spline	supervised	`random`	`small`, `standard`, `wide`
`xgboost`	tree	supervised	`random`	`small`, `standard`, `wide`
`lightgbm`	tree	supervised	`random`	`small`, `standard`, `wide`
`lgb_plus`	tree	supervised	`random`	`small`, `standard`, `wide`
`lgba_plus`	tree	supervised	`random`	`small`, `standard`, `wide`
`catboost`	tree	supervised	`random`	`small`, `standard`, `wide`
`quantile_regression_forest`	tree	supervised	`random`	`small`, `standard`, `wide`
`macro_random_forest`	tree	supervised	`random`	`small`, `standard`, `wide`
`garch11`	volatility	volatility	`grid`	`small`, `standard`, `wide`
`egarch`	volatility	volatility	`grid`	`small`, `standard`, `wide`
`realized_garch`	volatility	volatility	`grid`	`small`, `standard`, `wide`

Linear Models#

Linear implementation map#

The linear family mixes thin sklearn wrappers, hybrid macroforecast code, and package-native solvers. backend in ModelSpec records that distinction so metadata exported by describe_model(), list_model_specs(), and saved ModelFit objects is inspectable.

Model	Implementation class	Runtime backend
`ols`	external wrapper	`sklearn.linear_model.LinearRegression`
`ridge`	external wrapper	`sklearn.linear_model.Ridge`
`lasso`	external wrapper	`sklearn.linear_model.Lasso`
`elastic_net`	external wrapper	`sklearn.linear_model.ElasticNet`
`bayesian_ridge`	external wrapper	`sklearn.linear_model.BayesianRidge`
`huber`	external wrapper	`sklearn.linear_model.HuberRegressor`
`adaptive_lasso`	hybrid	macroforecast adaptive weights, final `sklearn.linear_model.Lasso`
`adaptive_elastic_net`	hybrid	macroforecast adaptive weights, final `sklearn.linear_model.ElasticNet`
`nonneg_ridge`	package-native	augmented ridge design solved by `scipy.optimize.nnls`
`shrink_to_target_ridge`	package-native	custom objective solved by `scipy.optimize.minimize(method="SLSQP")`
`fused_difference_ridge`	package-native	custom difference-penalty objective solved by SLSQP
`supervised_aggregation`, `component_aggregation`, `rank_aggregation`, `assemblage_regression`, `albacore_components`, `albacore_ranks`	package-native	Albacore/assemblage-derived constrained aggregation objectives solved by SLSQP
`random_walk_ridge`	package-native	expanded time-varying design solved by `numpy.linalg.lstsq`
`tvp_ridge`	package-native	Python port of `TVPRidge` R `tvp.ridge`, `Zfun`, `dualGRR`, and CV helpers
`group_lasso`	package-native	proximal-gradient group-lasso solver
`sparse_group_lasso`	package-native	proximal-gradient sparse-group-lasso solver
`glmboost`	package-native	componentwise L2 boosting loop

external wrapper means the statistical estimator is delegated to an external package and macroforecast only standardizes the callable contract, metadata, diagnostics, and persistence. hybrid means macroforecast owns the macro-level algorithmic transformation and delegates the final convex solver. package-native means the objective, iteration, or coefficient path logic is implemented inside macroforecast, using NumPy/SciPy only for basic numerical linear algebra or generic optimization.

R source comparison map#

The following R sources are the comparison surface for the linear models where macroforecast owns nontrivial logic. These sources are not vendored into the package. They are used as independent algorithm references; the Python code keeps short source cues in comments and implements the corresponding mathematical objective in macroforecast’s callable API.

macroforecast model	R package/source to inspect	Comparison target	Current equivalence status
`adaptive_lasso`	`glmnet`, `R/glmnet.R`: https://rdrr.io/cran/glmnet/src/R/glmnet.R	Adaptive lasso is expressible in R by computing initial weights and passing them as `penalty.factor` to lasso.	Same fixed-weight objective after macroforecast standardizes `X`, rescales columns, and normalizes weights to mean one; macroforecast fits one chosen `alpha`, while `glmnet` usually builds a lambda path.
`adaptive_elastic_net`	`glmnet`, `R/glmnet.R`: https://rdrr.io/cran/glmnet/src/R/glmnet.R	Adaptive elastic net uses the same adaptive weights with elastic-net mixing.	Same fixed-weight idea with mean-one penalty weights; macroforecast delegates the final fit to sklearn `ElasticNet` rather than glmnet’s path solver.
`nonneg_ridge`	`nnls`, `R/nnls.R`: https://rdrr.io/cran/nnls/src/R/nnls.R; also `glmnet` lower bounds	NNLS solves least squares under coefficient non-negativity.	Equivalent to NNLS on the augmented design `[X; sqrt(alpha) I]` and response `[y; 0]`, after optional centering.
`shrink_to_target_ridge`	`penalized`: https://search.r-project.org/CRAN/refmans/penalized/html/penalized.html; target ridge family in `rags2ridges`: https://rdrr.io/cran/rags2ridges/src/R/rags2ridges.R	Compare target-shrinkage/tikhonov logic, not an identical regression API.	No exact same R regression callable found. macroforecast solves `
`fused_difference_ridge`	fused L2 ridge family in `rags2ridgesFused.R`: https://rdrr.io/cran/rags2ridges/src/R/rags2ridgesFused.R	Compare L2 fusion/smoothness penalty structure.	Not identical domain: R source is primarily fused ridge for precision matrices; macroforecast applies an L2 finite-difference penalty directly to regression coefficients.
`component_aggregation`, `albacore_components`	`assemblage`, `R/assemblage_v240228.R`: `nonneg.ridge.sum1`	Nonnegative component weights, optional target-weight shrinkage, sum-to-one basket constraint.	Same fixed-alpha objective family as the R CVXR fit: SSE plus feature-std-scaled target shrinkage, `w >= 0`, and `sum(w)=1`. R owns block CV for lambda; macroforecast delegates alpha selection to model selection/forecasting.
`rank_aggregation`, `albacore_ranks`	`assemblage`, `R/assemblage_v240228.R`: `x.transformation`, `nonneg.ridge.meanD`	Sort components into rank space, estimate nonnegative smooth rank weights with a mean-matching constraint.	Same fixed-alpha rank objective family: row sorting, fused difference penalty on scaled rank weights, `w >= 0`, and `mean(Xw)=mean(y)`.
`supervised_aggregation`, `assemblage_regression`	`assemblage`, `R/assemblage_v240228.R`: `assemblage`, `nonneg.ridge`, `nonneg.ridge.mean`, `nonneg.ridge.sum1`, `nonneg.ridge.meanD`	Generic component/rank supervised aggregation.	Exposes the reusable primitives without requiring inflation data. Paper-specific inflation semantics live in the `albacore_*` wrappers.
`random_walk_ridge`	`walker`, `R/walker.R`: https://rdrr.io/cran/walker/man/walker.html	Random-walk coefficients in a time-varying regression.	Same modeling prior idea, different inference: `walker` is Bayesian/state-space via Stan; macroforecast computes the penalized least-squares MAP-style coefficient path and predicts with the final vector.
`tvp_ridge`	`TVPRidge`, local source `wiki/raw/paper_code/coulombe_site_github_20260530/tvpridge/R/MV2SRR_v210407.R`; upstream philgoucou/tvpridge	Goulet Coulombe TVP ridge / two-step ridge regression.	Direct Python port of the R `tvp.ridge` pipeline: `Zfun` expansion, `dualGRR` dual/primal generalized ridge, R-style random-fold CV helpers, 2SRR coefficient-innovation reweighting, residual-volatility reweighting, and R return fields.
`group_lasso`	`grpreg`, `R/grpreg.R`: https://rdrr.io/cran/grpreg/src/R/grpreg.R	Group penalty over coefficient blocks.	Same group-lasso penalty family for Gaussian loss; macroforecast uses a single-alpha proximal-gradient solver rather than a full regularization path.
`sparse_group_lasso`	`sparsegl`, `R/sparsegl.R`: https://rdrr.io/cran/sparsegl/src/R/sparsegl.R	Sparse group lasso objective with group and feature-level penalties.	Same penalty decomposition; macroforecast uses one selected `alpha` and `l1_ratio`, while `sparsegl` is a full path solver with additional bounds/families.
`glmboost`	`mboost`, `R/mboost.R`: https://rdrr.io/cran/mboost/src/R/mboost.R; component learner in `R/bolscw.R`; Goulet Coulombe et al. (2021) Appendix A.6 for random candidate sampling	Componentwise gradient/L2 boosting.	Same Gaussian componentwise L2 update: center predictors by default, select the base learner by normalized correlation, and apply shrinkage. The paper’s per-step random candidate rule is expressed as `candidate_sampling="random"`, `candidate_fraction=1/3`, `candidate_cap=200`, `candidate_rounding="floor"`, not as a hidden preset. macroforecast omits formula handling, weights, families, hat values, and stopping machinery.

The direct implementations should be reviewed against the objective, scaling, intercept handling, penalty normalization, and solver stopping rule separately. Matching an R package name is not enough: several R implementations solve a path problem, a Bayesian state-space problem, or a matrix-estimation problem, whereas macroforecast exposes a single callable forecasting estimator.

ols#

macroforecast.models.ols(X, y)

Fits ordinary least squares.

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Backend	`sklearn.linear_model.LinearRegression`
Default params	none
Tunable params	none
Preset search spaces	none

ridge#

macroforecast.models.ridge(X, y, *, alpha=1.0)

Fits ridge regression with an L2 penalty.

Backend: sklearn.linear_model.Ridge.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	L2 penalty strength.

Preset	`alpha`
`small`	`(0.01, 0.1, 1.0)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0)`

Default model-selection method: cv_path.

nonneg_ridge#

macroforecast.models.nonneg_ridge(X, y, *, alpha=1.0, fit_intercept=True)

Fits ridge regression with coefficients constrained to be non-negative. This uses SciPy NNLS on an augmented ridge design, so it does not require cvxpy.

Backend: package-native augmented ridge design plus scipy.optimize.nnls.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	L2 penalty strength.
`fit_intercept`	`True`	fixed by preset	Fit an intercept outside the constrained coefficients.

Default model-selection method: cv_path.

shrink_to_target_ridge#

macroforecast.models.shrink_to_target_ridge(
    X,
    y,
    *,
    alpha=1.0,
    prior_target=None,
    simplex=False,
    nonneg=False,
    fit_intercept=True,
    max_iter=1000,
    tol=1e-9,
)

Fits a ridge-type model where coefficients are shrunk toward a user-specified target vector. prior_target can be a scalar, a sequence ordered like X columns, or a mapping from column name to target coefficient. If prior_target=None, the target is zero, except under simplex=True, where the target is a uniform coefficient vector. simplex=True constrains coefficients to sum to one and uses no intercept; nonneg=True also enforces non-negative coefficients. The solver is SciPy SLSQP.

Backend: package-native objective plus scipy.optimize.minimize(method="SLSQP").

R comparison: this is a regression analogue of target-ridge/Tikhonov shrinkage. rags2ridges uses target ridge for covariance and precision matrices, not a direct X, y regression callable, but the same target idea is present: shrink an estimated parameter object toward a target rather than toward zero. In the unconstrained regression case, macroforecast solves

min_beta ||y - X beta||^2 + alpha ||beta - beta0||^2

with closed-form normal equation

(X'X + alpha I) beta = X'y + alpha beta0

after optional centering for the intercept. simplex=True changes the problem into a forecast-combination form: coefficients must sum to one, no intercept is fit, and prior_target=None means a uniform target vector.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Strength of shrinkage toward `prior_target`.
`prior_target`	`None`	fixed by preset	Scalar, sequence, mapping, or `None`.
`simplex`	`False`	fixed by preset	Constrain coefficients to sum to one.
`nonneg`	`False`	fixed by preset	Constrain coefficients to be non-negative.
`fit_intercept`	`True`	fixed by preset	Fit an intercept unless `simplex=True`.
`max_iter`	`1000`	fixed by preset	SLSQP iteration cap.
`tol`	`1e-9`	fixed by preset	SLSQP tolerance.

Default model-selection method: cv_path.

fused_difference_ridge#

macroforecast.models.fused_difference_ridge(
    X,
    y,
    *,
    alpha=1.0,
    difference_order=1,
    mean_equality=False,
    nonneg=False,
    fit_intercept=True,
    max_iter=1000,
    tol=1e-9,
)

Fits ridge regression with a finite-difference penalty on adjacent coefficients. This is useful when columns have an ordered meaning such as lag age, maturity, or horizon and neighboring coefficients should vary smoothly. difference_order=1 penalizes first differences; larger orders penalize higher order coefficient curvature. mean_equality=True adds a conservation-style constraint that the fitted and observed sums match and uses no intercept.

Backend: package-native finite-difference objective plus SLSQP.

R comparison: rags2ridges::ridgeP.fused uses a fused L2 penalty for related precision matrices. fused_difference_ridge() uses the same penalty idea on a single ordered regression-coefficient vector. With no sign or equality constraints, macroforecast solves

min_beta ||y - X beta||^2 + alpha ||D beta||^2

where D is the finite-difference matrix over adjacent coefficients. The closed-form normal equation is

(X'X + alpha D'D) beta = X'y

after optional centering for the intercept. mean_equality=True is a macro-forecasting conservation variant; it constrains fitted and observed sums to match and is intentionally outside the rags2ridges precision-matrix API.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Strength of the smoothness penalty.
`difference_order`	`1`	fixed by preset	Finite-difference order applied to coefficients.
`mean_equality`	`False`	fixed by preset	Constrain fitted and observed sums to match.
`nonneg`	`False`	fixed by preset	Constrain coefficients to be non-negative.
`fit_intercept`	`True`	fixed by preset	Fit an intercept unless `mean_equality=True`.
`max_iter`	`1000`	fixed by preset	SLSQP iteration cap.
`tol`	`1e-9`	fixed by preset	SLSQP tolerance.

Default model-selection method: cv_path.

Assemblage / Supervised Aggregation#

This family is derived from Goulet Coulombe, Klieber, Barrette, and Goebel, Maximally Forward-Looking Core Inflation, and the R package assemblage. The package splits the paper model into generic reusable primitives plus thin inflation-specific wrappers.

The generic problem is:

given components X_t and future aggregate target y_t,h,
learn weights w so X_t w predicts y_t,h

This is not ordinary ridge in disguise. The weights can be constrained to be nonnegative, sum to one, match the target mean, shrink toward reference basket weights, or vary smoothly across ranks. For inflation, those weights form an Albacore core-inflation measure. Outside inflation, the same functions can aggregate sectors, states, industries, survey items, or regional indicators.

supervised_aggregation#

macroforecast.models.supervised_aggregation(
    X,
    y,
    *,
    space="component",
    penalty="ridge",
    alpha=1.0,
    reference_weights=None,
    nonneg=True,
    simplex=False,
    mean_match=False,
    difference_order=1,
    fit_intercept=False,
    penalty_scale="feature_std",
    max_iter=1000,
    tol=1e-9,
)

Parameter	Default	Choices	Meaning
`space`	`"component"`	`"component"`, `"rank"`	Use named components or row-wise sorted order statistics.
`penalty`	`"ridge"`	`"ridge"`, `"target_shrinkage"`, `"fused_difference"`	Coefficient penalty family.
`alpha`	`1.0`	nonnegative float	Penalty strength; tune with `model_selection` or `forecasting`.
`reference_weights`	`None`	mapping, sequence, `Series`, or `None`	Target weights for `target_shrinkage`.
`nonneg`	`True`	bool	Enforce `w_j >= 0`.
`simplex`	`False`	bool	Enforce `sum(w)=1`.
`mean_match`	`False`	bool	Enforce `mean(Xw)=mean(y)`.
`difference_order`	`1`	positive int	Difference order for fused rank weights.
`fit_intercept`	`False`	bool	Fit an intercept outside the aggregation weights when no equality constraint is active.
`penalty_scale`	`"feature_std"`	`"feature_std"`, `"none"`	Match the R assemblage convention by scaling penalties with feature standard deviations.

Output: ModelFit. The fitted estimator exposes coef_, weights_, and, for rank space, rank_weight_curve_. Diagnostics include fitted values, residuals, metrics, and coefficient weights.

component_aggregation#

macroforecast.models.component_aggregation(
    X,
    y,
    *,
    alpha=1.0,
    reference_weights=None,
    penalty=None,
    simplex=True,
    nonneg=True,
    penalty_scale="feature_std",
    max_iter=1000,
    tol=1e-9,
)

Component-space aggregation estimates weights on named columns. With reference_weights supplied, penalty=None selects target_shrinkage, making this the generic version of Albacorecomps. Without reference weights, it is a nonnegative simplex ridge basket.

R source cue: nonneg.ridge.sum1 in assemblage_v240228.R.

rank_aggregation#

macroforecast.models.rank_aggregation(
    X,
    y,
    *,
    alpha=1.0,
    penalty="fused_difference",
    mean_match=True,
    nonneg=True,
    difference_order=1,
    penalty_scale="feature_std",
    max_iter=1000,
    tol=1e-9,
)

Rank-space aggregation sorts each row of X before fitting, then learns weights on rank_1, rank_2, … rather than on named components. This is a generic supervised trimmed-mean model. The fitted object stores estimator.rank_weight_curve_, a table with rank, percentile, and weight.

R source cue: x.transformation plus nonneg.ridge.meanD.

assemblage_regression#

macroforecast.models.assemblage_regression(
    X,
    y,
    *,
    space="component",
    alpha=1.0,
    reference_weights=None,
    penalty=None,
    max_iter=1000,
    tol=1e-9,
)

Convenience wrapper over component_aggregation() and rank_aggregation(). Use it when the model family is known to be assemblage-style but the final choice between component and rank space is part of the experiment design.

albacore_components#

macroforecast.models.albacore_components(
    X,
    y,
    *,
    reference_weights=None,
    alpha=1.0,
    max_iter=1000,
    tol=1e-9,
)

Inflation-specific wrapper for component-space Albacore. X should be a panel of price-component changes, y should be the forward average headline inflation target, and reference_weights should be official basket or expenditure weights when available. The wrapper sets nonneg=True, simplex=True, penalty="target_shrinkage", and fit_intercept=False.

albacore_ranks#

macroforecast.models.albacore_ranks(
    X,
    y,
    *,
    alpha=1.0,
    difference_order=1,
    max_iter=1000,
    tol=1e-9,
)

Inflation-specific wrapper for rank-space Albacore. X should be price component changes and y should be the forward average headline inflation target. The wrapper sorts components row by row, estimates nonnegative fused rank weights, and enforces the Albacoreranks mean-matching constraint.

Low-Level Solver Helpers#

These return a weight Series rather than a full ModelFit:

Function	Meaning
`solve_nonnegative_ridge(X, y, alpha=...)`	R `nonneg.ridge`-style nonnegative ridge weights.
`solve_simplex_ridge(X, y, alpha=...)`	Nonnegative weights constrained to sum to one.
`solve_target_shrinkage_ridge(X, y, reference_weights=..., alpha=...)`	R `nonneg.ridge.sum1`-style component basket weights.
`solve_mean_aligned_ridge(X, y, alpha=...)`	R `nonneg.ridge.mean`-style nonnegative mean-aligned weights.
`solve_fused_difference_ridge(X, y, alpha=...)`	R `nonneg.ridge.meanD`-style fused rank-weight primitive.

These helpers are deliberately not inflation-specific. They exist so users can compose custom supervised aggregation models without taking the Albacore wrappers.

random_walk_ridge#

macroforecast.models.random_walk_ridge(
    X,
    y,
    *,
    alpha=1.0,
    initial_alpha=1.0,
    fit_intercept=True,
)

Fits a time-varying coefficient path with a random-walk penalty:

sum_t (y_t - x_t beta_t)^2
+ initial_alpha * ||beta_1||^2
+ alpha * sum_t ||beta_t - beta_{t-1}||^2

Predictions use the final estimated coefficient vector. The full fitted path is stored on the estimator as coef_path_, and standard ModelFit diagnostics record the final coefficients, fitted values, and residuals.

Backend: package-native expanded design solved by numpy.linalg.lstsq.

R comparison: walker::walker_rw1 is the closest R source. It treats coefficients as random-walk state variables and estimates a Bayesian posterior with Stan / state-space smoothing. random_walk_ridge() keeps the same RW1 prior idea but solves the penalized least-squares MAP-style objective as one augmented linear system over the full coefficient path:

min_{beta_1,...,beta_T}
sum_t (y_t - x_t beta_t)^2
+ initial_alpha ||beta_1||^2
+ alpha sum_t ||beta_t - beta_{t-1}||^2

The fitted coef_path_ is the estimated path. predict() uses only the final coefficient vector, because this callable is a deterministic forecasting model, not a posterior simulation or Kalman-smoothing interface. fit_intercept=True centers the fit and recovers a static intercept from the final coefficient vector.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Penalty on adjacent coefficient changes.
`initial_alpha`	`1.0`	fixed by preset	Penalty on the first coefficient vector.
`fit_intercept`	`True`	fixed by preset	Fit an intercept outside the time-varying coefficient path.

Default model-selection method: cv_path.

tvp_ridge#

macroforecast.models.tvp_ridge(
    X,
    y,
    *,
    lambda_candidates=None,
    oosX=None,
    lambda2=0.1,
    kfold=5,
    cv_plot=False,
    cv_2srr=True,
    sig_u_param=0.75,
    sig_eps_param=0.75,
    ols_prior=False,
    random_state=1071,
    use_garch=True,
)

Fits Philippe Goulet Coulombe’s TVP ridge / two-step ridge regression estimator from Time-varying parameters as ridge regressions (International Journal of Forecasting, DOI https://doi.org/10.1016/j.ijforecast.2024.08.006). The implementation is a Python port of the R package TVPRidge, source file R/MV2SRR_v210407.R, local snapshot:

wiki/raw/paper_code/coulombe_site_github_20260530/tvpridge/R/MV2SRR_v210407.R

This is not a thin wrapper around random_walk_ridge(). Both models use a random-walk coefficient idea, but tvp_ridge() ports the paper’s full estimator:

Stage	R function	Python implementation
Basis expansion	`Zfun`	`_tvp_z_basis`
Generalized ridge solve	`dualGRR`	`_dual_generalized_ridge`
Initial lambda CV	`CV.KF.MV`	`_cv_kfold_multivariate`
Second-step lambda CV	`cv.univariate`	`_cv_univariate`
Dropout correction	`cumul_zeros`	`_cumul_zeros`
Public callable	`tvp.ridge`	`tvp_ridge` / `TVPRidgeRegressor`

The estimated path solves the paper’s time-varying parameter ridge problem:

min_{beta_1,...,beta_T}
sum_t (y_t - x_t beta_t)^2
+ lambda * sum_t ||beta_t - beta_{t-1}||^2
+ lambda2 * ||beta_0||^2

lambda controls the amount of time variation. Large values force smoother coefficient paths; small values allow more movement. lambda2 is the soft penalty on the starting coefficient values. The R code standardizes by sample standard deviation without centering; macroforecast follows that convention and rescales coefficient paths and fitted values back to the original data scale.

The 2SRR step follows the R package logic. First, the homogeneous ridge TVP is estimated. Then coefficient innovations are used to build coefficient-specific variance weights, and residual volatility weights are optionally estimated by a GARCH(1,1) backend. The model is refit with those weights. If Python package arch is unavailable or the GARCH fit fails, residual-volatility weights fall back to ones and the reason is recorded in fit.estimator.diagnostics_["garch_status"]; the ridge/2SRR fit still runs.

Input:

Argument	Required	Expected object	Meaning
`X`	yes	pandas DataFrame, NumPy array, or `FeatureSet`	Predictor matrix with shape `T x K`.
`y`	yes unless `X` is a `FeatureSet`	pandas Series or one-column DataFrame for the public `ModelFit` wrapper	Target series aligned to `X`.
`lambda_candidates`	no	sequence of positive floats or `None`	Candidate values for the time-variation penalty. `None` uses the R default grid.
`oosX`	no	one predictor vector of length `K`	Optional one-step forecast using the final coefficient vector.

Output:

Object	Type	Contents
return value	`ModelFit`	Standard macroforecast fitted model wrapper.
`fit.estimator.betas_rr_`	NumPy array, shape `M x (K+1) x T`	First-step ridge TVP coefficient paths, original scale.
`fit.estimator.betas_2srr_`	NumPy array, shape `M x (K+1) x T`	2SRR coefficient paths, original scale.
`fit.estimator.lambdas_`	NumPy array	Initial CV lambda for each target.
`fit.estimator.lambda_step2_`	NumPy array	Second-step lambda used after reweighting.
`fit.estimator.yhat_rr_`	DataFrame	In-sample first-step fitted values.
`fit.estimator.yhat_2srr_`	DataFrame	In-sample 2SRR fitted values.
`fit.estimator.sig_eps_`	DataFrame	Normalized residual-volatility weights.
`fit.estimator.forecast_`	NumPy array	Optional forecast when `oosX` is supplied.
`fit.estimator.coef_path_`	DataFrame	Final 2SRR path for the first target, excluding intercept.
`fit.estimator.coef_path_full_`	DataFrame	MultiIndex coefficient path including intercept and target names.

Prediction rule:

Call	Behavior
`fit.predict(X_train)` with the original training index	Returns the time-varying in-sample `yhat_2srr_` path.
`fit.predict(X_new)` with new rows	Uses the final estimated coefficient vector `beta_T`.

Default parameters:

Parameter	Default	Tunable	Meaning
`lambda_candidates`	`exp(linspace(-6, 20, 15))`	yes	R default candidate grid for the time-variation penalty.
`lambda2`	`0.1`	fixed by preset	Soft penalty on starting coefficient values.
`kfold`	`5`	fixed by preset	Number of random CV folds.
`cv_2srr`	`True`	fixed by preset	Re-run lambda CV after variance reweighting.
`sig_u_param`	`0.75`	fixed by preset	Shrinkage exponent for coefficient-innovation variance weights.
`sig_eps_param`	`0.75`	fixed by preset	Shrinkage exponent for residual-volatility weights.
`ols_prior`	`False`	fixed by preset	Shrink starting coefficients toward OLS rather than zero.
`random_state`	`1071`	fixed by preset	Fold seed matching the R source’s `set.seed(1071)` convention.
`use_garch`	`True`	fixed by preset	Use optional Python `arch` GARCH(1,1) for residual-volatility weights.

R parity notes:

Topic	Status
Standardization	Matches R: divide `X` and `Y` by sample standard deviation, no centering.
Basis columns	Matches R `Zfun`: innovation blocks by coefficient, then static intercept/predictor block.
Dual/primal solve	Matches R `dualGRR` algebra, with `numpy.linalg.solve` and pseudo-inverse fallback for singular systems.
CV folds	Same random-fold design and default seed, but NumPy’s RNG is not bit-identical to R’s `sample()`.
GARCH volatility	Uses optional Python `arch` backend; if unavailable, records fallback and continues with homogeneous residual weights.
Multivariate `Y`	Estimator internals preserve `M x (K+1) x T` arrays; the public `ModelFit` wrapper is optimized for one target, consistent with macroforecast’s standard supervised callable.

Default model-selection method: cv_path.

lasso#

macroforecast.models.lasso(
    X,
    y,
    *,
    alpha=1.0,
    max_iter=20000,
    standardize=False,
)

Fits lasso regression with an L1 penalty. There is no lasso_path() model callable; use get_model("lasso") and model_selection.select_params().

Backend: sklearn.linear_model.Lasso.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	L1 penalty strength.
`max_iter`	`20000`	fixed by preset	Optimization iteration cap.
`standardize`	`False`	fixed by preset	Standardize predictors inside the fitted estimator. Defaults to `False` because preprocessing/window policy usually owns scaling. Set `True` for lasso-style replications where scaling must be fit inside each model window.

Preset	`alpha`
`small`	`(0.01, 0.1, 1.0)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0)`

Default model-selection method: cv_path.

elastic_net#

macroforecast.models.elastic_net(
    X,
    y,
    *,
    alpha=1.0,
    l1_ratio=0.5,
    max_iter=20000,
    standardize=False,
)

Fits elastic net regression.

Backend: sklearn.linear_model.ElasticNet.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Overall penalty strength.
`l1_ratio`	`0.5`	yes	L1 share of the elastic-net penalty.
`max_iter`	`20000`	fixed by preset	Optimization iteration cap.
`standardize`	`False`	fixed by preset	Standardize predictors inside the fitted estimator. Defaults to `False`; set `True` for glmnet/MATLAB-style elastic-net replications whose penalty grid assumes window-local standardized predictors.

Preset	`alpha`	`l1_ratio`
`small`	`(0.01, 0.1, 1.0)`	`(0.25, 0.5, 0.75)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.1, 0.25, 0.5, 0.75, 0.9)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0)`	`(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95)`

adaptive_lasso#

macroforecast.models.adaptive_lasso(
    X,
    y,
    *,
    alpha=1.0,
    gamma=1.0,
    initial="ridge",
    initial_alpha=1.0,
    eps=1e-4,
    normalize_weights=True,
    max_iter=20000,
    tol=1e-4,
    random_state=None,
)

Fits adaptive lasso. The model first estimates initial coefficients with initial="ridge" or initial="ols", builds feature weights 1 / (abs(beta_init) + eps) ** gamma, and fits lasso on weighted standardized predictors. Predictions are mapped back to the original target scale.

Backend: macroforecast adaptive-weight construction plus final sklearn.linear_model.Lasso.

R/glmnet comparison: glmnet accepts the same idea through penalty.factor. It internally rescales penalty factors to sum to the number of predictors, so macroforecast defaults to normalize_weights=True, which rescales adaptive weights to mean one before fitting the final lasso. Set normalize_weights=False only when the absolute weight scale should change the effective penalty strength.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Final adaptive lasso penalty strength.
`gamma`	`1.0`	yes	Exponent applied to initial coefficient weights.
`initial`	`"ridge"`	manual	Initial model: `"ridge"` or `"ols"`.
`initial_alpha`	`1.0`	fixed by preset	Initial ridge penalty.
`eps`	`1e-4`	fixed by preset	Small denominator floor for adaptive weights.
`normalize_weights`	`True`	fixed by preset	Rescale adaptive weights to mean one, matching `glmnet` penalty-factor scaling.
`max_iter`	`20000`	fixed by preset	Final solver iteration cap.
`tol`	`1e-4`	fixed by preset	Final solver convergence tolerance.
`random_state`	`None`	fixed by preset	Final solver random seed.

Preset	`alpha`	`gamma`
`small`	`(0.01, 0.1, 1.0)`	`(1.0,)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.5, 1.0, 2.0)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.5, 1.0, 1.5, 2.0)`

adaptive_elastic_net#

macroforecast.models.adaptive_elastic_net(
    X,
    y,
    *,
    alpha=1.0,
    l1_ratio=0.5,
    gamma=1.0,
    initial="ridge",
    initial_alpha=1.0,
    eps=1e-4,
    normalize_weights=True,
    max_iter=20000,
    tol=1e-4,
    random_state=None,
)

Fits an adaptive elastic-net variant with the same initial coefficient weights as adaptive_lasso, followed by an elastic-net fit on weighted standardized predictors.

Backend: macroforecast adaptive-weight construction plus final sklearn.linear_model.ElasticNet.

R/glmnet comparison: this is the elastic-net analogue of adaptive lasso. normalize_weights=True gives the same mean-one penalty-factor convention as glmnet; the remaining difference is solver style, because macroforecast fits one selected alpha while glmnet usually estimates a regularization path.

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Final adaptive elastic-net penalty strength.
`l1_ratio`	`0.5`	yes	L1 share of the final elastic-net penalty.
`gamma`	`1.0`	yes	Exponent applied to initial coefficient weights.
`initial`	`"ridge"`	manual	Initial model: `"ridge"` or `"ols"`.
`initial_alpha`	`1.0`	fixed by preset	Initial ridge penalty.
`eps`	`1e-4`	fixed by preset	Small denominator floor for adaptive weights.
`normalize_weights`	`True`	fixed by preset	Rescale adaptive weights to mean one, matching `glmnet` penalty-factor scaling.
`max_iter`	`20000`	fixed by preset	Final solver iteration cap.
`tol`	`1e-4`	fixed by preset	Final solver convergence tolerance.
`random_state`	`None`	fixed by preset	Final solver random seed.

Preset	`alpha`	`l1_ratio`	`gamma`
`small`	`(0.01, 0.1, 1.0)`	`(0.25, 0.5, 0.75)`	`(1.0,)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.1, 0.25, 0.5, 0.75, 0.9)`	`(0.5, 1.0, 2.0)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.05, 0.1, 0.25, 0.5, 0.75, 0.9)`	`(0.5, 1.0, 1.5, 2.0)`

group_lasso#

macroforecast.models.group_lasso(
    X,
    y,
    *,
    groups=None,
    alpha=1.0,
    group_weights=None,
    max_iter=5000,
    tol=1e-5,
    scale=True,
)

Fits group lasso with a package-native proximal-gradient solver. groups must contain one label per predictor column. If groups=None, each predictor is treated as its own group.

Backend: package-native proximal-gradient solver.

R comparison: this follows the Gaussian group-lasso objective used by grpreg::grpreg(..., penalty = "grLasso"): standardized predictors, group-level L2 shrinkage, and default group weights proportional to sqrt(group_size). macroforecast fits one selected alpha and does not reproduce grpreg’s full path solver, GLM families, C backend, or within-group orthogonalization step.

Parameter	Default	Tunable	Meaning
`groups`	`None`	manual	One group label per predictor.
`alpha`	`1.0`	yes	Group penalty strength.
`group_weights`	`None`	manual	Optional group penalty weights; default is `sqrt(group_size)`.
`max_iter`	`5000`	fixed by preset	Proximal-gradient iteration cap.
`tol`	`1e-5`	fixed by preset	Proximal-gradient convergence tolerance.
`scale`	`True`	fixed by preset	Whether to standardize predictors inside the model.

Preset	`alpha`
`small`	`(0.01, 0.1, 1.0)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0)`

sparse_group_lasso#

macroforecast.models.sparse_group_lasso(
    X,
    y,
    *,
    groups=None,
    alpha=1.0,
    l1_ratio=0.5,
    group_weights=None,
    max_iter=5000,
    tol=1e-5,
    scale=True,
)

Fits sparse group lasso. l1_ratio controls the feature-level L1 share; the remaining penalty share is applied at the group level.

Backend: package-native proximal-gradient solver.

R comparison: this follows the sparse-group penalty decomposition used by sparsegl::sparsegl: a feature-level L1 part plus a group L2 part with default sqrt(group_size) group weights. macroforecast fits one selected alpha and l1_ratio; it does not reproduce sparsegl’s full lambda path, bounds, GLM families, or C++ backend.

Parameter	Default	Tunable	Meaning
`groups`	`None`	manual	One group label per predictor.
`alpha`	`1.0`	yes	Total sparse-group penalty strength.
`l1_ratio`	`0.5`	yes	Feature-level L1 share.
`group_weights`	`None`	manual	Optional group penalty weights; default is `sqrt(group_size)`.
`max_iter`	`5000`	fixed by preset	Proximal-gradient iteration cap.
`tol`	`1e-5`	fixed by preset	Proximal-gradient convergence tolerance.
`scale`	`True`	fixed by preset	Whether to standardize predictors inside the model.

Preset	`alpha`	`l1_ratio`
`small`	`(0.01, 0.1, 1.0)`	`(0.25, 0.5, 0.75)`
`standard`	`(0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.1, 0.25, 0.5, 0.75, 0.9)`
`wide`	`(0.0001, 0.001, 0.01, 0.1, 1.0, 10.0)`	`(0.05, 0.1, 0.25, 0.5, 0.75, 0.9)`

bayesian_ridge#

macroforecast.models.bayesian_ridge(X, y)

Fits sklearn empirical-Bayes Bayesian ridge.

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Backend	`sklearn.linear_model.BayesianRidge`
Default params	sklearn defaults
Tunable params	none in the clean preset catalog
Preset search spaces	none

huber#

macroforecast.models.huber(X, y, *, epsilon=1.35, max_iter=1000)

Fits robust Huber regression.

Backend: sklearn.linear_model.HuberRegressor.

Parameter	Default	Tunable	Meaning
`epsilon`	`1.35`	yes	Huber loss transition threshold.
`max_iter`	`1000`	fixed by preset	Optimization iteration cap.

Preset	`epsilon`
`small`	`(1.1, 1.35, 1.75)`
`standard`	`(1.1, 1.35, 1.5, 1.75, 2.0)`
`wide`	`(1.01, 1.1, 1.35, 1.5, 1.75, 2.0, 2.5)`

Kernel And Nonparametric Models#

These models are external sklearn wrappers, not package-native numerical solvers. They live in macroforecast.models.nonparametric and are re-exported from macroforecast.models and top-level macroforecast.

kernel_ridge#

macroforecast.models.kernel_ridge(
    X,
    y,
    *,
    alpha=1.0,
    kernel="linear",
    gamma=None,
    degree=3,
    coef0=1.0,
)

Fits sklearn kernel ridge regression. This model is scale-sensitive for nonlinear kernels, so standardize predictors before rbf, poly, or sigmoid kernels.

Backend: sklearn.kernel_ridge.KernelRidge.

R parity is intentionally not claimed for this callable. It is a thin sklearn backend wrapper; macroforecast owns only the pandas X, y contract, ModelFit metadata/diagnostics, and search-space registration.

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Internal scaling	none
`ModelSpec.requires_scaling`	`True`
Default model-selection method	`random`

Parameter	Default	Tunable	Meaning
`alpha`	`1.0`	yes	Ridge penalty strength.
`kernel`	`"linear"`	search option	Kernel name.
`gamma`	`None`	search option	Kernel coefficient.
`degree`	`3`	search option	Polynomial kernel degree.
`coef0`	`1.0`	fixed by preset	Independent term for polynomial/sigmoid kernels.

Preset	`alpha`	`kernel`	extra searched params
`small`	`(0.1, 1.0, 10.0)`	`("linear", "rbf")`	none
`standard`	`(0.01, 0.1, 1.0, 10.0)`	`("linear", "rbf", "poly")`	`gamma=(None, 0.01, 0.1)`
`wide`	`(0.001, 0.01, 0.1, 1.0, 10.0, 100.0)`	`("linear", "rbf", "poly", "sigmoid")`	`gamma=(None, 0.001, 0.01, 0.1, 1.0)`, `degree=(2, 3, 4)`

knn#

macroforecast.models.knn(
    X,
    y,
    *,
    n_neighbors=5,
    weights="uniform",
    metric="minkowski",
    p=2,
)

Fits sklearn k-nearest-neighbor regression. This is distance-based and should usually receive standardized predictors.

Backend: sklearn.neighbors.KNeighborsRegressor.

R parity is intentionally not claimed for this callable. It is a thin sklearn backend wrapper; macroforecast owns only the pandas X, y contract, small-window n_neighbors resolution, ModelFit metadata/diagnostics, and search-space registration.

If the requested n_neighbors is larger than the fitted sample size, macroforecast resolves it down to n_obs before constructing the sklearn estimator. The fit metadata records the effective n_neighbors and, when different, requested_n_neighbors. This avoids small-window forecasting runs failing at prediction time.

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Internal scaling	none
`ModelSpec.requires_scaling`	`True`
Default model-selection method	`random`

Parameter	Default	Tunable	Meaning
`n_neighbors`	`5`	yes	Number of nearest neighbors.
`weights`	`"uniform"`	yes	`"uniform"` or `"distance"`.
`metric`	`"minkowski"`	fixed by preset	Distance metric.
`p`	`2`	search option	Minkowski distance order.

Preset	`n_neighbors`	`weights`	extra searched params
`small`	`(3, 5, 10)`	`("uniform", "distance")`	none
`standard`	`(3, 5, 10, 20)`	`("uniform", "distance")`	`p=(1, 2)`
`wide`	`(1, 3, 5, 10, 20, 40)`	`("uniform", "distance")`	`p=(1, 2)`

Linear Boosting#

glmboost#

macroforecast.models.glmboost(
    X,
    y,
    *,
    n_iter=100,
    learning_rate=0.1,
    center=True,
    candidate_sampling="all",
    candidate_count=None,
    candidate_fraction=None,
    candidate_cap=None,
    candidate_min=1,
    candidate_rounding="floor",
    random_state=None,
)

Fits componentwise L2 boosting with linear base learners.

Backend: package-native componentwise L2 boosting loop. The R comparison target is mboost::glmboost. macroforecast implements the matrix-input Gaussian path: predictors are centered by default, each iteration selects the column with the largest normalized correlation with the current residual, and the selected least-squares coefficient is shrunk by learning_rate. Candidate sampling is deliberately decomposed into separate arguments. For Goulet Coulombe, Leroux, Stevanovic, and Surprenant (2021), Appendix A.6, use candidate_sampling="random", candidate_fraction=1/3, candidate_cap=200, and candidate_rounding="floor", which gives m=min(200, floor(n_features / 3)) sampled predictors at each boosting step.

Parameter	Default	Tunable	Meaning
`n_iter`	`100`	yes	Number of boosting iterations.
`learning_rate`	`0.1`	yes	Shrinkage applied to each update.
`center`	`True`	no	Center predictors before componentwise updates, matching `mboost::glmboost(..., center = TRUE)`.
`candidate_sampling`	`"all"`	fixed by preset	`"all"` searches every usable predictor each step; `"random"` samples a candidate subset before selecting the best base learner.
`candidate_count`	`None`	fixed by preset	Fixed sampled candidate count when `candidate_sampling="random"`. Mutually exclusive with `candidate_fraction`.
`candidate_fraction`	`None`	fixed by preset	Fraction of predictors sampled each step when `candidate_sampling="random"`.
`candidate_cap`	`None`	fixed by preset	Maximum sampled candidate count after resolving `candidate_count` or `candidate_fraction`.
`candidate_min`	`1`	fixed by preset	Minimum sampled candidate count.
`candidate_rounding`	`"floor"`	fixed by preset	Rounding rule for `candidate_fraction`: `"floor"`, `"ceil"`, or `"round"`.
`random_state`	`None`	fixed by preset	Seed for per-step candidate feature sampling when `candidate_sampling="random"`.

Preset	`n_iter`	`learning_rate`
`small`	`(50, 100)`	`(0.05, 0.1)`
`standard`	`(50, 100, 200, 500)`	`(0.01, 0.05, 0.1)`
`wide`	`(50, 100, 200, 500, 1000)`	`(0.005, 0.01, 0.05, 0.1, 0.2)`

Support-Vector Models#

Support-vector models are sklearn-backed and live in the base dependency set. They are useful when nonlinear margins or robust epsilon-insensitive losses are preferred over a pure least-squares fit. The forecasting runner treats them as ordinary supervised models: call model(X, y, **params), tune only model-owned hyperparameters through model_selection, and let window decide the train/validation/test dates.

Forecasting-runner example:

pre = macroforecast.preprocessing.preprocess_spec(
    transform="none",
    outliers="none",
    impute="mean",
    standardize="zscore",
    standardize_columns="predictors",
)
features = macroforecast.feature_engineering.feature_spec(
    target="y",
    horizon=1,
    predictors=["x1", "x2"],
    lags=(0, 1),
)
result = macroforecast.forecasting.run(
    panel,
    "svr",
    preprocessing=pre,
    features=features,
    window=macroforecast.window.last_block(validation_size=24),
    model_selection=macroforecast.model_selection.grid({"C": [0.1, 1.0], "epsilon": [0.01, 0.1]}),
)

svr#

macroforecast.models.svr(
    X,
    y,
    *,
    kernel="rbf",
    C=1.0,
    epsilon=0.1,
    gamma="scale",
    degree=3,
    coef0=0.0,
    shrinking=True,
    tol=1e-3,
    cache_size=200.0,
    max_iter=-1,
)

Fits sklearn SVR.

Backend: sklearn.svm.SVR.

kernel="precomputed" is intentionally not supported because macroforecast ModelFit expects X to be a feature matrix with stable column names. Use "linear", "poly", "rbf", or "sigmoid".

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Internal scaling	none
`ModelSpec.requires_scaling`	`True`
Default model-selection method	`random`

Parameter	Default	Tunable	Meaning
`kernel`	`"rbf"`	fixed by preset	Kernel: `"linear"`, `"poly"`, `"rbf"`, or `"sigmoid"`.
`C`	`1.0`	yes	Inverse regularization strength.
`epsilon`	`0.1`	yes	Epsilon-insensitive tube width.
`gamma`	`"scale"`	yes	Kernel coefficient for RBF/poly/sigmoid kernels.
`degree`	`3`	fixed by preset	Polynomial kernel degree.
`coef0`	`0.0`	fixed by preset	Independent term for poly/sigmoid kernels.
`shrinking`	`True`	fixed by preset	Whether to use the shrinking heuristic.
`tol`	`1e-3`	fixed by preset	Optimization tolerance.
`cache_size`	`200.0`	fixed by preset	Kernel cache size in MB.
`max_iter`	`-1`	fixed by preset	Solver iteration cap; `-1` means no cap.

Preset	`C`	`epsilon`	`gamma`
`small`	`(0.1, 1.0)`	`(0.01, 0.1)`	`("scale",)`
`standard`	`(0.1, 1.0, 10.0)`	`(0.01, 0.1, 0.2)`	`("scale", "auto")`
`wide`	`(0.01, 0.1, 1.0, 10.0, 100.0)`	`(0.001, 0.01, 0.1, 0.2)`	`("scale", "auto")`

linear_svr#

macroforecast.models.linear_svr(
    X,
    y,
    *,
    C=1.0,
    epsilon=0.0,
    loss="epsilon_insensitive",
    tol=1e-4,
    max_iter=10000,
    random_state=0,
)

Fits sklearn LinearSVR. Use this when a linear support-vector loss is wanted without kernel overhead.

Backend: sklearn.svm.LinearSVR.

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Internal scaling	none
`ModelSpec.requires_scaling`	`True`
Default model-selection method	`random`

Parameter	Default	Tunable	Meaning
`C`	`1.0`	yes	Inverse regularization strength.
`epsilon`	`0.0`	yes	Epsilon-insensitive tube width.
`loss`	`"epsilon_insensitive"`	fixed by preset	LinearSVR loss function.
`tol`	`1e-4`	fixed by preset	Optimization tolerance.
`max_iter`	`10000`	fixed by preset	Solver iteration cap.
`random_state`	`0`	fixed by preset	Random seed; can be `None`.

Preset	`C`	`epsilon`
`small`	`(0.1, 1.0)`	`(0.0, 0.1)`
`standard`	`(0.01, 0.1, 1.0, 10.0)`	`(0.0, 0.01, 0.1)`
`wide`	`(0.001, 0.01, 0.1, 1.0, 10.0, 100.0)`	`(0.0, 0.001, 0.01, 0.1, 0.2)`

nu_svr#

macroforecast.models.nu_svr(
    X,
    y,
    *,
    kernel="rbf",
    C=1.0,
    nu=0.5,
    gamma="scale",
    degree=3,
    coef0=0.0,
    shrinking=True,
    tol=1e-3,
    cache_size=200.0,
    max_iter=-1,
)

Fits sklearn NuSVR, where nu controls the admissible training-error and support-vector fractions.

Backend: sklearn.svm.NuSVR.

kernel="precomputed" is intentionally not supported for the same feature-matrix contract reason as svr().

Item	Value
Input	`X`, `y`
Output	`ModelFit`
Internal scaling	none
`ModelSpec.requires_scaling`	`True`
Default model-selection method	`random`

Parameter	Default	Tunable	Meaning
`kernel`	`"rbf"`	fixed by preset	Kernel: `"linear"`, `"poly"`, `"rbf"`, or `"sigmoid"`.
`C`	`1.0`	yes	Inverse regularization strength.
`nu`	`0.5`	yes	Upper/lower control for training-error and support-vector fractions.
`gamma`	`"scale"`	yes	Kernel coefficient for RBF/poly/sigmoid kernels.
`degree`	`3`	fixed by preset	Polynomial kernel degree.
`coef0`	`0.0`	fixed by preset	Independent term for poly/sigmoid kernels.
`shrinking`	`True`	fixed by preset	Whether to use the shrinking heuristic.
`tol`	`1e-3`	fixed by preset	Optimization tolerance.
`cache_size`	`200.0`	fixed by preset	Kernel cache size in MB.
`max_iter`	`-1`	fixed by preset	Solver iteration cap; `-1` means no cap.

Preset	`C`	`nu`	`gamma`
`small`	`(0.1, 1.0)`	`(0.25, 0.5)`	`("scale",)`
`standard`	`(0.1, 1.0, 10.0)`	`(0.25, 0.5, 0.75)`	`("scale", "auto")`
`wide`	`(0.01, 0.1, 1.0, 10.0, 100.0)`	`(0.1, 0.25, 0.5, 0.75, 0.9)`	`("scale", "auto")`

Neural Models#

nn, lstm, gru, transformer, hemisphere_nn, and density_hnn are all torch-backed neural-network models and require macroforecast[deep]. nn is the feed-forward neural network for tabular feature matrices; lstm and gru are recurrent neural networks that consume trailing row sequences; transformer is a compact Transformer encoder using the same trailing-row sequence contract. hemisphere_nn is a compact bagged dual-head network for mean and variance forecasts, while density_hnn follows the Aionx/Paper DensityHNN procedure with prior-DNN OOB volatility emphasis and OOB volatility recalibration. The deep extra is intentionally separate from macroforecast[all] because torch is large and platform-sensitive.

Torch recurrent example:

result = macroforecast.forecasting.run(
    panel,
    "lstm",
    features=features,
    window=macroforecast.window.last_block(validation_size=24),
    params={"lstm": {"sequence_length": 4, "hidden_size": 32, "device": "auto"}},
    model_selection={"lstm": None},
)

nn#

macroforecast.models.nn(
    X,
    y,
    *,
    hidden_layer_sizes=(100,),
    activation="relu",
    dropout=0.0,
    learning_rate=0.001,
    max_epochs=100,
    batch_size=32,
    weight_decay=0.0,
    optimizer="adam",
    loss="mse",
    random_state=0,
    device="auto",
)

Fits a torch-backed feed-forward neural-network regressor. The estimator standardizes X and y inside each fit window and maps predictions back to target units. Use feature engineering for lagged, rolling, PCA, or MARX-style inputs before fitting this model.

Forecasting-runner example:

result = macroforecast.forecasting.run(
    panel,
    "nn",
    features=features,
    window=macroforecast.window.last_block(validation_size=24),
    params={"nn": {"max_epochs": 100, "device": "auto"}},
    model_selection=macroforecast.model_selection.grid({
        "hidden_layer_sizes": [(32,), (64,)],
        "weight_decay": [0.0, 0.0001],
    }),
)

Parameter	Default	Tunable	Meaning
`hidden_layer_sizes`	`(100,)`	yes	Feed-forward hidden layer widths.
`activation`	`"relu"`	fixed by preset	Activation: `"identity"`, `"logistic"`, `"sigmoid"`, `"tanh"`, `"relu"`, or `"gelu"`.
`dropout`	`0.0`	yes	Dropout rate between hidden layers.
`learning_rate`	`0.001`	yes	Optimizer learning rate.
`max_epochs`	`100`	fixed by preset	Training epoch cap.
`batch_size`	`32`	fixed by preset	Mini-batch size.
`weight_decay`	`0.0`	yes	L2 weight decay.
`optimizer`	`"adam"`	fixed by preset	Torch optimizer: `"adam"`, `"sgd"`, or `"rmsprop"`.
`loss`	`"mse"`	fixed by preset	Torch loss: `"mse"` or `"huber"`.
`random_state`	`0`	fixed by preset	Random seed.
`device`	`"auto"`	fixed by preset	Torch device: `"auto"`, `"cpu"`, or `"cuda"`.

Preset	`hidden_layer_sizes`	`dropout`	`learning_rate`	`weight_decay`
`small`	`((32,), (64,))`	`(0.0,)`	`(0.001,)`	`(0.0, 0.0001)`
`standard`	`((64,), (100,), (64, 32))`	`(0.0, 0.1)`	`(0.0005, 0.001)`	`(0.0, 0.0001, 0.001)`
`wide`	`((32,), (64,), (100,), (128,), (100, 50), (128, 64))`	`(0.0, 0.1, 0.25)`	`(0.0001, 0.0005, 0.001, 0.005)`	`(0.0, 0.00001, 0.0001, 0.001, 0.01)`

lstm#

macroforecast.models.lstm(
    X,
    y,
    *,
    sequence_length=4,
    hidden_size=32,
    num_layers=1,
    dropout=0.0,
    learning_rate=0.001,
    max_epochs=100,
    batch_size=32,
    random_state=0,
    device="auto",
)

Fits a compact torch-backed LSTM regressor. sequence_length controls how many trailing rows are passed to the recurrent network for each target date. The fitted estimator stores the trailing training rows, so predict(X_test) can create the first test sequences without the caller manually prepending training history. The backend is a regular torch.nn.Module, switches to train() during fitting and eval() during prediction, and uses device to choose CPU or CUDA. The fit diagnostics include sequence_context, recording sequence_length, fit_sample_size, train_tail_rows, and the test_sequence_prefix policy. The prefix is always the last fitted rows only, so the forecasting runner can pass the test feature block directly without leaking future rows.

Parameter	Default	Tunable	Meaning
`sequence_length`	`4`	yes	Trailing rows per recurrent sequence.
`hidden_size`	`32`	yes	Recurrent hidden-state width.
`num_layers`	`1`	fixed by preset	Number of recurrent layers.
`dropout`	`0.0`	fixed by preset	Dropout between recurrent layers.
`learning_rate`	`0.001`	yes	Adam learning rate.
`max_epochs`	`100`	fixed by preset	Training epoch cap.
`batch_size`	`32`	fixed by preset	Mini-batch size.
`random_state`	`0`	fixed by preset	Random seed.
`device`	`"auto"`	fixed by preset	Torch device: `"auto"`, `"cpu"`, or `"cuda"`.

Preset	`sequence_length`	`hidden_size`	`learning_rate`
`small`	`(2, 4)`	`(16, 32)`	`(0.001,)`
`standard`	`(2, 4, 8)`	`(16, 32, 64)`	`(0.0005, 0.001)`
`wide`	`(2, 4, 8, 12)`	`(16, 32, 64, 128)`	`(0.0001, 0.0005, 0.001, 0.005)`

gru#

macroforecast.models.gru(
    X,
    y,
    *,
    sequence_length=4,
    hidden_size=32,
    num_layers=1,
    dropout=0.0,
    learning_rate=0.001,
    max_epochs=100,
    batch_size=32,
    random_state=0,
    device="auto",
)

Fits a compact torch-backed GRU regressor with the same input/output contract as lstm.

transformer#

macroforecast.models.transformer(
    X,
    y,
    *,
    sequence_length=4,
    hidden_size=32,
    num_layers=1,
    dropout=0.0,
    learning_rate=0.001,
    max_epochs=100,
    batch_size=32,
    random_state=0,
    device="auto",
)

Fits a compact torch-backed Transformer encoder regressor. The input/output contract matches lstm and gru: rows are standardized inside each fit window, trailing sequences are built from the fitted sample, and predictions for new rows are mapped back to target units. hidden_size is the Transformer feed-forward width, not the input dimension; the encoder uses d_model = n_features and nhead=1 to keep the public callable small and stable for macro panels.

hemisphere_nn#

macroforecast.models.hemisphere_nn(
    X,
    y,
    *,
    lc=2,
    lm=2,
    lv=2,
    neurons=64,
    dropout=0.2,
    learning_rate=0.001,
    max_epochs=100,
    n_estimators=100,
    subsample=0.8,
    nu=None,
    variance_penalty=1.0,
    patience=15,
    validation_fraction=0.2,
    random_state=0,
    device="auto",
    quantile_levels=(0.05, 0.5, 0.95),
)

Fits a compact Hemisphere neural network inspired by Goulet Coulombe, Frenette, and Klieber’s dual-head density-forecast architecture. The network has a shared common core, a mean head, and a positive variance head. The loss is Gaussian negative log likelihood plus a soft variance-emphasis penalty:

mean((y - h_m(X))^2 / h_v(X) + log h_v(X))
+ variance_penalty * (mean(h_v(X)) - nu * var(y))^2 / var(y)^2

predict() returns the ensemble mean forecast. The fitted estimator also exposes predict_variance(X), predict_distribution(X), and predict_quantiles(X, levels=None). The forecasting runner stores the variance and quantile outputs in variance_prediction and quantile_predictions. The public callable accepts legacy aliases lr, n_epochs, B, sub_rate, lambda_emphasis, and val_frac; normalized metadata records them as learning_rate, max_epochs, n_estimators, subsample, variance_penalty, and validation_fraction.

Parameter	Default	Tunable	Meaning
`lc`	`2`	fixed by preset	Shared common-core depth.
`lm`	`2`	fixed by preset	Mean-head depth after the common core.
`lv`	`2`	fixed by preset	Variance-head depth after the common core.
`neurons`	`64`	yes	Hidden width.
`dropout`	`0.2`	fixed by preset	Dropout rate.
`learning_rate`	`0.001`	yes	Adam learning rate.
`max_epochs`	`100`	fixed by preset	Training epoch cap.
`n_estimators`	`100`	yes	Number of blocked-subsample bags.
`subsample`	`0.8`	fixed by preset	Blocked-subsample fraction.
`nu`	`None`	fixed by preset	Variance-emphasis target ratio; `None` uses `0.5`.
`variance_penalty`	`1.0`	fixed by preset	Soft penalty on the variance-emphasis target.
`patience`	`15`	fixed by preset	Early-stopping patience.
`validation_fraction`	`0.2`	fixed by preset	Chronological validation fraction.
`device`	`"auto"`	fixed by preset	Torch device.
`quantile_levels`	`(0.05, 0.5, 0.95)`	fixed by preset	Default normal-approximation quantile levels returned by `predict_quantiles()`.

density_hnn#

macroforecast.models.density_hnn(
    X,
    y,
    *,
    common_layers=2,
    mean_layers=2,
    volatility_layers=2,
    prior_layers=3,
    neurons=400,
    dropout=0.2,
    learning_rate=0.001,
    max_epochs=100,
    n_estimators=100,
    prior_estimators=50,
    subsample=0.8,
    block_size=8,
    volatility_emphasis=None,
    rescale_volatility=True,
    patience=15,
    random_state=0,
    device="auto",
    quantile_levels=(0.05, 0.5, 0.95),
    volatility_clip=0.05,
)

Fits the Density Hemisphere Neural Network from Goulet Coulombe, Frenette, and Klieber, “From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks” (Journal of Applied Econometrics, 2025). The implementation is a torch-native port of the public Aionx DensityHNN logic, not a TensorFlow dependency wrapper. It is included because the method is a macro density forecast model: macroforecast uses it to produce conditional means, conditional variances, volatility forecasts, and normal-approximation quantiles. It does not create portfolio weights.

The Aionx source-code correspondence is:

Aionx source item	`macroforecast` implementation
`aionx.models.DensityHNN.prior_dnn_architecture`	`prior_estimators` plain DNN ensemble fitted before the HNN.
`aionx.bootstrap.TimeSeriesBlockBootstrap`	`block_size` and `subsample` create time-series block bootstrap samples.
`aionx.kerasnn.ensemble.OutOfBagPredictor`	OOB forecasts use the Aionx denominator formula `sum(oob forecast) / ((1 - subsample) * n_estimators)`.
`aionx.models.DensityHNN.base_architecture`	shared common core plus mean and volatility hemispheres; the volatility head is positive and normalized to the volatility-emphasis value.
`aionx.models.DensityHNN.volatility_rescaling_algorithm`	OOB log squared residuals are regressed on log predicted volatility squared, then all volatility forecasts are rescaled.

The callable consumes the standard supervised model contract density_hnn(X, y, **params). In Aionx, lags and trend terms are created inside DensityHNN.run(...); in macroforecast, lags, MARX/MAF features, PCA, trends, and seasonal/time features should be built explicitly with macroforecast.feature_engineering before calling the model or through the forecasting runner. This keeps the model callable small and lets the same feature construction be reused by other models.

Fit sequence:

Standardize X and y inside the fit window.
Fit a prior DNN ensemble on blocked bootstrap samples.
Compute prior-DNN OOB mean squared error; this becomes the Aionx volatility_emphasis unless the user supplies an override.
Fit a DensityHNN ensemble with shared core, conditional-mean head, and conditional-volatility head.
Compute HNN OOB mean and volatility forecasts.
Recalibrate volatility using the Aionx log residual-square regression.
Return forecasts on the original target scale.

Output:

Method	Output
`predict(X)`	pandas Series of conditional mean forecasts through `ModelFit`.
`predict_variance(X)`	numpy array of conditional variance forecasts in target units squared.
`predict_volatility(X)`	numpy array of conditional standard-deviation forecasts in target units.
`predict_distribution(X)`	`(mean, variance)` arrays in target units.
`predict_quantiles(X, levels=None)`	dictionary from quantile level to normal-approximation quantile forecast.

Diagnostics:

Field	Meaning
`fit.diagnostics["density"]["volatility_emphasis"]`	Aionx volatility-emphasis value used by the HNN volatility head.
`fit.diagnostics["density"]["prior_oob_mse"]`	Prior-DNN OOB mean squared error used when `volatility_emphasis=None`.
`fit.diagnostics["density"]["oob_rescaling"]`	Intercept, slope, scaler, and OOB count for the log residual-square volatility recalibration.
`fit.estimator.oob_prediction_`	Fit-window OOB conditional mean, volatility, and variance table.

Parameter	Default	Tunable	Meaning
`common_layers`	`2`	fixed by preset	Shared common-core depth.
`mean_layers`	`2`	fixed by preset	Conditional-mean hemisphere depth.
`volatility_layers`	`2`	fixed by preset	Conditional-volatility hemisphere depth.
`prior_layers`	`3`	fixed by preset	Prior-DNN hidden depth.
`neurons`	`400`	yes	Hidden width. The paper/Aionx default is 400; smaller values are useful for smoke tests.
`dropout`	`0.2`	fixed by preset	Dropout rate.
`learning_rate`	`0.001`	yes	Adam learning rate.
`max_epochs`	`100`	fixed by preset	Training epoch cap.
`n_estimators`	`100`	yes	DensityHNN bootstrap ensemble size.
`prior_estimators`	`50`	yes	Prior-DNN bootstrap ensemble size used to estimate volatility emphasis.
`subsample`	`0.8`	fixed by preset	Blocked bootstrap sampling rate.
`block_size`	`8`	fixed by preset	Time-series block size. The paper uses blocked subsampling to preserve temporal dependence.
`volatility_emphasis`	`None`	fixed by preset	`None` estimates the value from prior-DNN OOB MSE. Passing a float overrides it. Values outside Aionx’s `[0.01, 1.0]` range are mapped to `0.99`, following the source code.
`rescale_volatility`	`True`	fixed by preset	Apply the blocked-OOB volatility reality-check recalibration.
`patience`	`15`	fixed by preset	Early-stopping patience.
`device`	`"auto"`	fixed by preset	Torch device.
`quantile_levels`	`(0.05, 0.5, 0.95)`	fixed by preset	Default normal-approximation quantile levels.
`volatility_clip`	`0.05`	fixed by preset	Minimum volatility used in Gaussian negative log likelihood, matching Aionx’s numerical-stability clip.

Factor And Time-Series Models#

pls#

macroforecast.models.pls(
    X,
    y,
    *,
    n_components=3,
    scale=True,
    max_iter=500,
    tol=1e-6,
    control_columns=None,
    include_constant=True,
    drop_control_columns=True,
    quadratic_factors=False,
)

Fits partial least squares regression. Unlike unsupervised PCA, PLS uses the target while constructing latent components, so it belongs in models rather than preprocessing or feature_engineering.

n_components is treated as a requested upper bound. At fit time, the model resolves it to min(requested, n_factor_predictors, n_observations) so the default is safe for small feature sets. Metadata records both requested_n_components and resolved_n_components; n_components stores the resolved value used by sklearn.cross_decomposition.PLSRegression.

Implementation map:

Item	Value
Backend	`sklearn.cross_decomposition.PLSRegression`
Paper-code comparison	Hounyo-Li `PLS_emp002.m` and `PLS_tune.m`
Control handling	Optional `control_columns` block is fitted first, target residuals are passed to PLS, and the control forecast is added back.
Difference from MATLAB code	MATLAB `plsregress` exposes `stats.W`; macroforecast uses sklearn PLS latent scores and records factor loadings/metadata. The forecasting contract is the same control-residualized PLS factor regression.
PC2 support	Set `quadratic_factors=True` to add the `PLS_PC2.m` squared-factor forecast head.

Hounyo-Li PLS baseline:

alphawhat = y_insample * wt_insample' * inv(wt_insample * wt_insample')
y_resid   = y_insample - alphawhat * wt_insample
[~,~,~,~,~,~,~,stats] = plsregress(X_insample', y_resid', K)
B         = stats.W
Fhat      = B' * X_insample
alphahat  = y_resid * Fhat' * inv(Fhat * Fhat')
yhat      = (alphahat * B') * X_out + alphawhat * wt_out

macroforecast.models.pls() mirrors this as:

MATLAB step	`macroforecast` implementation
`wt`	`control_columns` plus optional constant
residualize `ytplush` on `wt`	`_PLSCompositeRegressor.control_coef_` and target residual
`plsregress(..., K)`	`sklearn.cross_decomposition.PLSRegression`
`Fhat` PLS scores	`transform(...)` / `x_scores_`
`alphahat` on PLS scores	`factor_coefs_`
add `alphawhat * wt_out`	`predict()` control block addition

Parameter	Default	Tunable	Meaning
`n_components`	`3`	yes	Requested maximum number of latent PLS components.
`scale`	`True`	fixed by preset	Whether to standardize predictors before PLS.
`max_iter`	`500`	fixed by preset	NIPALS iteration cap.
`tol`	`1e-6`	fixed by preset	NIPALS convergence tolerance.
`control_columns`	`None`	fixed by preset	Optional X columns used as forecasting controls.
`include_constant`	`True`	fixed by preset	Whether to include a constant in the control block.
`drop_control_columns`	`True`	fixed by preset	Whether controls are excluded from the PLS block.
`quadratic_factors`	`False`	fixed by preset	Whether to add the Hounyo-Li PC2 squared-factor forecast head.

Preset	`n_components`
`small`	`(1, 2, 3)`
`standard`	`(1, 2, 3, 5, 8)`
`wide`	`(1, 2, 3, 5, 8, 10, 12, 20)`

scaled_pca#

macroforecast.models.scaled_pca(
    X,
    y,
    *,
    n_components=3,
    scale=True,
    control_columns=None,
    include_constant=True,
    drop_control_columns=True,
    winsorize_slopes=None,
    quadratic_factors=False,
)

Fits Huang, Jiang, Li, Tong, and Zhou scaled PCA (sPCA) with a linear forecast head. The factor extraction step follows the original spcaest.m contract: standardize predictors, estimate one marginal predictive slope for each predictor, scale each standardized predictor by that slope, then run PCA on the scaled panel.

Mathematical contract:

Let \(X \in \mathbb{R}^{T \times N}\) be the model-window predictor matrix and \(y \in \mathbb{R}^{T}\) be the target. With scale=True, each predictor is standardized inside the active model window using MATLAB’s default sample standard deviation convention:

\[ X^s_{tj} = \frac{X_{tj}-\bar X_j}{s_j}. \]

For each predictor, estimate the marginal predictive slope from an intercept regression:

\[ \hat\beta_j = \frac{\sum_{t=1}^{T}(X^s_{tj}-\bar X^s_j)(y_t-\bar y)} {\sum_{t=1}^{T}(X^s_{tj}-\bar X^s_j)^2}. \]

Build the scaled panel:

\[ X^{\mathrm{sPCA}}_{tj} = X^s_{tj}\hat\beta_j. \]

Then compute principal components with Huang’s normalization \(\hat F'\hat F/T = I\):

\[ X^{\mathrm{sPCA}}X^{\mathrm{sPCA}\prime} = UDU', \qquad \hat F = \sqrt{T}\,U_{[:,1:K]}. \]

For forecasting, macroforecast regresses the target residual after optional controls on these factors, then projects new scaled observations into the same factor space. This forecast head is the package wrapper; the factor extraction itself matches Huang’s spcaest.m design.

Set quadratic_factors=True to reproduce the Hounyo-Li scaledPCA_PC2.m forecast head:

\[ \hat y_* = w_*\hat a_W + \hat\alpha_1 \hat f_* + \hat\alpha_2 \hat f_*^2. \]

For multiple factors, the squared term is applied componentwise, matching the MATLAB code’s alphahat2 * ((leftvector * scaleXs_outofsample).^2) contract.

Original-code match:

Huang `spcaest.m` step	`macroforecast` implementation
`Xs = standard(X)`	model-window predictor standardization with `ddof=1`
`xvar = [ones(T,1) Xs(:,j)]`	`_marginal_slopes(factor_values, y_values)`
`parm = xvar\target` and `beta(j)=parm(2)`	closed-form marginal slope stored in `scaling_slopes_`
`scaleXs(:,j)=Xs(:,j)*beta(j)`	`scaled_values = factor_values * slopes`
`pc_T(scaleXs,nfac)`	`_huang_scaled_pca_state(...)`
`fhat=Fhat0(:,1:nfac)*sqrt(T)`	`factor_scores_` with `F'F/T=I`

Scaling note: Huang’s spcaest.m standardizes only X before estimating the marginal slopes, while the target stays in its raw units. Therefore scaling_slopes_ in scaled_pca is in target-scale units. This differs from the Hounyo-Li macro SsPCA code below, where both X and the target are standardized before the slope step. The two slope vectors can differ by the target scale, but that is a global scalar difference for the factor/forecast structure; the practical scaling logic is the same once predictions are mapped back to the target units.

Parameter	Default	Tunable	Meaning
`n_components`	`3`	yes	Number of Huang scaled-PCA factors.
`scale`	`True`	fixed by preset	Whether to standardize predictors inside the model.
`control_columns`	`None`	fixed by preset	Optional X columns used as forecasting controls.
`include_constant`	`True`	fixed by preset	Whether to include a constant in the control block.
`drop_control_columns`	`True`	fixed by preset	Whether controls are excluded from the PCA block.
`winsorize_slopes`	`None`	fixed by preset	Optional percentile winsorization for scaling slopes.
`quadratic_factors`	`False`	fixed by preset	Whether to add the Hounyo-Li PC2 squared-factor forecast head.

The presets tune n_components. Inspect exact candidate lists with describe_model("scaled_pca").

supervised_pca#

macroforecast.models.supervised_pca(
    X,
    y,
    *,
    n_components=3,
    n_selected=50,
    min_abs_corr=0.0,
    scale=True,
    control_columns=None,
    include_constant=True,
    drop_control_columns=True,
    preselect="none",
    t_threshold=1.28,
    elastic_net_alpha=0.0002,
    elastic_net_l1_ratio=0.5,
    quadratic_factors=False,
    random_state=0,
)

Fits original-style supervised PCA (SPCA). The implementation follows the MATLAB reproducibility code structure from Hounyo and Li’s IJF weak-factor package: residualize the target on optional controls, rank the current predictors by absolute correlation with the current target residual, select n_selected predictors, extract one SVD factor, project both target and predictor residuals, then repeat for n_components.

This is different from feature_engineering.pca_features(), which is unsupervised and belongs in macroforecast.feature_engineering. Use this model when the target is intentionally allowed to guide component construction inside each model fit window.

Mathematical contract:

Let \(X \in \mathbb{R}^{T \times N}\) be the model-window predictor matrix, \(y \in \mathbb{R}^{T}\) be the target, and \(W \in \mathbb{R}^{T \times c}\) be the optional control block. With scale=True, each predictor and the target are standardized inside the active model window using the same sample standard deviation convention as MATLAB std(...,0,dim).

First residualize the target on controls. The paper code writes this with an ordinary inverse; macroforecast uses the Moore-Penrose inverse for numerical stability when the control block is singular or nearly singular.

\[ \hat a_W = W^{+}y, \qquad r_y^{(0)} = y - W\hat a_W, \qquad R_X^{(0)} = X. \]

For component \(k = 1,\ldots,K\), compute residual correlations, select a subset \(I_k\), extract one SVD loading, and project:

\[ c_j^{(k)} = \left|\operatorname{corr}\left(R_{X,j}^{(k-1)}, r_y^{(k-1)}\right)\right|, \qquad I_k = \operatorname{top}_{q}\{c_j^{(k)}\}_{j=1}^{N}. \]

\[ u_k = \operatorname{first\ left\ singular\ vector} \left(R_{X,I_k}^{(k-1)\prime}\right), \qquad \ell_{k,j}=0\ \text{for}\ j \notin I_k. \]

\[ f_k = R_X^{(k-1)}\ell_k,\qquad \hat\alpha_k = \frac{r_y^{(k-1)\prime}f_k}{f_k'f_k},\qquad \hat\lambda_k = \frac{R_X^{(k-1)\prime}f_k}{f_k'f_k}. \]

\[ r_y^{(k)} = r_y^{(k-1)}-\hat\alpha_k f_k,\qquad R_X^{(k)} = R_X^{(k-1)}-f_k\hat\lambda_k'. \]

Prediction for a new row \(x_*\) and controls \(w_*\) is:

\[ \hat y_* = w_*\hat a_W + x_*\left(\sum_{k=1}^{K}\hat\alpha_k\ell_k'\right). \]

Set quadratic_factors=True for the Hounyo-Li SPCA_PC2.m variant. In that case each extraction step also estimates

\[ \hat\alpha_{2,k} = \frac{r_y^{(k-1)\prime}(f_k^2)}{(f_k^2)'(f_k^2)} \]

and updates the target residual as

\[ r_y^{(k)} = r_y^{(k-1)} -\hat\alpha_{1,k}f_k -\hat\alpha_{2,k}f_k^2. \]

Original-code match:

MATLAB variable / step	`macroforecast` implementation
`alphawhat = yt_insamplewt_insample'inv(wt_insample*wt_insample')`	`_least_squares_coef(control_values, y_values)`
`COR = abs(corr(xt0', ytplush0'))`	`_absolute_correlations(work_x, work_y)`
`idx_sorted(1:N1)`	`_selected_indices(..., n_selected=q)`
`[U,~,~] = svds(xt0(II,:),1)`	`np.linalg.svd(work_x[:, selected], ...)` and first right singular vector
`Fhat = leftvector * xt0`	`factor = work_x @ loading`
`alphahat = ytplush0Fhat' / (FhatFhat')`	`alpha = work_y @ factor / (factor @ factor)`
`lambdahat = xt0Fhat' / (FhatFhat')`	`lambdas = work_x.T @ factor / (factor @ factor)`
`ytplush0 = ytplush0 - alphahat*Fhat`	`work_y = work_y - alpha * factor`
`xt0 = xt0 - lambdahat * Fhat`	`work_x = work_x - np.outer(factor, lambdas)`

Verification status: unit tests include a compact MATLAB-style reference recursion for both SPCA and SsPCA and compare generated predictions against models.supervised_pca() and models.supervised_scaled_pca().

Parameter	Default	Tunable	Meaning
`n_components`	`3`	yes	Number of sequential supervised components.
`n_selected`	`50`	yes	Predictors selected at each SPCA step.
`min_abs_corr`	`0.0`	yes	Minimum absolute residual correlation retained before PCA.
`scale`	`True`	fixed by preset	Whether to standardize predictors and target inside the model.
`control_columns`	`None`	fixed by preset	Optional X columns used as forecasting controls.
`include_constant`	`True`	fixed by preset	Whether to include a constant in the control block.
`drop_control_columns`	`True`	fixed by preset	Whether controls are excluded from the PCA block.
`preselect`	`"none"`	fixed by preset	Optional pre-selection: `"none"`, `"hard_tstat"`, or `"elastic_net"`.
`t_threshold`	`1.28`	fixed by preset	Hard t-stat pre-selection threshold.
`elastic_net_alpha`	`0.0002`	fixed by preset	Elastic-net pre-selection penalty.
`elastic_net_l1_ratio`	`0.5`	fixed by preset	Elastic-net pre-selection L1 ratio.
`quadratic_factors`	`False`	fixed by preset	Whether to add the Hounyo-Li PC2 squared-factor forecast head.
`random_state`	`0`	fixed by preset	Elastic-net pre-selection random seed.

The presets tune n_components, n_selected, and min_abs_corr. Inspect exact candidate lists with describe_model("supervised_pca").

supervised_scaled_pca#

macroforecast.models.supervised_scaled_pca(
    X,
    y,
    *,
    n_components=3,
    n_selected=50,
    min_abs_corr=0.0,
    scale=True,
    control_columns=None,
    include_constant=True,
    drop_control_columns=True,
    preselect="none",
    t_threshold=1.28,
    elastic_net_alpha=0.0002,
    elastic_net_l1_ratio=0.5,
    quadratic_factors=False,
    random_state=0,
)

Fits Hounyo-Li supervised scaled PCA (SsPCA). This adds the paper’s predictive-slope scaling step before the SPCA loop: each standardized predictor is first multiplied by its marginal predictive slope for the target. The scaled panel is then passed through the same iterative supervised selection, SVD factor extraction, and projection loop as supervised_pca.

Mathematical contract:

After the within-window standardization used above, estimate one marginal predictive slope per predictor:

\[ \hat\gamma_j = \frac{\sum_{t=1}^{T}(x_{tj}-\bar x_j)(y_t-\bar y)} {\sum_{t=1}^{T}(x_{tj}-\bar x_j)^2}. \]

Build the supervised-scaled panel

\[ X^{\mathrm{scaled}}_j = \hat\gamma_j X_j, \qquad j=1,\ldots,N. \]

Then run the same SPCA recursion as supervised_pca with \(R_X^{(0)} = X^{\mathrm{scaled}}\). The forecast is therefore

\[ \hat y_* = w_*\hat a_W + x_*^{\mathrm{scaled}} \left(\sum_{k=1}^{K}\hat\alpha_k\ell_k'\right). \]

This corresponds to Hounyo-Li SsPCA as implemented in the local MATLAB package: scaledPCA_emp002.m supplies the predictive-slope scaling idea, SPCA_emp002.m supplies the supervised selection/projection recursion, and SsPCA_emp002.m combines the two by applying SPCA to scaleXs.

Source checked: the local MATLAB reproducibility package for Hounyo and Li, SsPCA_emp002.m, SsPCA_tune.m, SPCA_emp002.m, scaledPCA_emp002.m, and inflation_linear_tune.m. The Python implementation is a clean port of the algorithmic contract, not copied MATLAB code.

Set quadratic_factors=True for the SsPCA_PC2.m variant. This keeps the same predictive-slope scaling and supervised selection recursion, then adds the componentwise squared-factor forecast head used in the paper’s PC2 scripts.

Original-code match for the scaling step:

MATLAB variable / step	`macroforecast` implementation
`xvar = [ones(1,T); xt_standardized(j,:)]`	`_marginal_slopes(factor_values, y_values)` uses an intercept-equivalent centered OLS slope
`parm = ytplushxvar'inv(xvar*xvar')`	closed-form marginal slope
`beta_scaled(j) = parm(2)`	`scaling_slopes_`
`scaleXs(j,:) = xt_standardized(j,:) * beta_scaled(j)`	`factor_values = factor_values * slopes`
`SsPCA_emp002(scaleXs, ytplush, wt, Khat, number)`	`SupervisedScaledPCARegressor` then calls the same SPCA extraction path

Target scaling note: the Hounyo-Li macro code standardizes the target and predictors before computing beta_scaled. Huang’s spcaest.m standardizes only predictors and keeps the target raw. Consequently, supervised_scaled_pca stores standardized-target slopes when scale=True, while scaled_pca stores raw-target slopes. These stored slope magnitudes are not directly comparable without the target standard deviation. For factor construction and forecast generation, however, the difference is a global target-scale multiplier rather than a different screening or projection rule.

Parameter	Default	Tunable	Meaning
`n_components`	`3`	yes	Number of sequential SsPCA components.
`n_selected`	`50`	yes	Predictors selected at each SPCA step after slope scaling.
`min_abs_corr`	`0.0`	yes	Minimum absolute residual correlation retained before PCA.
`scale`	`True`	fixed by preset	Whether to standardize predictors and target inside the model.
`control_columns`	`None`	fixed by preset	Optional X columns used as forecasting controls.
`include_constant`	`True`	fixed by preset	Whether to include a constant in the control block.
`drop_control_columns`	`True`	fixed by preset	Whether controls are excluded from the PCA block.
`preselect`	`"none"`	fixed by preset	Optional pre-selection: `"none"`, `"hard_tstat"`, or `"elastic_net"`.
`t_threshold`	`1.28`	fixed by preset	Hard t-stat pre-selection threshold.
`elastic_net_alpha`	`0.0002`	fixed by preset	Elastic-net pre-selection penalty.
`elastic_net_l1_ratio`	`0.5`	fixed by preset	Elastic-net pre-selection L1 ratio.
`quadratic_factors`	`False`	fixed by preset	Whether to add the Hounyo-Li PC2 squared-factor forecast head.
`random_state`	`0`	fixed by preset	Elastic-net pre-selection random seed.

The original empirical MATLAB code uses lagged target plus constant controls. In macroforecast, pass the lagged target as an X column and list it in control_columns when that exact control block is needed.

ar#

macroforecast.models.ar(y, *, n_lag=1)

Fits a univariate autoregression on the target series.

Parameter	Default	Tunable	Meaning
`n_lag`	`1`	yes	Autoregressive lag order.

Preset	`n_lag`
`small`	`(1, 2, 4)`
`standard`	`(1, 2, 4, 6, 12)`
`wide`	`(1, 2, 3, 4, 6, 9, 12, 18, 24)`

stlf#

macroforecast.models.stlf(y, *, period=None, sa_method="ets")

STL decomposition forecaster (R forecast::stlf). Seasonally adjusts the target with STL, forecasts the seasonally-adjusted series (additive-trend exponential smoothing, random-walk-drift fallback), and adds back the last seasonal cycle.

Parameter	Default	Tunable	Meaning
`period`	`None`	no	Seasonal period; inferred from the index frequency if omitted.
`sa_method`	`"ets"`	no	Forecaster for the seasonally-adjusted series.

naive#

macroforecast.models.naive(y)

Random-walk baseline (R forecast::naive). Carries the last observed target value forward, so the h-step path is constant at y_T. Target-only.

seasonal_naive#

macroforecast.models.seasonal_naive(y, *, period=None)

Seasonal-naive baseline (R forecast::snaive). Repeats the last full seasonal cycle of length period, so step k returns the value from one season earlier.

Parameter	Default	Tunable	Meaning
`period`	`None`	no	Seasonal period `m`; defaults to 1 (plain naive).

random_walk_drift#

macroforecast.models.random_walk_drift(y)

Random-walk-with-drift baseline (R forecast::rwf(drift=TRUE)). Extrapolates the last value by the average historical change: y_T + h * (y_T - y_1) / (T - 1).

var#

macroforecast.models.var(panel, *, target=None, n_lag=1, type="const", season=None)

Fits a VAR on a multivariate panel. target chooses the forecast output column. If omitted, the first column is used. The callable now uses an internal OLS implementation aligned with R vars::VAR and predict.varest: lagged endogenous variables are stacked in lag order, deterministic terms are controlled by R-style type, and predict() recursively rolls the VAR state forward for point forecasts.

Parameter	Default	Tunable	Meaning
`target`	`None`	fixed by preset	Target column in the panel.
`n_lag`	`1`	yes	VAR lag order.
`type`	`"const"`	fixed by preset	R `vars::VAR` deterministic terms: `"const"`, `"trend"`, `"both"`, or `"none"`. Short aliases `"c"`, `"t"`, `"ct"`, and `"n"` are accepted.
`season`	`None`	fixed by preset	Optional centered seasonal dummies, matching `vars::VAR(season=...)`.

Preset	`n_lag`
`small`	`(1, 2, 4)`
`standard`	`(1, 2, 4, 6, 12)`
`wide`	`(1, 2, 3, 4, 6, 9, 12, 18, 24)`

bvar_minnesota#

macroforecast.models.bvar_minnesota(
    panel,
    *,
    target=None,
    n_lag=1,
    kappa0=2.0,
    kappa1=0.5,
    nu0=0.0,
    s0=0.0,
    iter=10000,
    burnin=5000,
    random_state=0,
)

Fits a Bayesian VAR posterior sampler with the Minnesota prior variance logic used by R FAVAR::BVAR and bvartools::minnesota_prior. Saved posterior coefficient and covariance draws are available in diagnostics; predict() uses posterior-mean VAR coefficients for recursive point forecasts. BVAR forecasting is not a macroforecast-only extension: CRAN BVAR exposes predict.bvar, while this callable is macroforecast’s R-aligned ModelFit surface for the same class of BVAR forecast object.

Parameter	Default	Tunable	Meaning
`target`	`None`	fixed by preset	Target column in the panel.
`n_lag`	`1`	yes	VAR lag order.
`kappa0`	`2.0`	yes	Minnesota own-lag prior scale.
`kappa1`	`0.5`	yes	Minnesota lag-decay exponent.
`nu0`	`0.0`	fixed by preset	Inverse-Wishart degrees-of-freedom prior parameter.
`s0`	`0.0`	fixed by preset	Inverse-Wishart scale prior parameter.
`iter`	`10000`	fixed by preset	Total Gibbs iterations.
`burnin`	`5000`	fixed by preset	Burn-in iterations removed before summaries.
`random_state`	`0`	fixed by preset	Random seed for posterior draws.

bvar_normal_inverse_wishart#

macroforecast.models.bvar_normal_inverse_wishart(
    panel,
    *,
    target=None,
    n_lag=1,
    b0=0.0,
    vb0=0.0,
    nu0=0.0,
    s0=0.0,
    iter=10000,
    burnin=5000,
    random_state=0,
)

Fits the same FAVAR-style Bayesian VAR posterior sampler with direct controls for coefficient prior mean/variance and inverse-Wishart covariance prior terms. Saved diagnostics include coefficient posterior mean, standard deviation, credible interval bounds, and posterior mean covariance.

ets#

macroforecast.models.ets(
    y,
    *,
    error="add",
    trend=None,
    seasonal=None,
    seasonal_periods=None,
    damped_trend=False,
)

Fits a target-only statsmodels exponential-smoothing model through ETS-style arguments. In forecasting.run(...), this model ignores X and fits on the stage target vector.

Parameter	Default	Tunable	Meaning
`error`	`"add"`	fixed by preset	Error component; currently additive by default.
`trend`	`None`	fixed by preset	Optional trend component such as `"add"`.
`seasonal`	`None`	fixed by preset	Optional seasonal component such as `"add"`.
`seasonal_periods`	`None`	fixed by preset	Seasonal period length.
`damped_trend`	`False`	fixed by preset	Whether to damp the trend component.

Output is ModelFit; predict(X_future) uses only the number of requested future rows and preserves the provided index.

holt_winters#

macroforecast.models.holt_winters(
    y,
    *,
    trend="add",
    seasonal=None,
    seasonal_periods=None,
    damped_trend=False,
)

Fits a target-only Holt-Winters exponential-smoothing model. In the forecasting runner it is a target-input model: feature matrices are used only to provide the forecast index and horizon length.

Parameter	Default	Tunable	Meaning
`trend`	`"add"`	fixed by preset	Trend component.
`seasonal`	`None`	fixed by preset	Optional seasonal component.
`seasonal_periods`	`None`	fixed by preset	Seasonal period length.
`damped_trend`	`False`	fixed by preset	Whether to damp the trend component.

Output is ModelFit; predictions are indexed like the supplied future frame.

theta_method#

macroforecast.models.theta_method(
    y,
    *,
    period=None,
    deseasonalize=True,
    use_test=True,
)

Fits statsmodels’ target-only Theta method wrapper. Use it as a benchmark univariate model; it does not consume predictor columns.

Parameter	Default	Tunable	Meaning
`period`	`None`	fixed by preset	Seasonal period passed to statsmodels.
`deseasonalize`	`True`	fixed by preset	Whether statsmodels deseasonalizes before fitting.
`use_test`	`True`	fixed by preset	Whether statsmodels uses its internal seasonality test.

Output is ModelFit; predict(X_future) returns a point forecast series.

dfm_mixed_mariano_murasawa#

macroforecast.models.dfm_mixed_mariano_murasawa(
    panel,
    *,
    target=None,
    metadata=None,
    monthly_columns=None,
    quarterly_columns=None,
    unsupported="raise",
    n_factors=1,
    factor_order=1,
    idiosyncratic_ar1=True,
    standardize=True,
    maxiter=500,
    tolerance=1e-6,
)

Fits a monthly/quarterly dynamic factor model through statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ. The callable uses the Mariano-Murasawa state-space aggregation for quarterly variables by ordering monthly columns first, quarterly columns second, and passing k_endog_monthly to statsmodels.

R comparison: this is a backend-wrapper analogue of the mixed-frequency DFM contract used by dfms::DFM(X, quarterly.vars=...) and archived nowcasting::nowcast(method="EM"). Those R implementations require quarterly series to be positioned after monthly series and impose the Mariano-Murasawa [1, 2, 3, 2, 1] temporal aggregation restriction for quarterly growth/flow variables. macroforecast delegates the Kalman/EM likelihood to statsmodels rather than reimplementing the R/C++ filter code.

The preferred input is a native mixed-frequency bundle:

mixed = mf.data.combine(monthly_bundle, quarterly_bundle, frequency="native")
fit = mf.models.dfm_mixed_mariano_murasawa(mixed, target="GDPC1")

metadata["native_frequency_by_column"] is used to split monthly and quarterly columns. If metadata are absent, the function infers frequencies from observed date spacing. Explicit monthly_columns and quarterly_columns override metadata. Unsupported frequencies raise by default; set unsupported="drop" to drop those columns before fitting.

Parameter	Default	Tunable	Meaning
`target`	`None`	fixed by preset	Forecasted panel column. Defaults to first quarterly column, otherwise first column.
`metadata`	`None`	fixed by preset	Metadata with `native_frequency_by_column`; normally supplied by `DataBundle`.
`monthly_columns`	`None`	fixed by preset	Explicit monthly columns.
`quarterly_columns`	`None`	fixed by preset	Explicit quarterly columns.
`unsupported`	`"raise"`	fixed by preset	Unsupported frequency policy: `"raise"` or `"drop"`.
`n_factors`	`1`	yes	Number of dynamic factors.
`factor_order`	`1`	yes	VAR order for factor dynamics.
`idiosyncratic_ar1`	`True`	fixed by preset	Model idiosyncratic components as AR(1).
`standardize`	`True`	fixed by preset	Let statsmodels standardize observed variables before fitting.
`maxiter`	`500`	fixed by preset	EM iteration cap.
`tolerance`	`1e-6`	fixed by preset	EM convergence tolerance.

Preset	`n_factors`	`factor_order`
`small`	`(1,)`	`(1,)`
`standard`	`(1, 2)`	`(1, 2)`
`wide`	`(1, 2, 3)`	`(1, 2, 3)`

Diagnostics include filtered factors, fitted target values when available, target residuals, likelihood, and fitted parameter estimates.

dfm_unrestricted_midas#

macroforecast.models.dfm_unrestricted_midas(
    panel,
    *,
    target,
    metadata=None,
    lag_columns=None,
    lags=(0, 1, 2),
    factor_lags=(0,),
    target_frequency="quarterly",
    anchor_position="period_end",
    n_factors=1,
    factor_order=1,
    idiosyncratic_ar1=True,
    standardize=True,
    maxiter=500,
    tolerance=1e-6,
    alpha=0.0,
    fit_intercept=True,
    drop_missing=True,
)

Fits a composite mixed-frequency model:

Fit dfm_mixed_mariano_murasawa(...) on the native mixed-frequency panel.
Extract filtered DFM factors at the target anchor dates.
Add optional observed lag blocks from mixed_frequency_lags(...).
Fit unrestricted_midas(...) as the forecast head.

This is a convenience composite, not a new state-space likelihood. The returned fit’s predict() method accepts a prepared feature matrix with the same columns as fit.estimator.design_. The lower-level fit.estimator.predict_from_panel(...) method rebuilds the composite design from a native mixed-frequency panel. forecasting.run(...) uses that method: it fits the MIDAS head on the training panel, masks the test target values, then refits the DFM on the available native panel so current monthly information can enter the test-origin factor design without using the held-out target.

R comparison: this is the explicit callable version of a two-stage workflow, not a single R estimator. The first stage follows the DFM contract above. The forecast head is aligned with midasr::midas_u when alpha=0; alpha>0 is a macroforecast ridge extension.

Parameter	Default	Tunable	Meaning
`target`	required	fixed by preset	Forecasted target column.
`metadata`	`None`	fixed by preset	Metadata with native frequencies; normally supplied by `DataBundle`.
`lag_columns`	`None`	fixed by preset	Observed columns added as unrestricted MIDAS lags.
`lags`	`(0, 1, 2)`	yes	Native-frequency lags for observed columns.
`factor_lags`	`(0,)`	yes	Monthly lags of filtered DFM factors.
`target_frequency`	`"quarterly"`	fixed by preset	Frequency used to position target anchor dates.
`anchor_position`	`"period_end"`	fixed by preset	Anchor positioning; useful for FRED-QD quarter-start dates.
`n_factors`	`1`	yes	Number of DynamicFactorMQ factors.
`factor_order`	`1`	yes	VAR order for factor dynamics.
`idiosyncratic_ar1`	`True`	fixed by preset	Model DFM idiosyncratic components as AR(1).
`standardize`	`True`	fixed by preset	Let DynamicFactorMQ standardize observed variables.
`maxiter`	`500`	fixed by preset	DFM EM iteration cap.
`tolerance`	`1e-6`	fixed by preset	DFM EM convergence tolerance.
`alpha`	`0.0`	yes	Ridge penalty on the unrestricted MIDAS head.
`fit_intercept`	`True`	fixed by preset	Whether the unrestricted MIDAS head includes an intercept.
`drop_missing`	`True`	fixed by preset	Drop incomplete composite design rows before fitting the head.

midas_almon#

macroforecast.models.midas_almon(
    X,
    y,
    *,
    polynomial_order=2,
    theta=None,
    alpha=0.0,
    fit_intercept=True,
)

Fits a MIDAS regression where each lag group is compressed with normalized exponential Almon weights before a linear or ridge head is fit.

R comparison: midasr::midas_r(..., nealmon) jointly estimates the aggregate scale and Almon shape by nonlinear least squares. macroforecast keeps the shape fixed as a hyperparameter and estimates only the aggregate regression coefficient in a linear/ridge head. The weight shape matches the scale-free part of midasr::nealmon.

Parameter	Default	Tunable	Meaning
`polynomial_order`	`2`	yes	Degree of the Almon polynomial.
`theta`	`None`	fixed by preset	Shape coefficients for the scale-free part of `midasr::nealmon`. `None` gives equal weights.
`alpha`	`0.0`	yes	Ridge penalty on the regression head.
`fit_intercept`	`True`	fixed by preset	Whether the regression head includes an intercept.

Weight formula:

h_j = j, j = 1, ..., d
w_j = exp(theta_1 h_j + ... + theta_p h_j^p) / sum_j exp(...)

If theta is supplied, it must contain polynomial_order values. The aggregate coefficient scale is estimated by the regression head, corresponding to the first scale parameter in midasr::nealmon.

midas_beta#

macroforecast.models.midas_beta(
    X,
    y,
    *,
    beta_params=(1.0, 1.0),
    alpha=0.0,
    fit_intercept=True,
)

Fits a MIDAS regression where each lag group is compressed with normalized beta weights before a linear or ridge head is fit.

R comparison: this uses the scale-free form of midasr::nbetaMT with p=(1, a, b, 0): endpoints are shifted by machine epsilon, the beta density is normalized, and the aggregate scale is estimated by the regression head.

Parameter	Default	Tunable	Meaning
`beta_params`	`(1.0, 1.0)`	yes	Positive beta-shape parameters `(a, b)`.
`alpha`	`0.0`	yes	Ridge penalty on the regression head.
`fit_intercept`	`True`	fixed by preset	Whether the regression head includes an intercept.

Weight formula:

z_j = (j - 1) / (d - 1), with endpoint epsilon adjustment
w_j = z_j^(a-1) (1-z_j)^(b-1) / sum_j z_j^(a-1) (1-z_j)^(b-1)

Both beta parameters must be strictly positive.

midas_step#

macroforecast.models.midas_step(
    X,
    y,
    *,
    n_steps=3,
    step_bounds=None,
    step_weights=None,
    alpha=0.0,
    fit_intercept=True,
)

Fits a MIDAS regression where lags are grouped into piecewise-constant step blocks. If step_bounds and step_weights are omitted, the lag range is split into n_steps blocks with equal raw step heights, then normalized to a scale-free weight vector.

R comparison: midasr::polystep(p, d, m, a) repeats raw step coefficients between interior cut points. macroforecast exposes the same idea through step_bounds=a and step_weights=p, then normalizes the resulting shape because the aggregate scale is estimated by the regression head.

Parameter	Default	Tunable	Meaning
`n_steps`	`3`	yes	Number of lag buckets when `step_bounds` is omitted.
`step_bounds`	`None`	fixed by preset	Optional interior cut points, matching `midasr::polystep(..., a=...)`.
`step_weights`	`None`	fixed by preset	Optional raw step heights, one per bucket.
`alpha`	`0.0`	yes	Ridge penalty on the regression head.
`fit_intercept`	`True`	fixed by preset	Whether the regression head includes an intercept.

n_steps must be positive. If supplied, step_bounds must be strictly increasing and smaller than the number of lag columns; step_weights must contain one value per resulting bucket.

restricted_midas#

macroforecast.models.restricted_midas(
    X,
    y,
    *,
    weighting="almon",
    polynomial_order=2,
    start_params=None,
    n_steps=3,
    step_bounds=None,
    fit_intercept=True,
    maxiter=1000,
    tolerance=1e-8,
)

Fits a nonlinear restricted MIDAS regression over an explicit lag matrix. This is the direct callable counterpart to midasr::midas_r when the formula has already been expanded into columns such as PAYEMS_lag0, PAYEMS_lag1, and PAYEMS_lag2.

R comparison: midasr::midas_r maps each low-dimensional restriction parameter vector into full lag coefficients and minimizes the nonlinear least-squares objective. restricted_midas() uses the same objective and the same nealmon, nbetaMT, or polystep coefficient maps. It uses SciPy least_squares instead of R’s default optim(method="BFGS"), so optimizer traces are not bit-identical, but the restricted regression equation is the same. Formula parsing, AR* common-factor terms, HAC covariance, model tables, and S3 forecast utilities are not reproduced here.

Parameter	Default	Tunable	Meaning
`weighting`	`"almon"`	yes	Restriction map: `"almon"`/`"nealmon"`, `"beta"`/`"nbetaMT"`, or `"step"`/`"polystep"`.
`polynomial_order`	`2`	yes	Number of Almon shape terms after the aggregate scale parameter.
`start_params`	`None`	fixed by preset	Starting values. Pass one sequence for all lag groups, or a mapping from group name to sequence.
`n_steps`	`3`	yes	Number of step buckets when `weighting="step"` and `step_bounds` is omitted.
`step_bounds`	`None`	fixed by preset	Interior cut points for `polystep`-style step coefficients.
`fit_intercept`	`True`	fixed by preset	Whether to estimate an intercept outside the restricted lag coefficients.
`maxiter`	`1000`	fixed by preset	Maximum SciPy least-squares function evaluations.
`tolerance`	`1e-8`	fixed by preset	Shared `xtol`, `ftol`, and `gtol` stopping tolerance.

Outputs include fitted values, residuals, unrestricted effective lag coefficients, the optimized restricted parameter vector, convergence metadata, and the lag-group metadata used to expand coefficients.

unrestricted_midas#

macroforecast.models.unrestricted_midas(
    X,
    y,
    *,
    alpha=0.0,
    fit_intercept=True,
)

Fits an unrestricted MIDAS regression. Unlike the weighted variants, it does not collapse lag groups; each supplied lag column receives its own coefficient.

R comparison: this matches midasr::midas_u when alpha=0, because every lag coefficient is free and the regression is ordinary least squares. alpha>0 is a macroforecast ridge extension for high-dimensional lag matrices.

Parameter	Default	Tunable	Meaning
`alpha`	`0.0`	yes	Ridge penalty. `0.0` gives an ordinary linear head.
`fit_intercept`	`True`	fixed by preset	Whether the regression head includes an intercept.

MIDAS Input Contract#

The MIDAS callables expect lag-grouped predictor columns, typically names like PAYEMS_lag0, PAYEMS_lag1, and PAYEMS_lag2. Columns sharing the same prefix before _lag# are collapsed into one weighted aggregate before a linear or ridge regression is fit. This keeps mixed-frequency weighting as a model choice while leaving calendar alignment and lag construction in data, preprocessing, and feature_engineering.

These callables are small model functions, not workflow recipes. They do not infer target anchors, release calendars, or future design matrices. Build the lag matrix explicitly with mixed_frequency_lags(...), align X and y, then call the model.

The weighted MIDAS callables midas_almon(), midas_beta(), and midas_step() treat shape parameters as fixed or selected hyperparameters. Use restricted_midas() when the shape and scale parameters should be estimated jointly by nonlinear least squares, matching the midasr::midas_r estimation target.

For unrestricted_midas(), build its input matrix with mf.feature_engineering.mixed_frequency_lags(...) when the source data are native mixed-frequency panels.

All MIDAS callables preserve lag metadata. Weighted variants record lag groups, resolved weights, weighted aggregate column names, aggregate coefficients, and effective lag coefficients. unrestricted_midas() records the original lag groups and per-lag coefficients.

X_midas = mf.feature_engineering.mixed_frequency_lags(
    mixed,
    target="GDPC1",
    columns=["PAYEMS", "INDPRO"],
    lags=range(0, 12),
    target_frequency="quarterly",
    anchor_position="period_end",
    drop_missing=True,
)

y = mixed.panel["GDPC1"].dropna()
y.index = y.index.to_period("Q").asfreq("M", how="end").to_timestamp()
aligned = X_midas.join(y.rename("GDPC1")).dropna()

fit = mf.models.midas_beta(
    aligned.drop(columns="GDPC1"),
    aligned["GDPC1"],
    beta_params=(1.0, 2.0),
    alpha=0.1,
)

fit.metadata["weights"]
fit.diagnostics["effective_lag_coefficients"]

far#

macroforecast.models.far(
    X,
    y,
    *,
    n_factors=3,
    n_lag=1,
    random_state=0,
)

Fits factor-augmented autoregression: PCA factors from X plus AR lags of y.

Parameter	Default	Tunable	Meaning
`n_factors`	`3`	yes	Number of PCA factors.
`n_lag`	`1`	yes	Autoregressive lag order.
`random_state`	`0`	fixed by preset	PCA random seed.

Preset	`n_factors`	`n_lag`
`small`	`(1, 2, 3)`	`(1, 2, 4)`
`standard`	`(1, 2, 3, 5, 8)`	`(1, 2, 4, 6, 12)`
`wide`	`(1, 2, 3, 5, 8, 10, 12)`	`(1, 2, 3, 4, 6, 9, 12, 18, 24)`

favar#

macroforecast.models.favar(
    X,
    y,
    *,
    n_factors=2,
    n_lag=2,
    fctmethod="BBE",
    slowcode=None,
    factorprior=None,
    varprior=None,
    nburn=5000,
    nrep=15000,
    standardize=True,
    random_state=0,
)

Fits a Bayesian FAVAR aligned with CRAN FAVAR::FAVAR: optional R-style standardization, ExtrPC() factor extraction, BBE facrot() or BGM factor identification, conjugate loading-equation draws, and the internal FAVAR::BVAR posterior sampler for the [factors, y] VAR block.

Important boundary: BVAR forecasting is standard, and CRAN BVAR has predict.bvar. The macroforecast-specific extension here is narrower: CRAN FAVAR exposes summaries, coefficients, and impulse responses for favar objects, but not predict.favar. Therefore macroforecast.models.favar(...).predict(...) is a ModelFit forecast wrapper over the fitted FAVAR posterior VAR state using posterior-mean coefficients.

Parameter	Default	Tunable	Meaning
`n_factors`	`2`	yes	Number of latent factors.
`n_lag`	`2`	yes	VAR lag order on the target plus factors.
`fctmethod`	`"BBE"`	fixed by preset	Factor identification method: `"BBE"` or `"BGM"`.
`slowcode`	`None`	fixed by preset	Boolean slow-variable mask required by BBE.
`factorprior`	`None`	fixed by preset	Factor loading prior controls.
`varprior`	`None`	fixed by preset	BVAR prior controls for the factor VAR block.
`nburn`	`5000`	fixed by preset	Burn-in iterations for posterior draws.
`nrep`	`15000`	fixed by preset	Saved posterior draw count.
`standardize`	`True`	fixed by preset	Use R `scale()` semantics for X and y before factor extraction.
`random_state`	`0`	fixed by preset	Random seed for posterior draws.

Preset	`n_factors`	`n_lag`
`small`	`(1, 2, 3)`	`(1, 2, 4)`
`standard`	`(1, 2, 3, 5, 8)`	`(1, 2, 4, 6, 12)`
`wide`	`(1, 2, 3, 5, 8, 10, 12)`	`(1, 2, 3, 4, 6, 9, 12, 18, 24)`

Tree And Machine-Learning Models#

Tree Implementation Map#

Tree callables use backend wrappers, hybrid wrappers, or package-native code. Fit-time model ensembles such as bagging, subagging, stacking, Super Learner, and Booging live in Model Ensemble.

Model	Implementation class	Runtime backend
`decision_tree`	backend wrapper	`sklearn.tree.DecisionTreeRegressor`
`random_forest`	backend wrapper	`sklearn.ensemble.RandomForestRegressor`
`extra_trees`	backend wrapper	`sklearn.ensemble.ExtraTreesRegressor`
`gradient_boosting`	backend wrapper	`sklearn.ensemble.GradientBoostingRegressor`
`xgboost`	optional backend wrapper	`xgboost.XGBRegressor`
`lightgbm`	optional backend wrapper	`lightgbm.LGBMRegressor`
`lgb_plus`	package-native hybrid	LGB+ competition algorithm aligned to `philgoucou/lgbplus`, using `lightgbm.train` for residual tree candidates
`lgba_plus`	package-native hybrid	LGB^A+ alternating algorithm aligned to `philgoucou/lgbplus`, using `lightgbm.train` for residual tree blocks
`catboost`	optional backend wrapper	`catboost.CatBoostRegressor`
`quantile_regression_forest`	hybrid	sklearn forest plus macroforecast leaf-target quantile store
`macro_random_forest`	hybrid adapter	vendored `macroforecast.models._mrf_reference.MacroRandomForest`

backend wrapper means the statistical estimator is delegated to the named package and macroforecast standardizes callable input, metadata, diagnostics, and persistence. hybrid means macroforecast owns part of the algorithmic contract, such as resampling, leaf-distribution storage, feature augmentation, or pandas-to-reference-package adaptation. package-native means the estimator logic itself is implemented inside macroforecast.

decision_tree#

macroforecast.models.decision_tree(
    X,
    y,
    *,
    max_depth=None,
    min_samples_leaf=1,
    random_state=0,
)

Fits sklearn CART regression.

R parity is intentionally not claimed for the sklearn tree wrappers. The named backend owns the estimator; macroforecast owns the pandas X, y contract, metadata, diagnostics, and search-space registration.

Parameter	Default	Tunable	Meaning
`max_depth`	`None`	yes	Maximum tree depth.
`min_samples_leaf`	`1`	yes	Minimum samples per terminal leaf.
`random_state`	`0`	fixed by preset	Tree random seed.

Preset	`max_depth`	`min_samples_leaf`
`small`	`(3, 5, None)`	`(1, 3)`
`standard`	`(3, 5, 10, None)`	`(1, 3, 5)`
`wide`	`(2, 3, 5, 10, 20, None)`	`(1, 2, 3, 5, 10)`

random_forest#

macroforecast.models.random_forest(
    X,
    y,
    *,
    n_estimators=200,
    max_depth=None,
    min_samples_leaf=1,
    random_state=0,
    n_jobs=1,
)

Fits sklearn random forest regression.

The fitted wrapper exposes sklearn feature importances through fit.diagnostics["feature_importance"].

Parameter	Default	Tunable	Meaning
`n_estimators`	`200`	yes	Number of trees.
`max_depth`	`None`	yes	Maximum depth per tree.
`min_samples_leaf`	`1`	yes	Minimum samples per terminal leaf.
`random_state`	`0`	fixed by preset	Forest random seed.
`n_jobs`	`1`	fixed by preset	Parallel worker count.

Preset	`n_estimators`	`max_depth`	`min_samples_leaf`
`small`	`(50, 100)`	`(3, 5, None)`	`(1, 3)`
`standard`	`(100, 200, 500)`	`(3, 5, 10, None)`	`(1, 3, 5)`
`wide`	`(100, 200, 500, 1000)`	`(3, 5, 10, 20, None)`	`(1, 2, 3, 5, 10)`

Default model-selection method: random.

extra_trees#

macroforecast.models.extra_trees(
    X,
    y,
    *,
    n_estimators=200,
    max_depth=None,
    min_samples_leaf=1,
    random_state=0,
    n_jobs=1,
)

Fits sklearn extremely randomized trees. Parameters and presets match random_forest.

Default model-selection method: random.

gradient_boosting#

macroforecast.models.gradient_boosting(
    X,
    y,
    *,
    n_estimators=200,
    learning_rate=0.1,
    max_depth=3,
    random_state=0,
)

Fits sklearn gradient-boosted regression trees.

This is one boosting estimator. Fit-time boosting ensembles such as Booging live in Model Ensemble.

Parameter	Default	Tunable	Meaning
`n_estimators`	`200`	yes	Number of boosting stages.
`learning_rate`	`0.1`	yes	Shrinkage per stage.
`max_depth`	`3`	yes	Maximum tree depth.
`random_state`	`0`	fixed by preset	Boosting random seed.

Preset	`n_estimators`	`learning_rate`	`max_depth`
`small`	`(50, 100)`	`(0.05, 0.1)`	`(2, 3)`
`standard`	`(100, 200, 500)`	`(0.03, 0.05, 0.1)`	`(2, 3, 5)`
`wide`	`(100, 200, 500, 1000)`	`(0.01, 0.03, 0.05, 0.1)`	`(2, 3, 5, 8)`

Default model-selection method: random.

mars#

macroforecast.models.mars(
    X,
    y,
    *,
    max_terms=20,
    max_degree=1,
    n_knots=10,
    min_improvement=1e-6,
    penalty=2.0,
    prune=True,
)

Fits a package-native MARS-style hinge-basis regression. It uses forward insertion of hinge basis pairs and optional backward pruning by generalized cross-validation. This avoids the unmaintained pyearth dependency; it is a clean internal implementation and does not claim bit-level equivalence to other MARS backends.

Parameter	Default	Tunable	Meaning
`max_terms`	`20`	yes	Maximum number of basis terms including intercept.
`max_degree`	`1`	yes	Maximum interaction degree.
`n_knots`	`10`	yes	Candidate quantile knots per predictor.
`min_improvement`	`1e-6`	fixed by preset	Forward-step relative RSS improvement floor.
`penalty`	`2.0`	fixed by preset	GCV pruning complexity penalty.
`prune`	`True`	fixed by preset	Whether to prune terms by GCV.

Default model-selection method: random.

xgboost#

macroforecast.models.xgboost(
    X,
    y,
    *,
    n_estimators=300,
    learning_rate=0.1,
    max_depth=6,
    subsample=1.0,
    random_state=0,
    **kwargs,
)

Fits xgboost.XGBRegressor. Requires macroforecast[xgboost].

Parameter	Default	Tunable	Meaning
`n_estimators`	`300`	yes	Number of boosting stages.
`learning_rate`	`0.1`	yes	Shrinkage per stage.
`max_depth`	`6`	yes	Maximum tree depth.
`subsample`	`1.0`	yes	Row subsample share.
`random_state`	`0`	fixed by preset	Boosting random seed.

Preset spaces match gradient_boosting plus subsample=(0.6, 0.8, 1.0). Default model-selection method: random.

lightgbm#

macroforecast.models.lightgbm(
    X,
    y,
    *,
    n_estimators=300,
    learning_rate=0.1,
    max_depth=-1,
    num_leaves=31,
    random_state=0,
    **kwargs,
)

Fits lightgbm.LGBMRegressor. Requires macroforecast[lightgbm].

Parameter	Default	Tunable	Meaning
`n_estimators`	`300`	yes	Number of boosting stages.
`learning_rate`	`0.1`	yes	Shrinkage per stage.
`max_depth`	`-1`	yes	Maximum tree depth; `-1` means no limit.
`num_leaves`	`31`	yes	Maximum leaves per tree.
`random_state`	`0`	fixed by preset	Boosting random seed.

Preset	`n_estimators`	`learning_rate`	`max_depth`	`num_leaves`
`small`	`(50, 100)`	`(0.05, 0.1)`	`(-1, 3, 5)`	`(15, 31)`
`standard`	`(100, 200, 500)`	`(0.03, 0.05, 0.1)`	`(-1, 3, 5, 10)`	`(15, 31, 63)`
`wide`	`(100, 200, 500, 1000)`	`(0.01, 0.03, 0.05, 0.1)`	`(-1, 3, 5, 10, 20)`	`(15, 31, 63, 127)`

Default model-selection method: random.

lgb_plus#

Paper Citation And Source#

Primary paper:

Goulet Coulombe, Philippe. 2026. “LGB+: A Macroeconomic Forecasting Road Test.” Draft dated March 18, 2026. SSRN abstract 6439178. DOI: 10.2139/ssrn.6439178. Paper page: https://ssrn.com/abstract=6439178.

Reference implementation:

Source	Role in `macroforecast`
`philgoucou/lgbplus`	Original R/Python implementation repository.
`R/lgb_plus.R`	Competition algorithm, `linear_candidate_fraction`, component prediction helpers, and ensemble helper.
`python/lgb_plus.py`	Competition estimator class with in-class ensemble storage and step histories.
`R/lgb_plus_A.R`	Alternating algorithm and `lgb_plus_A_ensemble` helper.
`python/lgb_plus_A.py`	Alternating estimator class.

Paper Motivation#

The paper targets a macro forecasting problem that appears repeatedly in small and medium macro samples. Tree boosting is strong for nonlinearities and interactions, but a large share of macro predictive content can be simple: autoregressive persistence, slowly moving accounting-like relationships, or near-mechanical links such as claims before unemployment and permits before housing starts. A standard tree booster can approximate those linear slopes, but it spends splits and boosting capacity to do work that a one-variable linear update could do cheaply.

LGB+ expands the boosting basis from “trees only” to “trees plus greedy linear updates.” The forecast remains additive:

yhat(x) = intercept + tree_component(x) + linear_component(x)

That additive form is not a cosmetic detail. It lets the user inspect the forecast through two channels:

Channel	Intended role	Caveat
Linear	Persistence, autoregressive slopes, near-accounting links, and other simple one-variable residual corrections.	The split is operational, not metaphysical; a tree can still learn linear-looking structure.
Tree	Nonlinear states, interactions, thresholds, and regime-dependent effects.	Tree gains can include simple structures if the linear candidate loses the competition.

The paper emphasizes that the linear/nonlinear split should be read as an algorithmic decomposition generated by the boosting path, not as proof that the data-generating process is literally separated into pure linear and pure nonlinear blocks.

Paper Empirical Design#

The empirical road test uses transformed quarterly U.S. macro data from FRED-QD. The review file summarized the design as six targets: headline CPI inflation, GDP growth, unemployment, housing starts growth, industrial production, and the term spread. Predictors include FRED-QD transformations, four lags, MARX moving-average features, and principal components from the transformed panel. The out-of-sample design is expanding-window forecasting, with a pre-COVID evaluation period and a post-COVID stress period.

This matters for using macroforecast:

Paper object	`macroforecast` stage
FRED-QD transformed panel	`macroforecast.data.load_fred_qd(...)` then `macroforecast.preprocessing.reprocess(...)`.
Four lags	`macroforecast.feature_engineering.lag(...)` or runner `feature_spec(...)`.
MARX moving-average features	`macroforecast.feature_engineering.moving_average_ladder(...)` or `marx_step(...)`.
Principal components	`macroforecast.feature_engineering.pca_features(...)` or preprocessing/factor features when fit-window-safe execution is needed.
Expanding OOS design	`macroforecast.window.estimation_expanding(...)` plus `test_origins(...)`.
Re-estimation schedule	`macroforecast.window.estimation_expanding(..., retrain_every=...)`.
LGB+ model	`macroforecast.models.lgb_plus(...)`.
LGB^A+ model	`macroforecast.models.lgba_plus(...)`.
Linear/tree forecast decomposition	`fit.estimator.predict_components(...)` and diagnostics in `ModelFit`.

Method In The Paper#

The paper has two closely related estimators.

LGB+ is the competition version. At each boosting step:

Step	Operation
1	Start from the current fitted value.
2	Compute residuals on the training sample.
3	Draw a row subsample.
4	Fit one small LightGBM residual tree on the subsample.
5	Fit one greedy univariate linear residual update on the same subsample.
6	Evaluate both candidate updates using `oob`, fixed `validation`, or `training` loss.
7	Accept only the lower-loss candidate.
8	Record which channel won, the selected linear feature when relevant, and the candidate losses.

LGB^A+ is the alternating version. It does not run a per-step competition. Instead, each cycle applies a block of residual trees and then a greedy univariate linear correction. This is computationally simpler and can be more stable when the OOB judge is noisy in macro-sized samples.

Main Paper Findings To Keep In Mind#

The paper’s simulations and empirical road test support the following working interpretation:

Finding	Practical implication
In mostly linear designs, the linear channel can absorb much of the signal and avoid forcing trees to approximate simple slopes.	Include autoregressive and near-accounting predictors explicitly; then inspect the linear channel.
In nonlinear designs, the tree channel remains active and the hybrid does not have to behave like a linear model.	LGB+ is a flexible hybrid, not a linear model with tree residuals fixed in advance.
The linear channel is often useful for short-horizon unemployment and industrial production before COVID.	Channel diagnostics can reveal whether gains come from persistence-like relations or nonlinear state recognition.
In post-COVID stress periods, the linear channel can become harmful for some real-activity targets.	Report channel-specific diagnostics; do not rely only on total RMSE.
Forecasts can be decomposed natively into tree and linear pieces.	Use `predict_components()` and store `ModelFit.diagnostics` when writing replication outputs.

What `macroforecast` Implements#

macroforecast.models.lgb_plus implements the competition estimator as a package-native hybrid model. LightGBM supplies the residual tree candidate, but the step loop, linear candidate, OOB/validation/training selection, ensemble aggregation, channel accounting, and pandas metadata are implemented inside macroforecast.

The implementation is deliberately not just a thin LGBMRegressor wrapper:

Feature	Implemented?	Notes
Tree candidate per step	yes	Uses `lightgbm.train(..., num_boost_round=1)`.
Greedy univariate linear candidate	yes	Uses the same no-intercept residual slope as the competition reference code.
`linear_candidate_fraction`	yes	Kept from the R implementation.
`selection_method="oob"`	yes	Default; requires `subsample < 1`.
`selection_method="validation"`	yes	Uses a fixed random validation split inside the current fit window.
`selection_method="training"`	yes	Available for reference parity, but not recommended for macro evaluation.
Ensemble members	yes	Controlled by `n_ensemble`; predictions aggregate by mean or median.
Linear component prediction	yes	`predict_components(...).prediction_linear`.
Tree component prediction	yes	`predict_components(...).prediction_tree`.
Channel diagnostics	yes	`channel_summary`, `channel_importance`, and `training_history`.
AXIL historical weights	no	This belongs in interpretation/forecast analysis later, not inside the estimator.
Full paper table replication	no	The callable model is implemented; full empirical replication should be a separate replication package.

macroforecast.models.lgb_plus(
    X,
    y,
    *,
    n_ensemble=10,
    n_steps=200,
    learning_rate=0.05,
    subsample=0.7,
    num_leaves=5,
    min_data_in_leaf=20,
    lambda_l2=0.1,
    linear_candidate_fraction=0.5,
    selection_method="oob",
    val_fraction=0.2,
    early_stop_patience=50,
    aggregation="mean",
    random_state=0,
    verbose=False,
    **kwargs,
)

Fits LGB+ competition boosting from Goulet Coulombe’s philgoucou/lgbplus reference code. This is not ordinary lightgbm with extra linear features. At every boosting step the estimator builds two residual updates:

Candidate	Reference-code operation	Accepted when
Tree	Fit one small `lightgbm.train(...)` residual tree with manual shrinkage.	Candidate loss is no larger than the linear candidate.
Linear	Sample `linear_candidate_fraction` of features, choose the largest absolute residual correlation, and fit one no-intercept residual slope.	Candidate loss is lower than the tree candidate.

The R reference file R/lgb_plus.R includes linear_candidate_fraction; the Python reference file python/lgb_plus.py embeds the ensemble in the estimator but does not expose that candidate-fraction argument. macroforecast combines those two reference surfaces: n_ensemble controls independent runs and linear_candidate_fraction controls greedy linear candidate subsampling.

Input is the standard supervised model contract:

Input	Required shape	Meaning
`X`	`pandas.DataFrame`, `FeatureSet`, or array-like with shape `(n_obs, n_features)`	Predictor matrix. DataFrame column names are preserved in diagnostics.
`y`	`pandas.Series` or array-like with length `n_obs`	Forecast target for the current fit window.

Output is a ModelFit with model="lgb_plus". Use fit.predict(X_new) for the total prediction. The estimator also exposes:

Method or diagnostic	Output	Meaning
`fit.estimator.predict_components(X_new)`	DataFrame	`prediction_total`, `prediction_init`, `prediction_tree`, `prediction_linear`.
`fit.estimator.predict_individual(X_new)`	ndarray	One total prediction path per ensemble member.
`fit.estimator.channel_importance()`	DataFrame	Tree gain, linear selection count, and absolute linear update by feature.
`fit.diagnostics["channel_summary"]`	dict	Total tree and linear steps plus per-member counts.
`fit.diagnostics["training_history"]`	dict	Per-step candidate losses and selected channel metadata.

Parameter	Default	Tunable	Meaning
`n_ensemble`	`10`	yes	Independent boosting runs. Predictions are aggregated across runs.
`n_steps`	`200`	yes	Maximum tree/linear competition steps per run.
`learning_rate`	`0.05`	yes	Shared shrinkage for accepted tree or linear updates.
`subsample`	`0.7`	yes	Row subsample share per step. `selection_method="oob"` requires `< 1`.
`num_leaves`	`5`	yes	Maximum leaves for the one-step LightGBM tree candidate.
`min_data_in_leaf`	`20`	yes	Minimum rows per LightGBM leaf.
`lambda_l2`	`0.1`	fixed by preset	L2 penalty for LightGBM tree candidates.
`linear_candidate_fraction`	`0.5`	yes	Fraction of predictors sampled before greedy linear selection.
`selection_method`	`"oob"`	fixed by preset	Candidate judge: `"oob"`, `"validation"`, or `"training"`.
`val_fraction`	`0.2`	fixed by preset	Fixed validation share when `selection_method="validation"`.
`early_stop_patience`	`50`	fixed by preset	Stop after this many non-improving selected losses; `None` disables.
`aggregation`	`"mean"`	fixed by preset	Ensemble aggregation: `"mean"` or `"median"`.
`random_state`	`0`	fixed by preset	Base random seed. Each ensemble member increments it.
`**kwargs`	none	fixed by caller	Additional `lightgbm.train` parameters for residual tree candidates.

Preset	Main search dimensions
`small`	`n_ensemble`, `n_steps`, `learning_rate`, `subsample`, `num_leaves`, `min_data_in_leaf`, `linear_candidate_fraction` over narrow ranges.
`standard`	Same dimensions with 5-10 members, 100-400 steps, and candidate fractions `(0.33, 0.5, 1.0)`.
`wide`	Same dimensions with up to 20 members, 600 steps, lower learning rates, and broader subsampling.

Default model-selection method: random.

lgba_plus#

Paper Link#

lgba_plus is the macroforecast callable for the paper’s alternating variant, LGB^A+. It uses the same paper and source references as lgb_plus: Goulet Coulombe (2026), “LGB+: A Macroeconomic Forecasting Road Test,” SSRN 6439178, DOI 10.2139/ssrn.6439178, and the philgoucou/lgbplus source repository.

What Changes Relative To `lgb_plus`#

The alternating version is easier to read and usually cheaper to fit:

Dimension	`lgb_plus`	`lgba_plus`
Update schedule	Tree and linear candidates compete; one winner advances.	Every cycle applies both a tree block and one linear correction.
Main count parameter	`n_steps` per ensemble member.	`n_cycles` and `trees_per_cycle`.
Tree learning rate	Shared `learning_rate`.	`lr_tree`.
Linear learning rate	Shared `learning_rate`.	`lr_linear`.
Linear update	No-intercept residual slope in the competition reference code.	Intercept plus slope, matching the alternating reference code.
Ensemble control	`n_ensemble`.	`n_runs`.
Main diagnostic	Winner path and candidate losses.	Cycle path and selected linear feature after each tree block.

Use lgba_plus when the goal is a stable hybrid path rather than estimating the best tree/linear mix at every individual step. Use lgb_plus when the winner sequence itself is part of the analysis.

What `macroforecast` Implements#

Feature	Implemented?	Notes
Residual tree blocks	yes	Uses `lightgbm.train(..., num_boost_round=trees_per_cycle)`.
Greedy linear correction after every tree block	yes	Selects the largest absolute residual correlation.
Separate tree and linear learning rates	yes	`lr_tree`, `lr_linear`.
`n_runs` alternating ensemble	yes	Folds the R `lgb_plus_A_ensemble` helper into the estimator API.
Component prediction	yes	Same `predict_components()` output columns as `lgb_plus`.
Channel importance	yes	Tree gain, linear selection count, and absolute linear update.
Full AXIL dual interpretation	no	Planned for interpretation/forecast analysis rather than model fitting.

macroforecast.models.lgba_plus(
    X,
    y,
    *,
    n_runs=1,
    n_cycles=25,
    trees_per_cycle=10,
    lr_tree=0.02,
    lr_linear=0.1,
    num_leaves=15,
    min_data_in_leaf=20,
    subsample=1.0,
    random_state=0,
    verbose=False,
    **kwargs,
)

Fits LGB^A+, the alternating variant from philgoucou/lgbplus. Each cycle first fits a block of LightGBM residual trees, then fits one greedy univariate linear residual update with an intercept. Unlike lgb_plus, there is no per-step winner selection: both channels are updated every cycle.

The R reference file R/lgb_plus_A.R also provides an ensemble helper lgb_plus_A_ensemble. macroforecast folds that helper into this estimator via n_runs, so the same callable can represent both a single alternating model and an averaged alternating ensemble.

Input and output follow the same supervised contract as lgb_plus. fit.estimator.predict_components(X_new) returns the total, intercept, tree, and linear channels; fit.estimator.channel_importance() reports tree gain and linear update frequency by feature.

Parameter	Default	Tunable	Meaning
`n_runs`	`1`	yes	Independent alternating runs.
`n_cycles`	`25`	yes	Tree-block plus linear-update cycles per run.
`trees_per_cycle`	`10`	yes	Residual LightGBM trees per cycle.
`lr_tree`	`0.02`	yes	Shrinkage for tree-block predictions.
`lr_linear`	`0.1`	yes	Shrinkage for linear residual updates.
`num_leaves`	`15`	yes	Maximum leaves for each residual tree.
`min_data_in_leaf`	`20`	yes	Minimum rows per LightGBM leaf.
`subsample`	`1.0`	yes	LightGBM bagging fraction for tree blocks.
`random_state`	`0`	fixed by preset	Base random seed. Each run increments it.
`**kwargs`	none	fixed by caller	Additional `lightgbm.train` parameters.

The linear slope is computed by the centered OLS identity sum((x - mean(x)) * (r - mean(r))) / sum((x - mean(x))^2). This is statistically equivalent to the R code’s cov(x, residual) / var(x) and avoids the denominator mismatch in the reference Python expression that combines np.cov(...) with x.var().

Preset	Main search dimensions
`small`	Short alternating runs for smoke or narrow-window use.
`standard`	1 or 5 runs, 10 or 25 cycles, and tree/linear learning-rate grids.
`wide`	Up to 10 runs, 50 cycles, broader tree-block size and learning-rate ranges.

Default model-selection method: random.

catboost#

macroforecast.models.catboost(
    X,
    y,
    *,
    n_estimators=300,
    learning_rate=0.1,
    max_depth=6,
    random_state=0,
    verbose=False,
    **kwargs,
)

Fits catboost.CatBoostRegressor. Requires macroforecast[catboost].

Parameter	Default	Tunable	Meaning
`n_estimators`	`300`	yes	Number of boosting stages.
`learning_rate`	`0.1`	yes	Shrinkage per stage.
`max_depth`	`6`	yes	Tree depth.
`random_state`	`0`	fixed by preset	Boosting random seed.
`verbose`	`False`	fixed by preset	CatBoost console output flag.

Preset spaces match gradient_boosting. Default model-selection method: random.

Macro-Specific Tree Models#

quantile_regression_forest#

macroforecast.models.quantile_regression_forest(
    X,
    y,
    *,
    n_estimators=200,
    max_depth=None,
    min_samples_leaf=1,
    random_state=0,
    quantile_levels=(0.05, 0.5, 0.95),
)

Fits a random forest and stores per-leaf training-target distributions. The underlying estimator exposes predict_quantiles(X, levels=None). The forecasting runner stores those outputs in the quantile_predictions column as per-row dictionaries keyed by quantile level.

The fitted wrapper also exposes sklearn forest feature importances through fit.diagnostics["feature_importance"].

Quantiles use tree-equal leaf weighting: within each tree, all training observations that share the test row’s terminal leaf receive equal weight, and each tree contributes the same total weight. This avoids letting large leaves dominate the empirical quantile solely because they contain more observations.

Parameter	Default	Tunable	Meaning
`n_estimators`	`200`	yes	Number of trees.
`max_depth`	`None`	yes	Maximum depth per tree.
`min_samples_leaf`	`1`	yes	Minimum samples per terminal leaf.
`random_state`	`0`	fixed by preset	Forest random seed.
`quantile_levels`	`(0.05, 0.5, 0.95)`	fixed by preset	Default levels returned by `predict_quantiles()`.

Preset spaces match random_forest. Default model-selection method: random.

macro_random_forest#

macroforecast.models.macro_random_forest(
    X,
    y,
    *,
    x_columns=None,
    S_columns=None,
    x_pos=None,
    S_pos=None,
    y_pos=0,
    B=50,
    minsize=10,
    mtry_frac=1/3,
    min_leaf_frac_of_x=1.0,
    VI=False,
    ERT=False,
    quantile_rate=None,
    S_priority_vec=None,
    random_x=False,
    trend_push=1,
    howmany_random_x=1,
    howmany_keep_best_VI=20,
    cheap_look_at_GTVPs=True,
    prior_var=None,
    prior_mean=None,
    subsampling_rate=0.75,
    rw_regul=0.75,
    keep_forest=False,
    block_size=12,
    fast_rw=True,
    ridge_lambda=0.1,
    HRW=0,
    resampling_opt=2,
    print_b=False,
    parallelise=False,
    n_cores=1,
    **kwargs,
)

Adapter for Ryan Lucas’s MacroRandomForest reference backend. The reference implementation is vendored from MacroRandomForest 1.0.6 under the MIT license, with source attribution in macroforecast.models._mrf_reference. Install the optional runtime dependencies with macroforecast[macro_random_forest]. The adapter fits on the in-sample X/y and calls the reference _ensemble_loop() during predict(X_test). Repeated calls to predict() with the same test matrix reuse the previous reference-backend output, so repeated result materialization does not rerun the expensive forest loop.

If the reference backend returns multiple prediction columns for the requested test rows, the adapter averages them row by row. If the backend returns no recognized prediction field or fewer than the requested number of predictions, the adapter raises a runtime error instead of silently returning a misaligned forecast vector.

By default all columns in X are used both as the time-varying linear equation variables (x_columns) and the forest state variables (S_columns). Pass x_columns and S_columns when those sets should differ.

The reference backend distinguishes two predictor sets:

Argument	Role
`x_columns`	Columns in the local linear forecasting equation. These are the variables whose coefficients are allowed to vary over time.
`S_columns`	State variables used by the forest to split the sample and estimate those local coefficients.

Use either column names or reference-style positions for each role. Passing both x_columns and x_pos, or both S_columns and S_pos, raises an error rather than silently prioritizing one selector.

For example, a compact MRF can use a small local-linear equation but a wider state vector for the tree:

fit = macroforecast.models.macro_random_forest(
    X_train,
    y_train,
    x_columns=["INDPRO_lag0", "UNRATE_lag0"],
    S_columns=[
        "INDPRO_lag0",
        "UNRATE_lag0",
        "CPIAUCSL_lag0",
        "FEDFUNDS_lag0",
        "S&P500_lag0",
    ],
    B=50,
    minsize=10,
    mtry_frac=1.0,
    ridge_lambda=0.1,
    rw_regul=0.75,
    parallelise=False,
    print_b=False,
)

pred = fit.predict(X_test)

With the forecasting runner, pass model parameters through the model-keyed params mapping. If you want fixed parameters rather than model-owned tuning, also disable model selection for this model:

features = macroforecast.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["UNRATE", "CPIAUCSL", "FEDFUNDS", "S&P500"],
    lags=(0, 1),
)

window = macroforecast.window.spec(
    estimation=macroforecast.window.estimation_expanding(min_size=120),
    val=macroforecast.window.val_last_block(size=24),
    test=macroforecast.window.test_origins(horizon=1, step=1),
)

result = macroforecast.forecasting.run(
    panel,
    "macro_random_forest",
    window=window,
    features=features,
    params={
        "macro_random_forest": {
            "x_columns": ["UNRATE_lag0", "FEDFUNDS_lag0"],
            "S_columns": [
                "UNRATE_lag0",
                "UNRATE_lag1",
                "CPIAUCSL_lag0",
                "FEDFUNDS_lag0",
                "S&P500_lag0",
            ],
            "B": 100,
            "minsize": 10,
            "mtry_frac": 1.0,
            "parallelise": False,
            "print_b": False,
        }
    },
    model_selection={"macro_random_forest": None},
)

The reference implementation is sensitive to panel shape. Use numeric, non-missing features after preprocessing and feature engineering. Keep at least one x_columns variable, and prefer at least five S_columns variables; with very small state sets, set mtry_frac=1.0 so at least one state variable is considered at each split. Small training samples can also fail when minsize is too large relative to the number of local-linear variables.

Parameter	Default	Tunable	Meaning
`x_columns`	`None`	fixed by preset	Feature columns in the time-varying linear equation.
`S_columns`	`None`	fixed by preset	Feature columns used as forest state variables.
`x_pos`	`None`	fixed by preset	Reference-package predictor positions after the target column.
`S_pos`	`None`	fixed by preset	Reference-package state positions after the target column.
`y_pos`	`0`	fixed by preset	Fixed target position for the separated `X/y` adapter; must remain `0`.
`B`	`50`	yes	Number of MRF trees.
`minsize`	`10`	yes	Minimum node size before split attempts.
`mtry_frac`	`1/3`	yes	Fraction of state variables considered at each split.
`min_leaf_frac_of_x`	`1.0`	yes	Minimum leaf-size multiplier relative to local x dimension.
`VI`	`False`	fixed by preset	Enable variable-importance split search mode.
`ERT`	`False`	fixed by preset	Enable extremely randomized tree split mode.
`quantile_rate`	`None`	fixed by preset	Optional quantile rate for quantile-oriented output.
`S_priority_vec`	`None`	fixed by preset	Optional priority weights over state variables.
`random_x`	`False`	fixed by preset	Use random subsets of local-linear predictors.
`trend_push`	`1`	fixed by preset	Reference-package trend-push option.
`howmany_random_x`	`1`	fixed by preset	Number of random local-linear predictor draws.
`howmany_keep_best_VI`	`20`	fixed by preset	Number of best VI candidates retained.
`cheap_look_at_GTVPs`	`True`	fixed by preset	Use the reference package’s cheaper GTVP inspection.
`prior_var`	`None`	fixed by preset	Optional prior variances for local coefficients.
`prior_mean`	`None`	fixed by preset	Optional prior means for local coefficients.
`subsampling_rate`	`0.75`	yes	Subsample share used by each tree.
`rw_regul`	`0.75`	yes	Random-walk shrinkage strength.
`keep_forest`	`False`	fixed by preset	Keep full reference forest object in memory.
`block_size`	`12`	fixed by preset	Reference-package block size for time-series resampling.
`fast_rw`	`True`	fixed by preset	Use fast random-walk regularization path.
`ridge_lambda`	`0.1`	yes	Ridge penalty for local linear fits.
`HRW`	`0`	fixed by preset	Reference-package hierarchical random-walk option.
`resampling_opt`	`2`	yes	Reference MRF resampling option.
`parallelise`	`False`	fixed by preset	Whether to use reference-package parallel execution.
`n_cores`	`1`	fixed by preset	Worker count for the reference package.
`print_b`	`False`	fixed by preset	Reference-package progress printing.

The MRF presets tune B, minsize, mtry_frac, min_leaf_frac_of_x, subsampling_rate, rw_regul, ridge_lambda, and resampling_opt; inspect the exact candidate lists with describe_model("macro_random_forest").

Volatility Models#

Volatility model fits return VolatilityFit. In addition to predict_variance(horizon=...), their diagnostics include fitted parameter estimates under params and the in-sample conditional_volatility path when the backend exposes it.

These models are for volatility/variance forecasting, not ordinary conditional mean macro forecasting. They accept a univariate return-like series y and return both a point-mean prediction interface and a variance forecast interface.

Function	Implementation	R/source comparison	Boundary
`garch11`	Optional Python `arch.arch_model` backend.	Same surface as `rugarch::ugarchspec(variance.model=list(model="sGARCH"))` plus `ugarchfit()`, but the likelihood is delegated to `arch`, not reimplemented.	Backend controls solver details, distribution aliases, convergence behavior, and forecast internals.
`egarch`	Optional Python `arch.arch_model` backend.	Same surface as `rugarch::ugarchspec(variance.model=list(model="eGARCH"))` plus `ugarchfit()`, but the likelihood is delegated to `arch`, not reimplemented.	Backend controls solver details, distribution aliases, convergence behavior, and forecast internals.
`realized_garch`	Internal compact Gaussian log-linear MLE.	Aligned with the p=q=1 Hansen-Huang-Shek / `rugarch` `realGARCH` state and measurement equations.	Not a full `rugarch` clone: no ARMA/ARFIMA mean, variance regressors, non-Gaussian distributions, fixed-parameter SE machinery, simulation/path/roll helpers, or xts-specific checks.

Common Output#

fit = macroforecast.models.garch11(y)
variance = fit.predict_variance(horizon=12)
sigma = fit.conditional_volatility
metadata = fit.to_metadata()

Output	Type	Meaning
`fit`	`VolatilityFit`	Fitted wrapper with `predict()`, `predict_variance()`, `summary()`, `to_dict()`, and `to_metadata()`.
`fit.predict(X)`	`pandas.Series`	Conditional mean prediction. For these models this is usually a constant mean from the volatility backend.
`fit.predict_variance(horizon)`	`pandas.Series`	Variance forecast indexed from `0` to `horizon - 1`. `horizon` must be positive.
`fit.conditional_volatility`	`pandas.Series` or `None`	In-sample conditional volatility path if available.
`fit.diagnostics["params"]`	`dict`	Fitted parameters. Names depend on the backend/model.
`fit.diagnostics["conditional_volatility"]`	`pandas.Series`	Same path as `fit.conditional_volatility`, stored for metadata export.

garch11#

macroforecast.models.garch11(
    y,
    *,
    X=None,
    p=1,
    q=1,
    mean_model="constant",
    dist="normal",
    rescale=False,
)

Fits GARCH using the optional arch package. Requires macroforecast[arch]. The default is GARCH(1,1):

\[ r_t = \mu + \epsilon_t,\qquad \epsilon_t = \sigma_t z_t \]

\[ \sigma_t^2 = \omega + \alpha_1 \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2. \]

For higher p/q, the lag orders are passed directly to arch.arch_model(vol="GARCH", p=p, q=q).

Parameter	Default	Tunable	Meaning
`p`	`1`	yes	GARCH innovation lag order.
`q`	`1`	yes	GARCH variance lag order.
`mean_model`	`"constant"`	manual	Conditional mean model.
`dist`	`"normal"`	yes	Innovation distribution.
`rescale`	`False`	fixed by preset	`arch` package rescale option.

Preset	`p`	`q`	`dist`
`small`	`(1,)`	`(1,)`	`("normal", "t")`
`standard`	`(1, 2)`	`(1, 2)`	`("normal", "t")`
`wide`	`(1, 2, 3)`	`(1, 2, 3)`	`("normal", "t", "skewt")`

Implementation notes:

Item	Value
Backend	`arch.arch_model`
Required extra	`macroforecast[arch]`
R comparison	`rugarch::ugarchspec(variance.model=list(model="sGARCH", garchOrder=c(p, q)))`
Internal likelihood?	No. macroforecast validates orders, passes inputs to `arch`, and records metadata/diagnostics.
Minimum data	30 non-missing observations.

egarch#

macroforecast.models.egarch(
    y,
    *,
    X=None,
    p=1,
    o=0,
    q=1,
    mean_model="constant",
    dist="normal",
    rescale=False,
)

Fits EGARCH using the optional arch package. Requires macroforecast[arch]. The backend receives:

arch.arch_model(y, vol="EGARCH", p=p, o=o, q=q, mean=mean_model, dist=dist)

For EGARCH(1,1), the log-variance structure is backend-defined by arch; conceptually it is the exponential GARCH family where log variance reacts to standardized shock magnitude and asymmetry terms rather than modeling variance directly in levels.

Parameter	Default	Tunable	Meaning
`p`	`1`	yes	EGARCH innovation lag order.
`o`	`0`	yes	Asymmetric innovation lag order.
`q`	`1`	yes	EGARCH variance lag order.
`mean_model`	`"constant"`	manual	Conditional mean model.
`dist`	`"normal"`	yes	Innovation distribution.
`rescale`	`False`	fixed by preset	`arch` package rescale option.

Preset	`p`	`o`	`q`	`dist`
`small`	`(1,)`	`(0, 1)`	`(1,)`	`("normal", "t")`
`standard`	`(1, 2)`	`(0, 1)`	`(1, 2)`	`("normal", "t")`
`wide`	`(1, 2, 3)`	`(0, 1, 2)`	`(1, 2, 3)`	`("normal", "t", "skewt")`

Implementation notes:

Item	Value
Backend	`arch.arch_model`
Required extra	`macroforecast[arch]`
R comparison	`rugarch::ugarchspec(variance.model=list(model="eGARCH", garchOrder=c(p, q)))`
Internal likelihood?	No. macroforecast validates orders, passes inputs to `arch`, and records metadata/diagnostics.
Minimum data	30 non-missing observations.

realized_garch#

macroforecast.models.realized_garch(
    y,
    *,
    X=None,
    rv=None,
    realized_variance=None,
    max_iter=2000,
    n_starts=5,
    random_state=0,
)

Fits a compact p=q=1 Gaussian log-linear realized-GARCH joint likelihood. Provide rv directly or set realized_variance to the column in X containing the realized measure. If neither is supplied, macroforecast uses y ** 2 as an explicit rv_proxy so the caller can still inspect the model contract. For empirical realized-GARCH work, pass a true realized variance or realized volatility measure.

The implemented state and measurement equations are:

\[ \log h_t = \omega + \alpha \log x_{t-1} + \beta \log h_{t-1}, \]

\[ z_t = \frac{r_t - \mu}{\sqrt{h_t}}, \qquad \tau(z_t) = \eta_1 z_t + \eta_2(z_t^2 - 1), \]

\[ \log x_t = \xi + \delta \log h_t + \tau(z_t) + u_t, \qquad u_t \sim N(0, \sigma_u^2). \]

This matches the compact rugarch realGARCH recursion in rugarch/src/filters.c::realgarchfilter() for the p=q=1 case: lagged log realized volatility enters through alpha, lagged log latent variance enters through beta, and the measurement equation uses xi, delta, eta1, and eta2. The stationarity-style persistence diagnostic is:

\[ \text{persistence} = \beta + \delta \alpha. \]

The multi-step variance forecast uses the conditional expectation recursion: the first step uses the latest observed realized measure, then future tau(z_t) and measurement shocks are set to zero so E[\log x_t \mid h_t] = \xi + \delta \log h_t.

Parameter	Default	Tunable	Meaning
`realized_variance`	`None`	manual	Column name for realized variance.
`max_iter`	`2000`	fixed by preset	Optimizer iteration cap.
`n_starts`	`5`	yes	Number of optimizer starting points.
`random_state`	`0`	fixed by preset	Optimizer random seed.

Preset	`n_starts`
`small`	`(3, 5)`
`standard`	`(3, 5, 10)`
`wide`	`(3, 5, 10, 20)`

Implementation notes:

Item	Value
Backend	Internal SciPy `L-BFGS-B` optimizer.
Required extra	None beyond base SciPy stack.
R comparison	Compact p=q=1 version of `rugarch::ugarchspec(variance.model=list(model="realGARCH"))` with `realizedVol`.
Parameter names	`mu`, `omega`, `alpha`, `beta`, `xi`, `delta`, `eta_1`, `eta_2`, `log_sigma_u`, `persistence`.
Restrictions	`alpha > 0`, `beta > 0`, `delta > 0`, and `beta + delta * alpha < 1` during optimization.
Minimum data	30 aligned observations of `y` and realized measure.

Example:

fit = macroforecast.models.realized_garch(
    returns,
    rv=realized_variance,
    max_iter=2000,
    n_starts=5,
    random_state=0,
)

fit.diagnostics["params"]
fit.predict_variance(horizon=12)

Omitted From The Clean Model API#

Legacy name	Decision
`lasso_path`	Removed. Use `get_model("lasso")` and `model_selection.select_params()`.
`pcr`	Removed. Use `feature_engineering.feature_spec(pca_components=...)` with a regression model.

var_select_order – VAR lag-order selection by AIC/BIC/HQ/FPE (vars::VARselect), via statsmodels VAR.select_order.
gjr_garch – GJR-GARCH (Glosten-Jagannathan-Runkle) asymmetric/leverage volatility (arch GARCH, o>0; rugarch gjrGARCH).
tgarch – Threshold GARCH (TGARCH/Zakoian), absolute-value (power=1) asymmetric volatility.
risk_forecast – Value-at-Risk and Expected Shortfall forecast from a fitted volatility model (Normal / standardized-t).
value_at_risk – lower-tail VaR return quantile(s) from a fitted volatility model.
expected_shortfall – Expected Shortfall (mean return below VaR) from a fitted volatility model.
news_impact_curve – Engle-Ng (1993) news impact curve: conditional variance vs lagged shock for a fitted GARCH-family model.
garch_roll – rolling one-step volatility / VaR backtest with periodic refit and coverage summary (rugarch::ugarchroll).
var_roots – VAR stability check: moduli of the companion-matrix eigenvalues, spectral radius, and is_stable (vars::roots).
var_restrict – restricted VAR by sequential elimination of insignificant regressors with restriction matrix (vars::restrict).
arima – (seasonal) ARIMA model via statsmodels, order (p,d,q) and seasonal_order (P,D,Q,m).
auto_arima – automatic (seasonal) ARIMA order selection (forecast::auto.arima): KPSS-based d, AICc grid over (p,q[,P,Q]).

arima#

macroforecast.models.arima(y, *, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None)

auto_arima#

macroforecast.models.auto_arima(y, *, max_p=5, max_q=5, max_d=2, seasonal=False, m=1, ic="aicc", trend=None)

gjr_garch#

macroforecast.models.gjr_garch(y, *, X=None, p=1, o=1, q=1, mean_model="constant", dist="normal", rescale=False)

tgarch#

macroforecast.models.tgarch(y, *, X=None, p=1, o=1, q=1, mean_model="constant", dist="normal", rescale=False)