macroforecast.feature_engineering#

Purpose#

macroforecast.feature_engineering is the direct pandas surface for building forecast targets and model-ready feature matrices. It accepts the same direct Python inputs used by previous stages: PreprocessedData, DataSpec, DataBundle, (panel, metadata), or a canonical pandas.DataFrame.

For strict windowed forecasting, use feature_spec(...). The spec is fitted by macroforecast.forecasting.run(...) inside each train window and then transformed for the matching test rows. Individual functions such as lag(), rolling_mean(), pca_features(), and feature_matrix() remain callable one-shot helpers; runner composition belongs in macroforecast.forecasting.

The preferred flow is:

import macroforecast as mf

bundle = mf.data.load_fred_md()
data_spec = mf.data.spec(bundle, target="INDPRO", horizons=[1, 3, 6], predictors="all")
processed = mf.preprocessing.reprocess(data_spec)

features = mf.feature_engineering.build_features(
    processed,
    lags=(0, 1, 2, 3),
    rolling_windows=(3, 6),
    add_time=True,
)

X = features.X
y = features.y
metadata = features.metadata

build_features() emits a warning when it receives a canonical panel that does not carry metadata["preprocessing"]. This is allowed, but the default package workflow is data -> preprocessing -> feature engineering.

Common callable examples:

# Direct horizon targets: y[t+h].
y_direct = mf.feature_engineering.direct_target(processed, target="INDPRO", horizons=[1, 3, 6])

# Direct average target: one y column per requested horizon.
y_avg = mf.feature_engineering.average_target(processed, target="INDPRO", horizon=12, transform="growth")

# Path target: one y column per future step. Model fits/forecasts each step;
# evaluation averages the step forecasts.
y_path = mf.feature_engineering.path_targets(processed, target="INDPRO", horizon=12, transform="growth")

# Simple lagged predictors.
X_lag = mf.feature_engineering.lag(processed, columns=["PAYEMS", "INDPRO"], lags=range(0, 13))

Public Functions#

Group	Functions	Purpose
Target construction	`direct_target`, `average_target`, `path_targets`, `forward_average_target`	Build direct, average, or step-path forecast targets. `forward_average_target` is the Albacore/assemblage-named reusable helper.
Basic predictor transforms	`lag`, `rolling_mean`, `moving_average_ladder`, `mixed_frequency_lags`, `seasonal_lag`, `season_dummy`, `time_features`, `fourier_features`	Add lags, rolling blocks, mixed-frequency lag blocks, and deterministic date features.
ML-side value transforms	`transform_features`, `log_features`, `diff_features`, `log_diff_features`, `pct_change_features`, `cumsum_features`, `scale_features`	Add post-preprocessing transformations and scaling used as model features.
Nonlinear expansions	`polynomial_features`, `interaction_features`, `wavelet_features`, `savitzky_golay_features`, `adaptive_ma_rf_features`, `asymmetric_trim_features`, `rank_space_features`	Add nonlinear, smooth, rank, or multi-resolution feature blocks. `rank_space_features` is the generic order-statistic primitive used by Albacoreranks.
Supervised aggregation helpers	`moving_average_changes`, `align_reference_weights`, `weighted_aggregate`	Reusable component-to-aggregate helpers derived from the Albacore/assemblage R package. They are generic and can be used outside inflation.
Trend/cycle filter wrappers	`hamilton_filter_features`, `hp_filter_features`	Turn `macroforecast.filters` one-series filter outputs into panel feature columns with leakage warnings where needed.
Factor features	`pca_features`, `dfm_features`, `group_pca`, `varimax_features`, `sparse_pca_chen_rohe_features`, `partial_least_squares_features`, `sliced_inverse_regression_features`, `random_projection_features`, `nystroem_features`	Build fitted factor, supervised-factor, sparse-factor, rotation, random projection, and kernel approximation features.
Paper-style combinations	`feature_matrix`, `maf_features`, `moving_average_pca_lags`, `pca_then_lags`, `lags_then_pca`	Materialize named macro-ML blocks such as `X`, `F`, `MARX`, `MAF`, and compositions.
Composition	`compose_features`, `custom_features`, `custom_step`	Run sequential feature steps or user-supplied transforms.
Runner-safe specs	`feature_spec` plus step builders ending in `_step`	Store fit-aware feature construction for `forecasting.run(...)`.
Feature selection	`select_features`, `variance_selection`, `correlation_selection`, `lasso_selection`, `lasso_path_selection`, `rfe_selection`, `boruta_selection`, `stability_selection`, `genetic_selection`	Select columns with variance, target association, sparse-model, wrapper, or search rules.
Selection utilities	`normalize_feature_selection_method`, `feature_selection_requires_target`	Normalize selection aliases and report whether a target is required.
End-to-end builder	`build_features`	Return aligned `X`, `y`, metadata, feature provenance, and target provenance.

Runner-safe step builders are direct functions too: lag_step, rolling_step, moving_average_step, marx_step, transform_step, seasonal_lag_step, season_dummy_step, fourier_step, time_step, polynomial_step, interaction_step, scale_step, pca_step, sparse_pca_chen_rohe_step, varimax_step, group_pca_step, maf_step, hamilton_step, random_projection_step, nystroem_step, partial_least_squares_step, and sliced_inverse_regression_step.

Code Structure#

The public namespace stays macroforecast.feature_engineering, while the implementation is split by responsibility:

File	Responsibility
`targets.py`	`direct_target()`, `average_target()`, and `path_targets()`.
`transforms.py`	Direct pandas feature transforms: lags, rolling means, scaling, PCA, PLS, DFM-style factors, Chen-Rohe sparse component analysis, varimax rotation, grouped PCA, MAF, filter-to-feature wrappers, custom feature callables, and time features.
`feature_selection.py`	Shared fitted feature-selection algorithms used by direct selection callables and runner-safe `feature_spec()` method names.
`compose.py`	Reusable step builders and sequential feature composition.
`matrix.py`	Paper-style `X`, `F`, `MARX`, `MAF`, and `LEVEL` feature-matrix combinations.
`builder.py`	End-to-end `build_features()` alignment of `X`, `y`, and metadata.
`shared.py`	Internal normalization, metadata, fitting, and validation helpers.
`core.py`	Compatibility re-export only.

Public Classes And Types#

Symbol	Meaning
`FeatureInput`	Accepted feature-engineering input type.
`FeatureSet`	Output object returned by `build_features(...)`.
`FeatureSpec`	Runner-compatible feature-building contract.
`FittedFeatureBuilder`	Fitted feature-builder state used by the runner.
`FeatureSelectionResult`	Metadata-rich result for feature-selection helpers.
`select_features`	Generic feature-selection dispatcher.
`feature_selection_requires_target`	Return whether a feature-selection method requires a target.
`normalize_feature_selection_method`	Normalize feature-selection method aliases.
`pca_then_lags`	Convenience composition: PCA factors first, then lags.
`lags_then_pca`	Convenience composition: lag panel first, then PCA.
`moving_average_pca_lags`	Convenience composition for moving-average blocks, PCA, and lags.

FeatureSet#

macroforecast.feature_engineering.FeatureSet(
    X: pandas.DataFrame,
    y: pandas.DataFrame,
    metadata: dict,
    feature_metadata: pandas.DataFrame,
    target_metadata: pandas.DataFrame,
    target: str | None = None,
    targets: tuple[str, ...] = (),
    horizons: tuple[int, ...] = (),
    predictors: tuple[str, ...] = (),
)

Output Schema#

Field	Type	Meaning
`X`	`pandas.DataFrame`	Predictor matrix aligned on forecast-origin dates.
`y`	`pandas.DataFrame`	Direct horizon targets or path step targets aligned to `X`.
`metadata`	`dict`	Input metadata plus feature-engineering stage metadata.
`feature_metadata`	`pandas.DataFrame`	One row per generated feature with provenance columns.
`target_metadata`	`pandas.DataFrame`	One row per target column with horizon, transform, and formula provenance.
`target`, `targets`, `horizons`, `predictors`	scalar or tuple fields	Resolved study choices.

Methods#

Method	Input	Output	Meaning
`attach(stage, values)`	`stage: str`, `values: Mapping`	`FeatureSet`	Return a new object with one metadata stage added.

FeatureSet also supports tuple unpacking:

X, y, metadata = features

FeatureSelectionResult#

select_features(...) returns FeatureSelectionResult; direct selection wrappers return a selected-column DataFrame and store the same selection metadata on the returned frame.

Field	Meaning
`selected_columns`	Final selected columns in source order.
`scores`	Per-column score dictionary.
`method`	Canonical selection method.
`n_features`, `resolved_n_features`	Requested and resolved selected-feature counts.
`n_fit_rows`	Rows used by the selection fit.
`fit_policy`	Fit contract, such as target-aligned rows or column-only rows.
`target_required`	Whether the method requires a target.
`metadata`	Method-specific score and fit details.

Feature Boundary#

This stage is direct and pandas-first. It constructs target columns and ML-oriented feature transforms. Multiple transformations can be composed in sequence through plain Python callables, and higher-level orchestration can call the same functions later.

Function	Owns	Does not own yet
`direct_target()`	Direct-forecast target columns, including direct average targets.	Train/test split, recursive forecasting, inverse transforms.
`average_target()`	Explicit wrapper for direct average change/growth targets.	Model fitting.
`forward_average_target()`	Albacore/assemblage-named wrapper for future average aggregate targets.	Inflation-only semantics; works for any aggregate target.
`path_targets()`	Step-level targets for path-average forecasting.	Model-stage step fit/forecast and evaluation-stage forecast averaging.
`lag()`	Current and lagged predictor columns.	Model-specific lag search.
`mixed_frequency_lags()`	Exact-date lag blocks for native mixed-frequency panels.	Frequency conversion or model estimation.
`rolling_mean()`	Rolling-window means.	Fit-based filters or learned smoothers.
`moving_average_ladder()`	Multi-scale trailing moving-average block used before optional factor/PCA steps.	PCA/factor extraction itself.
`maf_features()`	Moving Average Factors from variable-specific lag panels.	Model fitting or choosing final feature combinations.
`hamilton_filter_features()`	Panel wrapper around `filters.hamilton_filter()` with explicit expanding or full-sample policy.	Model fitting, test windows, or choosing filter horizons.
`feature_matrix()`	Named `X`, `F`, `MARX`, `MAF`, and `LEVEL` feature-matrix combinations.	Loading or preprocessing the raw/level panel.
`scale_features()`	Fit-policy-aware z-score, min-max, or robust scaling.	Model fitting.
`pca_features()`	Fit-policy-aware PCA factors.	Forecast model fitting.
`sparse_pca_chen_rohe_features()`	Chen-Rohe sparse component analysis factors using an L1 loading budget.	Model fitting; runner-safe fitting should use `sparse_pca_chen_rohe_step()` inside `feature_spec()`.
`varimax_features()`	Orthogonal varimax rotation of already-created factor-score columns.	Factor extraction itself; usually call after `pca_features()` or a factor step.
`sliced_inverse_regression_features()`	Target-aware Sliced Inverse Regression factors.	Model fitting; runner-safe fitting should use `sliced_inverse_regression_step()` inside `feature_spec()`.
`partial_least_squares_features()`	Target-aware PLS factor scores.	Model fitting; runner-safe fitting should use `partial_least_squares_step()` inside `feature_spec()`.
`dfm_features()`	Static DFM approximation by standardized PCA.	State-space DFM estimation; use model callables for that.
`variance_selection()`, `correlation_selection()`, `lasso_selection()`, `lasso_path_selection()`, `rfe_selection()`, `boruta_selection()`, `stability_selection()`, `genetic_selection()`	Direct column selection by one explicit algorithm.	Model fitting; runner-safe fitting uses the same method names inside `feature_spec(..., steps=[...])`.
`asymmetric_trim_features()`	Per-period rank-space columns for asymmetric trimming weights.	Estimating the nonnegative rank weights.
`rank_space_features()`	Named generic rank-space/order-statistic primitive from the Albacore R `x.transformation` path.	Model fitting or learned rank weights.
`moving_average_changes()`	Convert one-period component changes to a trailing moving-average unit.	Choosing forecast windows or fitting weights.
`align_reference_weights()`	Align official/reference weights to a component column order.	Estimating those weights.
`weighted_aggregate()`	Apply fixed component weights to produce one aggregate.	Learning the weights; use `models.component_aggregation()` for that.
`wavelet_features()`	Panel wrapper around `filters.wavelet_filter()` returning causal rolling multi-resolution approximation/detail columns.	True DWT family-specific filtering.
`adaptive_ma_rf_features()`	Feature-wrapper form of `filters.albama()` that returns `{column}_albama` columns and stores full `AlbaMAResult` objects in attrs.	Forecast model fitting.
`group_pca()`	PCA factors within named column groups.	FAVAR-specific slow/fast construction, model estimation, or structural identification.
`custom_features()`	One direct user-supplied pandas feature transform.	Window-safe fitted state. Use `custom_step()` inside `feature_spec()` for runner use.
`compose_features()`	Sequential combinations such as `pca -> lag`, `lag -> pca`, `maf`, or `moving_average_ladder -> pca -> lag`.	Model fitting or evaluation.
`time_features()`	Trend, month, quarter, and year columns.	Public-holiday or trading-day calendars; the package targets monthly and quarterly macro panels.
`build_features()`	Aligned `X`, `y`, feature metadata, and feature-engineering metadata.	Model evaluation.

Fit-based transformations require a declared fit_policy. The default is fit_policy="expanding", which estimates transform parameters using only data available through each date. fit_policy="full_sample" is available for exploration or already-split training data. Public fitted transforms warn by default when full_sample is used; pass warn_full_sample=False only when the input panel is already training-only or the call is intentionally diagnostic.

direct_target#

macroforecast.feature_engineering.direct_target(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    transform: str = "level",
) -> pandas.DataFrame

Input#

Name	Type	Default	Choices
`data`	`PreprocessedData`, `DataSpec`, `DataBundle`, `(panel, metadata)`, or `DataFrame`	required	Canonical macroforecast input.
`metadata`	mapping or `None`	`None`	Extra metadata to merge into the input metadata.
`target`	`str` or `None`	from `data`	One target column.
`targets`	iterable or `None`	from `data`	Multiple target columns. Mutually exclusive with `target`.
`horizon`	positive `int` or `None`	from `data`, then `1`	One forecast horizon.
`horizons`	positive int/iterable or `None`	from `data`, then `(1,)`	Multiple forecast horizons. Mutually exclusive with `horizon`.
`transform`	`str`	`"level"`	`"level"`, `"value"`, `"change"`, `"growth"`, `"log_growth"`, `"average_value"`, `"average_change"`, `"average_growth"`, `"average_log_growth"`; common aliases include `"future_level"`, `"future_value"`, `"identity"`, `"diff"`, `"pct_change"`, `"simple_growth"`, `"log_change"`, `"log_diff"`, `"avg_value"`, `"avg_change"`, `"avg_growth"`, and `"direct_average_growth"`.

Output#

Returns a pandas.DataFrame indexed by date. Column names are {target}_{transform}_h{horizon}.

Transform	Formula aligned on row `t`
`"level"`	`x[t + h]`
`"value"`	`x[t + h]`; same formula as `level`, used when the series is already on the target’s forecasting scale.
`"change"`	`x[t + h] - x[t]`
`"growth"`	`x[t + h] / x[t] - 1`
`"log_growth"`	`log(x[t + h]) - log(x[t])`; non-positive pairs become missing.
`"average_value"`	Average of future values `x[t+1]` through `x[t+h]`; use this when `x` is already a one-period transformed forecast target.
`"average_change"`	Average of one-period changes from `t+1` through `t+h`.
`"average_growth"`	Average of one-period simple growth rates from `t+1` through `t+h`.
`"average_log_growth"`	Average of one-period log growth rates from `t+1` through `t+h`.

The final h rows are missing by construction because the future target is not observed.

The returned frame also carries attrs["macroforecast_target_metadata"]. Core columns are target_column, source, horizon, step, mode, transform, operation, formula, aggregation, and used_for_horizons.

average_target#

macroforecast.feature_engineering.average_target(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    transform: str = "change",
) -> pandas.DataFrame

average_target() is a readability wrapper for direct average targets. It returns the same output as:

mf.feature_engineering.direct_target(..., transform="average_change")
mf.feature_engineering.direct_target(..., transform="average_growth")
mf.feature_engineering.direct_target(..., transform="average_log_growth")
mf.feature_engineering.direct_target(..., transform="average_value")

This is the direct average approach: one final target column is created per requested horizon, and a later model can fit that column directly.

`transform`	Meaning
`"value"`	Average future values of an already transformed one-period target series.
`"change"`	Average one-period differences over the future path.
`"growth"`	Average one-period simple growth rates over the future path.
`"log_growth"`	Average one-period log growth rates over the future path.

forward_average_target#

macroforecast.feature_engineering.forward_average_target(
    data,
    *,
    target=None,
    targets=None,
    horizon=None,
    horizons=None,
    transform="change",
)

forward_average_target() is the named target helper for Albacore/assemblage-style supervised aggregation. It calls the same target logic as average_target(), but records source metadata pointing to Goulet Coulombe, Klieber, Barrette, and Goebel, Maximally Forward-Looking Core Inflation, and the R package assemblage. The helper is generic: the target can be any future aggregate, not only headline inflation.

Output: a DataFrame with columns such as headline_average_change_h12. The output stores attrs["macroforecast_target_metadata"] and marks source_method="assemblage_forward_target".

Assemblage Helper Primitives#

These helpers come from the Albacore/assemblage workflow but are intentionally split into generic callables. They can be attached to inflation components, state panels, sector panels, industry components, or any setting where current components are aggregated to forecast a future aggregate target.

Function	Input	Output	Albacore source cue
`rank_space_features(data, columns=None, prefix="rank_")`	Component panel.	`rank_1`, `rank_2`, … sorted low-to-high each date.	R `x.transformation`, rank path for Albacoreranks.
`moving_average_changes(data, window=3, method="compound_percent")`	One-period component changes.	`{column}_ma{window}` trailing change unit.	R `x.transformation`; month-over-month percent to 3m/12m compounded units.
`align_reference_weights(weights, columns, normalize=True)`	Mapping, `Series`, `DataFrame`, or sequence.	`Series` indexed by model columns.	R `weight.transformation`; official basket weights for Albacorecomps.
`weighted_aggregate(data, weights, columns=None)`	Component panel plus fixed weights.	One aggregate column.	Learned core measure after assemblage weights are estimated.

rank_space_features() and weighted_aggregate() do not estimate weights. Use macroforecast.models.rank_aggregation(), macroforecast.models.component_aggregation(), or the Albacore wrappers for supervised weight estimation.

path_targets#

macroforecast.feature_engineering.path_targets(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    transform: str = "change",
) -> pandas.DataFrame

path_targets() creates step-level future targets for path-average forecasting. For horizon=3, it returns step columns for t+1, t+2, and t+3. The model stage should fit and forecast each step target separately. The evaluation stage can then average the step forecasts for the final horizon. Use transform="value" when the supplied target column is already a one-period transformed object, such as the monthly growth/difference target in a FRED-MD replication.

path_y = mf.feature_engineering.path_targets(
    processed,
    target="INDPRO",
    horizon=3,
    transform="value",
)

Output columns are named {target}_{transform}_step{step}. Metadata includes metadata["path_target"]["columns_by_horizon"], which records which step columns should be averaged for each requested horizon.

macroforecast_target_metadata marks these rows with mode="path", operation="path_step_target", a non-null step, and aggregation="average_step_forecasts_in_evaluation". This records the intended later use without moving model fitting or forecast averaging into this stage.

lag#

macroforecast.feature_engineering.lag(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    lags: Iterable[int] | int = (1,),
    drop_missing: bool = False,
) -> pandas.DataFrame

Input#

Name	Type	Default	Choices
`data`	feature input	required	Canonical macroforecast input.
`columns`	iterable or `None`	all columns	Source columns to lag.
`lags`	int or iterable of ints	`(1,)`	Non-negative lags. `lags=3` expands to `1, 2, 3`; `lags=0` means current values only; pass `(0, 1, 3)` for exact lags including current values.
`drop_missing`	`bool`	`False`	Drop rows with any lag-induced missing values.

Output#

Returns a pandas.DataFrame with columns named {column}_lag{lag}.

mixed_frequency_lags#

macroforecast.feature_engineering.mixed_frequency_lags(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    anchor_dates: Iterable[object] | None = None,
    columns: Iterable[str] | None = None,
    lags: Iterable[int] | int = (0, 1, 2),
    frequency_by_column: Mapping[str, str] | None = None,
    target_frequency: str | None = None,
    anchor_position: str = "date",
    drop_missing: bool = False,
) -> pandas.DataFrame

Builds a lag matrix for MIDAS-style and other mixed-frequency regressions. Unlike lag(), lags are measured in each source column’s native frequency, using metadata["native_frequency_by_column"] from mf.data.set_frequencies() or mf.data.combine(..., frequency="native").

Lookup is period based, not timestamp-string based. A monthly source dated 2020-03-01 and the same source dated 2020-03-31 both map to the March 2020 source period. This prevents month-start/month-end conventions from silently breaking MIDAS lag construction.

Input#

Name	Type	Default	Meaning
`data`	feature input	required	Panel or bundle with a mixed-frequency contract.
`target`	`str` or `None`	input target if available	Column whose non-missing dates define anchors when `anchor_dates` is not supplied.
`anchor_dates`	iterable or `None`	target non-missing dates	Explicit rows to build features for.
`columns`	iterable or `None`	all non-target columns	Source columns to lag.
`lags`	int or iterable	`(0, 1, 2)`	Native-frequency lags. Pass an iterable for exact lags.
`frequency_by_column`	mapping or `None`	metadata map	Override native frequency by source column.
`target_frequency`	`str` or `None`	target metadata/inference	Frequency used when positioning anchor dates.
`anchor_position`	`str`	`"date"`	`"date"`, `"period_start"`, or `"period_end"`.
`drop_missing`	`bool`	`False`	Drop rows with missing lag values.

For FRED-QD-style quarterly targets dated at the first month of the quarter, use target_frequency="quarterly", anchor_position="period_end" to construct monthly lag blocks at the quarter-end month:

X_midas = mf.feature_engineering.mixed_frequency_lags(
    bundle,
    target="GDPC1",
    columns=["PAYEMS", "INDPRO"],
    lags=range(0, 12),
    target_frequency="quarterly",
    anchor_position="period_end",
)

The output columns are named {column}_lag{lag}, which is the grouping format expected by mf.models.midas_almon, mf.models.midas_beta, and mf.models.midas_step.

The returned DataFrame records metadata in two places:

Location	Meaning
`attrs["macroforecast_metadata"]["feature_engineering_mixed_frequency_lags"]`	Target, anchor dates, selected columns, exact lags, frequency map, anchor positioning, lookup calendar, and row counts before/after `drop_missing`.
`attrs["macroforecast_feature_metadata"]`	One row per generated lag feature, including source column, lag, native source frequency, anchor position, and lookup start/end dates.

rolling_mean#

macroforecast.feature_engineering.rolling_mean(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    windows: Iterable[int] | int = (3,),
    min_periods: int | None = None,
    shift: int = 0,
    drop_missing: bool = False,
) -> pandas.DataFrame

Input#

Name	Type	Default	Choices
`columns`	iterable or `None`	all columns	Source columns.
`windows`	positive int or iterable	`(3,)`	Rolling-window lengths.
`min_periods`	positive int or `None`	window length	Minimum observations required for a value.
`shift`	non-negative int	`0`	Shift source series before rolling. Use `1` for strictly lagged rolling means.
`drop_missing`	`bool`	`False`	Drop rows with window-induced missing values.

Output#

Returns a pandas.DataFrame with columns named {column}_roll{window}_mean. When shift > 0, names end in _lag{shift}.

moving_average_ladder#

macroforecast.feature_engineering.moving_average_ladder(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    windows: Iterable[int] | None = None,
    max_window: int = 12,
    min_periods: int | None = None,
    shift: int = 0,
    drop_missing: bool = False,
) -> pandas.DataFrame

Meaning#

moving_average_ladder() builds a stacked block of trailing moving averages at multiple horizons. With the default max_window=12, the implicit windows are 1, 2, 4, 8. Pass windows=(1, 2, 4, 8, 12) or any other explicit sequence when the endpoint should be included.

MARX in macroforecast#

Some papers describe this step as marx_features(P) or Moving Average Rotation of X (MARX). In macroforecast, the direct pandas form is the following explicit moving-average-ladder call:

MARX = mf.feature_engineering.moving_average_ladder(
    X,
    windows=range(1, P + 1),
    shift=1,
)

This means that, for each source series, the feature block contains increasing moving averages of lagged X: one-period lag, two-period average ending at t-1, three-period average ending at t-1, and so on through P. The shift=1 part is important because the MARX block uses lagged predictors, not the contemporaneous realization at the forecast date.

The runner-safe shorthand is marx_step(max_lag=P), used inside feature_spec(..., steps=[...]). It emits the same columns as the direct call, but lets forecasting.run() decide which rows are available for any fitted state through feature_policy.

The original author R snippet builds a VAR lag matrix ordered as lag 1 for all variables, lag 2 for all variables, and so on. Then each lag-l slot for a variable is replaced by the row average of that variable’s lag 1 through lag l columns. The direct call and marx_step(scale_lags=False) match that unscaled calculation. Through feature_matrix(..., specification="MARX", scale_marx=True) or marx_step(scale_lags=True), macroforecast also supports the optional R-code scaling step: z-score the lag matrix first using sample standard deviations, then apply the same increasing-lag averages. In feature_spec() mode, scale_lags=True fits those lag-matrix center/scale values only on the feature-fit panel and reuses them for validation/test rows.

This function is not PCA. It is the moving-average block used before optional factor extraction. Moving-average PCA should be represented as:

ma_block = mf.feature_engineering.moving_average_ladder(panel, windows=(1, 2, 4, 8, 12))
factors = mf.feature_engineering.pca_features(ma_block, fit_policy="expanding")

Keeping the moving-average block and PCA step separate matters because PCA is a fit-based transformation. Running PCA on the full sample before train/test or walk-forward boundaries would leak future information.

Input#

Name	Type	Default	Choices
`columns`	iterable or `None`	all columns	Source columns.
`windows`	iterable of positive ints or `None`	powers of two up to `max_window`	Exact moving-average windows.
`max_window`	positive int	`12`	Used only when `windows=None`; default creates `1, 2, 4, 8`.
`min_periods`	positive int or `None`	window length	Minimum observations required for a value.
`shift`	non-negative int	`0`	Shift source series before rolling. Use `1` for strictly lagged moving averages.
`drop_missing`	bool	`False`	Drop rows with window-induced missing values.

Output#

Returns a pandas.DataFrame with columns named {column}_ma{window}. When shift > 0, names end in _lag{shift}.

scale_features#

macroforecast.feature_engineering.scale_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    method: str = "zscore",
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

Name	Type	Default	Choices
`method`	str	`"zscore"`	`"zscore"`, `"minmax"`, `"robust"`; aliases: `"standard"`, `"standardize"`, `"min_max"`.
`fit_policy`	str	`"expanding"`	`"expanding"` or `"full_sample"`.
`min_train_size`	positive int or `None`	`5`	Minimum complete rows before emitting scaled values.
`drop_missing`	bool	`False`	Drop rows where scaling is unavailable.
`warn_full_sample`	bool	`True`	Warn when `fit_policy="full_sample"` is used.

pca_features#

macroforecast.feature_engineering.pca_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    n_components: int = 1,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    scale: bool = True,
    prefix: str = "pc",
    drop_missing: bool = False,
    random_state: int | None = None,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

pca_features() returns columns named {prefix}1, {prefix}2, and so on. The default fit_policy="expanding" avoids full-sample leakage. Use fit_policy="full_sample" only after the input sample has already been split or for exploratory diagnostics. warn_full_sample=True emits a warning for that choice.

sparse_pca_chen_rohe_features#

macroforecast.feature_engineering.sparse_pca_chen_rohe_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    n_components: int = 4,
    zeta: float = 0.0,
    max_iter: int = 200,
    var_innovations: bool = False,
    prefix: str | None = None,
    min_train_size: int | None = None,
    drop_missing: bool = False,
    random_state: int | None = 0,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

sparse_pca_chen_rohe_features() implements the legacy package’s Chen-Rohe-style Sparse Component Analysis (SCA) routine directly with NumPy. It is not sklearn.decomposition.SparsePCA. The transform centers the selected predictor panel, alternates over the score and loading matrices, and constrains the loading matrix with an L1 budget zeta.

The direct callable fits on all complete rows of the supplied input. It warns by default because that is a full-input fitted transform. For strict walk-forward forecasting, use sparse_pca_chen_rohe_step() inside feature_spec(...); the runner will fit the sparse loading matrix on the feature-fit panel and reuse the fixed loading matrix on validation/test rows.

Input#

Name	Type	Default	Meaning
`columns`	iterable or `None`	all columns	Predictor columns used to fit sparse components.
`n_components`	positive int	`4`	Requested number of sparse components. The resolved number is `min(n_components, complete_rows, selected_columns)`.
`zeta`	non-negative float	`0.0`	L1 loading-budget parameter. `zeta <= 0` uses the resolved component count, matching the legacy default. Smaller values create sparser loadings.
`max_iter`	positive int	`200`	Maximum alternating updates.
`var_innovations`	bool	`False`	If `True`, fit a VAR(1) on the sparse factor scores and return residual sparse macro-finance factors.
`prefix`	string or `None`	`"sca"` or `"scaf"`	Output prefix. Default is `"sca"`; with `var_innovations=True`, default is `"scaf"`.
`min_train_size`	positive int or `None`	`1`, or `3` when `var_innovations=True`	Minimum complete rows before emitting factors.
`drop_missing`	bool	`False`	Drop rows where sparse factors are unavailable.
`random_state`	int or `None`	`0`	Initialization seed for the alternating algorithm.
`warn_full_sample`	bool	`True`	Warn because the direct callable fits on all complete input rows.

Output#

Returns a pandas.DataFrame indexed by date, with columns such as sca1, sca2, or scaf1. Metadata is stored under attrs["macroforecast_metadata"]["feature_engineering_sparse_pca_chen_rohe"]. The stage records selected columns, requested/resolved components, zeta, resolved zeta, iteration count, final objective, VAR-innovation use, fit rows, and fit_policy="full_input_complete_rows".

macroforecast_feature_metadata records one row per factor with operation="sparse_pca_chen_rohe", the source columns, component index, and fit policy.

varimax_features#

macroforecast.feature_engineering.varimax_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    max_iter: int = 50,
    tol: float = 1e-7,
    prefix: str = "varimax",
    min_train_size: int | None = None,
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

varimax_features() rotates a factor-score panel with an orthogonal varimax rotation. It should be applied to factor columns, not raw macro variables. A typical direct use is:

factors = mf.feature_engineering.pca_features(
    processed,
    columns=["INDPRO", "PAYEMS", "UNRATE"],
    n_components=3,
    fit_policy="full_sample",
    warn_full_sample=False,
)
rotated = mf.feature_engineering.varimax_features(factors, warn_full_sample=False)

The direct callable fits the rotation on all complete rows and warns by default. For strict walk-forward forecasting, use:

spec = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.pca_step(name="pc", n_components=3, include=False),
        mf.feature_engineering.varimax_step(name="rot", input="pc"),
    ],
)

Input#

Name	Type	Default	Meaning
`columns`	iterable or `None`	all columns	Factor-score columns to rotate.
`max_iter`	positive int	`50`	Maximum varimax iterations.
`tol`	non-negative float	`1e-7`	Convergence tolerance for the rotation objective.
`prefix`	string	`"varimax"`	Output prefix.
`min_train_size`	positive int or `None`	`1`	Minimum complete rows before emitting rotated factors.
`drop_missing`	bool	`False`	Drop rows where rotated factors are unavailable.
`warn_full_sample`	bool	`True`	Warn because the direct callable fits on all complete input rows.

Output#

Returns a pandas.DataFrame with columns such as varimax1, varimax2, and so on. Metadata is stored under metadata["feature_engineering_varimax"], and macroforecast_feature_metadata records operation="varimax", component index, source factor columns, and fit policy.

sliced_inverse_regression_features#

macroforecast.feature_engineering.sliced_inverse_regression_features(
    data,
    target: str | pandas.Series | None = None,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    n_components: int = 3,
    n_slices: int = 10,
    scaling_policy: str = "scaled_pca",
    prefix: str = "sir",
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

sliced_inverse_regression_features() implements a target-aware SIR factor transform. It aligns the predictor panel with a target series, standardizes predictors, optionally applies predictive column scaling, slices observations by the target distribution, and projects the full panel onto the leading between-slice directions.

The direct callable fits on all target-aligned complete rows in the supplied input. For strict walk-forward forecasting, use sliced_inverse_regression_step() inside feature_spec(...); the runner then fits SIR directions only on the feature-fit panel and applies the fixed directions to validation/test rows.

Input#

Name	Type	Default	Choices
`target`	string, `Series`, or `None`	input target metadata	Target signal used for SIR slicing and optional predictive scaling.
`columns`	iterable or `None`	all non-target columns	Predictor columns.
`n_components`	positive int	`3`	Number of SIR factors to return. If the effective rank is smaller, remaining columns are zero-padded for stable shape.
`n_slices`	int	`10`	Target-distribution slices. Must be at least `2`; internally capped by aligned row count.
`scaling_policy`	string	`"scaled_pca"`	`"scaled_pca"`, `"marginal_R2"`, or `"none"`.
`prefix`	string	`"sir"`	Output prefix. Use `prefix="factor_"` for legacy-style names `factor_1`, `factor_2`, …
`drop_missing`	bool	`False`	Drop rows with missing predictor values after projection.
`warn_full_sample`	bool	`True`	Warn because the direct callable fits on all target-aligned complete rows.

Output#

Returns columns such as sir1, sir2, and so on. Metadata is stored under metadata["feature_engineering_sliced_inverse_regression"] and records the target, predictor columns, requested/resolved components, slices, scaling policy, fit row count, and fit_policy="full_input_target_aligned_rows".

Target-Aware Feature Steps#

feature_spec(..., steps=[...]) also supports target-aware fitted transforms. These steps use the resolved feature_spec() target during .fit(...), store a fixed fit state, and do not look at target values during .transform(...).

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.scale_step(name="scaled", include=False),
        mf.feature_engineering.partial_least_squares_step(
            name="pls",
            input="scaled",
            n_components=2,
            min_train_size=60,
        ),
    ],
)

Target-aware steps require exactly one resolved target column. In practice that means one target and one horizon for the step pipeline. If multiple targets or horizons are requested, fit raises before any model is run.

Step builder or method	Direct callable	Main options	Output
`partial_least_squares_step()`	`partial_least_squares_features()`	`n_components`, `columns`, `min_train_size`, `prefix`	`pls1`, `pls2`, …
`sliced_inverse_regression_step()`	`sliced_inverse_regression_features()`	`n_components`, `n_slices`, `scaling_policy`, `min_train_size`, `prefix`	`sir1`, `sir2`, …
`"variance_selection"`	`variance_selection()`	`n_features`; `columns`; `min_train_size`	Selected input columns.
`"correlation_selection"`	`correlation_selection()`	`n_features`; `columns`; `min_train_size`	Selected input columns.
`"lasso_selection"`	`lasso_selection()`	`n_features`; `alpha`; `min_train_size`	Selected input columns.
`"lasso_path_selection"`	`lasso_path_selection()`	`n_features`; `eps`; `n_alphas`; `normalize_features`; `positive`	Selected input columns.
`"rfe_selection"`	`rfe_selection()`	`n_features`; `estimator`; `step`; `use_cv`; `cv_folds`	Selected input columns.
`"boruta_selection"`	`boruta_selection()`	`n_features`; `n_estimators`; `max_iter`; `alpha`; `include_tentative`	Selected input columns.
`"stability_selection"`	`stability_selection()`	`n_features`; `n_subsamples`; `subsample_fraction`; `pi_threshold`; `base_estimator`	Selected input columns.
`"genetic_selection"`	`genetic_selection()`	`n_features`; `population_size`; `n_generations`; `crossover_prob`; `fitness_estimator`	Selected input columns.

Fit-state metadata records the resolved target column, selected source columns, requested/resolved component or feature count, fit row count, and fit_policy="fixed_fit_panel_target_aligned_rows" for target-dependent methods. For method="variance_selection", no target is used and the fit policy is fixed_fit_panel_columns.

Feature selection deliberately has no generic wrapper step. Use each algorithm name directly inside feature_spec():

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        {"name": "boruta", "method": "boruta_selection", "n_features": 2},
    ],
)

Custom Feature Functions#

custom_features() applies one user feature transform directly. It is useful when the input has already been split or when the transform has no fitted state.

macroforecast.feature_engineering.custom_features(
    data,
    func,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    name: str | None = None,
    **params,
) -> pandas.DataFrame

The callable receives:

func(source: pandas.DataFrame, *, metadata: dict, **params)

source is the selected predictor block. The callable may return a DataFrame, Series, or 1-D/2-D array-like object. The output must have the same row count as source or keep a DatetimeIndex. All output columns are coerced to numeric and validated as a macroforecast panel.

def spread_square(source, *, metadata=None, suffix="sq"):
    column = source.columns[0]
    return pd.DataFrame({f"{column}_{suffix}": source[column] ** 2}, index=source.index)

X_custom = mf.feature_engineering.custom_features(
    processed,
    spread_square,
    columns=["term_spread"],
    name="spread_square",
)

For strict runner use, prefer custom_step() inside feature_spec(...). The runner fits the step on the rows allowed by feature_policy and applies it to validation/test rows without leaking future information.

macroforecast.feature_engineering.custom_step(
    name,
    func=None,
    *,
    input="panel",
    include=True,
    columns=None,
    fit_func=None,
    transform_func=None,
    requires_target=False,
    min_train_size=None,
    prefix=None,
    drop_missing=False,
    **params,
) -> dict

Custom Step Modes#

Mode	Required callable	Fit-time call	Transform-time call
Stateless	`func`	none	`func(source, metadata=..., **params)`
Fitted transformer object	`fit_func` returning object with `.transform()`	`fit_func(source, target=..., metadata=..., **params)`	`state.transform(source)`
Separate fit/transform functions	`fit_func` and `transform_func`	`fit_func(source, target=..., metadata=..., **params)`	`transform_func(source, state=state, metadata=..., **params)`
State-aware callable	`fit_func` and `func`	`fit_func(source, target=..., metadata=..., **params)`	`func(source, state=state, metadata=..., **params)`

Set requires_target=True when the fitting callable needs the resolved feature_spec() target. This requires exactly one target and one horizon. The fitted state metadata stores callable names, selected columns, whether the target was used, fit row count, and output columns.

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.custom_step(
            "my_factor",
            fit_func=my_factor_fit,
            transform_func=my_factor_transform,
            columns=["PAYEMS", "UNRATE", "HOUST"],
            requires_target=True,
            prefix="myf",
            n_components=2,
        ),
    ],
)

group_pca#

macroforecast.feature_engineering.group_pca(
    data,
    *,
    groups: Mapping[str, Iterable[str]],
    metadata: Mapping[str, object] | None = None,
    n_components: int | Mapping[str, int] = 1,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    scale: bool = True,
    prefix: str | None = None,
    drop_missing: bool = False,
    random_state: int | None = None,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

group_pca() extracts PCA factors separately within named groups. It is a generic grouped factor transform. FAVAR-specific slow/fast grouping, observed-policy variables, VAR dynamics, identification, and IRFs belong to later model and evaluation stages.

factors = mf.feature_engineering.group_pca(
    processed,
    groups={
        "real_activity": ["INDPRO", "PAYEMS", "UNRATE"],
        "prices": ["CPIAUCSL", "PPIACO"],
    },
    n_components={"real_activity": 3, "prices": 2},
    fit_policy="expanding",
)

Output columns use the group name as the prefix by default:

real_activity1, real_activity2, real_activity3, prices1, prices2

group_pca_step() provides the same operation inside compose_features().

Supervised And Sparse Component Boundary#

Unsupervised group PCA belongs in feature_engineering because it only uses the predictor panel. PLS and SIR are target-aware and are available as runner-safe feature steps when the resolved feature_spec() target is single. Supervised PCA variants that fit a full predictive model still belong in macroforecast.models. Chen-Rohe sparse component analysis is unsupervised and is available as sparse_pca_chen_rohe_features() / sparse_pca_chen_rohe_step().

maf_features#

macroforecast.feature_engineering.maf_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    max_lag: int = 12,
    lags: Iterable[int] | None = None,
    n_components: int = 2,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    scale: bool = False,
    prefix: str = "maf",
    drop_missing: bool = False,
    random_state: int | None = None,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

maf_features() implements Moving Average Factors. For each selected variable x_k, it builds a variable-specific lag panel:

[x_k(t), x_k(t-1), ..., x_k(t-P)]

Then it extracts PCA components from that lag panel only. This is different from pca_features(), which runs PCA across all selected variables, and different from moving_average_pca_lags(), which runs PCA on a moving-average block.

The MAF implementation is intentionally limited to the construction described in the paper: variable-specific lag panels followed by PCA. The package does not assume undocumented author-code details beyond that description.

Validation status: MARX is tested against the author-supplied R-loop pattern. MAF is tested for the documented variable-specific lag-panel PCA contract, but there is no author-code benchmark in the package yet. If author MAF code becomes available, it should be added as a separate equivalence test before tightening the claim.

MAF = mf.feature_engineering.maf_features(
    X,
    max_lag=12,
    n_components=2,
    fit_policy="expanding",
)

With two input series, this returns columns like:

INDPRO_maf1, INDPRO_maf2, PAYEMS_maf1, PAYEMS_maf2

Input#

Name	Type	Default	Choices
`columns`	iterable or `None`	all columns	Source series for variable-specific lag panels.
`max_lag`	non-negative int	`12`	Used when `lags=None`; builds lags `0` through `max_lag`.
`lags`	iterable of non-negative ints or `None`	`None`	Exact lag set. Overrides `max_lag`.
`n_components`	positive int	`2`	Number of MAF components per source series.
`fit_policy`	str	`"expanding"`	`"expanding"` or `"full_sample"`.
`min_train_size`	positive int or `None`	`max(5, n_components + 1)`	Minimum complete rows before emitting PCA values.
`scale`	bool	`False`	Whether to z-score the lag columns before PCA. Default is `False` because lags of the same variable are already in the same unit.
`prefix`	str	`"maf"`	Component label used in output names.
`drop_missing`	bool	`False`	Drop rows where MAF values are unavailable.
`warn_full_sample`	bool	`True`	Warn when `fit_policy="full_sample"` is used.

Output#

Returns a pandas.DataFrame with one block per source series. Metadata is stored in metadata["feature_engineering_maf"], and macroforecast_feature_metadata records the source series for each component.

feature_matrix#

macroforecast.feature_engineering.feature_matrix(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    specification: str | Iterable[str] = "X",
    columns: Iterable[str] | None = None,
    level_data: feature input | None = None,
    level_columns: Iterable[str] | None = None,
    lags: Iterable[int] | int = (0,),
    max_lag: int = 12,
    n_factors: int = 8,
    n_maf_components: int = 2,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    include_current_factor: bool = True,
    scale_factors: bool = True,
    scale_marx: bool = False,
    scale_maf: bool = False,
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

feature_matrix() builds named combinations used in macro-ML forecasting papers without requiring the user to hand-write compose_features() steps.

Block	Package implementation
`X`	`lag(data, lags=...)` on the supplied, usually preprocessed, panel.
`F`	PCA factors from the supplied panel, then lags of those factors.
`MARX`	`moving_average_ladder(data, windows=range(1, max_lag + 1), shift=1)`; use `marx_step()` for runner-safe windowed construction. With `scale_marx=True`, first z-score the full lag matrix with sample standard deviations and then average lag 1 through lag `l`.
`MAF`	`maf_features(data, max_lag=max_lag, n_components=n_maf_components)`.
`LEVEL` / `H`	`lag(level_data, lags=...)`; requires a separate `level_data` input.

specification can be a string such as "F-X-MARX" or an iterable such as ("F", "X", "MAF"). Output columns are prefixed by block, for example F__F1_lag0, X__INDPRO_lag1, MARX__INDPRO_ma3_lag1, or MAF__INDPRO_maf1.

include_current_factor=True ensures the F block includes current factors even when lags contains only positive values such as range(1, 13). Set it to False when the factor block should exactly follow the supplied lag set.

Paper-Style Specifications#

The paper-style feature families are handled directly by feature_matrix(). The parser accepts -, +, or _ separators.

Specification	Meaning
`"X"`	Lagged predictor panel.
`"F"`	PCA factors from the predictor panel, then factor lags.
`"F-X"`	Factor lags plus lagged predictors.
`"H"` or `"LEVEL"`	Lagged level variables from `level_data`.
`"X-H"`	Lagged predictors plus lagged level variables from `level_data`.
`"F-X-H"` or `"F-X-LEVEL"`	`F-X` plus lagged level variables from `level_data`.
`"F-X-MARX"`	`F-X` plus MARX increasing averages of lagged predictors.
`"F-X-MAF"`	`F-X` plus Moving Average Factors.
`"F-X-H-MARX"`	`F-X-H` plus MARX.
`"F-X-H-MAF"`	`F-X-H` plus MAF.

Specification	Requires `level_data`	Fitted transform	Main output blocks
`"X"`	No	No	`X__...`
`"F"`	No	PCA	`F__...`
`"F-X"`	No	PCA	`F__...`, `X__...`
`"H"` / `"LEVEL"`	Yes	No	`LEVEL__...`
`"X-H"`	Yes	No	`X__...`, `LEVEL__...`
`"F-X-H"`	Yes	PCA	`F__...`, `X__...`, `LEVEL__...`
`"F-X-MARX"`	No	PCA; optional MARX scaling	`F__...`, `X__...`, `MARX__...`
`"F-X-MAF"`	No	PCA and MAF PCA	`F__...`, `X__...`, `MAF__...`
`"F-X-H-MARX"`	Yes	PCA; optional MARX scaling	`F__...`, `X__...`, `LEVEL__...`, `MARX__...`
`"F-X-H-MAF"`	Yes	PCA and MAF PCA	`F__...`, `X__...`, `LEVEL__...`, `MAF__...`

Examples:

FX = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X",
    lags=range(0, 13),
    n_factors=8,
    fit_policy="expanding",
)

FXH = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-H",
    level_data=raw_bundle,
    lags=range(0, 13),
    n_factors=8,
)

FXMARX = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-MARX",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
    scale_marx=False,
)

FXMAF = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-MAF",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
    n_maf_components=2,
)

Input#

Name	Type	Default	Choices
`specification`	string or iterable	`"X"`	Blocks `X`, `F`, `MARX`, `MAF`, `LEVEL`/`H`; strings can use `-`, `+`, or `_` separators.
`columns`	iterable or `None`	all columns	Source columns from the preprocessed panel.
`level_data`	feature input or `None`	`None`	Required when `specification` includes `LEVEL` or `H`.
`level_columns`	iterable or `None`	`columns`	Level-data columns to include.
`lags`	int or iterable	`(0,)`	Lag set for `X`, `F`, and `LEVEL`.
`max_lag`	positive int	`12`	Maximum lag for `MARX` and `MAF`.
`n_factors`	positive int	`8`	Number of PCA factors for `F`.
`n_maf_components`	positive int	`2`	MAF components per source variable.
`fit_policy`	str	`"expanding"`	`"expanding"` or `"full_sample"` for fitted transforms.
`min_train_size`	positive int or `None`	transform-specific	Minimum complete rows for fitted transforms.
`include_current_factor`	bool	`True`	Force lag 0 in the `F` block even when `lags` excludes it.
`scale_factors`	bool	`True`	Z-score variables before PCA in the `F` block.
`scale_marx`	bool	`False`	Match the optional author R-code scaling step for `MARX`.
`scale_maf`	bool	`False`	Z-score variable-specific MAF lag panels before PCA.
`drop_missing`	bool	`False`	Drop rows with missing feature values.
`warn_full_sample`	bool	`True`	Warn when any fitted block uses `fit_policy="full_sample"`.

Z = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-MARX",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
    fit_policy="expanding",
    drop_missing=True,
)

Use level_data= when the combination includes level variables:

Z = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-LEVEL",
    level_data=raw_bundle,
    lags=range(0, 13),
)

compose_features#

macroforecast.feature_engineering.compose_features(
    data,
    steps,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    include_original: bool = False,
    drop_missing: bool = False,
) -> pandas.DataFrame

steps is an ordered list of mappings. Each step has:

Key	Meaning
`name`	Step name. Later steps can reference this name.
`method`	One of `"lag"`, `"rolling_mean"`, `"moving_average_ladder"`, `"marx"`, `"transform"`, `"seasonal_lag"`, `"season_dummy"`, `"fourier"`, `"polynomial"`, `"interaction"`, `"maf"`, `"scale"`, `"pca"`, `"sparse_pca_chen_rohe"`, `"varimax"`, `"group_pca"`, `"time"`.
`input`	`"panel"` by default, or a previous step name.
`include`	Whether this step’s output is included in final `X`; default `True`.
other keys	Method-specific parameters such as `lags`, `windows`, `n_components`, `fit_policy`, `warn_full_sample`, or `columns`.

Examples:

# PCA, then lags of the PCA factors.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "pc", "method": "pca", "columns": ["PAYEMS", "INDPRO"], "n_components": 2, "include": False},
        {"name": "pc_lags", "method": "lag", "input": "pc", "lags": [1, 2, 3]},
    ],
)

# Lags first, then PCA on the lag block.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "lag_block", "method": "lag", "lags": [0, 1, 2, 3], "include": False},
        {"name": "lag_pc", "method": "pca", "input": "lag_block", "n_components": 4},
    ],
)

# Moving-average ladder, PCA, then lags of the factor.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "ma", "method": "moving_average_ladder", "windows": [1, 2, 4, 8, 12], "include": False},
        {"name": "ma_pc", "method": "pca", "input": "ma", "n_components": 4, "include": False},
        {"name": "ma_pc_lags", "method": "lag", "input": "ma_pc", "lags": [1, 2]},
    ],
)

# MAF as a direct block inside a composed feature matrix.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "maf", "method": "maf", "max_lag": 12, "n_components": 2},
    ],
)

# MARX shorthand: increasing averages of lagged predictors.
X = mf.feature_engineering.compose_features(
    processed,
    [
        mf.feature_engineering.marx_step(max_lag=12, scale_lags=False),
    ],
)

# Extra deterministic transforms can be composed the same way.
X = mf.feature_engineering.compose_features(
    processed,
    [
        mf.feature_engineering.transform_step(name="log_ip", transform="log", columns=["INDPRO"], include=False),
        mf.feature_engineering.lag_step(name="log_ip_lag", input="log_ip", lags=[1, 2, 3]),
        mf.feature_engineering.interaction_step(name="cross", columns=["PAYEMS", "HOUST"]),
    ],
)

time_features#

macroforecast.feature_engineering.time_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    trend: bool = True,
    month: bool = False,
    quarter: bool = False,
    year: bool = False,
) -> pandas.DataFrame

Input And Output#

Option	Output columns
`trend=True`	`trend`, starting at `1.0`.
`month=True`	`month_01` through `month_12`.
`quarter=True`	`quarter_1` through `quarter_4`.
`year=True`	`year`.

Additional Transform Helpers#

These helpers are feature-engineering transforms, not preprocessing t-codes. Use them when the model feature set needs extra ML-oriented columns after the canonical panel has already been cleaned.

Function	Main options	Output
`transform_features(data, transform=...)`	`transform`: `"log"`, `"diff"`, `"log_diff"`, `"pct_change"`, `"cumsum"`; `periods`; `columns`; `drop_missing`	`{column}_{transform}` columns.
`log_features`, `diff_features`, `log_diff_features`, `pct_change_features`, `cumsum_features`	Thin named wrappers around `transform_features`.	Same as above.
`seasonal_lag(data, season_length=12, lags=...)`	Seasonal step length and seasonal lag count.	`{column}_seasonlag{actual_lag}`.
`season_dummy(data, frequency="auto")`	`"month"` or `"quarter"`, optional `drop_first`.	Month or quarter dummies.
`fourier_features(data, period=12, order=2)`	Seasonal period and harmonic order.	Sine/cosine seasonal terms.
`polynomial_features(data, degree=2)`	`degree`, `include_bias`, `interaction_only`.	Named polynomial expansion columns.
`interaction_features(data, order=2)`	Exact-order pure interaction expansion without lower-order terms or powers.	`interaction_{col1}__{col2}` style columns.
`hp_filter_features(data, lamb=129600.0)`	HP `lambda`, `component`: `"cycle"`, `"trend"`, or `"both"`; `warn_full_sample=True`.	HP cycle/trend columns.
`hamilton_filter_features(data, h=8, p=4)`	Hamilton horizon `h`, regressor count `p`, `component`, `fit_policy`: `"expanding"` or `"full_sample"`, and missing policy.	`{column}_hamilton_cycle` and/or `{column}_hamilton_trend`.
`savitzky_golay_features(data, window_length=5, polyorder=2)`	Centered filter window, polynomial order, derivative; `warn_full_sample=True`.	Smoothed columns.
`wavelet_features(data, n_levels=3)`	Causal rolling approximation/detail levels; `wavelet` name is recorded for compatibility.	`{column}_wA{level}`, `{column}_wD{level}`.
`adaptive_ma_rf_features(data, sided="two", sample_fraction=0.6)`	Feature wrapper around `filters.albama`; `sided="two"` warns, `sided="one"` uses expanding one-sided fits. Full `AlbaMAResult` objects are stored in `attrs["macroforecast_feature_weight_results"]`.	`{column}_albama`.
`asymmetric_trim_features(data)`	Sorts each row’s selected columns in ascending order.	`rank_1`, `rank_2`, …
`partial_least_squares_features(data, target=..., n_components=...)`	Target-aware PLSRegression scores; warns by default.	`pls1`, `pls2`, …
`dfm_features(data, n_factors=...)`	Static DFM approximation by standardized PCA; warns by default.	`dfm1`, `dfm2`, …
`variance_selection(data, n_features=...)`	Select by sample variance; no target required.	Subset of original columns.
`correlation_selection(data, target=..., n_features=...)`	Select by absolute target correlation.	Subset of original columns.
`lasso_selection(data, target=..., alpha=...)`	Select by absolute lasso coefficient.	Subset of original columns.
`lasso_path_selection(data, target=..., eps=..., n_alphas=...)`	Select by lasso-path inclusion frequency.	Subset of original columns.
`rfe_selection(data, target=..., estimator=...)`	Select by recursive feature elimination.	Subset of original columns.
`boruta_selection(data, target=..., n_estimators=..., max_iter=...)`	Select by Boruta-style shadow-feature tests.	Subset of original columns.
`stability_selection(data, target=..., n_subsamples=..., pi_threshold=...)`	Select by repeated sparse-model subsampling frequency.	Subset of original columns.
`genetic_selection(data, target=..., population_size=..., n_generations=...)`	Select by genetic subset search.	Subset of original columns.
`random_projection_features(data, n_components=...)`	Gaussian random projection; `warn_full_sample=True` by default.	`rp1`, `rp2`, …
`nystroem_features(data, kernel="rbf", n_components=...)`	Kernel approximation settings; `warn_full_sample=True` by default.	`nys1`, `nys2`, …

Filter-Backed Features#

macroforecast.filters owns one-series filter and smoother callables. macroforecast.feature_engineering owns panel wrappers that turn those outputs into feature columns:

Feature wrapper	Direct filter
`hp_filter_features()`	`filters.hp_filter()`
`hamilton_filter_features()`	`filters.hamilton_filter()`
`savitzky_golay_features()`	`filters.savitzky_golay()`
`wavelet_features()`	`filters.wavelet_filter()`
`adaptive_ma_rf_features()`	`filters.albama()`

For AlbaMA method details, R-code alignment, and weight extraction, see Filters. adaptive_ma_rf_features() stores full AlbaMAResult objects in attrs["macroforecast_feature_weight_results"] so feature_analysis.effective_window() and feature_analysis.recent_weight_share() can inspect learned weights.

features = mf.feature_engineering.adaptive_ma_rf_features(
    processed.panel,
    columns=["CPIAUCSL", "INDPRO"],
    sided="one",
)

albama_results = features.attrs["macroforecast_feature_weight_results"]

random_projection_features() and nystroem_features() fit on complete rows of the provided input and warn by default because the direct helpers are full-input fitted helpers. For strict origin-by-origin forecasting, use random_projection_step() and nystroem_step() inside feature_spec(); the runner fits the projection/kernel state on the feature-fit panel and reuses the fixed state for validation/test rows.

hamilton_filter_features() follows Hamilton’s regression form: y[t+h] = a + b_0 y[t] + ... + b_{p-1} y[t-p+1] + e[t+h]. The fitted value is stored as the trend and the residual as the cycle, both labeled at t+h. Defaults h=8, p=4 match the common quarterly setting; for monthly panels, h=24, p=12 is the usual analogue. The default fit_policy="expanding" estimates each row with only earlier completed Hamilton-regression rows. fit_policy="full_sample" reproduces the ordinary in-sample filter style and warns by default because it can use future information relative to a forecasting origin.

feature_spec#

macroforecast.feature_engineering.feature_spec(
    *,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    predictors: Literal["all"] | Iterable[str] | None = None,
    lags: Iterable[int] | int | None = (0, 1),
    target_lags: Iterable[int] | int | None = None,
    rolling_windows: Iterable[int] | int | None = None,
    rolling_min_periods: int | None = None,
    add_time: bool = False,
    time_trend: bool = True,
    time_month: bool = False,
    time_quarter: bool = False,
    time_year: bool = False,
    pca_components: int | None = None,
    pca_columns: Iterable[str] | None = None,
    pca_scale: bool = True,
    pca_prefix: str = "pc",
    steps: Iterable[Mapping[str, object]] | None = None,
    feature_steps: Iterable[Mapping[str, object]] | None = None,
    include_original: bool = False,
    target_transform: str = "level",
    target_mode: str = "direct",
    drop_missing: bool = True,
    metadata: Mapping[str, object] | None = None,
) -> FeatureSpec

feature_spec() is the runner-safe feature contract. It is fitted by forecasting.run() according to feature_policy, so stateful choices such as scaling, PCA, grouped PCA, and MAF are estimated on the allowed training/reference panel and reused when transforming validation/test rows.

Input#

Name	Type	Default	Meaning
`target`, `targets`	string/iterable or `None`	from input metadata	Target column or target columns.
`horizon`, `horizons`	positive int/iterable or `None`	from input, then `(1,)`	Forecast horizon choices.
`predictors`	`"all"`, iterable, or `None`	from input, then all non-target columns	Predictor columns.
`lags`	int, iterable, or `None`	`(0, 1)`	Predictor lags. `0` includes the current predictor value at the forecast origin. `None` disables ordinary predictor lags.
`target_lags`	int, iterable, or `None`	`None`	Explicit autoregressive target lags added to `X` while keeping target columns out of `predictors`. Use this for AR-X and recursive runner designs.
`rolling_windows`	positive int/iterable or `None`	`None`	Optional rolling means.
`rolling_min_periods`	positive int or `None`	window length	Minimum observations for rolling means.
`add_time`	bool	`False`	Add deterministic date features.
`time_trend`, `time_month`, `time_quarter`, `time_year`	bool	see signature	Which deterministic date features to add.
`pca_components`	positive int or `None`	`None`	Fit PCA on the allowed feature-fit panel and append fixed-loadings components.
`pca_columns`	iterable or `None`	predictors	Columns used for PCA.
`pca_scale`	bool	`True`	Standardize PCA inputs using the feature-fit panel.
`pca_prefix`	string	`"pc"`	PCA output prefix.
`steps`, `feature_steps`	iterable of step mappings or `None`	`None`	Fit-aware feature-step pipeline. Use public step builders for deterministic/fitted transforms: `lag_step()`, `rolling_step()`, `moving_average_step()`, `marx_step()`, `transform_step()`, `seasonal_lag_step()`, `season_dummy_step()`, `fourier_step()`, `time_step()`, `polynomial_step()`, `interaction_step()`, `scale_step()`, `pca_step()`, `sparse_pca_chen_rohe_step()`, `varimax_step()`, `group_pca_step()`, `maf_step()`, `hamilton_step()`, `random_projection_step()`, `nystroem_step()`, `partial_least_squares_step()`, and `sliced_inverse_regression_step()`. For selection, pass step mappings with `method` equal to one of `variance_selection`, `correlation_selection`, `lasso_selection`, `lasso_path_selection`, `rfe_selection`, `boruta_selection`, `stability_selection`, or `genetic_selection`. `steps` and `feature_steps` are aliases; provide only one.
`include_original`	bool	`False`	Include the original predictor panel as part of `X` when using `steps`.
`target_transform`	string	`"level"`	Same target choices as `direct_target()`.
`target_mode`	string	`"direct"`	`"direct"` for horizon-level targets; `"path"` for step-level path targets.
`drop_missing`	bool	`True`	Drop rows with missing selected `X` or `y` during fit rows. Test rows are transformed without dropping by the runner.
`metadata`	mapping or `None`	`None`	User metadata stored inside the feature spec record.

When steps are supplied, they replace the shortcut predictor options rolling_windows, add_time, and pca_components; use the corresponding step builders instead. The default lags=(0, 1) shortcut is also not used in step mode unless you explicitly add a lag_step(). target_lags is not a predictor shortcut; it is appended as a separate autoregressive target block after the step pipeline, so paper-style designs can combine steps=[...] with target_lags=range(0, 13).

Output#

Returns FeatureSpec. Important methods:

Method	Output	Meaning
`.fit(data)`	`FittedFeatureBuilder`	Fits reusable feature state for PCA/scaling/grouped PCA/MAF steps on the supplied panel.
`.fit_transform(data)`	`FeatureSet`	Fits and transforms the same panel.
`.to_dict()`	`dict`	JSON-ready feature choices for result metadata.
`.to_metadata()`	`dict`	Compact runner metadata.

Fit-Aware Step Pipeline#

Step pipelines let the runner refit feature transformations inside each forecasting window:

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "HOUST", "S&P 500"],
    steps=[
        mf.feature_engineering.scale_step(name="scaled", include=False),
        mf.feature_engineering.pca_step(
            name="pc",
            input="scaled",
            n_components=3,
            min_train_size=60,
            include=False,
        ),
        mf.feature_engineering.lag_step(name="pc_lag", input="pc", lags=range(0, 13)),
    ],
)

Each step has a name, method, input, and include flag. input="panel" reads the original predictor panel; input="<step name>" reads a prior step; input="target_panel" reads the resolved target columns. The last input is explicitly opt-in: predictors still reject target overlap, but paper designs that require target-derived feature blocks such as MARX_y or MAF_y can construct them without treating the target as an ordinary predictor. If include=False, the step is an intermediate fitted transformation and its output is not included in the final X, but its metadata is still recorded.

Example target-derived MARX block:

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=3,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.marx_step(name="MARX_X", max_lag=12),
        mf.feature_engineering.marx_step(
            name="MARX_y",
            input="target_panel",
            columns=["INDPRO"],
            max_lag=12,
        ),
    ],
    target_lags=range(0, 13),
)

Stateful step builders are interpreted as fixed-fit transformations inside FeatureSpec: the runner’s feature_policy determines which rows are used to fit the step. Any fit_policy value inherited from reusable step builders is ignored in feature_spec() mode because the runner owns the temporal fit policy.

Step builder	Runner-safe behavior
`lag_step()`	Deterministic lag transform.
`rolling_step()`	Deterministic rolling mean transform.
`moving_average_step()`	Deterministic moving-average ladder.
`marx_step()`	MARX increasing lag averages; with `scale_lags=True`, fits lag-matrix center/scale on the feature-fit panel and reuses fixed parameters.
`transform_step()`	Deterministic column transform: `log`, `diff`, `log_diff`, `pct_change`, or `cumsum`.
`seasonal_lag_step()`	Deterministic seasonal lag such as 12-month or 4-quarter lag blocks.
`season_dummy_step()`	Deterministic month or quarter date dummies from the index.
`fourier_step()`	Deterministic Fourier seasonal terms from the index.
`time_step()`	Deterministic trend, month, quarter, and year columns from the index.
`polynomial_step()`	Deterministic polynomial expansion.
`interaction_step()`	Deterministic pure interaction terms.
`scale_step()`	Fits center/scale on the feature-fit panel, then applies fixed parameters.
`pca_step()`	Fits PCA loadings on the feature-fit panel, then applies fixed loadings.
`sparse_pca_chen_rohe_step()`	Fits Chen-Rohe sparse loadings on the feature-fit panel, then applies fixed loadings; optional `var_innovations=True` fits the VAR(1) residual mapping on the same feature-fit panel.
`varimax_step()`	Fits an orthogonal rotation on factor-score columns from the feature-fit panel, then applies the fixed rotation.
`group_pca_step()`	Fits separate PCA states inside named groups.
`maf_step()`	Fits variable-specific lag-panel PCA states for Moving Average Factors.
`hamilton_step()`	Fits Hamilton-regression beta on the feature-fit panel, then applies fixed beta to train/validation/test rows.
`random_projection_step()`	Fits a Gaussian random-projection transformer on the feature-fit panel and applies fixed components.
`nystroem_step()`	Fits Nystroem kernel-approximation landmarks on the feature-fit panel and applies fixed components.
`partial_least_squares_step()`	Fits PLS components against the single resolved target on the feature-fit panel and applies fixed weights.
`sliced_inverse_regression_step()`	Fits target-sliced directions on the feature-fit panel and applies fixed directions.
`variance_selection`, `correlation_selection`, `lasso_selection`, `lasso_path_selection`, `rfe_selection`, `boruta_selection`, `stability_selection`, `genetic_selection`	Select columns on the feature-fit panel and reuse the selected columns. Use these as `method` strings in step mappings, not as step-builder functions.

In feature_spec() mode, hamilton_step() ignores the reusable step’s fit_policy argument because the runner’s feature_policy owns the allowed fit rows. The fitted state records fit_policy="fixed_fit_panel". Runner-safe Hamilton currently requires missing="drop"; impute missing values in preprocessing before using it. The direct helper and compose_features() still support missing="interpolate" for one-shot exploratory construction.

Direct pandas functions and runner-safe step builders are intentionally paired:

Direct function	Runner-safe step	Fit state?	Typical use
`lag()`	`lag_step()`	No	Add current/lagged predictors.
`rolling_mean()`	`rolling_step()`	No	Add trailing rolling means.
`moving_average_ladder()`	`moving_average_step()`	No	Add multi-scale moving-average blocks.
`moving_average_ladder(..., shift=1)` / `feature_matrix(..., "MARX")`	`marx_step()`	Only when `scale_lags=True`	Add MARX increasing lag averages.
`transform_features()` and wrappers such as `log_features()` / `diff_features()`	`transform_step()`	No	Add ML-side transforms after preprocessing.
`seasonal_lag()`	`seasonal_lag_step()`	No	Add seasonal lag blocks.
`season_dummy()`	`season_dummy_step()`	No	Add calendar dummies.
`fourier_features()`	`fourier_step()`	No	Add deterministic seasonal Fourier terms.
`time_features()`	`time_step()` or `feature_spec(add_time=True, ...)`	No	Add deterministic trend/month/quarter/year terms.
`polynomial_features()`	`polynomial_step()`	No	Add nonlinear expansions.
`interaction_features()`	`interaction_step()`	No	Add cross-products.
`scale_features()`	`scale_step()`	Yes	Fit center/scale on allowed rows.
`pca_features()`	`pca_step()`	Yes	Fit PCA loadings on allowed rows.
`sparse_pca_chen_rohe_features()`	`sparse_pca_chen_rohe_step()`	Yes	Fit Chen-Rohe sparse component loadings on allowed rows.
`varimax_features()`	`varimax_step()`	Yes	Fit orthogonal factor rotation on allowed rows.
`group_pca()`	`group_pca_step()`	Yes	Fit separate PCA states by group.
`maf_features()`	`maf_step()`	Yes	Fit variable-specific lag-panel PCA states.
`hamilton_filter_features()`	`hamilton_step()`	Yes	Fit Hamilton-regression beta on allowed rows, then apply fixed beta.
`random_projection_features()`	`random_projection_step()`	Yes	Fit Gaussian random-projection state on allowed rows.
`nystroem_features()`	`nystroem_step()`	Yes	Fit Nystroem kernel landmarks on allowed rows.
`partial_least_squares_features()`	`partial_least_squares_step()`	Yes	Fit PLS scores against the resolved target on allowed rows.
`sliced_inverse_regression_features()`	`sliced_inverse_regression_step()`	Yes	Fit SIR directions against the resolved target on allowed rows.
`variance_selection()`	`{"method": "variance_selection", ...}`	Yes	Select columns by sample variance on allowed rows; no target required.
`correlation_selection()`	`{"method": "correlation_selection", ...}`	Yes	Select columns by target correlation on allowed rows.
`lasso_selection()`	`{"method": "lasso_selection", ...}`	Yes	Select columns by lasso coefficient magnitude on allowed rows.
`lasso_path_selection()`	`{"method": "lasso_path_selection", ...}`	Yes	Select columns by lasso-path inclusion frequency on allowed rows.
`rfe_selection()`	`{"method": "rfe_selection", ...}`	Yes	Select columns by recursive feature elimination on allowed rows.
`boruta_selection()`	`{"method": "boruta_selection", ...}`	Yes	Select columns by Boruta-style shadow-feature tests on allowed rows.
`stability_selection()`	`{"method": "stability_selection", ...}`	Yes	Select columns by sparse-model subsampling frequency on allowed rows.
`genetic_selection()`	`{"method": "genetic_selection", ...}`	Yes	Select columns by genetic subset search on allowed rows.

The remaining helpers remain callable but are intentionally not accepted as FeatureSpec step methods yet:

Helper	Why not a runner-safe step yet
`mixed_frequency_lags()`	It changes the date anchor and native-frequency lookup calendar. This belongs with mixed-frequency data/model design, not ordinary same-index feature steps.
`hp_filter_features()`	HP filtering is two-sided on the supplied sample. It remains direct-only and warns by default; use `hamilton_step()` for a runner-safe trend/cycle filter.
`savitzky_golay_features()`	The smoother uses a centered local window over the supplied sample. It remains direct-only and warns by default; use trailing `rolling_step()` when a past-only smoother is needed.

build_features() remains broader for one-shot construction, including feature_specification="F-X-MARX" and feature_specification="F-X-MAF". Use it when you want to materialize a complete FeatureSet first. Use feature_spec(..., steps=...) when the feature transformations themselves must be refit inside forecasting.run() according to the window design.

build_features#

macroforecast.feature_engineering.build_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    predictors: Literal["all"] | Iterable[str] | None = None,
    lags: Iterable[int] | int = (0, 1),
    target_lags: Iterable[int] | int | None = None,
    rolling_windows: Iterable[int] | int | None = None,
    rolling_min_periods: int | None = None,
    add_time: bool = False,
    time_trend: bool = True,
    time_month: bool = False,
    time_quarter: bool = False,
    time_year: bool = False,
    feature_steps: Iterable[Mapping[str, object]] | None = None,
    feature_specification: str | Iterable[str] | None = None,
    include_original: bool = False,
    level_data: feature input | None = None,
    max_lag: int = 12,
    n_factors: int = 8,
    n_maf_components: int = 2,
    feature_fit_policy: str = "expanding",
    feature_min_train_size: int | None = None,
    feature_warn_full_sample: bool = True,
    include_current_factor: bool = True,
    scale_factors: bool = True,
    scale_marx: bool = False,
    scale_maf: bool = False,
    target_transform: str = "level",
    target_mode: str = "direct",
    drop_missing: bool = True,
) -> FeatureSet

Input#

Name	Type	Default	Choices
`target`, `targets`	string/iterable or `None`	from `DataSpec`/`PreprocessedData`	Target column choices. One of them is required if the input does not already define targets.
`horizon`, `horizons`	positive int/iterable or `None`	from input, then `(1,)`	Forecast horizons.
`predictors`	`"all"`, iterable, or `None`	from input, then all non-target columns	Predictor columns. Target columns are rejected as predictors.
`lags`	int or iterable	`(0, 1)`	Current value plus lag one by default. `lags=3` means `1, 2, 3`; `lags=0` means current values only; pass exact iterables when needed.
`target_lags`	int, iterable, or `None`	`None`	Add autoregressive target-lag columns to `X` while still keeping targets out of ordinary predictors. `target_lags=range(0, 13)` means current target transform plus 12 past lags.
`rolling_windows`	positive int/iterable or `None`	`None`	Add rolling-mean features for each window.
`rolling_min_periods`	positive int or `None`	window length	Passed to `rolling_mean()`.
`add_time`	bool	`False`	Add deterministic date features.
`time_trend`, `time_month`, `time_quarter`, `time_year`	bool	`True`, `False`, `False`, `False`	Which date features to include when `add_time=True`.
`feature_steps`	iterable of mappings or `None`	`None`	If supplied, use `compose_features()` instead of the simple lag/rolling/time defaults. Mutually exclusive with `feature_specification`.
`feature_specification`	string/iterable or `None`	`None`	If supplied, use `feature_matrix()` blocks such as `"F-X-MARX"` or `"F-X-MAF"` instead of the simple lag/rolling/time defaults.
`include_original`	bool	`False`	Include original predictors when `feature_steps` is supplied.
`level_data`	feature input or `None`	`None`	Passed to `feature_matrix()` when `feature_specification` includes `LEVEL`/`H`.
`max_lag`	positive int	`12`	Passed to `feature_matrix()` for `MARX` and `MAF`.
`n_factors`	positive int	`8`	Number of `F` factors when `feature_specification` uses `F`.
`n_maf_components`	positive int	`2`	MAF components per source variable when `feature_specification` uses `MAF`.
`feature_fit_policy`	str	`"expanding"`	Fit policy passed to `feature_matrix()` fitted transforms.
`feature_min_train_size`	positive int or `None`	`None`	Minimum complete rows passed to `feature_matrix()` fitted transforms.
`feature_warn_full_sample`	bool	`True`	Warn when block-based fitted transforms use `feature_fit_policy="full_sample"`.
`include_current_factor`	bool	`True`	Force lag 0 for the `F` block.
`scale_factors`	bool	`True`	Scale variables before `F` PCA.
`scale_marx`	bool	`False`	Apply optional author R-code lag-matrix scaling for `MARX`.
`scale_maf`	bool	`False`	Scale MAF lag panels before PCA.
`target_transform`	str	`"level"`	Same choices as `direct_target(transform=...)`.
`target_mode`	str	`"direct"`	`"direct"` returns horizon-level target columns. `"path"` returns step-level target columns from `path_targets()`.
`drop_missing`	bool	`True`	Drop rows where any selected `X` or `y` column is missing.

target_mode="path" is a target-construction shortcut only. It does not fit or forecast one model per step; that belongs in the model stage. It also does not average forecasts; horizon-level forecast averaging belongs in evaluation. The returned FeatureSet.y contains step columns, and metadata records which step columns belong to each requested horizon.

Output#

Returns FeatureSet.

Field	Type	Meaning
`X`	`pandas.DataFrame`	Predictor matrix aligned on forecast origin dates.
`y`	`pandas.DataFrame`	Direct horizon targets or path step targets aligned to `X`.
`metadata`	`dict`	Input metadata plus a `feature_engineering` stage.
`feature_metadata`	`pandas.DataFrame`	Generated-feature provenance. Core columns are `feature`, `step`, `block`, `operation`, `source`, `parameter`, `lag`, `window`, `component`, `fit_policy`, `inputs`, and `included`.
`target_metadata`	`pandas.DataFrame`	Target-column provenance. Core columns are `target_column`, `source`, `horizon`, `step`, `mode`, `transform`, `operation`, `formula`, `aggregation`, and `used_for_horizons`.
`target`, `targets`, `horizons`, `predictors`	scalar/tuple fields	Resolved study choices.

FeatureSet supports tuple unpacking:

X, y, metadata = features

Metadata#

metadata["feature_engineering"] records:

Key	Meaning
`input_panel`	Shape, date range, columns, missing count, and inferred index frequency.
`predictors`, `targets`, `horizons`	Resolved study choices.
`target_transform`	Target formula choice.
`target_mode`	`"direct"` or `"path"`.
`path_target_columns_by_horizon`	Step columns to average later when `target_mode="path"`.
`lags`, `target_lags`, `rolling_windows`, `rolling_min_periods`	Predictor and autoregressive target-lag construction choices.
`feature_specification`	`feature_matrix()` block specification when used.
`feature_matrix`	`feature_matrix()` options when block-based features are used.
`feature_steps`	Ordered composition steps when `compose_features()` is used through `build_features()`.
`time`	Deterministic date-feature choices.
`drop_missing`	Whether rows with missing `X` or `y` values were removed.
`output`	Final row count, feature count, target count, and sample dates.

Feature Metadata#

Each feature-producing function attaches macroforecast_feature_metadata to the returned DataFrame. build_features() exposes the same table as FeatureSet.feature_metadata.

The table is normalized through a single schema helper. The first columns are always:

Column	Meaning
`feature`	Generated feature column name.
`step`	Producing step name when created through `compose_features()` or `feature_spec(..., steps=...)`; otherwise empty.
`block`	Paper-style block such as `X`, `F`, `MARX`, `MAF`, or `LEVEL` when created by `feature_matrix()`.
`operation`	Operation family, for example `lag`, `rolling_mean`, `marx`, `pca`, `pct_change`, `season_dummy`, or `interaction`.
`source`	Main source column, source group, or `date` for calendar features.
`parameter`	Compact parameter string such as `lag=1`, `window=3`, `component=1`, or `periods=1`.
`lag`, `window`, `component`	Parsed numeric fields when the feature name/operation carries them.
`fit_policy`	Fitting policy for stateful transforms. In `feature_spec()` this is `fixed_fit_panel` for fit-aware steps.
`inputs`	Comma-separated source columns used by the feature.
`included`	`True` when the feature is included in final `X`; `False` for intermediate pipeline steps.

Extra columns are preserved after the standard columns. For example, mixed_frequency_lags() adds source-frequency and lookup-calendar fields. The metadata frame also carries attrs["macroforecast_metadata_schema"] = {"kind": "feature_metadata", "version": 1, ...}.

features = mf.feature_engineering.build_features(
    processed,
    feature_specification="F-X-MARX",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
)

features.feature_metadata.loc[
    features.feature_metadata["feature"] == "MARX__INDPRO_ma3_lag1",
    ["block", "operation", "source", "window", "lag"],
]

This records that MARX__INDPRO_ma3_lag1 came from the MARX block, source series INDPRO, window 3, and lag 1. Intermediate compose_features() steps are also recorded; the included column marks whether a step output is part of the final X matrix.

Target Metadata#

Target-producing functions attach macroforecast_target_metadata to the returned target frame. build_features() exposes the same table as FeatureSet.target_metadata.

features = mf.feature_engineering.build_features(
    processed,
    target="INDPRO",
    horizons=[1, 3, 6],
    target_transform="growth",
)

features.target_metadata.loc[
    features.target_metadata["target_column"] == "INDPRO_growth_h3",
    ["source", "horizon", "mode", "transform", "formula"],
]

For direct targets, horizon is the forecast horizon and step is empty. For path targets, step identifies the future step and used_for_horizons records which requested horizons later consume that step forecast. This keeps the target construction, model-stage step fitting, and evaluation-stage averaging separate while preserving the contract in metadata.

Error Conditions#

Condition	Result
Input is not a canonical panel-like object	`TypeError`.
Target is missing and input has no target metadata	`ValueError`.
Target/predictor names are not in the panel	`ValueError`.
Predictors include target columns	`ValueError`.
Horizons, windows, or min periods are non-positive	`ValueError`.
Feature construction leaves no aligned rows	`ValueError`.