macroforecast.feature_engineering#

Back to reference

Purpose#

macroforecast.feature_engineering is the direct pandas surface for building forecast targets and model-ready feature matrices. It accepts the same direct Python inputs used by previous stages: PreprocessedData, DataSpec, DataBundle, (panel, metadata), or a canonical pandas.DataFrame.

For strict windowed forecasting, use feature_spec(...). The spec is fitted by macroforecast.forecasting.run(...) inside each train window and then transformed for the matching test rows. Individual functions such as lag(), rolling_mean(), pca_features(), and feature_matrix() remain callable one-shot helpers; runner composition belongs in macroforecast.forecasting.

The preferred flow is:

import macroforecast as mf

bundle = mf.data.load_fred_md()
data_spec = mf.data.spec(bundle, target="INDPRO", horizons=[1, 3, 6], predictors="all")
processed = mf.preprocessing.reprocess(data_spec)

features = mf.feature_engineering.build_features(
    processed,
    lags=(0, 1, 2, 3),
    rolling_windows=(3, 6),
    add_time=True,
)

X = features.X
y = features.y
metadata = features.metadata

build_features() emits a warning when it receives a canonical panel that does not carry metadata["preprocessing"]. This is allowed, but the default package workflow is data -> preprocessing -> feature engineering.

Common callable examples:

# Direct horizon targets: y[t+h].
y_direct = mf.feature_engineering.direct_target(processed, target="INDPRO", horizons=[1, 3, 6])

# Direct average target: one y column per requested horizon.
y_avg = mf.feature_engineering.average_target(processed, target="INDPRO", horizon=12, transform="growth")

# Path target: one y column per future step. Model fits/forecasts each step;
# evaluation averages the step forecasts.
y_path = mf.feature_engineering.path_targets(processed, target="INDPRO", horizon=12, transform="growth")

# Simple lagged predictors.
X_lag = mf.feature_engineering.lag(processed, columns=["PAYEMS", "INDPRO"], lags=range(0, 13))

Public Functions#

Group

Functions

Purpose

Target construction

direct_target, average_target, path_targets, forward_average_target

Build direct, average, or step-path forecast targets. forward_average_target is the Albacore/assemblage-named reusable helper.

Basic predictor transforms

lag, rolling_mean, moving_average_ladder, mixed_frequency_lags, seasonal_lag, season_dummy, time_features, fourier_features

Add lags, rolling blocks, mixed-frequency lag blocks, and deterministic date features.

ML-side value transforms

transform_features, log_features, diff_features, log_diff_features, pct_change_features, cumsum_features, scale_features

Add post-preprocessing transformations and scaling used as model features.

Nonlinear expansions

polynomial_features, interaction_features, wavelet_features, savitzky_golay_features, adaptive_ma_rf_features, asymmetric_trim_features, rank_space_features

Add nonlinear, smooth, rank, or multi-resolution feature blocks. rank_space_features is the generic order-statistic primitive used by Albacoreranks.

Supervised aggregation helpers

moving_average_changes, align_reference_weights, weighted_aggregate

Reusable component-to-aggregate helpers derived from the Albacore/assemblage R package. They are generic and can be used outside inflation.

Trend/cycle filter wrappers

hamilton_filter_features, hp_filter_features

Turn macroforecast.filters one-series filter outputs into panel feature columns with leakage warnings where needed.

Factor features

pca_features, dfm_features, group_pca, varimax_features, sparse_pca_chen_rohe_features, partial_least_squares_features, sliced_inverse_regression_features, random_projection_features, nystroem_features

Build fitted factor, supervised-factor, sparse-factor, rotation, random projection, and kernel approximation features.

Paper-style combinations

feature_matrix, maf_features, moving_average_pca_lags, pca_then_lags, lags_then_pca

Materialize named macro-ML blocks such as X, F, MARX, MAF, and compositions.

Composition

compose_features, custom_features, custom_step

Run sequential feature steps or user-supplied transforms.

Runner-safe specs

feature_spec plus step builders ending in _step

Store fit-aware feature construction for forecasting.run(...).

Feature selection

select_features, variance_selection, correlation_selection, lasso_selection, lasso_path_selection, rfe_selection, boruta_selection, stability_selection, genetic_selection

Select columns with variance, target association, sparse-model, wrapper, or search rules.

Selection utilities

normalize_feature_selection_method, feature_selection_requires_target

Normalize selection aliases and report whether a target is required.

End-to-end builder

build_features

Return aligned X, y, metadata, feature provenance, and target provenance.

Runner-safe step builders are direct functions too: lag_step, rolling_step, moving_average_step, marx_step, transform_step, seasonal_lag_step, season_dummy_step, fourier_step, time_step, polynomial_step, interaction_step, scale_step, pca_step, sparse_pca_chen_rohe_step, varimax_step, group_pca_step, maf_step, hamilton_step, random_projection_step, nystroem_step, partial_least_squares_step, and sliced_inverse_regression_step.

Code Structure#

The public namespace stays macroforecast.feature_engineering, while the implementation is split by responsibility:

File

Responsibility

targets.py

direct_target(), average_target(), and path_targets().

transforms.py

Direct pandas feature transforms: lags, rolling means, scaling, PCA, PLS, DFM-style factors, Chen-Rohe sparse component analysis, varimax rotation, grouped PCA, MAF, filter-to-feature wrappers, custom feature callables, and time features.

feature_selection.py

Shared fitted feature-selection algorithms used by direct selection callables and runner-safe feature_spec() method names.

compose.py

Reusable step builders and sequential feature composition.

matrix.py

Paper-style X, F, MARX, MAF, and LEVEL feature-matrix combinations.

builder.py

End-to-end build_features() alignment of X, y, and metadata.

shared.py

Internal normalization, metadata, fitting, and validation helpers.

core.py

Compatibility re-export only.

Public Classes And Types#

Symbol

Meaning

FeatureInput

Accepted feature-engineering input type.

FeatureSet

Output object returned by build_features(...).

FeatureSpec

Runner-compatible feature-building contract.

FittedFeatureBuilder

Fitted feature-builder state used by the runner.

FeatureSelectionResult

Metadata-rich result for feature-selection helpers.

select_features

Generic feature-selection dispatcher.

feature_selection_requires_target

Return whether a feature-selection method requires a target.

normalize_feature_selection_method

Normalize feature-selection method aliases.

pca_then_lags

Convenience composition: PCA factors first, then lags.

lags_then_pca

Convenience composition: lag panel first, then PCA.

moving_average_pca_lags

Convenience composition for moving-average blocks, PCA, and lags.

FeatureSet#

macroforecast.feature_engineering.FeatureSet(
    X: pandas.DataFrame,
    y: pandas.DataFrame,
    metadata: dict,
    feature_metadata: pandas.DataFrame,
    target_metadata: pandas.DataFrame,
    target: str | None = None,
    targets: tuple[str, ...] = (),
    horizons: tuple[int, ...] = (),
    predictors: tuple[str, ...] = (),
)

Output Schema#

Field

Type

Meaning

X

pandas.DataFrame

Predictor matrix aligned on forecast-origin dates.

y

pandas.DataFrame

Direct horizon targets or path step targets aligned to X.

metadata

dict

Input metadata plus feature-engineering stage metadata.

feature_metadata

pandas.DataFrame

One row per generated feature with provenance columns.

target_metadata

pandas.DataFrame

One row per target column with horizon, transform, and formula provenance.

target, targets, horizons, predictors

scalar or tuple fields

Resolved study choices.

Methods#

Method

Input

Output

Meaning

attach(stage, values)

stage: str, values: Mapping

FeatureSet

Return a new object with one metadata stage added.

FeatureSet also supports tuple unpacking:

X, y, metadata = features

FeatureSelectionResult#

select_features(...) returns FeatureSelectionResult; direct selection wrappers return a selected-column DataFrame and store the same selection metadata on the returned frame.

Field

Meaning

selected_columns

Final selected columns in source order.

scores

Per-column score dictionary.

method

Canonical selection method.

n_features, resolved_n_features

Requested and resolved selected-feature counts.

n_fit_rows

Rows used by the selection fit.

fit_policy

Fit contract, such as target-aligned rows or column-only rows.

target_required

Whether the method requires a target.

metadata

Method-specific score and fit details.

Feature Boundary#

This stage is direct and pandas-first. It constructs target columns and ML-oriented feature transforms. Multiple transformations can be composed in sequence through plain Python callables, and higher-level orchestration can call the same functions later.

Function

Owns

Does not own yet

direct_target()

Direct-forecast target columns, including direct average targets.

Train/test split, recursive forecasting, inverse transforms.

average_target()

Explicit wrapper for direct average change/growth targets.

Model fitting.

forward_average_target()

Albacore/assemblage-named wrapper for future average aggregate targets.

Inflation-only semantics; works for any aggregate target.

path_targets()

Step-level targets for path-average forecasting.

Model-stage step fit/forecast and evaluation-stage forecast averaging.

lag()

Current and lagged predictor columns.

Model-specific lag search.

mixed_frequency_lags()

Exact-date lag blocks for native mixed-frequency panels.

Frequency conversion or model estimation.

rolling_mean()

Rolling-window means.

Fit-based filters or learned smoothers.

moving_average_ladder()

Multi-scale trailing moving-average block used before optional factor/PCA steps.

PCA/factor extraction itself.

maf_features()

Moving Average Factors from variable-specific lag panels.

Model fitting or choosing final feature combinations.

hamilton_filter_features()

Panel wrapper around filters.hamilton_filter() with explicit expanding or full-sample policy.

Model fitting, test windows, or choosing filter horizons.

feature_matrix()

Named X, F, MARX, MAF, and LEVEL feature-matrix combinations.

Loading or preprocessing the raw/level panel.

scale_features()

Fit-policy-aware z-score, min-max, or robust scaling.

Model fitting.

pca_features()

Fit-policy-aware PCA factors.

Forecast model fitting.

sparse_pca_chen_rohe_features()

Chen-Rohe sparse component analysis factors using an L1 loading budget.

Model fitting; runner-safe fitting should use sparse_pca_chen_rohe_step() inside feature_spec().

varimax_features()

Orthogonal varimax rotation of already-created factor-score columns.

Factor extraction itself; usually call after pca_features() or a factor step.

sliced_inverse_regression_features()

Target-aware Sliced Inverse Regression factors.

Model fitting; runner-safe fitting should use sliced_inverse_regression_step() inside feature_spec().

partial_least_squares_features()

Target-aware PLS factor scores.

Model fitting; runner-safe fitting should use partial_least_squares_step() inside feature_spec().

dfm_features()

Static DFM approximation by standardized PCA.

State-space DFM estimation; use model callables for that.

variance_selection(), correlation_selection(), lasso_selection(), lasso_path_selection(), rfe_selection(), boruta_selection(), stability_selection(), genetic_selection()

Direct column selection by one explicit algorithm.

Model fitting; runner-safe fitting uses the same method names inside feature_spec(..., steps=[...]).

asymmetric_trim_features()

Per-period rank-space columns for asymmetric trimming weights.

Estimating the nonnegative rank weights.

rank_space_features()

Named generic rank-space/order-statistic primitive from the Albacore R x.transformation path.

Model fitting or learned rank weights.

moving_average_changes()

Convert one-period component changes to a trailing moving-average unit.

Choosing forecast windows or fitting weights.

align_reference_weights()

Align official/reference weights to a component column order.

Estimating those weights.

weighted_aggregate()

Apply fixed component weights to produce one aggregate.

Learning the weights; use models.component_aggregation() for that.

wavelet_features()

Panel wrapper around filters.wavelet_filter() returning causal rolling multi-resolution approximation/detail columns.

True DWT family-specific filtering.

adaptive_ma_rf_features()

Feature-wrapper form of filters.albama() that returns {column}_albama columns and stores full AlbaMAResult objects in attrs.

Forecast model fitting.

group_pca()

PCA factors within named column groups.

FAVAR-specific slow/fast construction, model estimation, or structural identification.

custom_features()

One direct user-supplied pandas feature transform.

Window-safe fitted state. Use custom_step() inside feature_spec() for runner use.

compose_features()

Sequential combinations such as pca -> lag, lag -> pca, maf, or moving_average_ladder -> pca -> lag.

Model fitting or evaluation.

time_features()

Trend, month, quarter, and year columns.

Public-holiday or trading-day calendars; the package targets monthly and quarterly macro panels.

build_features()

Aligned X, y, feature metadata, and feature-engineering metadata.

Model evaluation.

Fit-based transformations require a declared fit_policy. The default is fit_policy="expanding", which estimates transform parameters using only data available through each date. fit_policy="full_sample" is available for exploration or already-split training data. Public fitted transforms warn by default when full_sample is used; pass warn_full_sample=False only when the input panel is already training-only or the call is intentionally diagnostic.

direct_target#

macroforecast.feature_engineering.direct_target(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    transform: str = "level",
) -> pandas.DataFrame

Input#

Name

Type

Default

Choices

data

PreprocessedData, DataSpec, DataBundle, (panel, metadata), or DataFrame

required

Canonical macroforecast input.

metadata

mapping or None

None

Extra metadata to merge into the input metadata.

target

str or None

from data

One target column.

targets

iterable or None

from data

Multiple target columns. Mutually exclusive with target.

horizon

positive int or None

from data, then 1

One forecast horizon.

horizons

positive int/iterable or None

from data, then (1,)

Multiple forecast horizons. Mutually exclusive with horizon.

transform

str

"level"

"level", "value", "change", "growth", "log_growth", "average_value", "average_change", "average_growth", "average_log_growth"; common aliases include "future_level", "future_value", "identity", "diff", "pct_change", "simple_growth", "log_change", "log_diff", "avg_value", "avg_change", "avg_growth", and "direct_average_growth".

Output#

Returns a pandas.DataFrame indexed by date. Column names are {target}_{transform}_h{horizon}.

Transform

Formula aligned on row t

"level"

x[t + h]

"value"

x[t + h]; same formula as level, used when the series is already on the target’s forecasting scale.

"change"

x[t + h] - x[t]

"growth"

x[t + h] / x[t] - 1

"log_growth"

log(x[t + h]) - log(x[t]); non-positive pairs become missing.

"average_value"

Average of future values x[t+1] through x[t+h]; use this when x is already a one-period transformed forecast target.

"average_change"

Average of one-period changes from t+1 through t+h.

"average_growth"

Average of one-period simple growth rates from t+1 through t+h.

"average_log_growth"

Average of one-period log growth rates from t+1 through t+h.

The final h rows are missing by construction because the future target is not observed.

The returned frame also carries attrs["macroforecast_target_metadata"]. Core columns are target_column, source, horizon, step, mode, transform, operation, formula, aggregation, and used_for_horizons.

average_target#

macroforecast.feature_engineering.average_target(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    transform: str = "change",
) -> pandas.DataFrame

average_target() is a readability wrapper for direct average targets. It returns the same output as:

mf.feature_engineering.direct_target(..., transform="average_change")
mf.feature_engineering.direct_target(..., transform="average_growth")
mf.feature_engineering.direct_target(..., transform="average_log_growth")
mf.feature_engineering.direct_target(..., transform="average_value")

This is the direct average approach: one final target column is created per requested horizon, and a later model can fit that column directly.

transform

Meaning

"value"

Average future values of an already transformed one-period target series.

"change"

Average one-period differences over the future path.

"growth"

Average one-period simple growth rates over the future path.

"log_growth"

Average one-period log growth rates over the future path.

forward_average_target#

macroforecast.feature_engineering.forward_average_target(
    data,
    *,
    target=None,
    targets=None,
    horizon=None,
    horizons=None,
    transform="change",
)

forward_average_target() is the named target helper for Albacore/assemblage-style supervised aggregation. It calls the same target logic as average_target(), but records source metadata pointing to Goulet Coulombe, Klieber, Barrette, and Goebel, Maximally Forward-Looking Core Inflation, and the R package assemblage. The helper is generic: the target can be any future aggregate, not only headline inflation.

Output: a DataFrame with columns such as headline_average_change_h12. The output stores attrs["macroforecast_target_metadata"] and marks source_method="assemblage_forward_target".

Assemblage Helper Primitives#

These helpers come from the Albacore/assemblage workflow but are intentionally split into generic callables. They can be attached to inflation components, state panels, sector panels, industry components, or any setting where current components are aggregated to forecast a future aggregate target.

Function

Input

Output

Albacore source cue

rank_space_features(data, columns=None, prefix="rank_")

Component panel.

rank_1, rank_2, … sorted low-to-high each date.

R x.transformation, rank path for Albacoreranks.

moving_average_changes(data, window=3, method="compound_percent")

One-period component changes.

{column}_ma{window} trailing change unit.

R x.transformation; month-over-month percent to 3m/12m compounded units.

align_reference_weights(weights, columns, normalize=True)

Mapping, Series, DataFrame, or sequence.

Series indexed by model columns.

R weight.transformation; official basket weights for Albacorecomps.

weighted_aggregate(data, weights, columns=None)

Component panel plus fixed weights.

One aggregate column.

Learned core measure after assemblage weights are estimated.

rank_space_features() and weighted_aggregate() do not estimate weights. Use macroforecast.models.rank_aggregation(), macroforecast.models.component_aggregation(), or the Albacore wrappers for supervised weight estimation.

path_targets#

macroforecast.feature_engineering.path_targets(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    transform: str = "change",
) -> pandas.DataFrame

path_targets() creates step-level future targets for path-average forecasting. For horizon=3, it returns step columns for t+1, t+2, and t+3. The model stage should fit and forecast each step target separately. The evaluation stage can then average the step forecasts for the final horizon. Use transform="value" when the supplied target column is already a one-period transformed object, such as the monthly growth/difference target in a FRED-MD replication.

path_y = mf.feature_engineering.path_targets(
    processed,
    target="INDPRO",
    horizon=3,
    transform="value",
)

Output columns are named {target}_{transform}_step{step}. Metadata includes metadata["path_target"]["columns_by_horizon"], which records which step columns should be averaged for each requested horizon.

macroforecast_target_metadata marks these rows with mode="path", operation="path_step_target", a non-null step, and aggregation="average_step_forecasts_in_evaluation". This records the intended later use without moving model fitting or forecast averaging into this stage.

lag#

macroforecast.feature_engineering.lag(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    lags: Iterable[int] | int = (1,),
    drop_missing: bool = False,
) -> pandas.DataFrame

Input#

Name

Type

Default

Choices

data

feature input

required

Canonical macroforecast input.

columns

iterable or None

all columns

Source columns to lag.

lags

int or iterable of ints

(1,)

Non-negative lags. lags=3 expands to 1, 2, 3; lags=0 means current values only; pass (0, 1, 3) for exact lags including current values.

drop_missing

bool

False

Drop rows with any lag-induced missing values.

Output#

Returns a pandas.DataFrame with columns named {column}_lag{lag}.

mixed_frequency_lags#

macroforecast.feature_engineering.mixed_frequency_lags(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    anchor_dates: Iterable[object] | None = None,
    columns: Iterable[str] | None = None,
    lags: Iterable[int] | int = (0, 1, 2),
    frequency_by_column: Mapping[str, str] | None = None,
    target_frequency: str | None = None,
    anchor_position: str = "date",
    drop_missing: bool = False,
) -> pandas.DataFrame

Builds a lag matrix for MIDAS-style and other mixed-frequency regressions. Unlike lag(), lags are measured in each source column’s native frequency, using metadata["native_frequency_by_column"] from mf.data.set_frequencies() or mf.data.combine(..., frequency="native").

Lookup is period based, not timestamp-string based. A monthly source dated 2020-03-01 and the same source dated 2020-03-31 both map to the March 2020 source period. This prevents month-start/month-end conventions from silently breaking MIDAS lag construction.

Input#

Name

Type

Default

Meaning

data

feature input

required

Panel or bundle with a mixed-frequency contract.

target

str or None

input target if available

Column whose non-missing dates define anchors when anchor_dates is not supplied.

anchor_dates

iterable or None

target non-missing dates

Explicit rows to build features for.

columns

iterable or None

all non-target columns

Source columns to lag.

lags

int or iterable

(0, 1, 2)

Native-frequency lags. Pass an iterable for exact lags.

frequency_by_column

mapping or None

metadata map

Override native frequency by source column.

target_frequency

str or None

target metadata/inference

Frequency used when positioning anchor dates.

anchor_position

str

"date"

"date", "period_start", or "period_end".

drop_missing

bool

False

Drop rows with missing lag values.

For FRED-QD-style quarterly targets dated at the first month of the quarter, use target_frequency="quarterly", anchor_position="period_end" to construct monthly lag blocks at the quarter-end month:

X_midas = mf.feature_engineering.mixed_frequency_lags(
    bundle,
    target="GDPC1",
    columns=["PAYEMS", "INDPRO"],
    lags=range(0, 12),
    target_frequency="quarterly",
    anchor_position="period_end",
)

The output columns are named {column}_lag{lag}, which is the grouping format expected by mf.models.midas_almon, mf.models.midas_beta, and mf.models.midas_step.

The returned DataFrame records metadata in two places:

Location

Meaning

attrs["macroforecast_metadata"]["feature_engineering_mixed_frequency_lags"]

Target, anchor dates, selected columns, exact lags, frequency map, anchor positioning, lookup calendar, and row counts before/after drop_missing.

attrs["macroforecast_feature_metadata"]

One row per generated lag feature, including source column, lag, native source frequency, anchor position, and lookup start/end dates.

rolling_mean#

macroforecast.feature_engineering.rolling_mean(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    windows: Iterable[int] | int = (3,),
    min_periods: int | None = None,
    shift: int = 0,
    drop_missing: bool = False,
) -> pandas.DataFrame

Input#

Name

Type

Default

Choices

columns

iterable or None

all columns

Source columns.

windows

positive int or iterable

(3,)

Rolling-window lengths.

min_periods

positive int or None

window length

Minimum observations required for a value.

shift

non-negative int

0

Shift source series before rolling. Use 1 for strictly lagged rolling means.

drop_missing

bool

False

Drop rows with window-induced missing values.

Output#

Returns a pandas.DataFrame with columns named {column}_roll{window}_mean. When shift > 0, names end in _lag{shift}.

moving_average_ladder#

macroforecast.feature_engineering.moving_average_ladder(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    windows: Iterable[int] | None = None,
    max_window: int = 12,
    min_periods: int | None = None,
    shift: int = 0,
    drop_missing: bool = False,
) -> pandas.DataFrame

Meaning#

moving_average_ladder() builds a stacked block of trailing moving averages at multiple horizons. With the default max_window=12, the implicit windows are 1, 2, 4, 8. Pass windows=(1, 2, 4, 8, 12) or any other explicit sequence when the endpoint should be included.

MARX in macroforecast#

Some papers describe this step as marx_features(P) or Moving Average Rotation of X (MARX). In macroforecast, the direct pandas form is the following explicit moving-average-ladder call:

MARX = mf.feature_engineering.moving_average_ladder(
    X,
    windows=range(1, P + 1),
    shift=1,
)

This means that, for each source series, the feature block contains increasing moving averages of lagged X: one-period lag, two-period average ending at t-1, three-period average ending at t-1, and so on through P. The shift=1 part is important because the MARX block uses lagged predictors, not the contemporaneous realization at the forecast date.

The runner-safe shorthand is marx_step(max_lag=P), used inside feature_spec(..., steps=[...]). It emits the same columns as the direct call, but lets forecasting.run() decide which rows are available for any fitted state through feature_policy.

The original author R snippet builds a VAR lag matrix ordered as lag 1 for all variables, lag 2 for all variables, and so on. Then each lag-l slot for a variable is replaced by the row average of that variable’s lag 1 through lag l columns. The direct call and marx_step(scale_lags=False) match that unscaled calculation. Through feature_matrix(..., specification="MARX", scale_marx=True) or marx_step(scale_lags=True), macroforecast also supports the optional R-code scaling step: z-score the lag matrix first using sample standard deviations, then apply the same increasing-lag averages. In feature_spec() mode, scale_lags=True fits those lag-matrix center/scale values only on the feature-fit panel and reuses them for validation/test rows.

This function is not PCA. It is the moving-average block used before optional factor extraction. Moving-average PCA should be represented as:

ma_block = mf.feature_engineering.moving_average_ladder(panel, windows=(1, 2, 4, 8, 12))
factors = mf.feature_engineering.pca_features(ma_block, fit_policy="expanding")

Keeping the moving-average block and PCA step separate matters because PCA is a fit-based transformation. Running PCA on the full sample before train/test or walk-forward boundaries would leak future information.

Input#

Name

Type

Default

Choices

columns

iterable or None

all columns

Source columns.

windows

iterable of positive ints or None

powers of two up to max_window

Exact moving-average windows.

max_window

positive int

12

Used only when windows=None; default creates 1, 2, 4, 8.

min_periods

positive int or None

window length

Minimum observations required for a value.

shift

non-negative int

0

Shift source series before rolling. Use 1 for strictly lagged moving averages.

drop_missing

bool

False

Drop rows with window-induced missing values.

Output#

Returns a pandas.DataFrame with columns named {column}_ma{window}. When shift > 0, names end in _lag{shift}.

scale_features#

macroforecast.feature_engineering.scale_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    method: str = "zscore",
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

Name

Type

Default

Choices

method

str

"zscore"

"zscore", "minmax", "robust"; aliases: "standard", "standardize", "min_max".

fit_policy

str

"expanding"

"expanding" or "full_sample".

min_train_size

positive int or None

5

Minimum complete rows before emitting scaled values.

drop_missing

bool

False

Drop rows where scaling is unavailable.

warn_full_sample

bool

True

Warn when fit_policy="full_sample" is used.

pca_features#

macroforecast.feature_engineering.pca_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    n_components: int = 1,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    scale: bool = True,
    prefix: str = "pc",
    drop_missing: bool = False,
    random_state: int | None = None,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

pca_features() returns columns named {prefix}1, {prefix}2, and so on. The default fit_policy="expanding" avoids full-sample leakage. Use fit_policy="full_sample" only after the input sample has already been split or for exploratory diagnostics. warn_full_sample=True emits a warning for that choice.

sparse_pca_chen_rohe_features#

macroforecast.feature_engineering.sparse_pca_chen_rohe_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    n_components: int = 4,
    zeta: float = 0.0,
    max_iter: int = 200,
    var_innovations: bool = False,
    prefix: str | None = None,
    min_train_size: int | None = None,
    drop_missing: bool = False,
    random_state: int | None = 0,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

sparse_pca_chen_rohe_features() implements the legacy package’s Chen-Rohe-style Sparse Component Analysis (SCA) routine directly with NumPy. It is not sklearn.decomposition.SparsePCA. The transform centers the selected predictor panel, alternates over the score and loading matrices, and constrains the loading matrix with an L1 budget zeta.

The direct callable fits on all complete rows of the supplied input. It warns by default because that is a full-input fitted transform. For strict walk-forward forecasting, use sparse_pca_chen_rohe_step() inside feature_spec(...); the runner will fit the sparse loading matrix on the feature-fit panel and reuse the fixed loading matrix on validation/test rows.

Input#

Name

Type

Default

Meaning

columns

iterable or None

all columns

Predictor columns used to fit sparse components.

n_components

positive int

4

Requested number of sparse components. The resolved number is min(n_components, complete_rows, selected_columns).

zeta

non-negative float

0.0

L1 loading-budget parameter. zeta <= 0 uses the resolved component count, matching the legacy default. Smaller values create sparser loadings.

max_iter

positive int

200

Maximum alternating updates.

var_innovations

bool

False

If True, fit a VAR(1) on the sparse factor scores and return residual sparse macro-finance factors.

prefix

string or None

"sca" or "scaf"

Output prefix. Default is "sca"; with var_innovations=True, default is "scaf".

min_train_size

positive int or None

1, or 3 when var_innovations=True

Minimum complete rows before emitting factors.

drop_missing

bool

False

Drop rows where sparse factors are unavailable.

random_state

int or None

0

Initialization seed for the alternating algorithm.

warn_full_sample

bool

True

Warn because the direct callable fits on all complete input rows.

Output#

Returns a pandas.DataFrame indexed by date, with columns such as sca1, sca2, or scaf1. Metadata is stored under attrs["macroforecast_metadata"]["feature_engineering_sparse_pca_chen_rohe"]. The stage records selected columns, requested/resolved components, zeta, resolved zeta, iteration count, final objective, VAR-innovation use, fit rows, and fit_policy="full_input_complete_rows".

macroforecast_feature_metadata records one row per factor with operation="sparse_pca_chen_rohe", the source columns, component index, and fit policy.

varimax_features#

macroforecast.feature_engineering.varimax_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    max_iter: int = 50,
    tol: float = 1e-7,
    prefix: str = "varimax",
    min_train_size: int | None = None,
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

varimax_features() rotates a factor-score panel with an orthogonal varimax rotation. It should be applied to factor columns, not raw macro variables. A typical direct use is:

factors = mf.feature_engineering.pca_features(
    processed,
    columns=["INDPRO", "PAYEMS", "UNRATE"],
    n_components=3,
    fit_policy="full_sample",
    warn_full_sample=False,
)
rotated = mf.feature_engineering.varimax_features(factors, warn_full_sample=False)

The direct callable fits the rotation on all complete rows and warns by default. For strict walk-forward forecasting, use:

spec = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.pca_step(name="pc", n_components=3, include=False),
        mf.feature_engineering.varimax_step(name="rot", input="pc"),
    ],
)

Input#

Name

Type

Default

Meaning

columns

iterable or None

all columns

Factor-score columns to rotate.

max_iter

positive int

50

Maximum varimax iterations.

tol

non-negative float

1e-7

Convergence tolerance for the rotation objective.

prefix

string

"varimax"

Output prefix.

min_train_size

positive int or None

1

Minimum complete rows before emitting rotated factors.

drop_missing

bool

False

Drop rows where rotated factors are unavailable.

warn_full_sample

bool

True

Warn because the direct callable fits on all complete input rows.

Output#

Returns a pandas.DataFrame with columns such as varimax1, varimax2, and so on. Metadata is stored under metadata["feature_engineering_varimax"], and macroforecast_feature_metadata records operation="varimax", component index, source factor columns, and fit policy.

sliced_inverse_regression_features#

macroforecast.feature_engineering.sliced_inverse_regression_features(
    data,
    target: str | pandas.Series | None = None,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    n_components: int = 3,
    n_slices: int = 10,
    scaling_policy: str = "scaled_pca",
    prefix: str = "sir",
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

sliced_inverse_regression_features() implements a target-aware SIR factor transform. It aligns the predictor panel with a target series, standardizes predictors, optionally applies predictive column scaling, slices observations by the target distribution, and projects the full panel onto the leading between-slice directions.

The direct callable fits on all target-aligned complete rows in the supplied input. For strict walk-forward forecasting, use sliced_inverse_regression_step() inside feature_spec(...); the runner then fits SIR directions only on the feature-fit panel and applies the fixed directions to validation/test rows.

Input#

Name

Type

Default

Choices

target

string, Series, or None

input target metadata

Target signal used for SIR slicing and optional predictive scaling.

columns

iterable or None

all non-target columns

Predictor columns.

n_components

positive int

3

Number of SIR factors to return. If the effective rank is smaller, remaining columns are zero-padded for stable shape.

n_slices

int

10

Target-distribution slices. Must be at least 2; internally capped by aligned row count.

scaling_policy

string

"scaled_pca"

"scaled_pca", "marginal_R2", or "none".

prefix

string

"sir"

Output prefix. Use prefix="factor_" for legacy-style names factor_1, factor_2, …

drop_missing

bool

False

Drop rows with missing predictor values after projection.

warn_full_sample

bool

True

Warn because the direct callable fits on all target-aligned complete rows.

Output#

Returns columns such as sir1, sir2, and so on. Metadata is stored under metadata["feature_engineering_sliced_inverse_regression"] and records the target, predictor columns, requested/resolved components, slices, scaling policy, fit row count, and fit_policy="full_input_target_aligned_rows".

Target-Aware Feature Steps#

feature_spec(..., steps=[...]) also supports target-aware fitted transforms. These steps use the resolved feature_spec() target during .fit(...), store a fixed fit state, and do not look at target values during .transform(...).

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.scale_step(name="scaled", include=False),
        mf.feature_engineering.partial_least_squares_step(
            name="pls",
            input="scaled",
            n_components=2,
            min_train_size=60,
        ),
    ],
)

Target-aware steps require exactly one resolved target column. In practice that means one target and one horizon for the step pipeline. If multiple targets or horizons are requested, fit raises before any model is run.

Step builder or method

Direct callable

Main options

Output

partial_least_squares_step()

partial_least_squares_features()

n_components, columns, min_train_size, prefix

pls1, pls2, …

sliced_inverse_regression_step()

sliced_inverse_regression_features()

n_components, n_slices, scaling_policy, min_train_size, prefix

sir1, sir2, …

"variance_selection"

variance_selection()

n_features; columns; min_train_size

Selected input columns.

"correlation_selection"

correlation_selection()

n_features; columns; min_train_size

Selected input columns.

"lasso_selection"

lasso_selection()

n_features; alpha; min_train_size

Selected input columns.

"lasso_path_selection"

lasso_path_selection()

n_features; eps; n_alphas; normalize_features; positive

Selected input columns.

"rfe_selection"

rfe_selection()

n_features; estimator; step; use_cv; cv_folds

Selected input columns.

"boruta_selection"

boruta_selection()

n_features; n_estimators; max_iter; alpha; include_tentative

Selected input columns.

"stability_selection"

stability_selection()

n_features; n_subsamples; subsample_fraction; pi_threshold; base_estimator

Selected input columns.

"genetic_selection"

genetic_selection()

n_features; population_size; n_generations; crossover_prob; fitness_estimator

Selected input columns.

Fit-state metadata records the resolved target column, selected source columns, requested/resolved component or feature count, fit row count, and fit_policy="fixed_fit_panel_target_aligned_rows" for target-dependent methods. For method="variance_selection", no target is used and the fit policy is fixed_fit_panel_columns.

Feature selection deliberately has no generic wrapper step. Use each algorithm name directly inside feature_spec():

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        {"name": "boruta", "method": "boruta_selection", "n_features": 2},
    ],
)

Custom Feature Functions#

custom_features() applies one user feature transform directly. It is useful when the input has already been split or when the transform has no fitted state.

macroforecast.feature_engineering.custom_features(
    data,
    func,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    name: str | None = None,
    **params,
) -> pandas.DataFrame

The callable receives:

func(source: pandas.DataFrame, *, metadata: dict, **params)

source is the selected predictor block. The callable may return a DataFrame, Series, or 1-D/2-D array-like object. The output must have the same row count as source or keep a DatetimeIndex. All output columns are coerced to numeric and validated as a macroforecast panel.

def spread_square(source, *, metadata=None, suffix="sq"):
    column = source.columns[0]
    return pd.DataFrame({f"{column}_{suffix}": source[column] ** 2}, index=source.index)

X_custom = mf.feature_engineering.custom_features(
    processed,
    spread_square,
    columns=["term_spread"],
    name="spread_square",
)

For strict runner use, prefer custom_step() inside feature_spec(...). The runner fits the step on the rows allowed by feature_policy and applies it to validation/test rows without leaking future information.

macroforecast.feature_engineering.custom_step(
    name,
    func=None,
    *,
    input="panel",
    include=True,
    columns=None,
    fit_func=None,
    transform_func=None,
    requires_target=False,
    min_train_size=None,
    prefix=None,
    drop_missing=False,
    **params,
) -> dict

Custom Step Modes#

Mode

Required callable

Fit-time call

Transform-time call

Stateless

func

none

func(source, metadata=..., **params)

Fitted transformer object

fit_func returning object with .transform()

fit_func(source, target=..., metadata=..., **params)

state.transform(source)

Separate fit/transform functions

fit_func and transform_func

fit_func(source, target=..., metadata=..., **params)

transform_func(source, state=state, metadata=..., **params)

State-aware callable

fit_func and func

fit_func(source, target=..., metadata=..., **params)

func(source, state=state, metadata=..., **params)

Set requires_target=True when the fitting callable needs the resolved feature_spec() target. This requires exactly one target and one horizon. The fitted state metadata stores callable names, selected columns, whether the target was used, fit row count, and output columns.

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.custom_step(
            "my_factor",
            fit_func=my_factor_fit,
            transform_func=my_factor_transform,
            columns=["PAYEMS", "UNRATE", "HOUST"],
            requires_target=True,
            prefix="myf",
            n_components=2,
        ),
    ],
)

group_pca#

macroforecast.feature_engineering.group_pca(
    data,
    *,
    groups: Mapping[str, Iterable[str]],
    metadata: Mapping[str, object] | None = None,
    n_components: int | Mapping[str, int] = 1,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    scale: bool = True,
    prefix: str | None = None,
    drop_missing: bool = False,
    random_state: int | None = None,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

group_pca() extracts PCA factors separately within named groups. It is a generic grouped factor transform. FAVAR-specific slow/fast grouping, observed-policy variables, VAR dynamics, identification, and IRFs belong to later model and evaluation stages.

factors = mf.feature_engineering.group_pca(
    processed,
    groups={
        "real_activity": ["INDPRO", "PAYEMS", "UNRATE"],
        "prices": ["CPIAUCSL", "PPIACO"],
    },
    n_components={"real_activity": 3, "prices": 2},
    fit_policy="expanding",
)

Output columns use the group name as the prefix by default:

real_activity1, real_activity2, real_activity3, prices1, prices2

group_pca_step() provides the same operation inside compose_features().

Supervised And Sparse Component Boundary#

Unsupervised group PCA belongs in feature_engineering because it only uses the predictor panel. PLS and SIR are target-aware and are available as runner-safe feature steps when the resolved feature_spec() target is single. Supervised PCA variants that fit a full predictive model still belong in macroforecast.models. Chen-Rohe sparse component analysis is unsupervised and is available as sparse_pca_chen_rohe_features() / sparse_pca_chen_rohe_step().

maf_features#

macroforecast.feature_engineering.maf_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    max_lag: int = 12,
    lags: Iterable[int] | None = None,
    n_components: int = 2,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    scale: bool = False,
    prefix: str = "maf",
    drop_missing: bool = False,
    random_state: int | None = None,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

maf_features() implements Moving Average Factors. For each selected variable x_k, it builds a variable-specific lag panel:

[x_k(t), x_k(t-1), ..., x_k(t-P)]

Then it extracts PCA components from that lag panel only. This is different from pca_features(), which runs PCA across all selected variables, and different from moving_average_pca_lags(), which runs PCA on a moving-average block.

The MAF implementation is intentionally limited to the construction described in the paper: variable-specific lag panels followed by PCA. The package does not assume undocumented author-code details beyond that description.

Validation status: MARX is tested against the author-supplied R-loop pattern. MAF is tested for the documented variable-specific lag-panel PCA contract, but there is no author-code benchmark in the package yet. If author MAF code becomes available, it should be added as a separate equivalence test before tightening the claim.

MAF = mf.feature_engineering.maf_features(
    X,
    max_lag=12,
    n_components=2,
    fit_policy="expanding",
)

With two input series, this returns columns like:

INDPRO_maf1, INDPRO_maf2, PAYEMS_maf1, PAYEMS_maf2

Input#

Name

Type

Default

Choices

columns

iterable or None

all columns

Source series for variable-specific lag panels.

max_lag

non-negative int

12

Used when lags=None; builds lags 0 through max_lag.

lags

iterable of non-negative ints or None

None

Exact lag set. Overrides max_lag.

n_components

positive int

2

Number of MAF components per source series.

fit_policy

str

"expanding"

"expanding" or "full_sample".

min_train_size

positive int or None

max(5, n_components + 1)

Minimum complete rows before emitting PCA values.

scale

bool

False

Whether to z-score the lag columns before PCA. Default is False because lags of the same variable are already in the same unit.

prefix

str

"maf"

Component label used in output names.

drop_missing

bool

False

Drop rows where MAF values are unavailable.

warn_full_sample

bool

True

Warn when fit_policy="full_sample" is used.

Output#

Returns a pandas.DataFrame with one block per source series. Metadata is stored in metadata["feature_engineering_maf"], and macroforecast_feature_metadata records the source series for each component.

feature_matrix#

macroforecast.feature_engineering.feature_matrix(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    specification: str | Iterable[str] = "X",
    columns: Iterable[str] | None = None,
    level_data: feature input | None = None,
    level_columns: Iterable[str] | None = None,
    lags: Iterable[int] | int = (0,),
    max_lag: int = 12,
    n_factors: int = 8,
    n_maf_components: int = 2,
    fit_policy: str = "expanding",
    min_train_size: int | None = None,
    include_current_factor: bool = True,
    scale_factors: bool = True,
    scale_marx: bool = False,
    scale_maf: bool = False,
    drop_missing: bool = False,
    warn_full_sample: bool = True,
) -> pandas.DataFrame

feature_matrix() builds named combinations used in macro-ML forecasting papers without requiring the user to hand-write compose_features() steps.

Block

Package implementation

X

lag(data, lags=...) on the supplied, usually preprocessed, panel.

F

PCA factors from the supplied panel, then lags of those factors.

MARX

moving_average_ladder(data, windows=range(1, max_lag + 1), shift=1); use marx_step() for runner-safe windowed construction. With scale_marx=True, first z-score the full lag matrix with sample standard deviations and then average lag 1 through lag l.

MAF

maf_features(data, max_lag=max_lag, n_components=n_maf_components).

LEVEL / H

lag(level_data, lags=...); requires a separate level_data input.

specification can be a string such as "F-X-MARX" or an iterable such as ("F", "X", "MAF"). Output columns are prefixed by block, for example F__F1_lag0, X__INDPRO_lag1, MARX__INDPRO_ma3_lag1, or MAF__INDPRO_maf1.

include_current_factor=True ensures the F block includes current factors even when lags contains only positive values such as range(1, 13). Set it to False when the factor block should exactly follow the supplied lag set.

Paper-Style Specifications#

The paper-style feature families are handled directly by feature_matrix(). The parser accepts -, +, or _ separators.

Specification

Meaning

"X"

Lagged predictor panel.

"F"

PCA factors from the predictor panel, then factor lags.

"F-X"

Factor lags plus lagged predictors.

"H" or "LEVEL"

Lagged level variables from level_data.

"X-H"

Lagged predictors plus lagged level variables from level_data.

"F-X-H" or "F-X-LEVEL"

F-X plus lagged level variables from level_data.

"F-X-MARX"

F-X plus MARX increasing averages of lagged predictors.

"F-X-MAF"

F-X plus Moving Average Factors.

"F-X-H-MARX"

F-X-H plus MARX.

"F-X-H-MAF"

F-X-H plus MAF.

Specification

Requires level_data

Fitted transform

Main output blocks

"X"

No

No

X__...

"F"

No

PCA

F__...

"F-X"

No

PCA

F__..., X__...

"H" / "LEVEL"

Yes

No

LEVEL__...

"X-H"

Yes

No

X__..., LEVEL__...

"F-X-H"

Yes

PCA

F__..., X__..., LEVEL__...

"F-X-MARX"

No

PCA; optional MARX scaling

F__..., X__..., MARX__...

"F-X-MAF"

No

PCA and MAF PCA

F__..., X__..., MAF__...

"F-X-H-MARX"

Yes

PCA; optional MARX scaling

F__..., X__..., LEVEL__..., MARX__...

"F-X-H-MAF"

Yes

PCA and MAF PCA

F__..., X__..., LEVEL__..., MAF__...

Examples:

FX = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X",
    lags=range(0, 13),
    n_factors=8,
    fit_policy="expanding",
)

FXH = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-H",
    level_data=raw_bundle,
    lags=range(0, 13),
    n_factors=8,
)

FXMARX = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-MARX",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
    scale_marx=False,
)

FXMAF = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-MAF",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
    n_maf_components=2,
)

Input#

Name

Type

Default

Choices

specification

string or iterable

"X"

Blocks X, F, MARX, MAF, LEVEL/H; strings can use -, +, or _ separators.

columns

iterable or None

all columns

Source columns from the preprocessed panel.

level_data

feature input or None

None

Required when specification includes LEVEL or H.

level_columns

iterable or None

columns

Level-data columns to include.

lags

int or iterable

(0,)

Lag set for X, F, and LEVEL.

max_lag

positive int

12

Maximum lag for MARX and MAF.

n_factors

positive int

8

Number of PCA factors for F.

n_maf_components

positive int

2

MAF components per source variable.

fit_policy

str

"expanding"

"expanding" or "full_sample" for fitted transforms.

min_train_size

positive int or None

transform-specific

Minimum complete rows for fitted transforms.

include_current_factor

bool

True

Force lag 0 in the F block even when lags excludes it.

scale_factors

bool

True

Z-score variables before PCA in the F block.

scale_marx

bool

False

Match the optional author R-code scaling step for MARX.

scale_maf

bool

False

Z-score variable-specific MAF lag panels before PCA.

drop_missing

bool

False

Drop rows with missing feature values.

warn_full_sample

bool

True

Warn when any fitted block uses fit_policy="full_sample".

Z = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-X-MARX",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
    fit_policy="expanding",
    drop_missing=True,
)

Use level_data= when the combination includes level variables:

Z = mf.feature_engineering.feature_matrix(
    processed,
    specification="F-LEVEL",
    level_data=raw_bundle,
    lags=range(0, 13),
)

compose_features#

macroforecast.feature_engineering.compose_features(
    data,
    steps,
    *,
    metadata: Mapping[str, object] | None = None,
    columns: Iterable[str] | None = None,
    include_original: bool = False,
    drop_missing: bool = False,
) -> pandas.DataFrame

steps is an ordered list of mappings. Each step has:

Key

Meaning

name

Step name. Later steps can reference this name.

method

One of "lag", "rolling_mean", "moving_average_ladder", "marx", "transform", "seasonal_lag", "season_dummy", "fourier", "polynomial", "interaction", "maf", "scale", "pca", "sparse_pca_chen_rohe", "varimax", "group_pca", "time".

input

"panel" by default, or a previous step name.

include

Whether this step’s output is included in final X; default True.

other keys

Method-specific parameters such as lags, windows, n_components, fit_policy, warn_full_sample, or columns.

Examples:

# PCA, then lags of the PCA factors.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "pc", "method": "pca", "columns": ["PAYEMS", "INDPRO"], "n_components": 2, "include": False},
        {"name": "pc_lags", "method": "lag", "input": "pc", "lags": [1, 2, 3]},
    ],
)

# Lags first, then PCA on the lag block.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "lag_block", "method": "lag", "lags": [0, 1, 2, 3], "include": False},
        {"name": "lag_pc", "method": "pca", "input": "lag_block", "n_components": 4},
    ],
)

# Moving-average ladder, PCA, then lags of the factor.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "ma", "method": "moving_average_ladder", "windows": [1, 2, 4, 8, 12], "include": False},
        {"name": "ma_pc", "method": "pca", "input": "ma", "n_components": 4, "include": False},
        {"name": "ma_pc_lags", "method": "lag", "input": "ma_pc", "lags": [1, 2]},
    ],
)

# MAF as a direct block inside a composed feature matrix.
X = mf.feature_engineering.compose_features(
    processed,
    [
        {"name": "maf", "method": "maf", "max_lag": 12, "n_components": 2},
    ],
)

# MARX shorthand: increasing averages of lagged predictors.
X = mf.feature_engineering.compose_features(
    processed,
    [
        mf.feature_engineering.marx_step(max_lag=12, scale_lags=False),
    ],
)

# Extra deterministic transforms can be composed the same way.
X = mf.feature_engineering.compose_features(
    processed,
    [
        mf.feature_engineering.transform_step(name="log_ip", transform="log", columns=["INDPRO"], include=False),
        mf.feature_engineering.lag_step(name="log_ip_lag", input="log_ip", lags=[1, 2, 3]),
        mf.feature_engineering.interaction_step(name="cross", columns=["PAYEMS", "HOUST"]),
    ],
)

time_features#

macroforecast.feature_engineering.time_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    trend: bool = True,
    month: bool = False,
    quarter: bool = False,
    year: bool = False,
) -> pandas.DataFrame

Input And Output#

Option

Output columns

trend=True

trend, starting at 1.0.

month=True

month_01 through month_12.

quarter=True

quarter_1 through quarter_4.

year=True

year.

Additional Transform Helpers#

These helpers are feature-engineering transforms, not preprocessing t-codes. Use them when the model feature set needs extra ML-oriented columns after the canonical panel has already been cleaned.

Function

Main options

Output

transform_features(data, transform=...)

transform: "log", "diff", "log_diff", "pct_change", "cumsum"; periods; columns; drop_missing

{column}_{transform} columns.

log_features, diff_features, log_diff_features, pct_change_features, cumsum_features

Thin named wrappers around transform_features.

Same as above.

seasonal_lag(data, season_length=12, lags=...)

Seasonal step length and seasonal lag count.

{column}_seasonlag{actual_lag}.

season_dummy(data, frequency="auto")

"month" or "quarter", optional drop_first.

Month or quarter dummies.

fourier_features(data, period=12, order=2)

Seasonal period and harmonic order.

Sine/cosine seasonal terms.

polynomial_features(data, degree=2)

degree, include_bias, interaction_only.

Named polynomial expansion columns.

interaction_features(data, order=2)

Exact-order pure interaction expansion without lower-order terms or powers.

interaction_{col1}__{col2} style columns.

hp_filter_features(data, lamb=129600.0)

HP lambda, component: "cycle", "trend", or "both"; warn_full_sample=True.

HP cycle/trend columns.

hamilton_filter_features(data, h=8, p=4)

Hamilton horizon h, regressor count p, component, fit_policy: "expanding" or "full_sample", and missing policy.

{column}_hamilton_cycle and/or {column}_hamilton_trend.

savitzky_golay_features(data, window_length=5, polyorder=2)

Centered filter window, polynomial order, derivative; warn_full_sample=True.

Smoothed columns.

wavelet_features(data, n_levels=3)

Causal rolling approximation/detail levels; wavelet name is recorded for compatibility.

{column}_wA{level}, {column}_wD{level}.

adaptive_ma_rf_features(data, sided="two", sample_fraction=0.6)

Feature wrapper around filters.albama; sided="two" warns, sided="one" uses expanding one-sided fits. Full AlbaMAResult objects are stored in attrs["macroforecast_feature_weight_results"].

{column}_albama.

asymmetric_trim_features(data)

Sorts each row’s selected columns in ascending order.

rank_1, rank_2, …

partial_least_squares_features(data, target=..., n_components=...)

Target-aware PLSRegression scores; warns by default.

pls1, pls2, …

dfm_features(data, n_factors=...)

Static DFM approximation by standardized PCA; warns by default.

dfm1, dfm2, …

variance_selection(data, n_features=...)

Select by sample variance; no target required.

Subset of original columns.

correlation_selection(data, target=..., n_features=...)

Select by absolute target correlation.

Subset of original columns.

lasso_selection(data, target=..., alpha=...)

Select by absolute lasso coefficient.

Subset of original columns.

lasso_path_selection(data, target=..., eps=..., n_alphas=...)

Select by lasso-path inclusion frequency.

Subset of original columns.

rfe_selection(data, target=..., estimator=...)

Select by recursive feature elimination.

Subset of original columns.

boruta_selection(data, target=..., n_estimators=..., max_iter=...)

Select by Boruta-style shadow-feature tests.

Subset of original columns.

stability_selection(data, target=..., n_subsamples=..., pi_threshold=...)

Select by repeated sparse-model subsampling frequency.

Subset of original columns.

genetic_selection(data, target=..., population_size=..., n_generations=...)

Select by genetic subset search.

Subset of original columns.

random_projection_features(data, n_components=...)

Gaussian random projection; warn_full_sample=True by default.

rp1, rp2, …

nystroem_features(data, kernel="rbf", n_components=...)

Kernel approximation settings; warn_full_sample=True by default.

nys1, nys2, …

Filter-Backed Features#

macroforecast.filters owns one-series filter and smoother callables. macroforecast.feature_engineering owns panel wrappers that turn those outputs into feature columns:

Feature wrapper

Direct filter

hp_filter_features()

filters.hp_filter()

hamilton_filter_features()

filters.hamilton_filter()

savitzky_golay_features()

filters.savitzky_golay()

wavelet_features()

filters.wavelet_filter()

adaptive_ma_rf_features()

filters.albama()

For AlbaMA method details, R-code alignment, and weight extraction, see Filters. adaptive_ma_rf_features() stores full AlbaMAResult objects in attrs["macroforecast_feature_weight_results"] so feature_analysis.effective_window() and feature_analysis.recent_weight_share() can inspect learned weights.

features = mf.feature_engineering.adaptive_ma_rf_features(
    processed.panel,
    columns=["CPIAUCSL", "INDPRO"],
    sided="one",
)

albama_results = features.attrs["macroforecast_feature_weight_results"]

random_projection_features() and nystroem_features() fit on complete rows of the provided input and warn by default because the direct helpers are full-input fitted helpers. For strict origin-by-origin forecasting, use random_projection_step() and nystroem_step() inside feature_spec(); the runner fits the projection/kernel state on the feature-fit panel and reuses the fixed state for validation/test rows.

hamilton_filter_features() follows Hamilton’s regression form: y[t+h] = a + b_0 y[t] + ... + b_{p-1} y[t-p+1] + e[t+h]. The fitted value is stored as the trend and the residual as the cycle, both labeled at t+h. Defaults h=8, p=4 match the common quarterly setting; for monthly panels, h=24, p=12 is the usual analogue. The default fit_policy="expanding" estimates each row with only earlier completed Hamilton-regression rows. fit_policy="full_sample" reproduces the ordinary in-sample filter style and warns by default because it can use future information relative to a forecasting origin.

feature_spec#

macroforecast.feature_engineering.feature_spec(
    *,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    predictors: Literal["all"] | Iterable[str] | None = None,
    lags: Iterable[int] | int | None = (0, 1),
    target_lags: Iterable[int] | int | None = None,
    rolling_windows: Iterable[int] | int | None = None,
    rolling_min_periods: int | None = None,
    add_time: bool = False,
    time_trend: bool = True,
    time_month: bool = False,
    time_quarter: bool = False,
    time_year: bool = False,
    pca_components: int | None = None,
    pca_columns: Iterable[str] | None = None,
    pca_scale: bool = True,
    pca_prefix: str = "pc",
    steps: Iterable[Mapping[str, object]] | None = None,
    feature_steps: Iterable[Mapping[str, object]] | None = None,
    include_original: bool = False,
    target_transform: str = "level",
    target_mode: str = "direct",
    drop_missing: bool = True,
    metadata: Mapping[str, object] | None = None,
) -> FeatureSpec

feature_spec() is the runner-safe feature contract. It is fitted by forecasting.run() according to feature_policy, so stateful choices such as scaling, PCA, grouped PCA, and MAF are estimated on the allowed training/reference panel and reused when transforming validation/test rows.

Input#

Name

Type

Default

Meaning

target, targets

string/iterable or None

from input metadata

Target column or target columns.

horizon, horizons

positive int/iterable or None

from input, then (1,)

Forecast horizon choices.

predictors

"all", iterable, or None

from input, then all non-target columns

Predictor columns.

lags

int, iterable, or None

(0, 1)

Predictor lags. 0 includes the current predictor value at the forecast origin. None disables ordinary predictor lags.

target_lags

int, iterable, or None

None

Explicit autoregressive target lags added to X while keeping target columns out of predictors. Use this for AR-X and recursive runner designs.

rolling_windows

positive int/iterable or None

None

Optional rolling means.

rolling_min_periods

positive int or None

window length

Minimum observations for rolling means.

add_time

bool

False

Add deterministic date features.

time_trend, time_month, time_quarter, time_year

bool

see signature

Which deterministic date features to add.

pca_components

positive int or None

None

Fit PCA on the allowed feature-fit panel and append fixed-loadings components.

pca_columns

iterable or None

predictors

Columns used for PCA.

pca_scale

bool

True

Standardize PCA inputs using the feature-fit panel.

pca_prefix

string

"pc"

PCA output prefix.

steps, feature_steps

iterable of step mappings or None

None

Fit-aware feature-step pipeline. Use public step builders for deterministic/fitted transforms: lag_step(), rolling_step(), moving_average_step(), marx_step(), transform_step(), seasonal_lag_step(), season_dummy_step(), fourier_step(), time_step(), polynomial_step(), interaction_step(), scale_step(), pca_step(), sparse_pca_chen_rohe_step(), varimax_step(), group_pca_step(), maf_step(), hamilton_step(), random_projection_step(), nystroem_step(), partial_least_squares_step(), and sliced_inverse_regression_step(). For selection, pass step mappings with method equal to one of variance_selection, correlation_selection, lasso_selection, lasso_path_selection, rfe_selection, boruta_selection, stability_selection, or genetic_selection. steps and feature_steps are aliases; provide only one.

include_original

bool

False

Include the original predictor panel as part of X when using steps.

target_transform

string

"level"

Same target choices as direct_target().

target_mode

string

"direct"

"direct" for horizon-level targets; "path" for step-level path targets.

drop_missing

bool

True

Drop rows with missing selected X or y during fit rows. Test rows are transformed without dropping by the runner.

metadata

mapping or None

None

User metadata stored inside the feature spec record.

When steps are supplied, they replace the shortcut predictor options rolling_windows, add_time, and pca_components; use the corresponding step builders instead. The default lags=(0, 1) shortcut is also not used in step mode unless you explicitly add a lag_step(). target_lags is not a predictor shortcut; it is appended as a separate autoregressive target block after the step pipeline, so paper-style designs can combine steps=[...] with target_lags=range(0, 13).

Output#

Returns FeatureSpec. Important methods:

Method

Output

Meaning

.fit(data)

FittedFeatureBuilder

Fits reusable feature state for PCA/scaling/grouped PCA/MAF steps on the supplied panel.

.fit_transform(data)

FeatureSet

Fits and transforms the same panel.

.to_dict()

dict

JSON-ready feature choices for result metadata.

.to_metadata()

dict

Compact runner metadata.

Fit-Aware Step Pipeline#

Step pipelines let the runner refit feature transformations inside each forecasting window:

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=1,
    predictors=["PAYEMS", "HOUST", "S&P 500"],
    steps=[
        mf.feature_engineering.scale_step(name="scaled", include=False),
        mf.feature_engineering.pca_step(
            name="pc",
            input="scaled",
            n_components=3,
            min_train_size=60,
            include=False,
        ),
        mf.feature_engineering.lag_step(name="pc_lag", input="pc", lags=range(0, 13)),
    ],
)

Each step has a name, method, input, and include flag. input="panel" reads the original predictor panel; input="<step name>" reads a prior step; input="target_panel" reads the resolved target columns. The last input is explicitly opt-in: predictors still reject target overlap, but paper designs that require target-derived feature blocks such as MARX_y or MAF_y can construct them without treating the target as an ordinary predictor. If include=False, the step is an intermediate fitted transformation and its output is not included in the final X, but its metadata is still recorded.

Example target-derived MARX block:

features = mf.feature_engineering.feature_spec(
    target="INDPRO",
    horizon=3,
    predictors=["PAYEMS", "UNRATE", "HOUST"],
    steps=[
        mf.feature_engineering.marx_step(name="MARX_X", max_lag=12),
        mf.feature_engineering.marx_step(
            name="MARX_y",
            input="target_panel",
            columns=["INDPRO"],
            max_lag=12,
        ),
    ],
    target_lags=range(0, 13),
)

Stateful step builders are interpreted as fixed-fit transformations inside FeatureSpec: the runner’s feature_policy determines which rows are used to fit the step. Any fit_policy value inherited from reusable step builders is ignored in feature_spec() mode because the runner owns the temporal fit policy.

Step builder

Runner-safe behavior

lag_step()

Deterministic lag transform.

rolling_step()

Deterministic rolling mean transform.

moving_average_step()

Deterministic moving-average ladder.

marx_step()

MARX increasing lag averages; with scale_lags=True, fits lag-matrix center/scale on the feature-fit panel and reuses fixed parameters.

transform_step()

Deterministic column transform: log, diff, log_diff, pct_change, or cumsum.

seasonal_lag_step()

Deterministic seasonal lag such as 12-month or 4-quarter lag blocks.

season_dummy_step()

Deterministic month or quarter date dummies from the index.

fourier_step()

Deterministic Fourier seasonal terms from the index.

time_step()

Deterministic trend, month, quarter, and year columns from the index.

polynomial_step()

Deterministic polynomial expansion.

interaction_step()

Deterministic pure interaction terms.

scale_step()

Fits center/scale on the feature-fit panel, then applies fixed parameters.

pca_step()

Fits PCA loadings on the feature-fit panel, then applies fixed loadings.

sparse_pca_chen_rohe_step()

Fits Chen-Rohe sparse loadings on the feature-fit panel, then applies fixed loadings; optional var_innovations=True fits the VAR(1) residual mapping on the same feature-fit panel.

varimax_step()

Fits an orthogonal rotation on factor-score columns from the feature-fit panel, then applies the fixed rotation.

group_pca_step()

Fits separate PCA states inside named groups.

maf_step()

Fits variable-specific lag-panel PCA states for Moving Average Factors.

hamilton_step()

Fits Hamilton-regression beta on the feature-fit panel, then applies fixed beta to train/validation/test rows.

random_projection_step()

Fits a Gaussian random-projection transformer on the feature-fit panel and applies fixed components.

nystroem_step()

Fits Nystroem kernel-approximation landmarks on the feature-fit panel and applies fixed components.

partial_least_squares_step()

Fits PLS components against the single resolved target on the feature-fit panel and applies fixed weights.

sliced_inverse_regression_step()

Fits target-sliced directions on the feature-fit panel and applies fixed directions.

variance_selection, correlation_selection, lasso_selection, lasso_path_selection, rfe_selection, boruta_selection, stability_selection, genetic_selection

Select columns on the feature-fit panel and reuse the selected columns. Use these as method strings in step mappings, not as step-builder functions.

In feature_spec() mode, hamilton_step() ignores the reusable step’s fit_policy argument because the runner’s feature_policy owns the allowed fit rows. The fitted state records fit_policy="fixed_fit_panel". Runner-safe Hamilton currently requires missing="drop"; impute missing values in preprocessing before using it. The direct helper and compose_features() still support missing="interpolate" for one-shot exploratory construction.

Direct pandas functions and runner-safe step builders are intentionally paired:

Direct function

Runner-safe step

Fit state?

Typical use

lag()

lag_step()

No

Add current/lagged predictors.

rolling_mean()

rolling_step()

No

Add trailing rolling means.

moving_average_ladder()

moving_average_step()

No

Add multi-scale moving-average blocks.

moving_average_ladder(..., shift=1) / feature_matrix(..., "MARX")

marx_step()

Only when scale_lags=True

Add MARX increasing lag averages.

transform_features() and wrappers such as log_features() / diff_features()

transform_step()

No

Add ML-side transforms after preprocessing.

seasonal_lag()

seasonal_lag_step()

No

Add seasonal lag blocks.

season_dummy()

season_dummy_step()

No

Add calendar dummies.

fourier_features()

fourier_step()

No

Add deterministic seasonal Fourier terms.

time_features()

time_step() or feature_spec(add_time=True, ...)

No

Add deterministic trend/month/quarter/year terms.

polynomial_features()

polynomial_step()

No

Add nonlinear expansions.

interaction_features()

interaction_step()

No

Add cross-products.

scale_features()

scale_step()

Yes

Fit center/scale on allowed rows.

pca_features()

pca_step()

Yes

Fit PCA loadings on allowed rows.

sparse_pca_chen_rohe_features()

sparse_pca_chen_rohe_step()

Yes

Fit Chen-Rohe sparse component loadings on allowed rows.

varimax_features()

varimax_step()

Yes

Fit orthogonal factor rotation on allowed rows.

group_pca()

group_pca_step()

Yes

Fit separate PCA states by group.

maf_features()

maf_step()

Yes

Fit variable-specific lag-panel PCA states.

hamilton_filter_features()

hamilton_step()

Yes

Fit Hamilton-regression beta on allowed rows, then apply fixed beta.

random_projection_features()

random_projection_step()

Yes

Fit Gaussian random-projection state on allowed rows.

nystroem_features()

nystroem_step()

Yes

Fit Nystroem kernel landmarks on allowed rows.

partial_least_squares_features()

partial_least_squares_step()

Yes

Fit PLS scores against the resolved target on allowed rows.

sliced_inverse_regression_features()

sliced_inverse_regression_step()

Yes

Fit SIR directions against the resolved target on allowed rows.

variance_selection()

{"method": "variance_selection", ...}

Yes

Select columns by sample variance on allowed rows; no target required.

correlation_selection()

{"method": "correlation_selection", ...}

Yes

Select columns by target correlation on allowed rows.

lasso_selection()

{"method": "lasso_selection", ...}

Yes

Select columns by lasso coefficient magnitude on allowed rows.

lasso_path_selection()

{"method": "lasso_path_selection", ...}

Yes

Select columns by lasso-path inclusion frequency on allowed rows.

rfe_selection()

{"method": "rfe_selection", ...}

Yes

Select columns by recursive feature elimination on allowed rows.

boruta_selection()

{"method": "boruta_selection", ...}

Yes

Select columns by Boruta-style shadow-feature tests on allowed rows.

stability_selection()

{"method": "stability_selection", ...}

Yes

Select columns by sparse-model subsampling frequency on allowed rows.

genetic_selection()

{"method": "genetic_selection", ...}

Yes

Select columns by genetic subset search on allowed rows.

The remaining helpers remain callable but are intentionally not accepted as FeatureSpec step methods yet:

Helper

Why not a runner-safe step yet

mixed_frequency_lags()

It changes the date anchor and native-frequency lookup calendar. This belongs with mixed-frequency data/model design, not ordinary same-index feature steps.

hp_filter_features()

HP filtering is two-sided on the supplied sample. It remains direct-only and warns by default; use hamilton_step() for a runner-safe trend/cycle filter.

savitzky_golay_features()

The smoother uses a centered local window over the supplied sample. It remains direct-only and warns by default; use trailing rolling_step() when a past-only smoother is needed.

build_features() remains broader for one-shot construction, including feature_specification="F-X-MARX" and feature_specification="F-X-MAF". Use it when you want to materialize a complete FeatureSet first. Use feature_spec(..., steps=...) when the feature transformations themselves must be refit inside forecasting.run() according to the window design.

build_features#

macroforecast.feature_engineering.build_features(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizon: int | None = None,
    horizons: Iterable[int] | int | None = None,
    predictors: Literal["all"] | Iterable[str] | None = None,
    lags: Iterable[int] | int = (0, 1),
    target_lags: Iterable[int] | int | None = None,
    rolling_windows: Iterable[int] | int | None = None,
    rolling_min_periods: int | None = None,
    add_time: bool = False,
    time_trend: bool = True,
    time_month: bool = False,
    time_quarter: bool = False,
    time_year: bool = False,
    feature_steps: Iterable[Mapping[str, object]] | None = None,
    feature_specification: str | Iterable[str] | None = None,
    include_original: bool = False,
    level_data: feature input | None = None,
    max_lag: int = 12,
    n_factors: int = 8,
    n_maf_components: int = 2,
    feature_fit_policy: str = "expanding",
    feature_min_train_size: int | None = None,
    feature_warn_full_sample: bool = True,
    include_current_factor: bool = True,
    scale_factors: bool = True,
    scale_marx: bool = False,
    scale_maf: bool = False,
    target_transform: str = "level",
    target_mode: str = "direct",
    drop_missing: bool = True,
) -> FeatureSet

Input#

Name

Type

Default

Choices

target, targets

string/iterable or None

from DataSpec/PreprocessedData

Target column choices. One of them is required if the input does not already define targets.

horizon, horizons

positive int/iterable or None

from input, then (1,)

Forecast horizons.

predictors

"all", iterable, or None

from input, then all non-target columns

Predictor columns. Target columns are rejected as predictors.

lags

int or iterable

(0, 1)

Current value plus lag one by default. lags=3 means 1, 2, 3; lags=0 means current values only; pass exact iterables when needed.

target_lags

int, iterable, or None

None

Add autoregressive target-lag columns to X while still keeping targets out of ordinary predictors. target_lags=range(0, 13) means current target transform plus 12 past lags.

rolling_windows

positive int/iterable or None

None

Add rolling-mean features for each window.

rolling_min_periods

positive int or None

window length

Passed to rolling_mean().

add_time

bool

False

Add deterministic date features.

time_trend, time_month, time_quarter, time_year

bool

True, False, False, False

Which date features to include when add_time=True.

feature_steps

iterable of mappings or None

None

If supplied, use compose_features() instead of the simple lag/rolling/time defaults. Mutually exclusive with feature_specification.

feature_specification

string/iterable or None

None

If supplied, use feature_matrix() blocks such as "F-X-MARX" or "F-X-MAF" instead of the simple lag/rolling/time defaults.

include_original

bool

False

Include original predictors when feature_steps is supplied.

level_data

feature input or None

None

Passed to feature_matrix() when feature_specification includes LEVEL/H.

max_lag

positive int

12

Passed to feature_matrix() for MARX and MAF.

n_factors

positive int

8

Number of F factors when feature_specification uses F.

n_maf_components

positive int

2

MAF components per source variable when feature_specification uses MAF.

feature_fit_policy

str

"expanding"

Fit policy passed to feature_matrix() fitted transforms.

feature_min_train_size

positive int or None

None

Minimum complete rows passed to feature_matrix() fitted transforms.

feature_warn_full_sample

bool

True

Warn when block-based fitted transforms use feature_fit_policy="full_sample".

include_current_factor

bool

True

Force lag 0 for the F block.

scale_factors

bool

True

Scale variables before F PCA.

scale_marx

bool

False

Apply optional author R-code lag-matrix scaling for MARX.

scale_maf

bool

False

Scale MAF lag panels before PCA.

target_transform

str

"level"

Same choices as direct_target(transform=...).

target_mode

str

"direct"

"direct" returns horizon-level target columns. "path" returns step-level target columns from path_targets().

drop_missing

bool

True

Drop rows where any selected X or y column is missing.

target_mode="path" is a target-construction shortcut only. It does not fit or forecast one model per step; that belongs in the model stage. It also does not average forecasts; horizon-level forecast averaging belongs in evaluation. The returned FeatureSet.y contains step columns, and metadata records which step columns belong to each requested horizon.

Output#

Returns FeatureSet.

Field

Type

Meaning

X

pandas.DataFrame

Predictor matrix aligned on forecast origin dates.

y

pandas.DataFrame

Direct horizon targets or path step targets aligned to X.

metadata

dict

Input metadata plus a feature_engineering stage.

feature_metadata

pandas.DataFrame

Generated-feature provenance. Core columns are feature, step, block, operation, source, parameter, lag, window, component, fit_policy, inputs, and included.

target_metadata

pandas.DataFrame

Target-column provenance. Core columns are target_column, source, horizon, step, mode, transform, operation, formula, aggregation, and used_for_horizons.

target, targets, horizons, predictors

scalar/tuple fields

Resolved study choices.

FeatureSet supports tuple unpacking:

X, y, metadata = features

Metadata#

metadata["feature_engineering"] records:

Key

Meaning

input_panel

Shape, date range, columns, missing count, and inferred index frequency.

predictors, targets, horizons

Resolved study choices.

target_transform

Target formula choice.

target_mode

"direct" or "path".

path_target_columns_by_horizon

Step columns to average later when target_mode="path".

lags, target_lags, rolling_windows, rolling_min_periods

Predictor and autoregressive target-lag construction choices.

feature_specification

feature_matrix() block specification when used.

feature_matrix

feature_matrix() options when block-based features are used.

feature_steps

Ordered composition steps when compose_features() is used through build_features().

time

Deterministic date-feature choices.

drop_missing

Whether rows with missing X or y values were removed.

output

Final row count, feature count, target count, and sample dates.

Feature Metadata#

Each feature-producing function attaches macroforecast_feature_metadata to the returned DataFrame. build_features() exposes the same table as FeatureSet.feature_metadata.

The table is normalized through a single schema helper. The first columns are always:

Column

Meaning

feature

Generated feature column name.

step

Producing step name when created through compose_features() or feature_spec(..., steps=...); otherwise empty.

block

Paper-style block such as X, F, MARX, MAF, or LEVEL when created by feature_matrix().

operation

Operation family, for example lag, rolling_mean, marx, pca, pct_change, season_dummy, or interaction.

source

Main source column, source group, or date for calendar features.

parameter

Compact parameter string such as lag=1, window=3, component=1, or periods=1.

lag, window, component

Parsed numeric fields when the feature name/operation carries them.

fit_policy

Fitting policy for stateful transforms. In feature_spec() this is fixed_fit_panel for fit-aware steps.

inputs

Comma-separated source columns used by the feature.

included

True when the feature is included in final X; False for intermediate pipeline steps.

Extra columns are preserved after the standard columns. For example, mixed_frequency_lags() adds source-frequency and lookup-calendar fields. The metadata frame also carries attrs["macroforecast_metadata_schema"] = {"kind": "feature_metadata", "version": 1, ...}.

features = mf.feature_engineering.build_features(
    processed,
    feature_specification="F-X-MARX",
    lags=range(0, 13),
    max_lag=12,
    n_factors=8,
)

features.feature_metadata.loc[
    features.feature_metadata["feature"] == "MARX__INDPRO_ma3_lag1",
    ["block", "operation", "source", "window", "lag"],
]

This records that MARX__INDPRO_ma3_lag1 came from the MARX block, source series INDPRO, window 3, and lag 1. Intermediate compose_features() steps are also recorded; the included column marks whether a step output is part of the final X matrix.

Target Metadata#

Target-producing functions attach macroforecast_target_metadata to the returned target frame. build_features() exposes the same table as FeatureSet.target_metadata.

features = mf.feature_engineering.build_features(
    processed,
    target="INDPRO",
    horizons=[1, 3, 6],
    target_transform="growth",
)

features.target_metadata.loc[
    features.target_metadata["target_column"] == "INDPRO_growth_h3",
    ["source", "horizon", "mode", "transform", "formula"],
]

For direct targets, horizon is the forecast horizon and step is empty. For path targets, step identifies the future step and used_for_horizons records which requested horizons later consume that step forecast. This keeps the target construction, model-stage step fitting, and evaluation-stage averaging separate while preserving the contract in metadata.

Error Conditions#

Condition

Result

Input is not a canonical panel-like object

TypeError.

Target is missing and input has no target metadata

ValueError.

Target/predictor names are not in the panel

ValueError.

Predictors include target columns

ValueError.

Horizons, windows, or min periods are non-positive

ValueError.

Feature construction leaves no aligned rows

ValueError.