macroforecast.data#

Back to reference

Purpose#

macroforecast.data is the data entry point for the package. It loads official or user-supplied data, normalizes it to one pandas panel contract, and attaches source metadata. It also creates run-level data specifications and combines national FRED-MD/FRED-QD data with state-level FRED-SD panels.

This module does not apply stationarity transforms, outlier rules, imputation, feature engineering, model fitting, or evaluation. Those steps happen later. The main output is always a DataBundle or DataSpec.

The usual flow is:

import macroforecast as mf

bundle = mf.data.load_fred_md()

data_spec = mf.data.spec(
    bundle,
    target="INDPRO",
    horizons=[1, 3, 6, 12],
    start="1960-01",
    end="2024-12",
    predictors="all",
)

mf.data.spec(...) is not a wrapper that runs data loading, preprocessing, feature engineering, or modeling. It is a small contract builder for the already-loaded panel. It validates the requested target, horizons, sample window, and predictor set; subsets the panel to those columns and dates; expands predictors="all" to concrete non-target columns; and records the choices in metadata. Later callable stages can consume the same DataSpec without guessing which columns or horizons the run intended to use.

Public Functions#

Function

Purpose

Output

load_fred_md

Load official FRED-MD current or vintage data.

DataBundle

load_fred_qd

Load official FRED-QD current or vintage data.

DataBundle

load_fred_sd

Load official FRED-SD state-level panel data.

DataBundle

load_fred_md_sd

Load and combine FRED-MD with FRED-SD.

DataBundle

load_fred_qd_sd

Load and combine FRED-QD with FRED-SD.

DataBundle

load_custom_csv

Load a user CSV into the canonical panel contract.

DataBundle

load_custom_parquet

Load a user Parquet file into the canonical panel contract.

DataBundle

custom_dataset

Build a custom dataset from an in-memory DataFrame.

DataBundle

combine

Concatenate loaded bundles and optionally align frequency.

DataBundle

list_vintages

Generate supported monthly vintage labels for a dataset.

list[str]

as_panel

Normalize a DataFrame to the canonical panel contract.

pandas.DataFrame

validate_panel

Validate the canonical panel contract.

None

panel_info

Summarize panel shape, dates, missingness, and frequency.

dict

metadata

Extract explicit package metadata from data-like input.

dict

attach_metadata

Merge one metadata stage into an existing metadata dictionary.

dict

set_frequencies

Attach column-level native/output frequency metadata.

DataBundle

spec

Attach target, horizon, sample, and predictor choices.

DataSpec

align_frequency

Keep, filter, or align panel columns to a common frequency.

DataBundle

chow_lin_disaggregate

Disaggregate low-frequency series with a high-frequency indicator.

pandas.Series

infer_frequencies

Read or infer native frequency by column.

(dict[str, str], str)

frequency_hardening_issues

Report columns with weak frequency classification.

list[dict]

availability_lag

Delay selected columns to encode release availability.

DataBundle

same_period_predictors

Allow, lag, drop, or reject same-period predictors in a DataSpec.

DataSpec

define_regime

Attach a binary regime definition to metadata, optionally as a column.

DataBundle

Public Classes And Types#

Symbol

Meaning

DataBundle

Canonical panel plus metadata returned by loaders and data-policy helpers.

DataSpec

Canonical panel plus target, horizon, sample, and predictor choices for a run.

RegimeDirection

Stored threshold direction type: "above", "below", "equal", or "not_equal".

SamePeriodPolicy

Stored same-period predictor policy type: "allow", "lag", "drop", or "forbid".

Canonical Panel#

Every public loader returns a DataBundle.

panel = bundle.panel
metadata = bundle.metadata

DataBundle also supports tuple unpacking:

panel, metadata = mf.data.load_fred_md()

Panel Contract#

Property

Required Value

Type

pandas.DataFrame

Index

pandas.DatetimeIndex

Index name

"date"

Sort order

ascending date order

Duplicate dates

not allowed

Columns

variable IDs

Values

numeric values or NaN

Empty panel

not allowed

Infinite values

not allowed

Metadata is explicit on DataBundle.metadata. The panel also carries panel.attrs["macroforecast_metadata"] for pandas-native handoff. FRED-MD and FRED-QD transform codes are attached to panel.attrs["macroforecast_transform_codes"]; preprocessing is responsible for using them.

Panel normalization is strict by default. Invalid date values, non-numeric cells that would be coerced to NaN, duplicate dates, empty panels, and infinite values raise errors. When a caller deliberately sets strict=False, lossy normalization is allowed but recorded in panel.attrs["macroforecast_panel_report"] and metadata["panel"] when the panel is returned inside a DataBundle.

macroforecast_panel_report contains:

Key

Meaning

contract

Panel contract version, currently macroforecast_panel_v1.

strict

Whether lossy date/numeric coercion was rejected.

input_rows, output_rows

Row count before and after panel normalization.

input_columns, output_columns

Column names before and after selection/renaming.

date_source

Date source used: a column name or "index".

invalid_date_rows_dropped

Number of invalid date rows dropped when strict=False.

numeric_coercion

Count and examples of non-numeric cells coerced to NaN when strict=False.

Metadata Contract#

Every loader writes a metadata dictionary with these common keys.

Key

Type

Meaning

dataset

str

Dataset identifier such as fred_md, fred_qd, fred_sd, fred_md+fred_sd, or fred_qd+fred_sd.

frequency

str

Loader-level frequency label: monthly, quarterly, weekly, annual, mixed, unknown, or the chosen combined frequency.

version_mode

str

current, vintage, or mixed for combined inputs with different modes.

vintage

str or None

Requested vintage label in YYYY-MM form, or None for current data.

data_through

str or None

Last date present in the loaded panel, formatted as YYYY-MM.

support_tier

str

stable for official loaders, provisional for user-supplied files.

parse_notes

tuple[str, ...]

Loader notes, including discouraged frequency alignments for combined datasets.

artifact

dict or None

Raw-file provenance for single-source loads; combined bundles use None.

transform_codes

dict[str, int]

Official FRED-MD/FRED-QD t-codes when available. FRED-SD has no official t-code map.

Combined bundles add:

Key

Type

Meaning

source_family

str

Combined-source label currently set to "combined".

combined_sources

list[dict]

Full metadata dictionaries from the source bundles.

source_by_column

dict[str, str]

Source dataset for each output column.

native_frequency_by_column

dict[str, str]

Original frequency for each output column before alignment.

native_frequency_counts

dict[str, int]

Count of columns by original frequency.

date_anchor_by_column

dict[str, str]

FRED-SD date-anchor map for state columns when available.

date_anchor_counts

dict[str, int]

Count of FRED-SD date-anchor patterns when available.

output_frequency_by_column

dict[str, str]

Frequency represented in the returned panel for each output column.

output_frequency_counts

dict[str, int]

Count of columns by returned-panel frequency.

frequency_conversion_warnings

list[dict]

Records of monthly-to-quarterly or quarterly-to-monthly conversions.

alignment

dict

Chosen target frequency, alignment rules, and source-level alignment summaries.

Public metadata helpers and policy types:

Symbol

Meaning

attach_metadata

Return metadata with one stage key merged in a pandas-safe way. Used by loaders, preprocessing, analysis, and runner outputs.

RegimeDirection

Stored threshold direction type for define_regime(...): "above", "below", "equal", or "not_equal".

SamePeriodPolicy

Stored same-period predictor policy type for same_period_predictors(...): "allow", "lag", "drop", or "forbid".

DataBundle#

macroforecast.data.DataBundle(
    panel: pandas.DataFrame,
    metadata: dict,
)

Output#

Field

Type

Meaning

panel

pandas.DataFrame

Canonical date-indexed data panel.

metadata

dict

Source, vintage, artifact, frequency, and transform-code metadata.

Methods#

Method

Input

Output

Meaning

attach(stage, values)

stage: str, values: Mapping

DataBundle

Return a new bundle with one metadata stage added.

Preprocessing outputs can use the same metadata-attachment pattern.

DataSpec#

macroforecast.data.DataSpec(
    panel: pandas.DataFrame,
    metadata: dict,
    target: str | None,
    targets: tuple[str, ...],
    horizons: tuple[int, ...],
    start: str | None = None,
    end: str | None = None,
    predictors: "all" | tuple[str, ...] = "all",
)

DataSpec is the output of spec(...). It keeps the canonical panel and metadata together with the target, horizons, sample window, and predictor selection for a run.

Output#

Field

Type

Meaning

panel

pandas.DataFrame

Canonical date-indexed data panel after sample and column selection.

metadata

dict

Source metadata plus a data_spec stage.

target

str or None

Single target column when target= was used.

targets

tuple[str, ...]

Active target columns.

horizons

tuple[int, ...]

Positive forecast horizons.

start, end

str or None

Normalized sample bounds.

predictors

tuple[str, ...]

Concrete non-target predictor columns.

Methods#

Method

Input

Output

Meaning

attach(stage, values)

stage: str, values: Mapping

DataSpec

Return a new spec with one metadata stage added.

DataSpec also supports tuple unpacking:

panel, metadata = data_spec

load_fred_md#

Load FRED-MD and return DataBundle.

macroforecast.data.load_fred_md(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_source: str | pathlib.Path | None = None,
    local_zip_source: str | pathlib.Path | None = None,
) -> DataBundle

Input#

Name

Type

Default

Meaning

vintage

str | None

None

Vintage in YYYY-MM form. None loads current.

force

bool

False

Re-download or re-copy even if cache exists.

cache_root

path-like or None

None

Raw cache root.

local_source

path-like or None

None

Local CSV source instead of download.

local_zip_source

path-like or None

None

Optional local historical zip override. Without it, vintage requests automatically download the official FRED-MD historical archive and extract the requested CSV.

Output#

Returns DataBundle with a monthly FRED-MD panel and metadata. The official CSV transform row is parsed into metadata["transform_codes"] and panel.attrs["macroforecast_transform_codes"].

See FRED-MD for dataset-specific details. See FRED-MD + FRED-SD for the combined monthly national/state loader.

load_fred_qd#

Load FRED-QD and return DataBundle.

macroforecast.data.load_fred_qd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_source: str | pathlib.Path | None = None,
    local_zip_source: str | pathlib.Path | None = None,
) -> DataBundle

Input#

Name

Type

Default

Meaning

vintage

str | None

None

Vintage in YYYY-MM form. None loads current.

force

bool

False

Re-download or re-copy even if cache exists.

cache_root

path-like or None

None

Raw cache root.

local_source

path-like or None

None

Local CSV source instead of download.

local_zip_source

path-like or None

None

Optional local historical zip override. Without it, vintage requests automatically download the official FRED-QD historical archive and extract the requested CSV.

Output#

Returns a quarterly canonical panel. The official CSV transform row is parsed into metadata["transform_codes"] and panel.attrs["macroforecast_transform_codes"].

See FRED-QD for dataset-specific details. See FRED-QD + FRED-SD for the combined quarterly national/state loader.

load_fred_sd#

Load FRED-SD and return DataBundle.

macroforecast.data.load_fred_sd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_source: str | pathlib.Path | None = None,
    states: list[str] | None = None,
    variables: list[str] | None = None,
) -> DataBundle

Input#

Name

Type

Default

Meaning

states

list[str] | None

None

Optional state subset.

variables

list[str] | None

None

Optional FRED-SD variable subset.

FRED-SD columns are wide variable-state IDs such as UR_CA. The loader also adds panel.attrs["macrocast_reports"]["fred_sd_series_metadata"], which records each column’s state, FRED-SD variable, observed date range, non-missing count, native frequency, and date-anchor pattern inferred from the official series workbook. The same frequency and date-anchor maps are exposed in metadata["native_frequency_by_column"], metadata["native_frequency_counts"], metadata["date_anchor_by_column"], metadata["date_anchor_counts"], and metadata["state_summary"].

For vintage="YYYY-MM", FRED-SD uses the official by-series workbook path. It tries series-YYYY-MM.xlsx first and then falls back to the official by-series zip archive containing that workbook. There is no local_zip_source parameter for FRED-SD because local overrides are supplied as local_source= with either an official workbook or a canonical wide CSV.

See FRED-SD for mixed-frequency state-series details and t-code limitations. See FRED-MD + FRED-SD and FRED-QD + FRED-SD for combined-loader behavior.

load_fred_md_sd#

Load FRED-MD and FRED-SD, align them to one panel, and return DataBundle.

macroforecast.data.load_fred_md_sd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_fred_md_source: str | pathlib.Path | None = None,
    local_fred_sd_source: str | pathlib.Path | None = None,
    states: list[str] | None = None,
    variables: list[str] | None = None,
    frequency: str = "monthly",
    quarterly_to_monthly: str = "repeat_within_quarter",
    monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle

Purpose#

Use this when the outcome or main state panel is monthly and national macroeconomic controls should come from FRED-MD. This is the recommended combined dataset for monthly state analysis.

Input#

Name

Type

Default

Meaning

vintage

str | None

None

Vintage label shared across FRED-MD and FRED-SD.

force

bool

False

Re-download or re-copy raw sources.

cache_root

path-like or None

None

Raw cache root used by both loaders.

local_fred_md_source

path-like or None

None

Local FRED-MD CSV source.

local_fred_sd_source

path-like or None

None

Local FRED-SD workbook or CSV source.

states

list[str] | None

None

FRED-SD state subset.

variables

list[str] | None

None

FRED-SD variable subset.

frequency

str

"monthly"

"monthly", "quarterly", or "native". Quarterly is supported but not recommended for this loader.

quarterly_to_monthly

str

"repeat_within_quarter"

Rule used if an included FRED-SD series is quarterly and the target panel is monthly.

monthly_to_quarterly

str

"quarterly_average"

Rule used only when frequency="quarterly".

Output#

Returns a combined DataBundle with:

  • metadata["dataset"] == "fred_md+fred_sd"

  • metadata["source_family"] == "combined"

  • metadata["frequency"] == frequency

  • FRED-MD official t-codes in metadata["transform_codes"]

  • FRED-SD series metadata preserved in panel.attrs["macrocast_reports"]

  • FRED-SD source-frequency and date-anchor maps in metadata["native_frequency_by_column"] and metadata["date_anchor_by_column"]

  • any frequency conversions recorded in metadata["frequency_conversion_warnings"]

If a quarterly FRED-SD series is included in a monthly panel, the function emits a UserWarning and records the conversion. The default quarterly_to_monthly="repeat_within_quarter" assigns the quarterly value to each month inside the quarter.

load_fred_qd_sd#

Load FRED-QD and FRED-SD, align them to one panel, and return DataBundle.

macroforecast.data.load_fred_qd_sd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_fred_qd_source: str | pathlib.Path | None = None,
    local_fred_sd_source: str | pathlib.Path | None = None,
    states: list[str] | None = None,
    variables: list[str] | None = None,
    frequency: str = "quarterly",
    quarterly_to_monthly: str = "repeat_within_quarter",
    monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle

Purpose#

Use this when the target or outcome is quarterly and national controls should come from FRED-QD. This is the recommended combined dataset for quarterly state-level analysis.

Input#

Name

Type

Default

Meaning

vintage

str | None

None

Vintage label shared across FRED-QD and FRED-SD.

force

bool

False

Re-download or re-copy raw sources.

cache_root

path-like or None

None

Raw cache root used by both loaders.

local_fred_qd_source

path-like or None

None

Local FRED-QD CSV source.

local_fred_sd_source

path-like or None

None

Local FRED-SD workbook or CSV source.

states

list[str] | None

None

FRED-SD state subset.

variables

list[str] | None

None

FRED-SD variable subset.

frequency

str

"quarterly"

"quarterly", "monthly", or "native". Monthly is supported but not recommended for this loader.

quarterly_to_monthly

str

"repeat_within_quarter"

Rule used only when frequency="monthly".

monthly_to_quarterly

str

"quarterly_average"

Rule used if an included FRED-SD series is monthly and the target panel is quarterly.

Output#

Returns a combined DataBundle with:

  • metadata["dataset"] == "fred_qd+fred_sd"

  • metadata["source_family"] == "combined"

  • metadata["frequency"] == frequency

  • FRED-QD official t-codes in metadata["transform_codes"]

  • FRED-SD series metadata preserved in panel.attrs["macrocast_reports"]

  • FRED-SD source-frequency and date-anchor maps in metadata["native_frequency_by_column"] and metadata["date_anchor_by_column"]

  • any frequency conversions recorded in metadata["frequency_conversion_warnings"]

If a monthly FRED-SD series is included in a quarterly panel, the function emits a UserWarning and records the conversion. The default monthly_to_quarterly="quarterly_average" averages monthly observations inside each quarter.

combine#

Combine already-loaded DataBundle objects into one canonical panel.

macroforecast.data.combine(
    *bundles,
    dataset: str | None = None,
    frequency: str = "native",
    quarterly_to_monthly: str = "repeat_within_quarter",
    monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle

Input#

Name

Type

Default

Choices

*bundles

DataBundle

required

Two or more bundles to concatenate by date index.

dataset

str or None

joined source names

Output dataset label.

frequency

str

"native"

"native", "monthly", or "quarterly".

quarterly_to_monthly

str

"repeat_within_quarter"

"repeat_within_quarter", "quarter_end_ffill", "linear_interpolation".

monthly_to_quarterly

str

"quarterly_average"

"quarterly_average", "quarterly_endpoint", "quarterly_sum".

With frequency="native" or frequency="mixed", no monthly/quarterly conversion is applied. The returned panel keeps each source column on its native observation dates and records metadata["frequency"] == "mixed". Quarterly columns therefore appear as sparse columns on the union date index when they are combined with monthly columns. Downstream mixed-frequency models should read metadata["native_frequency_by_column"] rather than infer frequency from the overall index.

Frequency Conversion Rules#

Direction

Rule

Meaning

quarterly to monthly

repeat_within_quarter

Assign the quarterly value to each month in that quarter.

quarterly to monthly

quarter_end_ffill

Place the quarterly value at quarter end and forward-fill after it is observed.

quarterly to monthly

linear_interpolation

Interpolate between observed quarter-end values on the monthly grid.

monthly to quarterly

quarterly_average

Average monthly observations in the quarter.

monthly to quarterly

quarterly_endpoint

Use the last monthly observation in the quarter.

monthly to quarterly

quarterly_sum

Sum monthly observations in the quarter.

Combined monthly/quarterly output supports only source columns identified as monthly or quarterly. If a source contains weekly, annual, irregular, or unknown-frequency columns, combine() raises ValueError. Use frequency="native" to inspect the mixed panel first, then call mf.data.align_frequency() explicitly if those columns should enter a common monthly or quarterly design.

Output#

Returns DataBundle. The panel is a column-wise concatenation after frequency alignment. Duplicate output column names raise ValueError.

For mixed outputs, the key metadata fields are:

Key

Meaning

metadata["frequency"]

"mixed".

metadata["native_frequency_by_column"]

Native source frequency for each column.

metadata["native_frequency_counts"]

Counts of native source frequencies.

metadata["date_anchor_by_column"]

FRED-SD date-anchor map when available.

metadata["date_anchor_counts"]

Counts of FRED-SD date-anchor patterns when available.

metadata["output_frequency_by_column"]

Returned-panel frequency for each column; equal to native frequency in native mode.

metadata["alignment"]["frequency"]

"native" when no conversion was applied.

Frequency Conversion Warnings#

When combine() changes a source column’s native frequency, it emits UserWarning and records the same information in metadata["frequency_conversion_warnings"].

Each record has:

Key

Type

Meaning

dataset

str

Source dataset whose columns were converted.

from_frequency

str

Native frequency before alignment.

to_frequency

str

Combined panel frequency.

rule

str

Alignment rule used.

variables

list[str]

Variable-level names, e.g. ["NQGSP"] for NQGSP_CA.

columns

list[str]

Exact converted panel columns.

n_columns

int

Number of converted columns.

Example warning:

fred_sd monthly variables were aligned to quarterly using quarterly_average:
UR, ICLAIMS (102 columns).

load_custom_csv#

Load a user CSV and normalize it to the canonical panel contract.

macroforecast.data.load_custom_csv(
    path,
    *,
    date: str | None = None,
    date_col: str | int | None = None,
    columns: Iterable[str] | None = None,
    series_columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    dataset: str = "custom",
    frequency: str = "unknown",
    frequency_by_column: Mapping[str, str] | None = None,
    default_frequency: str | None = None,
    metadata: Mapping[str, object] | None = None,
    transform_codes: Mapping[str, int] | None = None,
    cache_root: str | pathlib.Path | None = None,
    strict: bool = True,
) -> DataBundle

Input#

Name

Type

Default

Meaning

path

path-like

required

CSV file path.

date

str | None

None

Date column. If omitted, uses a DatetimeIndex or parses the first column.

date_col

str | int | None

None

Alias for date; integer values select the date column by zero-based position.

columns

iterable or None

None

Columns to keep before renaming.

series_columns

iterable or None

None

Alias for columns; use this name when thinking in panel series IDs.

rename

mapping or None

None

Column rename map.

dataset

str

"custom"

Metadata dataset label.

frequency

str

"unknown"

Metadata frequency label.

frequency_by_column

mapping or None

None

Optional final-column frequency map, e.g. {"PAYEMS": "monthly", "GDPC1": "quarterly"}.

default_frequency

str or None

None

Fill frequency for columns omitted from frequency_by_column.

metadata

mapping or None

None

User metadata to attach.

transform_codes

mapping or None

None

Optional McCracken-Ng t-code map. Keys must match final loaded series columns after selection and renaming.

cache_root

path-like or None

None

If supplied, append a raw-manifest entry under this cache root. Custom loaders do not write the default manifest unless this is supplied.

strict

bool

True

Reject invalid date rows and non-numeric cells instead of silently coercing them. Set False only when you want a permissive load with a panel report.

Output#

Returns a DataBundle. The normalized panel is available as bundle.panel and metadata as bundle.metadata. If transform_codes is provided, it is stored in both bundle.metadata["transform_codes"] and bundle.panel.attrs["macroforecast_transform_codes"], so mf.preprocessing.reprocess(bundle) can use the codes automatically.

Custom loaders also store the strict-normalization report at bundle.metadata["panel"]. With strict=True, malformed dates or non-numeric cells raise RawParseError wrapping the underlying validation error. With strict=False, those lossy operations are allowed and counted.

If frequency_by_column is provided, custom loaders call set_frequencies(...) internally and write the same mixed-frequency metadata contract used by official combined bundles. The keys must match final loaded column names after selection and renaming.

Example:

bundle = mf.data.load_custom_csv(
    "panel.csv",
    date_col="DATE",
    series_columns=["INDPRO", "spread"],
    frequency="monthly",
    transform_codes={"INDPRO": 5, "spread": 2},
)

processed = mf.preprocessing.reprocess(bundle)

load_custom_parquet#

Load a user Parquet file with the same normalization contract as load_custom_csv.

macroforecast.data.load_custom_parquet(
    path,
    *,
    date: str | None = None,
    date_col: str | int | None = None,
    columns: Iterable[str] | None = None,
    series_columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    dataset: str = "custom",
    frequency: str = "unknown",
    frequency_by_column: Mapping[str, str] | None = None,
    default_frequency: str | None = None,
    metadata: Mapping[str, object] | None = None,
    transform_codes: Mapping[str, int] | None = None,
    cache_root: str | pathlib.Path | None = None,
    strict: bool = True,
) -> DataBundle

Input#

Name

Type

Default

Meaning

path

path-like

required

Parquet file path.

date

str | None

None

Date column. If omitted, uses a DatetimeIndex or parses the first column.

date_col

str | int | None

None

Alias for date; integer values select the date column by zero-based position.

columns

iterable or None

None

Columns to keep before renaming.

series_columns

iterable or None

None

Alias for columns.

rename

mapping or None

None

Column rename map.

dataset

str

"custom"

Metadata dataset label.

frequency

str

"unknown"

Metadata frequency label.

frequency_by_column

mapping or None

None

Optional final-column frequency map.

default_frequency

str or None

None

Fill frequency for columns omitted from frequency_by_column.

metadata

mapping or None

None

User metadata to attach.

transform_codes

mapping or None

None

Optional McCracken-Ng t-code map. Keys must match final loaded series columns after selection and renaming.

cache_root

path-like or None

None

If supplied, append a raw-manifest entry under this cache root.

strict

bool

True

Reject invalid date rows and non-numeric cells instead of silently coercing them.

Output#

Returns a DataBundle with the same canonical panel, metadata, transform-code, strict-normalization, and optional mixed-frequency contract as load_custom_csv.

custom_dataset#

Build a custom DataBundle from an in-memory pandas DataFrame.

Use custom_dataset() when the data are already in Python memory and should enter the same contract as load_fred_md(), load_fred_qd(), load_fred_sd(), load_custom_csv(), and load_custom_parquet().

macroforecast.data.custom_dataset(
    frame,
    *,
    date: str | None = None,
    columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    dataset: str = "custom",
    source_family: str = "custom",
    frequency: str = "unknown",
    frequency_by_column: Mapping[str, str] | None = None,
    transform_codes: Mapping[str, int] | None = None,
    metadata: Mapping[str, object] | None = None,
    strict: bool = True,
) -> DataBundle

Input#

Name

Type

Default

Meaning

frame

pandas.DataFrame

required

Raw or already canonical panel.

date

str or None

None

Date column. If omitted, the input must have a DatetimeIndex or a parseable first column.

columns

iterable or None

None

Columns to keep before renaming.

rename

mapping or None

None

Rename retained columns after selection.

dataset

str

"custom"

Dataset label stored in metadata.

source_family

str

"custom"

Source-family label stored in metadata.

frequency

str

"unknown"

Loader-level frequency label.

frequency_by_column

mapping or None

None

Optional column-level frequency map for mixed-frequency panels.

transform_codes

mapping or None

None

Optional t-code map. Keys must match final panel columns.

metadata

mapping or None

None

User metadata merged before package metadata is attached.

strict

bool

True

Reject lossy date or numeric coercion.

Output#

Returns DataBundle. The panel is canonical and the metadata includes dataset, source_family, frequency, optional transform_codes, optional column-level frequency metadata, and a custom_dataset stage.

bundle = mf.data.custom_dataset(
    frame,
    date="date",
    dataset="bank_panel",
    frequency="monthly",
    transform_codes={"loan_growth": 1, "spread": 2},
)

processed = mf.preprocessing.reprocess(
    bundle,
    transform="custom",
    impute="mean",
)

as_panel#

Normalize an existing pandas DataFrame.

macroforecast.data.as_panel(
    frame,
    *,
    date: str | None = None,
    columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    metadata: Mapping[str, object] | None = None,
    strict: bool = True,
) -> pandas.DataFrame

as_panel returns a canonical panel. It raises if the date column is missing, dates are duplicated, the output is empty, infinite values are present, or any retained column cannot be represented as numeric values or NaN.

Input#

Name

Type

Default

Meaning

frame

pandas.DataFrame

required

Raw or already canonical panel.

date

str | None

None

Date column. If omitted and the index is not a DatetimeIndex, the first column is parsed as dates.

columns

iterable or None

None

Columns to keep before renaming.

rename

mapping or None

None

Rename retained columns after selection.

metadata

mapping or None

None

Metadata attached under panel.attrs["macroforecast_metadata"].

strict

bool

True

Reject lossy date/numeric coercion. False permits it and records a panel report.

Output#

Returns a pandas.DataFrame with DatetimeIndex named "date", ascending dates, numeric columns, and attrs containing macroforecast_panel_report.

validate_panel#

Validate the canonical panel contract.

macroforecast.data.validate_panel(panel) -> None

Raises TypeError or ValueError when the panel is not canonical.

panel_info#

Return a compact panel summary.

macroforecast.data.panel_info(bundle_or_panel) -> dict

Output keys include n_rows, n_columns, start, end, columns, missing_values, frequency, and index_frequency. If the input carries metadata, frequency uses the metadata label such as "mixed" while index_frequency reports the pandas-inferred date-index frequency. Combined data also include compact native/output frequency counts.

set_frequencies#

Attach a column-level frequency contract to an existing panel or bundle.

macroforecast.data.set_frequencies(
    data,
    frequency_by_column,
    *,
    default_frequency: str | None = None,
    output_frequency_by_column: Mapping[str, str] | None = None,
    frequency: str | None = None,
    metadata: Mapping[str, object] | None = None,
) -> DataBundle

Input#

Name

Type

Default

Meaning

data

DataBundle, DataSpec, (panel, metadata), or DataFrame

required

Canonical panel input.

frequency_by_column

mapping

required

Native frequency for each final panel column.

default_frequency

str or None

None

Fill omitted columns with one frequency.

output_frequency_by_column

mapping or None

None

Returned-panel frequency for each column; defaults to native frequency.

frequency

str or None

None

Overall metadata label. Defaults to the unique native frequency or "mixed".

metadata

mapping or None

None

Extra metadata to merge before writing frequency fields.

Allowed column frequencies are monthly, quarterly, weekly, annual, irregular, and unknown, with short aliases such as m, q, and w. For mixed-frequency DFM models, monthly and quarterly columns are the relevant contract.

Output#

Returns a DataBundle with:

Metadata key

Meaning

frequency

Overall label, usually "mixed" when multiple native frequencies are present.

native_frequency_by_column

Native frequency for each column.

native_frequency_counts

Counts by native frequency.

output_frequency_by_column

Frequency represented in the returned panel for each column.

output_frequency_counts

Counts by output frequency.

metadata#

Return explicit metadata from a DataBundle, DataSpec, (panel, metadata) tuple, or DataFrame.

macroforecast.data.metadata(obj) -> dict

Input#

Name

Type

Meaning

obj

DataBundle, DataSpec, (panel, metadata), or DataFrame

Object carrying package metadata.

Output#

Returns a shallow copy of the metadata dictionary. Mutating the returned object does not mutate the original bundle or panel attrs.

attach_metadata#

Merge one metadata stage into an existing metadata dictionary.

macroforecast.data.attach_metadata(
    metadata,
    stage: str,
    values,
) -> dict

Input#

Name

Type

Meaning

metadata

mapping

Existing metadata dictionary.

stage

str

Non-empty stage key to write, such as "data_spec" or "data_frequency_alignment".

values

mapping

Stage payload to copy under stage.

Output#

Returns a new dictionary. Existing metadata is copied, then values is copied under the requested stage. attach_metadata() does not mutate its input.

spec#

Attach run-level data choices to a bundle or panel. This function creates a DataSpec; it does not execute downstream pipeline steps.

macroforecast.data.spec(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizons: Iterable[int] | int | None = None,
    start: str | None = None,
    end: str | None = None,
    predictors: "all" | Iterable[str] = "all",
) -> DataSpec

Input#

Name

Type

Default

Meaning

data

DataBundle, DataSpec, (panel, metadata), or DataFrame

required

Canonical data input.

metadata

mapping or None

None

Extra metadata to merge.

target

str | None

None

Single target column.

targets

iterable or None

None

Multiple target columns.

horizons

iterable, int, or None

derived

Forecast horizons.

start

str | None

None

Start date. Accepts YYYY, YYYY-MM, or YYYY-MM-DD.

end

str | None

None

End date. Accepts YYYY, YYYY-MM, or YYYY-MM-DD.

predictors

“all” | iterable

"all"

Predictor columns to keep. "all" expands to all non-target columns. Explicit predictor lists may be empty for target-only or autoregressive designs, and may not include target columns.

Default Horizons#

Metadata frequency

Default horizons

monthly

(1, 3, 6, 12)

quarterly

(1, 2, 4, 8)

other or unknown

(1,)

Output#

Returns DataSpec. Its metadata contains a data_spec entry with the chosen target, targets, horizons, sample dates, expanded predictor list, and panel summary. This expansion is deliberate: downstream model stages should consume a concrete non-target predictor list, not infer from the full panel and risk target leakage.

What It Does And Does Not Do#

Action

Done by mf.data.spec(...)?

Validate the canonical panel contract

Yes

Validate target and predictor columns

Yes

Expand predictors="all" to all non-target columns

Yes

Apply start and end sample bounds

Yes

Attach metadata["data_spec"]

Yes

Load raw data

No

Transform, clean, impute, or standardize values

No

Create forecast targets or lagged predictors

No

Fit models or run evaluation

No

Data Policy Helpers#

These functions are direct Python replacements for the old data-policy axes. They do not parse YAML and do not fit models.

align_frequency#

macroforecast.data.align_frequency(
    data,
    *,
    method: str = "keep",
    quarterly_to_monthly: str = "repeat_within_quarter",
    weekly_to_monthly: str = "mean",
    monthly_to_quarterly: str = "quarterly_average",
    weekly_to_quarterly: str = "mean",
    chow_lin_indicator: str | Mapping[str, str] | None = None,
    chow_lin_aggregation: str = "mean",
    chow_lin_rho: float | None = None,
    chow_lin_rho_method: str = "fixed",
) -> DataBundle

Keeps, filters, or aligns a panel to a common data frequency. This belongs in macroforecast.data because it changes the calendar and column-level frequency contract before preprocessing or feature engineering.

Input

Default

Choices

method

"keep"

"keep", "monthly", "quarterly", "drop_non_monthly", "drop_non_quarterly"

quarterly_to_monthly

"repeat_within_quarter"

"repeat_within_quarter", "step_backward", "step_forward", "quarter_end_ffill", "linear_interpolation", "chow_lin"

weekly_to_monthly

"mean"

"mean", "last", "sum"

monthly_to_quarterly

"quarterly_average"

"quarterly_average", "quarterly_endpoint", "quarterly_sum"

weekly_to_quarterly

"mean"

"mean", "last", "sum"

chow_lin_indicator

None

Indicator column name, or mapping from quarterly column to indicator column, used only when quarterly_to_monthly="chow_lin".

chow_lin_aggregation

"mean"

"mean" or "sum"; the low-frequency aggregation to conserve.

chow_lin_rho

None

Fixed AR(1) residual correlation. If supplied, must be inside (-1, 1).

chow_lin_rho_method

"fixed"

"fixed", "min_chi_squared", or "max_likelihood".

Output is a DataBundle. Metadata records data_frequency_alignment, native_frequency_by_column, output_frequency_by_column, and frequency counts. Frequency detection uses native_frequency_by_column first, then FRED-SD series reports, then observed-date spacing.

monthly = mf.data.align_frequency(
    mixed_bundle,
    method="monthly",
    quarterly_to_monthly="repeat_within_quarter",
)

For quarterly-to-monthly alignment, step_backward is accepted as an alias for repeat_within_quarter; the latter is the clearer spelling. Use quarter_end_ffill when values should only become available from the quarter-end month forward.

Use quarterly_to_monthly="chow_lin" when a quarterly series should be regression-disaggregated with a monthly indicator:

monthly = mf.data.align_frequency(
    mixed_bundle,
    method="monthly",
    quarterly_to_monthly="chow_lin",
    chow_lin_indicator={"GDPC1": "INDPRO"},
    chow_lin_aggregation="mean",
)

This preserves the supplied quarterly observations when the output is re-aggregated by the declared chow_lin_aggregation. The function records the indicator and rho choices in metadata["data_frequency_alignment"].

chow_lin_disaggregate#

macroforecast.data.chow_lin_disaggregate(
    low_frequency,
    indicator,
    *,
    aggregation: str = "mean",
    rho: float | None = None,
    rho_method: str = "fixed",
) -> pandas.Series

Direct Chow-Lin quarterly-to-monthly style disaggregation. low_frequency is a low-frequency Series, and indicator is a higher-frequency Series or a single/first-column DataFrame. The returned series is indexed like the indicator and conserves low_frequency under aggregation="mean" or aggregation="sum".

rho_method="fixed" uses rho when supplied and 0.0 otherwise. "min_chi_squared" and "max_likelihood" estimate rho over a bounded grid.

infer_frequencies#

macroforecast.data.infer_frequencies(data) -> tuple[dict[str, str], str]

infer_frequencies() returns (frequency_by_column, source). The source is "native_frequency_by_column", "fred_sd_series_metadata", or "observed_dates".

frequency_hardening_issues#

macroforecast.data.frequency_hardening_issues(
    frequencies,
) -> list[dict]

Reports columns classified as unknown, irregular, or annual before a caller aligns frequencies. This is useful before forcing a mixed panel to monthly or quarterly frequency.

Output key

Meaning

frequency

Weak frequency class.

columns

Columns assigned to that class.

n_columns

Number of affected columns.

availability_lag#

macroforecast.data.availability_lag(
    data,
    *,
    lags: int | Mapping[str, int] = 1,
    columns: Iterable[str] | None = None,
    drop_missing: bool = False,
) -> DataBundle

Positive lags delay predictor availability. lags=1 means the value dated t-1 is the latest available value on row t. Pass a mapping for column-specific release lags.

same_period_predictors#

macroforecast.data.same_period_predictors(
    data_spec,
    *,
    policy: "allow" | "lag" | "drop" | "forbid" = "allow",
    lag: int = 1,
    columns: Iterable[str] | None = None,
    drop_missing: bool = False,
) -> DataSpec

allow records the choice, lag shifts selected predictors, drop removes them from the active predictor set, and forbid raises if such predictors are present. Targets are never shifted by this helper.

define_regime#

macroforecast.data.define_regime(
    data,
    *,
    name: str = "regime",
    column: str | None = None,
    threshold: float | None = None,
    direction: "above" | "below" | "equal" | "not_equal" = "above",
    dates: Iterable[str | pandas.Timestamp] | None = None,
    values: Sequence[bool | int | float] | pandas.Series | None = None,
    append: bool = False,
    output_column: str | None = None,
) -> DataBundle

Exactly one regime source is required: threshold rule, explicit dates, or an aligned vector/Series. The regime is stored in metadata["regimes"]; set append=True to also add a numeric indicator column to the panel.

Vintage Helpers#

list_vintages#

Generate monthly vintage labels for a supported dataset.

macroforecast.data.list_vintages(
    dataset: str,
    start: str | None = None,
    end: str | None = None,
) -> list[str]

Input#

Name

Type

Default

Meaning

dataset

str

required

One of fred_md, fred_qd, fred_sd, fred_md+fred_sd, or fred_qd+fred_sd.

start

str | None

first supported vintage

Start vintage in YYYY-MM form.

end

str | None

required

End vintage in YYYY-MM form.

Output#

Returns candidate monthly vintage labels. The selected vintage is passed to load_fred_md, load_fred_qd, or load_fred_sd through vintage=.

end is required because the function does not inspect remote availability.

Official Source Pages#