macroforecast.data#

Purpose#

macroforecast.data is the data entry point for the package. It loads official or user-supplied data, normalizes it to one pandas panel contract, and attaches source metadata. It also creates run-level data specifications and combines national FRED-MD/FRED-QD data with state-level FRED-SD panels.

This module does not apply stationarity transforms, outlier rules, imputation, feature engineering, model fitting, or evaluation. Those steps happen later. The main output is always a DataBundle or DataSpec.

The usual flow is:

import macroforecast as mf

bundle = mf.data.load_fred_md()

data_spec = mf.data.spec(
    bundle,
    target="INDPRO",
    horizons=[1, 3, 6, 12],
    start="1960-01",
    end="2024-12",
    predictors="all",
)

mf.data.spec(...) is not a wrapper that runs data loading, preprocessing, feature engineering, or modeling. It is a small contract builder for the already-loaded panel. It validates the requested target, horizons, sample window, and predictor set; subsets the panel to those columns and dates; expands predictors="all" to concrete non-target columns; and records the choices in metadata. Later callable stages can consume the same DataSpec without guessing which columns or horizons the run intended to use.

Public Functions#

Function	Purpose	Output
`load_fred_md`	Load official FRED-MD current or vintage data.	`DataBundle`
`load_fred_qd`	Load official FRED-QD current or vintage data.	`DataBundle`
`load_fred_sd`	Load official FRED-SD state-level panel data.	`DataBundle`
`load_fred_md_sd`	Load and combine FRED-MD with FRED-SD.	`DataBundle`
`load_fred_qd_sd`	Load and combine FRED-QD with FRED-SD.	`DataBundle`
`load_custom_csv`	Load a user CSV into the canonical panel contract.	`DataBundle`
`load_custom_parquet`	Load a user Parquet file into the canonical panel contract.	`DataBundle`
`custom_dataset`	Build a custom dataset from an in-memory `DataFrame`.	`DataBundle`
`combine`	Concatenate loaded bundles and optionally align frequency.	`DataBundle`
`list_vintages`	Generate supported monthly vintage labels for a dataset.	`list[str]`
`as_panel`	Normalize a `DataFrame` to the canonical panel contract.	`pandas.DataFrame`
`validate_panel`	Validate the canonical panel contract.	`None`
`panel_info`	Summarize panel shape, dates, missingness, and frequency.	`dict`
`metadata`	Extract explicit package metadata from data-like input.	`dict`
`attach_metadata`	Merge one metadata stage into an existing metadata dictionary.	`dict`
`set_frequencies`	Attach column-level native/output frequency metadata.	`DataBundle`
`spec`	Attach target, horizon, sample, and predictor choices.	`DataSpec`
`align_frequency`	Keep, filter, or align panel columns to a common frequency.	`DataBundle`
`chow_lin_disaggregate`	Disaggregate low-frequency series with a high-frequency indicator.	`pandas.Series`
`infer_frequencies`	Read or infer native frequency by column.	`(dict[str, str], str)`
`frequency_hardening_issues`	Report columns with weak frequency classification.	`list[dict]`
`availability_lag`	Delay selected columns to encode release availability.	`DataBundle`
`same_period_predictors`	Allow, lag, drop, or reject same-period predictors in a `DataSpec`.	`DataSpec`
`define_regime`	Attach a binary regime definition to metadata, optionally as a column.	`DataBundle`

Public Classes And Types#

Symbol	Meaning
`DataBundle`	Canonical panel plus metadata returned by loaders and data-policy helpers.
`DataSpec`	Canonical panel plus target, horizon, sample, and predictor choices for a run.
`RegimeDirection`	Stored threshold direction type: `"above"`, `"below"`, `"equal"`, or `"not_equal"`.
`SamePeriodPolicy`	Stored same-period predictor policy type: `"allow"`, `"lag"`, `"drop"`, or `"forbid"`.

Canonical Panel#

Every public loader returns a DataBundle.

panel = bundle.panel
metadata = bundle.metadata

DataBundle also supports tuple unpacking:

panel, metadata = mf.data.load_fred_md()

Panel Contract#

Property	Required Value
Type	`pandas.DataFrame`
Index	`pandas.DatetimeIndex`
Index name	`"date"`
Sort order	ascending date order
Duplicate dates	not allowed
Columns	variable IDs
Values	numeric values or `NaN`
Empty panel	not allowed
Infinite values	not allowed

Metadata is explicit on DataBundle.metadata. The panel also carries panel.attrs["macroforecast_metadata"] for pandas-native handoff. FRED-MD and FRED-QD transform codes are attached to panel.attrs["macroforecast_transform_codes"]; preprocessing is responsible for using them.

Panel normalization is strict by default. Invalid date values, non-numeric cells that would be coerced to NaN, duplicate dates, empty panels, and infinite values raise errors. When a caller deliberately sets strict=False, lossy normalization is allowed but recorded in panel.attrs["macroforecast_panel_report"] and metadata["panel"] when the panel is returned inside a DataBundle.

macroforecast_panel_report contains:

Key	Meaning
`contract`	Panel contract version, currently `macroforecast_panel_v1`.
`strict`	Whether lossy date/numeric coercion was rejected.
`input_rows`, `output_rows`	Row count before and after panel normalization.
`input_columns`, `output_columns`	Column names before and after selection/renaming.
`date_source`	Date source used: a column name or `"index"`.
`invalid_date_rows_dropped`	Number of invalid date rows dropped when `strict=False`.
`numeric_coercion`	Count and examples of non-numeric cells coerced to `NaN` when `strict=False`.

Metadata Contract#

Every loader writes a metadata dictionary with these common keys.

Key	Type	Meaning
`dataset`	`str`	Dataset identifier such as `fred_md`, `fred_qd`, `fred_sd`, `fred_md+fred_sd`, or `fred_qd+fred_sd`.
`frequency`	`str`	Loader-level frequency label: `monthly`, `quarterly`, `weekly`, `annual`, `mixed`, `unknown`, or the chosen combined frequency.
`version_mode`	`str`	`current`, `vintage`, or `mixed` for combined inputs with different modes.
`vintage`	`str` or `None`	Requested vintage label in `YYYY-MM` form, or `None` for current data.
`data_through`	`str` or `None`	Last date present in the loaded panel, formatted as `YYYY-MM`.
`support_tier`	`str`	`stable` for official loaders, `provisional` for user-supplied files.
`parse_notes`	`tuple[str, ...]`	Loader notes, including discouraged frequency alignments for combined datasets.
`artifact`	`dict` or `None`	Raw-file provenance for single-source loads; combined bundles use `None`.
`transform_codes`	`dict[str, int]`	Official FRED-MD/FRED-QD t-codes when available. FRED-SD has no official t-code map.

Combined bundles add:

Key	Type	Meaning
`source_family`	`str`	Combined-source label currently set to `"combined"`.
`combined_sources`	`list[dict]`	Full metadata dictionaries from the source bundles.
`source_by_column`	`dict[str, str]`	Source dataset for each output column.
`native_frequency_by_column`	`dict[str, str]`	Original frequency for each output column before alignment.
`native_frequency_counts`	`dict[str, int]`	Count of columns by original frequency.
`date_anchor_by_column`	`dict[str, str]`	FRED-SD date-anchor map for state columns when available.
`date_anchor_counts`	`dict[str, int]`	Count of FRED-SD date-anchor patterns when available.
`output_frequency_by_column`	`dict[str, str]`	Frequency represented in the returned panel for each output column.
`output_frequency_counts`	`dict[str, int]`	Count of columns by returned-panel frequency.
`frequency_conversion_warnings`	`list[dict]`	Records of monthly-to-quarterly or quarterly-to-monthly conversions.
`alignment`	`dict`	Chosen target frequency, alignment rules, and source-level alignment summaries.

Public metadata helpers and policy types:

Symbol	Meaning
`attach_metadata`	Return metadata with one stage key merged in a pandas-safe way. Used by loaders, preprocessing, analysis, and runner outputs.
`RegimeDirection`	Stored threshold direction type for `define_regime(...)`: `"above"`, `"below"`, `"equal"`, or `"not_equal"`.
`SamePeriodPolicy`	Stored same-period predictor policy type for `same_period_predictors(...)`: `"allow"`, `"lag"`, `"drop"`, or `"forbid"`.

DataBundle#

macroforecast.data.DataBundle(
    panel: pandas.DataFrame,
    metadata: dict,
)

Output#

Field	Type	Meaning
`panel`	`pandas.DataFrame`	Canonical date-indexed data panel.
`metadata`	`dict`	Source, vintage, artifact, frequency, and transform-code metadata.

Methods#

Method	Input	Output	Meaning
`attach(stage, values)`	`stage: str`, `values: Mapping`	`DataBundle`	Return a new bundle with one metadata stage added.

Preprocessing outputs can use the same metadata-attachment pattern.

DataSpec#

macroforecast.data.DataSpec(
    panel: pandas.DataFrame,
    metadata: dict,
    target: str | None,
    targets: tuple[str, ...],
    horizons: tuple[int, ...],
    start: str | None = None,
    end: str | None = None,
    predictors: "all" | tuple[str, ...] = "all",
)

DataSpec is the output of spec(...). It keeps the canonical panel and metadata together with the target, horizons, sample window, and predictor selection for a run.

Output#

Field	Type	Meaning
`panel`	`pandas.DataFrame`	Canonical date-indexed data panel after sample and column selection.
`metadata`	`dict`	Source metadata plus a `data_spec` stage.
`target`	`str` or `None`	Single target column when `target=` was used.
`targets`	`tuple[str, ...]`	Active target columns.
`horizons`	`tuple[int, ...]`	Positive forecast horizons.
`start`, `end`	`str` or `None`	Normalized sample bounds.
`predictors`	`tuple[str, ...]`	Concrete non-target predictor columns.

Methods#

Method	Input	Output	Meaning
`attach(stage, values)`	`stage: str`, `values: Mapping`	`DataSpec`	Return a new spec with one metadata stage added.

DataSpec also supports tuple unpacking:

panel, metadata = data_spec

load_fred_md#

Load FRED-MD and return DataBundle.

macroforecast.data.load_fred_md(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_source: str | pathlib.Path | None = None,
    local_zip_source: str | pathlib.Path | None = None,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`vintage`	`str \| None`	`None`	Vintage in `YYYY-MM` form. `None` loads current.
`force`	`bool`	`False`	Re-download or re-copy even if cache exists.
`cache_root`	path-like or `None`	`None`	Raw cache root.
`local_source`	path-like or `None`	`None`	Local CSV source instead of download.
`local_zip_source`	path-like or `None`	`None`	Optional local historical zip override. Without it, vintage requests automatically download the official FRED-MD historical archive and extract the requested CSV.

Output#

Returns DataBundle with a monthly FRED-MD panel and metadata. The official CSV transform row is parsed into metadata["transform_codes"] and panel.attrs["macroforecast_transform_codes"].

See FRED-MD for dataset-specific details. See FRED-MD + FRED-SD for the combined monthly national/state loader.

load_fred_qd#

Load FRED-QD and return DataBundle.

macroforecast.data.load_fred_qd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_source: str | pathlib.Path | None = None,
    local_zip_source: str | pathlib.Path | None = None,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`vintage`	`str \| None`	`None`	Vintage in `YYYY-MM` form. `None` loads current.
`force`	`bool`	`False`	Re-download or re-copy even if cache exists.
`cache_root`	path-like or `None`	`None`	Raw cache root.
`local_source`	path-like or `None`	`None`	Local CSV source instead of download.
`local_zip_source`	path-like or `None`	`None`	Optional local historical zip override. Without it, vintage requests automatically download the official FRED-QD historical archive and extract the requested CSV.

Output#

Returns a quarterly canonical panel. The official CSV transform row is parsed into metadata["transform_codes"] and panel.attrs["macroforecast_transform_codes"].

See FRED-QD for dataset-specific details. See FRED-QD + FRED-SD for the combined quarterly national/state loader.

load_fred_sd#

Load FRED-SD and return DataBundle.

macroforecast.data.load_fred_sd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_source: str | pathlib.Path | None = None,
    states: list[str] | None = None,
    variables: list[str] | None = None,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`states`	`list[str] \| None`	`None`	Optional state subset.
`variables`	`list[str] \| None`	`None`	Optional FRED-SD variable subset.

FRED-SD columns are wide variable-state IDs such as UR_CA. The loader also adds panel.attrs["macrocast_reports"]["fred_sd_series_metadata"], which records each column’s state, FRED-SD variable, observed date range, non-missing count, native frequency, and date-anchor pattern inferred from the official series workbook. The same frequency and date-anchor maps are exposed in metadata["native_frequency_by_column"], metadata["native_frequency_counts"], metadata["date_anchor_by_column"], metadata["date_anchor_counts"], and metadata["state_summary"].

For vintage="YYYY-MM", FRED-SD uses the official by-series workbook path. It tries series-YYYY-MM.xlsx first and then falls back to the official by-series zip archive containing that workbook. There is no local_zip_source parameter for FRED-SD because local overrides are supplied as local_source= with either an official workbook or a canonical wide CSV.

See FRED-SD for mixed-frequency state-series details and t-code limitations. See FRED-MD + FRED-SD and FRED-QD + FRED-SD for combined-loader behavior.

load_fred_md_sd#

Load FRED-MD and FRED-SD, align them to one panel, and return DataBundle.

macroforecast.data.load_fred_md_sd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_fred_md_source: str | pathlib.Path | None = None,
    local_fred_sd_source: str | pathlib.Path | None = None,
    states: list[str] | None = None,
    variables: list[str] | None = None,
    frequency: str = "monthly",
    quarterly_to_monthly: str = "repeat_within_quarter",
    monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle

Purpose#

Use this when the outcome or main state panel is monthly and national macroeconomic controls should come from FRED-MD. This is the recommended combined dataset for monthly state analysis.

Input#

Name	Type	Default	Meaning
`vintage`	`str \| None`	`None`	Vintage label shared across FRED-MD and FRED-SD.
`force`	`bool`	`False`	Re-download or re-copy raw sources.
`cache_root`	path-like or `None`	`None`	Raw cache root used by both loaders.
`local_fred_md_source`	path-like or `None`	`None`	Local FRED-MD CSV source.
`local_fred_sd_source`	path-like or `None`	`None`	Local FRED-SD workbook or CSV source.
`states`	`list[str] \| None`	`None`	FRED-SD state subset.
`variables`	`list[str] \| None`	`None`	FRED-SD variable subset.
`frequency`	`str`	`"monthly"`	`"monthly"`, `"quarterly"`, or `"native"`. Quarterly is supported but not recommended for this loader.
`quarterly_to_monthly`	`str`	`"repeat_within_quarter"`	Rule used if an included FRED-SD series is quarterly and the target panel is monthly.
`monthly_to_quarterly`	`str`	`"quarterly_average"`	Rule used only when `frequency="quarterly"`.

Output#

Returns a combined DataBundle with:

metadata["dataset"] == "fred_md+fred_sd"
metadata["source_family"] == "combined"
metadata["frequency"] == frequency
FRED-MD official t-codes in metadata["transform_codes"]
FRED-SD series metadata preserved in panel.attrs["macrocast_reports"]
FRED-SD source-frequency and date-anchor maps in metadata["native_frequency_by_column"] and metadata["date_anchor_by_column"]
any frequency conversions recorded in metadata["frequency_conversion_warnings"]

If a quarterly FRED-SD series is included in a monthly panel, the function emits a UserWarning and records the conversion. The default quarterly_to_monthly="repeat_within_quarter" assigns the quarterly value to each month inside the quarter.

load_fred_qd_sd#

Load FRED-QD and FRED-SD, align them to one panel, and return DataBundle.

macroforecast.data.load_fred_qd_sd(
    vintage: str | None = None,
    *,
    force: bool = False,
    cache_root: str | pathlib.Path | None = None,
    local_fred_qd_source: str | pathlib.Path | None = None,
    local_fred_sd_source: str | pathlib.Path | None = None,
    states: list[str] | None = None,
    variables: list[str] | None = None,
    frequency: str = "quarterly",
    quarterly_to_monthly: str = "repeat_within_quarter",
    monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle

Purpose#

Use this when the target or outcome is quarterly and national controls should come from FRED-QD. This is the recommended combined dataset for quarterly state-level analysis.

Input#

Name	Type	Default	Meaning
`vintage`	`str \| None`	`None`	Vintage label shared across FRED-QD and FRED-SD.
`force`	`bool`	`False`	Re-download or re-copy raw sources.
`cache_root`	path-like or `None`	`None`	Raw cache root used by both loaders.
`local_fred_qd_source`	path-like or `None`	`None`	Local FRED-QD CSV source.
`local_fred_sd_source`	path-like or `None`	`None`	Local FRED-SD workbook or CSV source.
`states`	`list[str] \| None`	`None`	FRED-SD state subset.
`variables`	`list[str] \| None`	`None`	FRED-SD variable subset.
`frequency`	`str`	`"quarterly"`	`"quarterly"`, `"monthly"`, or `"native"`. Monthly is supported but not recommended for this loader.
`quarterly_to_monthly`	`str`	`"repeat_within_quarter"`	Rule used only when `frequency="monthly"`.
`monthly_to_quarterly`	`str`	`"quarterly_average"`	Rule used if an included FRED-SD series is monthly and the target panel is quarterly.

Output#

Returns a combined DataBundle with:

metadata["dataset"] == "fred_qd+fred_sd"
metadata["source_family"] == "combined"
metadata["frequency"] == frequency
FRED-QD official t-codes in metadata["transform_codes"]
FRED-SD series metadata preserved in panel.attrs["macrocast_reports"]
FRED-SD source-frequency and date-anchor maps in metadata["native_frequency_by_column"] and metadata["date_anchor_by_column"]
any frequency conversions recorded in metadata["frequency_conversion_warnings"]

If a monthly FRED-SD series is included in a quarterly panel, the function emits a UserWarning and records the conversion. The default monthly_to_quarterly="quarterly_average" averages monthly observations inside each quarter.

combine#

Combine already-loaded DataBundle objects into one canonical panel.

macroforecast.data.combine(
    *bundles,
    dataset: str | None = None,
    frequency: str = "native",
    quarterly_to_monthly: str = "repeat_within_quarter",
    monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle

Input#

Name	Type	Default	Choices
`*bundles`	`DataBundle`	required	Two or more bundles to concatenate by date index.
`dataset`	`str` or `None`	joined source names	Output dataset label.
`frequency`	`str`	`"native"`	`"native"`, `"monthly"`, or `"quarterly"`.
`quarterly_to_monthly`	`str`	`"repeat_within_quarter"`	`"repeat_within_quarter"`, `"quarter_end_ffill"`, `"linear_interpolation"`.
`monthly_to_quarterly`	`str`	`"quarterly_average"`	`"quarterly_average"`, `"quarterly_endpoint"`, `"quarterly_sum"`.

With frequency="native" or frequency="mixed", no monthly/quarterly conversion is applied. The returned panel keeps each source column on its native observation dates and records metadata["frequency"] == "mixed". Quarterly columns therefore appear as sparse columns on the union date index when they are combined with monthly columns. Downstream mixed-frequency models should read metadata["native_frequency_by_column"] rather than infer frequency from the overall index.

Frequency Conversion Rules#

Direction	Rule	Meaning
quarterly to monthly	`repeat_within_quarter`	Assign the quarterly value to each month in that quarter.
quarterly to monthly	`quarter_end_ffill`	Place the quarterly value at quarter end and forward-fill after it is observed.
quarterly to monthly	`linear_interpolation`	Interpolate between observed quarter-end values on the monthly grid.
monthly to quarterly	`quarterly_average`	Average monthly observations in the quarter.
monthly to quarterly	`quarterly_endpoint`	Use the last monthly observation in the quarter.
monthly to quarterly	`quarterly_sum`	Sum monthly observations in the quarter.

Combined monthly/quarterly output supports only source columns identified as monthly or quarterly. If a source contains weekly, annual, irregular, or unknown-frequency columns, combine() raises ValueError. Use frequency="native" to inspect the mixed panel first, then call mf.data.align_frequency() explicitly if those columns should enter a common monthly or quarterly design.

Output#

Returns DataBundle. The panel is a column-wise concatenation after frequency alignment. Duplicate output column names raise ValueError.

For mixed outputs, the key metadata fields are:

Key	Meaning
`metadata["frequency"]`	`"mixed"`.
`metadata["native_frequency_by_column"]`	Native source frequency for each column.
`metadata["native_frequency_counts"]`	Counts of native source frequencies.
`metadata["date_anchor_by_column"]`	FRED-SD date-anchor map when available.
`metadata["date_anchor_counts"]`	Counts of FRED-SD date-anchor patterns when available.
`metadata["output_frequency_by_column"]`	Returned-panel frequency for each column; equal to native frequency in native mode.
`metadata["alignment"]["frequency"]`	`"native"` when no conversion was applied.

Frequency Conversion Warnings#

When combine() changes a source column’s native frequency, it emits UserWarning and records the same information in metadata["frequency_conversion_warnings"].

Each record has:

Key	Type	Meaning
`dataset`	`str`	Source dataset whose columns were converted.
`from_frequency`	`str`	Native frequency before alignment.
`to_frequency`	`str`	Combined panel frequency.
`rule`	`str`	Alignment rule used.
`variables`	`list[str]`	Variable-level names, e.g. `["NQGSP"]` for `NQGSP_CA`.
`columns`	`list[str]`	Exact converted panel columns.
`n_columns`	`int`	Number of converted columns.

Example warning:

fred_sd monthly variables were aligned to quarterly using quarterly_average:
UR, ICLAIMS (102 columns).

load_custom_csv#

Load a user CSV and normalize it to the canonical panel contract.

macroforecast.data.load_custom_csv(
    path,
    *,
    date: str | None = None,
    date_col: str | int | None = None,
    columns: Iterable[str] | None = None,
    series_columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    dataset: str = "custom",
    frequency: str = "unknown",
    frequency_by_column: Mapping[str, str] | None = None,
    default_frequency: str | None = None,
    metadata: Mapping[str, object] | None = None,
    transform_codes: Mapping[str, int] | None = None,
    cache_root: str | pathlib.Path | None = None,
    strict: bool = True,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`path`	path-like	required	CSV file path.
`date`	`str \| None`	`None`	Date column. If omitted, uses a DatetimeIndex or parses the first column.
`date_col`	`str \| int \| None`	`None`	Alias for `date`; integer values select the date column by zero-based position.
`columns`	iterable or `None`	`None`	Columns to keep before renaming.
`series_columns`	iterable or `None`	`None`	Alias for `columns`; use this name when thinking in panel series IDs.
`rename`	mapping or `None`	`None`	Column rename map.
`dataset`	`str`	`"custom"`	Metadata dataset label.
`frequency`	`str`	`"unknown"`	Metadata frequency label.
`frequency_by_column`	mapping or `None`	`None`	Optional final-column frequency map, e.g. `{"PAYEMS": "monthly", "GDPC1": "quarterly"}`.
`default_frequency`	`str` or `None`	`None`	Fill frequency for columns omitted from `frequency_by_column`.
`metadata`	mapping or `None`	`None`	User metadata to attach.
`transform_codes`	mapping or `None`	`None`	Optional McCracken-Ng t-code map. Keys must match final loaded series columns after selection and renaming.
`cache_root`	path-like or `None`	`None`	If supplied, append a raw-manifest entry under this cache root. Custom loaders do not write the default manifest unless this is supplied.
`strict`	`bool`	`True`	Reject invalid date rows and non-numeric cells instead of silently coercing them. Set `False` only when you want a permissive load with a panel report.

Output#

Returns a DataBundle. The normalized panel is available as bundle.panel and metadata as bundle.metadata. If transform_codes is provided, it is stored in both bundle.metadata["transform_codes"] and bundle.panel.attrs["macroforecast_transform_codes"], so mf.preprocessing.reprocess(bundle) can use the codes automatically.

Custom loaders also store the strict-normalization report at bundle.metadata["panel"]. With strict=True, malformed dates or non-numeric cells raise RawParseError wrapping the underlying validation error. With strict=False, those lossy operations are allowed and counted.

If frequency_by_column is provided, custom loaders call set_frequencies(...) internally and write the same mixed-frequency metadata contract used by official combined bundles. The keys must match final loaded column names after selection and renaming.

Example:

bundle = mf.data.load_custom_csv(
    "panel.csv",
    date_col="DATE",
    series_columns=["INDPRO", "spread"],
    frequency="monthly",
    transform_codes={"INDPRO": 5, "spread": 2},
)

processed = mf.preprocessing.reprocess(bundle)

load_custom_parquet#

Load a user Parquet file with the same normalization contract as load_custom_csv.

macroforecast.data.load_custom_parquet(
    path,
    *,
    date: str | None = None,
    date_col: str | int | None = None,
    columns: Iterable[str] | None = None,
    series_columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    dataset: str = "custom",
    frequency: str = "unknown",
    frequency_by_column: Mapping[str, str] | None = None,
    default_frequency: str | None = None,
    metadata: Mapping[str, object] | None = None,
    transform_codes: Mapping[str, int] | None = None,
    cache_root: str | pathlib.Path | None = None,
    strict: bool = True,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`path`	path-like	required	Parquet file path.
`date`	`str \| None`	`None`	Date column. If omitted, uses a `DatetimeIndex` or parses the first column.
`date_col`	`str \| int \| None`	`None`	Alias for `date`; integer values select the date column by zero-based position.
`columns`	iterable or `None`	`None`	Columns to keep before renaming.
`series_columns`	iterable or `None`	`None`	Alias for `columns`.
`rename`	mapping or `None`	`None`	Column rename map.
`dataset`	`str`	`"custom"`	Metadata dataset label.
`frequency`	`str`	`"unknown"`	Metadata frequency label.
`frequency_by_column`	mapping or `None`	`None`	Optional final-column frequency map.
`default_frequency`	`str` or `None`	`None`	Fill frequency for columns omitted from `frequency_by_column`.
`metadata`	mapping or `None`	`None`	User metadata to attach.
`transform_codes`	mapping or `None`	`None`	Optional McCracken-Ng t-code map. Keys must match final loaded series columns after selection and renaming.
`cache_root`	path-like or `None`	`None`	If supplied, append a raw-manifest entry under this cache root.
`strict`	`bool`	`True`	Reject invalid date rows and non-numeric cells instead of silently coercing them.

Output#

Returns a DataBundle with the same canonical panel, metadata, transform-code, strict-normalization, and optional mixed-frequency contract as load_custom_csv.

custom_dataset#

Build a custom DataBundle from an in-memory pandas DataFrame.

Use custom_dataset() when the data are already in Python memory and should enter the same contract as load_fred_md(), load_fred_qd(), load_fred_sd(), load_custom_csv(), and load_custom_parquet().

macroforecast.data.custom_dataset(
    frame,
    *,
    date: str | None = None,
    columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    dataset: str = "custom",
    source_family: str = "custom",
    frequency: str = "unknown",
    frequency_by_column: Mapping[str, str] | None = None,
    transform_codes: Mapping[str, int] | None = None,
    metadata: Mapping[str, object] | None = None,
    strict: bool = True,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`frame`	`pandas.DataFrame`	required	Raw or already canonical panel.
`date`	`str` or `None`	`None`	Date column. If omitted, the input must have a `DatetimeIndex` or a parseable first column.
`columns`	iterable or `None`	`None`	Columns to keep before renaming.
`rename`	mapping or `None`	`None`	Rename retained columns after selection.
`dataset`	`str`	`"custom"`	Dataset label stored in metadata.
`source_family`	`str`	`"custom"`	Source-family label stored in metadata.
`frequency`	`str`	`"unknown"`	Loader-level frequency label.
`frequency_by_column`	mapping or `None`	`None`	Optional column-level frequency map for mixed-frequency panels.
`transform_codes`	mapping or `None`	`None`	Optional t-code map. Keys must match final panel columns.
`metadata`	mapping or `None`	`None`	User metadata merged before package metadata is attached.
`strict`	`bool`	`True`	Reject lossy date or numeric coercion.

Output#

Returns DataBundle. The panel is canonical and the metadata includes dataset, source_family, frequency, optional transform_codes, optional column-level frequency metadata, and a custom_dataset stage.

bundle = mf.data.custom_dataset(
    frame,
    date="date",
    dataset="bank_panel",
    frequency="monthly",
    transform_codes={"loan_growth": 1, "spread": 2},
)

processed = mf.preprocessing.reprocess(
    bundle,
    transform="custom",
    impute="mean",
)

as_panel#

Normalize an existing pandas DataFrame.

macroforecast.data.as_panel(
    frame,
    *,
    date: str | None = None,
    columns: Iterable[str] | None = None,
    rename: Mapping[str, str] | None = None,
    metadata: Mapping[str, object] | None = None,
    strict: bool = True,
) -> pandas.DataFrame

as_panel returns a canonical panel. It raises if the date column is missing, dates are duplicated, the output is empty, infinite values are present, or any retained column cannot be represented as numeric values or NaN.

Input#

Name	Type	Default	Meaning
`frame`	`pandas.DataFrame`	required	Raw or already canonical panel.
`date`	`str \| None`	`None`	Date column. If omitted and the index is not a `DatetimeIndex`, the first column is parsed as dates.
`columns`	iterable or `None`	`None`	Columns to keep before renaming.
`rename`	mapping or `None`	`None`	Rename retained columns after selection.
`metadata`	mapping or `None`	`None`	Metadata attached under `panel.attrs["macroforecast_metadata"]`.
`strict`	`bool`	`True`	Reject lossy date/numeric coercion. `False` permits it and records a panel report.

Output#

Returns a pandas.DataFrame with DatetimeIndex named "date", ascending dates, numeric columns, and attrs containing macroforecast_panel_report.

validate_panel#

Validate the canonical panel contract.

macroforecast.data.validate_panel(panel) -> None

Raises TypeError or ValueError when the panel is not canonical.

panel_info#

Return a compact panel summary.

macroforecast.data.panel_info(bundle_or_panel) -> dict

Output keys include n_rows, n_columns, start, end, columns, missing_values, frequency, and index_frequency. If the input carries metadata, frequency uses the metadata label such as "mixed" while index_frequency reports the pandas-inferred date-index frequency. Combined data also include compact native/output frequency counts.

set_frequencies#

Attach a column-level frequency contract to an existing panel or bundle.

macroforecast.data.set_frequencies(
    data,
    frequency_by_column,
    *,
    default_frequency: str | None = None,
    output_frequency_by_column: Mapping[str, str] | None = None,
    frequency: str | None = None,
    metadata: Mapping[str, object] | None = None,
) -> DataBundle

Input#

Name	Type	Default	Meaning
`data`	`DataBundle`, `DataSpec`, `(panel, metadata)`, or `DataFrame`	required	Canonical panel input.
`frequency_by_column`	mapping	required	Native frequency for each final panel column.
`default_frequency`	`str` or `None`	`None`	Fill omitted columns with one frequency.
`output_frequency_by_column`	mapping or `None`	`None`	Returned-panel frequency for each column; defaults to native frequency.
`frequency`	`str` or `None`	`None`	Overall metadata label. Defaults to the unique native frequency or `"mixed"`.
`metadata`	mapping or `None`	`None`	Extra metadata to merge before writing frequency fields.

Allowed column frequencies are monthly, quarterly, weekly, annual, irregular, and unknown, with short aliases such as m, q, and w. For mixed-frequency DFM models, monthly and quarterly columns are the relevant contract.

Output#

Returns a DataBundle with:

Metadata key	Meaning
`frequency`	Overall label, usually `"mixed"` when multiple native frequencies are present.
`native_frequency_by_column`	Native frequency for each column.
`native_frequency_counts`	Counts by native frequency.
`output_frequency_by_column`	Frequency represented in the returned panel for each column.
`output_frequency_counts`	Counts by output frequency.

metadata#

Return explicit metadata from a DataBundle, DataSpec, (panel, metadata) tuple, or DataFrame.

macroforecast.data.metadata(obj) -> dict

Input#

Name	Type	Meaning
`obj`	`DataBundle`, `DataSpec`, `(panel, metadata)`, or `DataFrame`	Object carrying package metadata.

Output#

Returns a shallow copy of the metadata dictionary. Mutating the returned object does not mutate the original bundle or panel attrs.

attach_metadata#

Merge one metadata stage into an existing metadata dictionary.

macroforecast.data.attach_metadata(
    metadata,
    stage: str,
    values,
) -> dict

Input#

Name	Type	Meaning
`metadata`	mapping	Existing metadata dictionary.
`stage`	`str`	Non-empty stage key to write, such as `"data_spec"` or `"data_frequency_alignment"`.
`values`	mapping	Stage payload to copy under `stage`.

Output#

Returns a new dictionary. Existing metadata is copied, then values is copied under the requested stage. attach_metadata() does not mutate its input.

spec#

Attach run-level data choices to a bundle or panel. This function creates a DataSpec; it does not execute downstream pipeline steps.

macroforecast.data.spec(
    data,
    *,
    metadata: Mapping[str, object] | None = None,
    target: str | None = None,
    targets: Iterable[str] | None = None,
    horizons: Iterable[int] | int | None = None,
    start: str | None = None,
    end: str | None = None,
    predictors: "all" | Iterable[str] = "all",
) -> DataSpec

Input#

Name	Type	Default	Meaning
`data`	`DataBundle`, `DataSpec`, `(panel, metadata)`, or `DataFrame`	required	Canonical data input.
`metadata`	mapping or `None`	`None`	Extra metadata to merge.
`target`	`str \| None`	`None`	Single target column.
`targets`	iterable or `None`	`None`	Multiple target columns.
`horizons`	iterable, int, or `None`	derived	Forecast horizons.
`start`	`str \| None`	`None`	Start date. Accepts `YYYY`, `YYYY-MM`, or `YYYY-MM-DD`.
`end`	`str \| None`	`None`	End date. Accepts `YYYY`, `YYYY-MM`, or `YYYY-MM-DD`.
`predictors`	`“all” \| iterable`	`"all"`	Predictor columns to keep. `"all"` expands to all non-target columns. Explicit predictor lists may be empty for target-only or autoregressive designs, and may not include target columns.

Default Horizons#

Metadata frequency	Default horizons
`monthly`	`(1, 3, 6, 12)`
`quarterly`	`(1, 2, 4, 8)`
other or unknown	`(1,)`

Output#

Returns DataSpec. Its metadata contains a data_spec entry with the chosen target, targets, horizons, sample dates, expanded predictor list, and panel summary. This expansion is deliberate: downstream model stages should consume a concrete non-target predictor list, not infer from the full panel and risk target leakage.

What It Does And Does Not Do#

Action	Done by `mf.data.spec(...)`?
Validate the canonical panel contract	Yes
Validate target and predictor columns	Yes
Expand `predictors="all"` to all non-target columns	Yes
Apply `start` and `end` sample bounds	Yes
Attach `metadata["data_spec"]`	Yes
Load raw data	No
Transform, clean, impute, or standardize values	No
Create forecast targets or lagged predictors	No
Fit models or run evaluation	No

Data Policy Helpers#

These functions are direct Python replacements for the old data-policy axes. They do not parse YAML and do not fit models.

align_frequency#

macroforecast.data.align_frequency(
    data,
    *,
    method: str = "keep",
    quarterly_to_monthly: str = "repeat_within_quarter",
    weekly_to_monthly: str = "mean",
    monthly_to_quarterly: str = "quarterly_average",
    weekly_to_quarterly: str = "mean",
    chow_lin_indicator: str | Mapping[str, str] | None = None,
    chow_lin_aggregation: str = "mean",
    chow_lin_rho: float | None = None,
    chow_lin_rho_method: str = "fixed",
) -> DataBundle

Keeps, filters, or aligns a panel to a common data frequency. This belongs in macroforecast.data because it changes the calendar and column-level frequency contract before preprocessing or feature engineering.

Input	Default	Choices
`method`	`"keep"`	`"keep"`, `"monthly"`, `"quarterly"`, `"drop_non_monthly"`, `"drop_non_quarterly"`
`quarterly_to_monthly`	`"repeat_within_quarter"`	`"repeat_within_quarter"`, `"step_backward"`, `"step_forward"`, `"quarter_end_ffill"`, `"linear_interpolation"`, `"chow_lin"`
`weekly_to_monthly`	`"mean"`	`"mean"`, `"last"`, `"sum"`
`monthly_to_quarterly`	`"quarterly_average"`	`"quarterly_average"`, `"quarterly_endpoint"`, `"quarterly_sum"`
`weekly_to_quarterly`	`"mean"`	`"mean"`, `"last"`, `"sum"`
`chow_lin_indicator`	`None`	Indicator column name, or mapping from quarterly column to indicator column, used only when `quarterly_to_monthly="chow_lin"`.
`chow_lin_aggregation`	`"mean"`	`"mean"` or `"sum"`; the low-frequency aggregation to conserve.
`chow_lin_rho`	`None`	Fixed AR(1) residual correlation. If supplied, must be inside `(-1, 1)`.
`chow_lin_rho_method`	`"fixed"`	`"fixed"`, `"min_chi_squared"`, or `"max_likelihood"`.

Output is a DataBundle. Metadata records data_frequency_alignment, native_frequency_by_column, output_frequency_by_column, and frequency counts. Frequency detection uses native_frequency_by_column first, then FRED-SD series reports, then observed-date spacing.

monthly = mf.data.align_frequency(
    mixed_bundle,
    method="monthly",
    quarterly_to_monthly="repeat_within_quarter",
)

For quarterly-to-monthly alignment, step_backward is accepted as an alias for repeat_within_quarter; the latter is the clearer spelling. Use quarter_end_ffill when values should only become available from the quarter-end month forward.

Use quarterly_to_monthly="chow_lin" when a quarterly series should be regression-disaggregated with a monthly indicator:

monthly = mf.data.align_frequency(
    mixed_bundle,
    method="monthly",
    quarterly_to_monthly="chow_lin",
    chow_lin_indicator={"GDPC1": "INDPRO"},
    chow_lin_aggregation="mean",
)

This preserves the supplied quarterly observations when the output is re-aggregated by the declared chow_lin_aggregation. The function records the indicator and rho choices in metadata["data_frequency_alignment"].

chow_lin_disaggregate#

macroforecast.data.chow_lin_disaggregate(
    low_frequency,
    indicator,
    *,
    aggregation: str = "mean",
    rho: float | None = None,
    rho_method: str = "fixed",
) -> pandas.Series

Direct Chow-Lin quarterly-to-monthly style disaggregation. low_frequency is a low-frequency Series, and indicator is a higher-frequency Series or a single/first-column DataFrame. The returned series is indexed like the indicator and conserves low_frequency under aggregation="mean" or aggregation="sum".

rho_method="fixed" uses rho when supplied and 0.0 otherwise. "min_chi_squared" and "max_likelihood" estimate rho over a bounded grid.

infer_frequencies#

macroforecast.data.infer_frequencies(data) -> tuple[dict[str, str], str]

infer_frequencies() returns (frequency_by_column, source). The source is "native_frequency_by_column", "fred_sd_series_metadata", or "observed_dates".

frequency_hardening_issues#

macroforecast.data.frequency_hardening_issues(
    frequencies,
) -> list[dict]

Reports columns classified as unknown, irregular, or annual before a caller aligns frequencies. This is useful before forcing a mixed panel to monthly or quarterly frequency.

Output key	Meaning
`frequency`	Weak frequency class.
`columns`	Columns assigned to that class.
`n_columns`	Number of affected columns.

availability_lag#

macroforecast.data.availability_lag(
    data,
    *,
    lags: int | Mapping[str, int] = 1,
    columns: Iterable[str] | None = None,
    drop_missing: bool = False,
) -> DataBundle

Positive lags delay predictor availability. lags=1 means the value dated t-1 is the latest available value on row t. Pass a mapping for column-specific release lags.

same_period_predictors#

macroforecast.data.same_period_predictors(
    data_spec,
    *,
    policy: "allow" | "lag" | "drop" | "forbid" = "allow",
    lag: int = 1,
    columns: Iterable[str] | None = None,
    drop_missing: bool = False,
) -> DataSpec

allow records the choice, lag shifts selected predictors, drop removes them from the active predictor set, and forbid raises if such predictors are present. Targets are never shifted by this helper.

define_regime#

macroforecast.data.define_regime(
    data,
    *,
    name: str = "regime",
    column: str | None = None,
    threshold: float | None = None,
    direction: "above" | "below" | "equal" | "not_equal" = "above",
    dates: Iterable[str | pandas.Timestamp] | None = None,
    values: Sequence[bool | int | float] | pandas.Series | None = None,
    append: bool = False,
    output_column: str | None = None,
) -> DataBundle

Exactly one regime source is required: threshold rule, explicit dates, or an aligned vector/Series. The regime is stored in metadata["regimes"]; set append=True to also add a numeric indicator column to the panel.

Vintage Helpers#

list_vintages#

Generate monthly vintage labels for a supported dataset.

macroforecast.data.list_vintages(
    dataset: str,
    start: str | None = None,
    end: str | None = None,
) -> list[str]

Input#

Name	Type	Default	Meaning
`dataset`	`str`	required	One of `fred_md`, `fred_qd`, `fred_sd`, `fred_md+fred_sd`, or `fred_qd+fred_sd`.
`start`	`str \| None`	first supported vintage	Start vintage in `YYYY-MM` form.
`end`	`str \| None`	required	End vintage in `YYYY-MM` form.

Output#

Returns candidate monthly vintage labels. The selected vintage is passed to load_fred_md, load_fred_qd, or load_fred_sd through vintage=.

end is required because the function does not inspect remote availability.

Official Source Pages#

FRED-MD and FRED-QD source page: https://www.stlouisfed.org/research/economists/mccracken/fred-databases
FRED-SD source page: https://www.stlouisfed.org/research/economists/owyang/fred-sd