macroforecast.data#
Purpose#
macroforecast.data is the data entry point for the package. It loads official
or user-supplied data, normalizes it to one pandas panel contract, and attaches
source metadata. It also creates run-level data specifications and combines
national FRED-MD/FRED-QD data with state-level FRED-SD panels.
This module does not apply stationarity transforms, outlier rules, imputation,
feature engineering, model fitting, or evaluation. Those steps happen later.
The main output is always a DataBundle or DataSpec.
The usual flow is:
import macroforecast as mf
bundle = mf.data.load_fred_md()
data_spec = mf.data.spec(
bundle,
target="INDPRO",
horizons=[1, 3, 6, 12],
start="1960-01",
end="2024-12",
predictors="all",
)
mf.data.spec(...) is not a wrapper that runs data loading, preprocessing,
feature engineering, or modeling. It is a small contract builder for the
already-loaded panel. It validates the requested target, horizons, sample
window, and predictor set; subsets the panel to those columns and dates; expands
predictors="all" to concrete non-target columns; and records the choices in
metadata. Later callable stages can consume the same DataSpec without
guessing which columns or horizons the run intended to use.
Public Functions#
Function |
Purpose |
Output |
|---|---|---|
|
Load official FRED-MD current or vintage data. |
|
|
Load official FRED-QD current or vintage data. |
|
|
Load official FRED-SD state-level panel data. |
|
|
Load and combine FRED-MD with FRED-SD. |
|
|
Load and combine FRED-QD with FRED-SD. |
|
|
Load a user CSV into the canonical panel contract. |
|
|
Load a user Parquet file into the canonical panel contract. |
|
|
Build a custom dataset from an in-memory |
|
|
Concatenate loaded bundles and optionally align frequency. |
|
|
Generate supported monthly vintage labels for a dataset. |
|
|
Normalize a |
|
|
Validate the canonical panel contract. |
|
|
Summarize panel shape, dates, missingness, and frequency. |
|
|
Extract explicit package metadata from data-like input. |
|
|
Merge one metadata stage into an existing metadata dictionary. |
|
|
Attach column-level native/output frequency metadata. |
|
|
Attach target, horizon, sample, and predictor choices. |
|
|
Keep, filter, or align panel columns to a common frequency. |
|
|
Disaggregate low-frequency series with a high-frequency indicator. |
|
|
Read or infer native frequency by column. |
|
|
Report columns with weak frequency classification. |
|
|
Delay selected columns to encode release availability. |
|
|
Allow, lag, drop, or reject same-period predictors in a |
|
|
Attach a binary regime definition to metadata, optionally as a column. |
|
Public Classes And Types#
Symbol |
Meaning |
|---|---|
|
Canonical panel plus metadata returned by loaders and data-policy helpers. |
|
Canonical panel plus target, horizon, sample, and predictor choices for a run. |
|
Stored threshold direction type: |
|
Stored same-period predictor policy type: |
Canonical Panel#
Every public loader returns a DataBundle.
panel = bundle.panel
metadata = bundle.metadata
DataBundle also supports tuple unpacking:
panel, metadata = mf.data.load_fred_md()
Panel Contract#
Property |
Required Value |
|---|---|
Type |
|
Index |
|
Index name |
|
Sort order |
ascending date order |
Duplicate dates |
not allowed |
Columns |
variable IDs |
Values |
numeric values or |
Empty panel |
not allowed |
Infinite values |
not allowed |
Metadata is explicit on DataBundle.metadata. The panel also carries
panel.attrs["macroforecast_metadata"] for pandas-native handoff. FRED-MD and
FRED-QD transform codes are attached to
panel.attrs["macroforecast_transform_codes"]; preprocessing is responsible
for using them.
Panel normalization is strict by default. Invalid date values, non-numeric
cells that would be coerced to NaN, duplicate dates, empty panels, and
infinite values raise errors. When a caller deliberately sets strict=False,
lossy normalization is allowed but recorded in
panel.attrs["macroforecast_panel_report"] and
metadata["panel"] when the panel is returned inside a DataBundle.
macroforecast_panel_report contains:
Key |
Meaning |
|---|---|
|
Panel contract version, currently |
|
Whether lossy date/numeric coercion was rejected. |
|
Row count before and after panel normalization. |
|
Column names before and after selection/renaming. |
|
Date source used: a column name or |
|
Number of invalid date rows dropped when |
|
Count and examples of non-numeric cells coerced to |
Metadata Contract#
Every loader writes a metadata dictionary with these common keys.
Key |
Type |
Meaning |
|---|---|---|
|
|
Dataset identifier such as |
|
|
Loader-level frequency label: |
|
|
|
|
|
Requested vintage label in |
|
|
Last date present in the loaded panel, formatted as |
|
|
|
|
|
Loader notes, including discouraged frequency alignments for combined datasets. |
|
|
Raw-file provenance for single-source loads; combined bundles use |
|
|
Official FRED-MD/FRED-QD t-codes when available. FRED-SD has no official t-code map. |
Combined bundles add:
Key |
Type |
Meaning |
|---|---|---|
|
|
Combined-source label currently set to |
|
|
Full metadata dictionaries from the source bundles. |
|
|
Source dataset for each output column. |
|
|
Original frequency for each output column before alignment. |
|
|
Count of columns by original frequency. |
|
|
FRED-SD date-anchor map for state columns when available. |
|
|
Count of FRED-SD date-anchor patterns when available. |
|
|
Frequency represented in the returned panel for each output column. |
|
|
Count of columns by returned-panel frequency. |
|
|
Records of monthly-to-quarterly or quarterly-to-monthly conversions. |
|
|
Chosen target frequency, alignment rules, and source-level alignment summaries. |
Public metadata helpers and policy types:
Symbol |
Meaning |
|---|---|
|
Return metadata with one stage key merged in a pandas-safe way. Used by loaders, preprocessing, analysis, and runner outputs. |
|
Stored threshold direction type for |
|
Stored same-period predictor policy type for |
DataBundle#
macroforecast.data.DataBundle(
panel: pandas.DataFrame,
metadata: dict,
)
Output#
Field |
Type |
Meaning |
|---|---|---|
|
|
Canonical date-indexed data panel. |
|
|
Source, vintage, artifact, frequency, and transform-code metadata. |
Methods#
Method |
Input |
Output |
Meaning |
|---|---|---|---|
|
|
|
Return a new bundle with one metadata stage added. |
Preprocessing outputs can use the same metadata-attachment pattern.
DataSpec#
macroforecast.data.DataSpec(
panel: pandas.DataFrame,
metadata: dict,
target: str | None,
targets: tuple[str, ...],
horizons: tuple[int, ...],
start: str | None = None,
end: str | None = None,
predictors: "all" | tuple[str, ...] = "all",
)
DataSpec is the output of spec(...). It keeps the canonical panel and
metadata together with the target, horizons, sample window, and predictor
selection for a run.
Output#
Field |
Type |
Meaning |
|---|---|---|
|
|
Canonical date-indexed data panel after sample and column selection. |
|
|
Source metadata plus a |
|
|
Single target column when |
|
|
Active target columns. |
|
|
Positive forecast horizons. |
|
|
Normalized sample bounds. |
|
|
Concrete non-target predictor columns. |
Methods#
Method |
Input |
Output |
Meaning |
|---|---|---|---|
|
|
|
Return a new spec with one metadata stage added. |
DataSpec also supports tuple unpacking:
panel, metadata = data_spec
load_fred_md#
Load FRED-MD and return DataBundle.
macroforecast.data.load_fred_md(
vintage: str | None = None,
*,
force: bool = False,
cache_root: str | pathlib.Path | None = None,
local_source: str | pathlib.Path | None = None,
local_zip_source: str | pathlib.Path | None = None,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
|
Vintage in |
|
|
|
Re-download or re-copy even if cache exists. |
|
path-like or |
|
Raw cache root. |
|
path-like or |
|
Local CSV source instead of download. |
|
path-like or |
|
Optional local historical zip override. Without it, vintage requests automatically download the official FRED-MD historical archive and extract the requested CSV. |
Output#
Returns DataBundle with a monthly FRED-MD panel and metadata. The official
CSV transform row is parsed into metadata["transform_codes"] and
panel.attrs["macroforecast_transform_codes"].
See FRED-MD for dataset-specific details. See FRED-MD + FRED-SD for the combined monthly national/state loader.
load_fred_qd#
Load FRED-QD and return DataBundle.
macroforecast.data.load_fred_qd(
vintage: str | None = None,
*,
force: bool = False,
cache_root: str | pathlib.Path | None = None,
local_source: str | pathlib.Path | None = None,
local_zip_source: str | pathlib.Path | None = None,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
|
Vintage in |
|
|
|
Re-download or re-copy even if cache exists. |
|
path-like or |
|
Raw cache root. |
|
path-like or |
|
Local CSV source instead of download. |
|
path-like or |
|
Optional local historical zip override. Without it, vintage requests automatically download the official FRED-QD historical archive and extract the requested CSV. |
Output#
Returns a quarterly canonical panel. The official CSV transform row is parsed
into metadata["transform_codes"] and
panel.attrs["macroforecast_transform_codes"].
See FRED-QD for dataset-specific details. See FRED-QD + FRED-SD for the combined quarterly national/state loader.
load_fred_sd#
Load FRED-SD and return DataBundle.
macroforecast.data.load_fred_sd(
vintage: str | None = None,
*,
force: bool = False,
cache_root: str | pathlib.Path | None = None,
local_source: str | pathlib.Path | None = None,
states: list[str] | None = None,
variables: list[str] | None = None,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
|
Optional state subset. |
|
|
|
Optional FRED-SD variable subset. |
FRED-SD columns are wide variable-state IDs such as UR_CA. The loader also
adds panel.attrs["macrocast_reports"]["fred_sd_series_metadata"], which
records each column’s state, FRED-SD variable, observed date range, non-missing
count, native frequency, and date-anchor pattern inferred from the official
series workbook. The same frequency and date-anchor maps are exposed in
metadata["native_frequency_by_column"],
metadata["native_frequency_counts"], metadata["date_anchor_by_column"],
metadata["date_anchor_counts"], and metadata["state_summary"].
For vintage="YYYY-MM", FRED-SD uses the official by-series workbook path. It
tries series-YYYY-MM.xlsx first and then falls back to the official
by-series zip archive containing that workbook. There is no
local_zip_source parameter for FRED-SD because local overrides are supplied as
local_source= with either an official workbook or a canonical wide CSV.
See FRED-SD for mixed-frequency state-series details and t-code limitations. See FRED-MD + FRED-SD and FRED-QD + FRED-SD for combined-loader behavior.
load_fred_md_sd#
Load FRED-MD and FRED-SD, align them to one panel, and return DataBundle.
macroforecast.data.load_fred_md_sd(
vintage: str | None = None,
*,
force: bool = False,
cache_root: str | pathlib.Path | None = None,
local_fred_md_source: str | pathlib.Path | None = None,
local_fred_sd_source: str | pathlib.Path | None = None,
states: list[str] | None = None,
variables: list[str] | None = None,
frequency: str = "monthly",
quarterly_to_monthly: str = "repeat_within_quarter",
monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle
Purpose#
Use this when the outcome or main state panel is monthly and national macroeconomic controls should come from FRED-MD. This is the recommended combined dataset for monthly state analysis.
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
|
Vintage label shared across FRED-MD and FRED-SD. |
|
|
|
Re-download or re-copy raw sources. |
|
path-like or |
|
Raw cache root used by both loaders. |
|
path-like or |
|
Local FRED-MD CSV source. |
|
path-like or |
|
Local FRED-SD workbook or CSV source. |
|
|
|
FRED-SD state subset. |
|
|
|
FRED-SD variable subset. |
|
|
|
|
|
|
|
Rule used if an included FRED-SD series is quarterly and the target panel is monthly. |
|
|
|
Rule used only when |
Output#
Returns a combined DataBundle with:
metadata["dataset"] == "fred_md+fred_sd"metadata["source_family"] == "combined"metadata["frequency"] == frequencyFRED-MD official t-codes in
metadata["transform_codes"]FRED-SD series metadata preserved in
panel.attrs["macrocast_reports"]FRED-SD source-frequency and date-anchor maps in
metadata["native_frequency_by_column"]andmetadata["date_anchor_by_column"]any frequency conversions recorded in
metadata["frequency_conversion_warnings"]
If a quarterly FRED-SD series is included in a monthly panel, the function
emits a UserWarning and records the conversion. The default
quarterly_to_monthly="repeat_within_quarter" assigns the quarterly value to
each month inside the quarter.
load_fred_qd_sd#
Load FRED-QD and FRED-SD, align them to one panel, and return DataBundle.
macroforecast.data.load_fred_qd_sd(
vintage: str | None = None,
*,
force: bool = False,
cache_root: str | pathlib.Path | None = None,
local_fred_qd_source: str | pathlib.Path | None = None,
local_fred_sd_source: str | pathlib.Path | None = None,
states: list[str] | None = None,
variables: list[str] | None = None,
frequency: str = "quarterly",
quarterly_to_monthly: str = "repeat_within_quarter",
monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle
Purpose#
Use this when the target or outcome is quarterly and national controls should come from FRED-QD. This is the recommended combined dataset for quarterly state-level analysis.
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
|
Vintage label shared across FRED-QD and FRED-SD. |
|
|
|
Re-download or re-copy raw sources. |
|
path-like or |
|
Raw cache root used by both loaders. |
|
path-like or |
|
Local FRED-QD CSV source. |
|
path-like or |
|
Local FRED-SD workbook or CSV source. |
|
|
|
FRED-SD state subset. |
|
|
|
FRED-SD variable subset. |
|
|
|
|
|
|
|
Rule used only when |
|
|
|
Rule used if an included FRED-SD series is monthly and the target panel is quarterly. |
Output#
Returns a combined DataBundle with:
metadata["dataset"] == "fred_qd+fred_sd"metadata["source_family"] == "combined"metadata["frequency"] == frequencyFRED-QD official t-codes in
metadata["transform_codes"]FRED-SD series metadata preserved in
panel.attrs["macrocast_reports"]FRED-SD source-frequency and date-anchor maps in
metadata["native_frequency_by_column"]andmetadata["date_anchor_by_column"]any frequency conversions recorded in
metadata["frequency_conversion_warnings"]
If a monthly FRED-SD series is included in a quarterly panel, the function
emits a UserWarning and records the conversion. The default
monthly_to_quarterly="quarterly_average" averages monthly observations inside
each quarter.
combine#
Combine already-loaded DataBundle objects into one canonical panel.
macroforecast.data.combine(
*bundles,
dataset: str | None = None,
frequency: str = "native",
quarterly_to_monthly: str = "repeat_within_quarter",
monthly_to_quarterly: str = "quarterly_average",
) -> DataBundle
Input#
Name |
Type |
Default |
Choices |
|---|---|---|---|
|
|
required |
Two or more bundles to concatenate by date index. |
|
|
joined source names |
Output dataset label. |
|
|
|
|
|
|
|
|
|
|
|
|
With frequency="native" or frequency="mixed", no monthly/quarterly
conversion is applied. The returned panel keeps each source column on its
native observation dates and records metadata["frequency"] == "mixed".
Quarterly columns therefore appear as sparse columns on the union date index
when they are combined with monthly columns. Downstream mixed-frequency models
should read metadata["native_frequency_by_column"] rather than infer
frequency from the overall index.
Frequency Conversion Rules#
Direction |
Rule |
Meaning |
|---|---|---|
quarterly to monthly |
|
Assign the quarterly value to each month in that quarter. |
quarterly to monthly |
|
Place the quarterly value at quarter end and forward-fill after it is observed. |
quarterly to monthly |
|
Interpolate between observed quarter-end values on the monthly grid. |
monthly to quarterly |
|
Average monthly observations in the quarter. |
monthly to quarterly |
|
Use the last monthly observation in the quarter. |
monthly to quarterly |
|
Sum monthly observations in the quarter. |
Combined monthly/quarterly output supports only source columns identified as
monthly or quarterly. If a source contains weekly, annual, irregular, or
unknown-frequency columns, combine() raises ValueError. Use
frequency="native" to inspect the mixed panel first, then call
mf.data.align_frequency() explicitly if those columns should enter a common
monthly or quarterly design.
Output#
Returns DataBundle. The panel is a column-wise concatenation after frequency
alignment. Duplicate output column names raise ValueError.
For mixed outputs, the key metadata fields are:
Key |
Meaning |
|---|---|
|
|
|
Native source frequency for each column. |
|
Counts of native source frequencies. |
|
FRED-SD date-anchor map when available. |
|
Counts of FRED-SD date-anchor patterns when available. |
|
Returned-panel frequency for each column; equal to native frequency in native mode. |
|
|
Frequency Conversion Warnings#
When combine() changes a source column’s native frequency, it emits
UserWarning and records the same information in
metadata["frequency_conversion_warnings"].
Each record has:
Key |
Type |
Meaning |
|---|---|---|
|
|
Source dataset whose columns were converted. |
|
|
Native frequency before alignment. |
|
|
Combined panel frequency. |
|
|
Alignment rule used. |
|
|
Variable-level names, e.g. |
|
|
Exact converted panel columns. |
|
|
Number of converted columns. |
Example warning:
fred_sd monthly variables were aligned to quarterly using quarterly_average:
UR, ICLAIMS (102 columns).
load_custom_csv#
Load a user CSV and normalize it to the canonical panel contract.
macroforecast.data.load_custom_csv(
path,
*,
date: str | None = None,
date_col: str | int | None = None,
columns: Iterable[str] | None = None,
series_columns: Iterable[str] | None = None,
rename: Mapping[str, str] | None = None,
dataset: str = "custom",
frequency: str = "unknown",
frequency_by_column: Mapping[str, str] | None = None,
default_frequency: str | None = None,
metadata: Mapping[str, object] | None = None,
transform_codes: Mapping[str, int] | None = None,
cache_root: str | pathlib.Path | None = None,
strict: bool = True,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
path-like |
required |
CSV file path. |
|
|
|
Date column. If omitted, uses a DatetimeIndex or parses the first column. |
|
|
|
Alias for |
|
iterable or |
|
Columns to keep before renaming. |
|
iterable or |
|
Alias for |
|
mapping or |
|
Column rename map. |
|
|
|
Metadata dataset label. |
|
|
|
Metadata frequency label. |
|
mapping or |
|
Optional final-column frequency map, e.g. |
|
|
|
Fill frequency for columns omitted from |
|
mapping or |
|
User metadata to attach. |
|
mapping or |
|
Optional McCracken-Ng t-code map. Keys must match final loaded series columns after selection and renaming. |
|
path-like or |
|
If supplied, append a raw-manifest entry under this cache root. Custom loaders do not write the default manifest unless this is supplied. |
|
|
|
Reject invalid date rows and non-numeric cells instead of silently coercing them. Set |
Output#
Returns a DataBundle. The normalized panel is available as bundle.panel and
metadata as bundle.metadata. If transform_codes is provided, it is stored in
both bundle.metadata["transform_codes"] and
bundle.panel.attrs["macroforecast_transform_codes"], so
mf.preprocessing.reprocess(bundle) can use the codes automatically.
Custom loaders also store the strict-normalization report at
bundle.metadata["panel"]. With strict=True, malformed dates or non-numeric
cells raise RawParseError wrapping the underlying validation error. With
strict=False, those lossy operations are allowed and counted.
If frequency_by_column is provided, custom loaders call
set_frequencies(...) internally and write the same mixed-frequency metadata
contract used by official combined bundles. The keys must match final loaded
column names after selection and renaming.
Example:
bundle = mf.data.load_custom_csv(
"panel.csv",
date_col="DATE",
series_columns=["INDPRO", "spread"],
frequency="monthly",
transform_codes={"INDPRO": 5, "spread": 2},
)
processed = mf.preprocessing.reprocess(bundle)
load_custom_parquet#
Load a user Parquet file with the same normalization contract as
load_custom_csv.
macroforecast.data.load_custom_parquet(
path,
*,
date: str | None = None,
date_col: str | int | None = None,
columns: Iterable[str] | None = None,
series_columns: Iterable[str] | None = None,
rename: Mapping[str, str] | None = None,
dataset: str = "custom",
frequency: str = "unknown",
frequency_by_column: Mapping[str, str] | None = None,
default_frequency: str | None = None,
metadata: Mapping[str, object] | None = None,
transform_codes: Mapping[str, int] | None = None,
cache_root: str | pathlib.Path | None = None,
strict: bool = True,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
path-like |
required |
Parquet file path. |
|
|
|
Date column. If omitted, uses a |
|
|
|
Alias for |
|
iterable or |
|
Columns to keep before renaming. |
|
iterable or |
|
Alias for |
|
mapping or |
|
Column rename map. |
|
|
|
Metadata dataset label. |
|
|
|
Metadata frequency label. |
|
mapping or |
|
Optional final-column frequency map. |
|
|
|
Fill frequency for columns omitted from |
|
mapping or |
|
User metadata to attach. |
|
mapping or |
|
Optional McCracken-Ng t-code map. Keys must match final loaded series columns after selection and renaming. |
|
path-like or |
|
If supplied, append a raw-manifest entry under this cache root. |
|
|
|
Reject invalid date rows and non-numeric cells instead of silently coercing them. |
Output#
Returns a DataBundle with the same canonical panel, metadata, transform-code,
strict-normalization, and optional mixed-frequency contract as
load_custom_csv.
custom_dataset#
Build a custom DataBundle from an in-memory pandas DataFrame.
Use custom_dataset() when the data are already in Python memory and should
enter the same contract as load_fred_md(), load_fred_qd(),
load_fred_sd(), load_custom_csv(), and load_custom_parquet().
macroforecast.data.custom_dataset(
frame,
*,
date: str | None = None,
columns: Iterable[str] | None = None,
rename: Mapping[str, str] | None = None,
dataset: str = "custom",
source_family: str = "custom",
frequency: str = "unknown",
frequency_by_column: Mapping[str, str] | None = None,
transform_codes: Mapping[str, int] | None = None,
metadata: Mapping[str, object] | None = None,
strict: bool = True,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
required |
Raw or already canonical panel. |
|
|
|
Date column. If omitted, the input must have a |
|
iterable or |
|
Columns to keep before renaming. |
|
mapping or |
|
Rename retained columns after selection. |
|
|
|
Dataset label stored in metadata. |
|
|
|
Source-family label stored in metadata. |
|
|
|
Loader-level frequency label. |
|
mapping or |
|
Optional column-level frequency map for mixed-frequency panels. |
|
mapping or |
|
Optional t-code map. Keys must match final panel columns. |
|
mapping or |
|
User metadata merged before package metadata is attached. |
|
|
|
Reject lossy date or numeric coercion. |
Output#
Returns DataBundle. The panel is canonical and the metadata includes
dataset, source_family, frequency, optional transform_codes, optional
column-level frequency metadata, and a custom_dataset stage.
bundle = mf.data.custom_dataset(
frame,
date="date",
dataset="bank_panel",
frequency="monthly",
transform_codes={"loan_growth": 1, "spread": 2},
)
processed = mf.preprocessing.reprocess(
bundle,
transform="custom",
impute="mean",
)
as_panel#
Normalize an existing pandas DataFrame.
macroforecast.data.as_panel(
frame,
*,
date: str | None = None,
columns: Iterable[str] | None = None,
rename: Mapping[str, str] | None = None,
metadata: Mapping[str, object] | None = None,
strict: bool = True,
) -> pandas.DataFrame
as_panel returns a canonical panel. It raises if the date column is missing,
dates are duplicated, the output is empty, infinite values are present, or any
retained column cannot be represented as numeric values or NaN.
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
required |
Raw or already canonical panel. |
|
|
|
Date column. If omitted and the index is not a |
|
iterable or |
|
Columns to keep before renaming. |
|
mapping or |
|
Rename retained columns after selection. |
|
mapping or |
|
Metadata attached under |
|
|
|
Reject lossy date/numeric coercion. |
Output#
Returns a pandas.DataFrame with DatetimeIndex named "date", ascending
dates, numeric columns, and attrs containing macroforecast_panel_report.
validate_panel#
Validate the canonical panel contract.
macroforecast.data.validate_panel(panel) -> None
Raises TypeError or ValueError when the panel is not canonical.
panel_info#
Return a compact panel summary.
macroforecast.data.panel_info(bundle_or_panel) -> dict
Output keys include n_rows, n_columns, start, end, columns,
missing_values, frequency, and index_frequency. If the input carries
metadata, frequency uses the metadata label such as "mixed" while
index_frequency reports the pandas-inferred date-index frequency. Combined
data also include compact native/output frequency counts.
set_frequencies#
Attach a column-level frequency contract to an existing panel or bundle.
macroforecast.data.set_frequencies(
data,
frequency_by_column,
*,
default_frequency: str | None = None,
output_frequency_by_column: Mapping[str, str] | None = None,
frequency: str | None = None,
metadata: Mapping[str, object] | None = None,
) -> DataBundle
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
required |
Canonical panel input. |
|
mapping |
required |
Native frequency for each final panel column. |
|
|
|
Fill omitted columns with one frequency. |
|
mapping or |
|
Returned-panel frequency for each column; defaults to native frequency. |
|
|
|
Overall metadata label. Defaults to the unique native frequency or |
|
mapping or |
|
Extra metadata to merge before writing frequency fields. |
Allowed column frequencies are monthly, quarterly, weekly, annual,
irregular, and unknown, with short aliases such as m, q, and w.
For mixed-frequency DFM models, monthly and quarterly columns are the relevant
contract.
Output#
Returns a DataBundle with:
Metadata key |
Meaning |
|---|---|
|
Overall label, usually |
|
Native frequency for each column. |
|
Counts by native frequency. |
|
Frequency represented in the returned panel for each column. |
|
Counts by output frequency. |
metadata#
Return explicit metadata from a DataBundle, DataSpec, (panel, metadata)
tuple, or DataFrame.
macroforecast.data.metadata(obj) -> dict
Input#
Name |
Type |
Meaning |
|---|---|---|
|
|
Object carrying package metadata. |
Output#
Returns a shallow copy of the metadata dictionary. Mutating the returned object does not mutate the original bundle or panel attrs.
attach_metadata#
Merge one metadata stage into an existing metadata dictionary.
macroforecast.data.attach_metadata(
metadata,
stage: str,
values,
) -> dict
Input#
Name |
Type |
Meaning |
|---|---|---|
|
mapping |
Existing metadata dictionary. |
|
|
Non-empty stage key to write, such as |
|
mapping |
Stage payload to copy under |
Output#
Returns a new dictionary. Existing metadata is copied, then values is copied
under the requested stage. attach_metadata() does not mutate its input.
spec#
Attach run-level data choices to a bundle or panel. This function creates a
DataSpec; it does not execute downstream pipeline steps.
macroforecast.data.spec(
data,
*,
metadata: Mapping[str, object] | None = None,
target: str | None = None,
targets: Iterable[str] | None = None,
horizons: Iterable[int] | int | None = None,
start: str | None = None,
end: str | None = None,
predictors: "all" | Iterable[str] = "all",
) -> DataSpec
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
required |
Canonical data input. |
|
mapping or |
|
Extra metadata to merge. |
|
|
|
Single target column. |
|
iterable or |
|
Multiple target columns. |
|
iterable, int, or |
derived |
Forecast horizons. |
|
|
|
Start date. Accepts |
|
|
|
End date. Accepts |
|
|
|
Predictor columns to keep. |
Default Horizons#
Metadata frequency |
Default horizons |
|---|---|
|
|
|
|
other or unknown |
|
Output#
Returns DataSpec. Its metadata contains a data_spec entry with the chosen
target, targets, horizons, sample dates, expanded predictor list, and panel
summary. This expansion is deliberate: downstream model stages should consume a
concrete non-target predictor list, not infer from the full panel and risk
target leakage.
What It Does And Does Not Do#
Action |
Done by |
|---|---|
Validate the canonical panel contract |
Yes |
Validate target and predictor columns |
Yes |
Expand |
Yes |
Apply |
Yes |
Attach |
Yes |
Load raw data |
No |
Transform, clean, impute, or standardize values |
No |
Create forecast targets or lagged predictors |
No |
Fit models or run evaluation |
No |
Data Policy Helpers#
These functions are direct Python replacements for the old data-policy axes. They do not parse YAML and do not fit models.
align_frequency#
macroforecast.data.align_frequency(
data,
*,
method: str = "keep",
quarterly_to_monthly: str = "repeat_within_quarter",
weekly_to_monthly: str = "mean",
monthly_to_quarterly: str = "quarterly_average",
weekly_to_quarterly: str = "mean",
chow_lin_indicator: str | Mapping[str, str] | None = None,
chow_lin_aggregation: str = "mean",
chow_lin_rho: float | None = None,
chow_lin_rho_method: str = "fixed",
) -> DataBundle
Keeps, filters, or aligns a panel to a common data frequency. This belongs in
macroforecast.data because it changes the calendar and column-level frequency
contract before preprocessing or feature engineering.
Input |
Default |
Choices |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Indicator column name, or mapping from quarterly column to indicator column, used only when |
|
|
|
|
|
Fixed AR(1) residual correlation. If supplied, must be inside |
|
|
|
Output is a DataBundle. Metadata records data_frequency_alignment,
native_frequency_by_column, output_frequency_by_column, and frequency
counts. Frequency detection uses native_frequency_by_column first, then
FRED-SD series reports, then observed-date spacing.
monthly = mf.data.align_frequency(
mixed_bundle,
method="monthly",
quarterly_to_monthly="repeat_within_quarter",
)
For quarterly-to-monthly alignment, step_backward is accepted as an alias for
repeat_within_quarter; the latter is the clearer spelling. Use
quarter_end_ffill when values should only become available from the
quarter-end month forward.
Use quarterly_to_monthly="chow_lin" when a quarterly series should be
regression-disaggregated with a monthly indicator:
monthly = mf.data.align_frequency(
mixed_bundle,
method="monthly",
quarterly_to_monthly="chow_lin",
chow_lin_indicator={"GDPC1": "INDPRO"},
chow_lin_aggregation="mean",
)
This preserves the supplied quarterly observations when the output is
re-aggregated by the declared chow_lin_aggregation. The function records the
indicator and rho choices in metadata["data_frequency_alignment"].
chow_lin_disaggregate#
macroforecast.data.chow_lin_disaggregate(
low_frequency,
indicator,
*,
aggregation: str = "mean",
rho: float | None = None,
rho_method: str = "fixed",
) -> pandas.Series
Direct Chow-Lin quarterly-to-monthly style disaggregation. low_frequency is a
low-frequency Series, and indicator is a higher-frequency Series or a
single/first-column DataFrame. The returned series is indexed like the
indicator and conserves low_frequency under aggregation="mean" or
aggregation="sum".
rho_method="fixed" uses rho when supplied and 0.0 otherwise.
"min_chi_squared" and "max_likelihood" estimate rho over a bounded grid.
infer_frequencies#
macroforecast.data.infer_frequencies(data) -> tuple[dict[str, str], str]
infer_frequencies() returns (frequency_by_column, source). The source is
"native_frequency_by_column", "fred_sd_series_metadata", or
"observed_dates".
frequency_hardening_issues#
macroforecast.data.frequency_hardening_issues(
frequencies,
) -> list[dict]
Reports columns classified as unknown, irregular, or annual before a
caller aligns frequencies. This is useful before forcing a mixed panel to
monthly or quarterly frequency.
Output key |
Meaning |
|---|---|
|
Weak frequency class. |
|
Columns assigned to that class. |
|
Number of affected columns. |
availability_lag#
macroforecast.data.availability_lag(
data,
*,
lags: int | Mapping[str, int] = 1,
columns: Iterable[str] | None = None,
drop_missing: bool = False,
) -> DataBundle
Positive lags delay predictor availability. lags=1 means the value dated
t-1 is the latest available value on row t. Pass a mapping for
column-specific release lags.
same_period_predictors#
macroforecast.data.same_period_predictors(
data_spec,
*,
policy: "allow" | "lag" | "drop" | "forbid" = "allow",
lag: int = 1,
columns: Iterable[str] | None = None,
drop_missing: bool = False,
) -> DataSpec
allow records the choice, lag shifts selected predictors, drop removes
them from the active predictor set, and forbid raises if such predictors are
present. Targets are never shifted by this helper.
define_regime#
macroforecast.data.define_regime(
data,
*,
name: str = "regime",
column: str | None = None,
threshold: float | None = None,
direction: "above" | "below" | "equal" | "not_equal" = "above",
dates: Iterable[str | pandas.Timestamp] | None = None,
values: Sequence[bool | int | float] | pandas.Series | None = None,
append: bool = False,
output_column: str | None = None,
) -> DataBundle
Exactly one regime source is required: threshold rule, explicit dates, or an
aligned vector/Series. The regime is stored in metadata["regimes"]; set
append=True to also add a numeric indicator column to the panel.
Vintage Helpers#
list_vintages#
Generate monthly vintage labels for a supported dataset.
macroforecast.data.list_vintages(
dataset: str,
start: str | None = None,
end: str | None = None,
) -> list[str]
Input#
Name |
Type |
Default |
Meaning |
|---|---|---|---|
|
|
required |
One of |
|
|
first supported vintage |
Start vintage in |
|
|
required |
End vintage in |
Output#
Returns candidate monthly vintage labels. The selected vintage is passed to
load_fred_md, load_fred_qd, or load_fred_sd through vintage=.
end is required because the function does not inspect remote availability.
Official Source Pages#
FRED-MD and FRED-QD source page: https://www.stlouisfed.org/research/economists/mccracken/fred-databases
FRED-SD source page: https://www.stlouisfed.org/research/economists/owyang/fred-sd