custom_dataset#

Back to custom extensions

Use custom data functions when the panel does not come from FRED-MD, FRED-QD, or FRED-SD. The output must still be a canonical DataBundle: a date-indexed numeric panel plus metadata that later stages can read.

Function Choices#

Function

Input

Output

Use case

mf.data.custom_dataset(...)

in-memory DataFrame

DataBundle

User code already loaded the data.

mf.data.load_custom_csv(...)

CSV path

DataBundle

File source is CSV.

mf.data.load_custom_parquet(...)

Parquet path

DataBundle

File source is Parquet.

custom_dataset#

mf.data.custom_dataset(
    data,
    *,
    date=None,
    columns=None,
    dataset="custom",
    frequency=None,
    transform_codes=None,
    metadata=None,
) -> mf.data.DataBundle

Input#

Name

Type

Meaning

data

pandas.DataFrame

User panel. It can contain a date column or already have a DatetimeIndex.

date

str or None

Date column to move into the index. Use None when the index is already dates.

columns

sequence or None

Optional variable subset after date handling.

dataset

str

Dataset label stored in metadata.

frequency

str or mapping or None

Panel frequency such as monthly, quarterly, or per-column frequency metadata.

transform_codes

mapping or None

Optional FRED-style transformation code metadata by column.

metadata

mapping or None

User metadata to merge into the bundle metadata.

Output#

Field

Contract

bundle.panel

Numeric DataFrame, sorted DatetimeIndex named date, no duplicate dates.

bundle.metadata["dataset"]

Dataset name.

bundle.metadata["frequency"]

Dataset-level or column-level frequency information when supplied.

bundle.metadata["transform_codes"]

Transform-code metadata when supplied.

bundle.metadata["panel_normalization"]

Normalization report for date/index/column conversion.

Flow#

bundle = mf.data.custom_dataset(
    frame,
    date="date",
    dataset="local_macro",
    frequency="monthly",
    transform_codes={"target": 1, "x": 1},
)

processed = mf.preprocessing.reprocess(bundle, transform="none")

File Loaders#

mf.data.load_custom_csv(path, *, date, columns=None, dataset="custom", ...)
mf.data.load_custom_parquet(path, *, date, columns=None, dataset="custom", ...)

The file loaders normalize the same panel contract as custom_dataset(). Use them when the file path itself should appear in the loader metadata.

Validation#

Problem

Behavior

Missing date column

raises loader/normalization error.

Duplicate dates

raises unless permissive mode is explicitly requested by the loader.

Non-numeric selected variables

coerced or reported by panel normalization.

Missing frequency metadata

allowed, but frequency-aware downstream logic may need explicit set_frequencies(...).