# custom_dataset

[Back to custom extensions](index.md)

Use custom data functions when the panel does not come from FRED-MD, FRED-QD,
or FRED-SD. The output must still be a canonical `DataBundle`: a date-indexed
numeric panel plus metadata that later stages can read.

## Function Choices

| Function | Input | Output | Use case |
| --- | --- | --- | --- |
| `mf.data.custom_dataset(...)` | in-memory `DataFrame` | `DataBundle` | User code already loaded the data. |
| `mf.data.load_custom_csv(...)` | CSV path | `DataBundle` | File source is CSV. |
| `mf.data.load_custom_parquet(...)` | Parquet path | `DataBundle` | File source is Parquet. |

## custom_dataset

```python
mf.data.custom_dataset(
    data,
    *,
    date=None,
    columns=None,
    dataset="custom",
    frequency=None,
    transform_codes=None,
    metadata=None,
) -> mf.data.DataBundle
```

### Input

| Name | Type | Meaning |
| --- | --- | --- |
| `data` | `pandas.DataFrame` | User panel. It can contain a date column or already have a `DatetimeIndex`. |
| `date` | str or `None` | Date column to move into the index. Use `None` when the index is already dates. |
| `columns` | sequence or `None` | Optional variable subset after date handling. |
| `dataset` | str | Dataset label stored in metadata. |
| `frequency` | str or mapping or `None` | Panel frequency such as `monthly`, `quarterly`, or per-column frequency metadata. |
| `transform_codes` | mapping or `None` | Optional FRED-style transformation code metadata by column. |
| `metadata` | mapping or `None` | User metadata to merge into the bundle metadata. |

### Output

| Field | Contract |
| --- | --- |
| `bundle.panel` | Numeric `DataFrame`, sorted `DatetimeIndex` named `date`, no duplicate dates. |
| `bundle.metadata["dataset"]` | Dataset name. |
| `bundle.metadata["frequency"]` | Dataset-level or column-level frequency information when supplied. |
| `bundle.metadata["transform_codes"]` | Transform-code metadata when supplied. |
| `bundle.metadata["panel_normalization"]` | Normalization report for date/index/column conversion. |

### Flow

```python
bundle = mf.data.custom_dataset(
    frame,
    date="date",
    dataset="local_macro",
    frequency="monthly",
    transform_codes={"target": 1, "x": 1},
)

processed = mf.preprocessing.reprocess(bundle, transform="none")
```

## File Loaders

```python
mf.data.load_custom_csv(path, *, date, columns=None, dataset="custom", ...)
mf.data.load_custom_parquet(path, *, date, columns=None, dataset="custom", ...)
```

The file loaders normalize the same panel contract as `custom_dataset()`.
Use them when the file path itself should appear in the loader metadata.

## Validation

| Problem | Behavior |
| --- | --- |
| Missing date column | raises loader/normalization error. |
| Duplicate dates | raises unless permissive mode is explicitly requested by the loader. |
| Non-numeric selected variables | coerced or reported by panel normalization. |
| Missing frequency metadata | allowed, but frequency-aware downstream logic may need explicit `set_frequencies(...)`. |