# Bring Your Own Data

macroforecast works with any time-series panel you supply. This guide covers
monthly and quarterly CSV / Parquet files.

If you prefer the official FRED-MD or FRED-QD panels, start with
[FRED Datasets in Recipes](../recipe_api/fred_datasets.md)
instead.

> **FRED-MD/QD format note**: the raw FRED CSV files include a `Transform:`
> header row above the data. Your custom CSV must **not** include that row --
> it is an artefact of the official FRED format and is stripped automatically
> only when `dataset=fred_md` / `fred_qd` uses the built-in adapter. Custom
> CSV files are plain panels: date index + numeric columns only.

## When to use this guide

Use this guide when you have:

- A proprietary indicator panel (e.g., firm-level surveys, regional prices).
- A monthly or quarterly series not available in FRED.
- A country-specific macro panel.

If you have a few additional series you want to **add on top of** the official
FRED panel, see [Merging with FRED-MD or FRED-QD](#merging-with-fred-md-or-fred-qd).

## File format contract

### Monthly CSV

```text
date,my_target,x1,x2
1990-01-01,1.23,0.45,2.10
1990-02-01,1.31,0.47,2.05
1990-03-01,1.29,0.46,1.99
```

Rules:

- First column: date, parseable by pandas (`YYYY-MM-DD` is the safest format;
  `YYYY-MM` also works when the day is not meaningful).
- Remaining columns: numeric. Non-numeric cells are coerced to `NaN`;
  columns that are entirely `NaN` are dropped silently.
- No `Transform:` row. No multi-level headers. No trailing metadata rows.
- The column you name as `target` in the recipe must be present.

### Quarterly CSV

Same rules. Use `YYYY-01-01`, `YYYY-04-01`, `YYYY-07-01`, `YYYY-10-01` as
quarterly date stamps, or any convention pandas parses to quarterly periods.
The recipe axis `frequency: quarterly` tells the runtime to interpret the
dates as quarterly.

### Parquet

Same schema as CSV. The Parquet file may have either a `DatetimeIndex` or
a date column as its first column. Column names and numeric typing rules
are identical.

## Running with your own data

### Option A: YAML recipe (recommended)

Set `custom_source_policy: custom_panel_only` and point `custom_source_path`
at your file. The runtime infers CSV vs Parquet from the file extension
(`.csv` -> CSV loader; `.parquet` or `.pq` -> Parquet loader).

**Monthly example**

```yaml
0_meta:
  fixed_axes:
    failure_policy: fail_fast
    reproducibility_mode: seeded_reproducible

1_data:
  fixed_axes:
    custom_source_policy: custom_panel_only
    dataset: fred_md          # labels the panel as "monthly" in the runtime
    frequency: monthly
    horizon_set: custom_list
  leaf_config:
    target: my_target
    target_horizons: [1, 3, 6]
    custom_source_path: data/my_monthly_panel.csv
    sample_start_date: "1990-01"
    sample_end_date: "2019-12"

2_preprocessing:
  fixed_axes:
    transform_policy: no_transform
    outlier_policy: none
    imputation_policy: none_propagate
    frame_edge_policy: keep_unbalanced

3_feature_engineering:
  nodes:
    - {id: src_X, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: predictors}}}
    - {id: src_y, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: target}}}
    - {id: lag_x, type: step, op: lag, params: {n_lag: 1}, inputs: [src_X]}
    - {id: y_h, type: step, op: target_construction, params: {mode: point_forecast, method: direct, horizon: 1}, inputs: [src_y]}
  sinks:
    l3_features_v1: {X_final: lag_x, y_final: y_h}
    l3_metadata_v1: auto

4_forecasting_model:
  nodes:
    - {id: src_X, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: X_final}}}
    - {id: src_y, type: source, selector: {layer_ref: l3, sink_name: l3_features_v1, subset: {component: y_final}}}
    - id: fit_ridge
      type: step
      op: fit_model
      params: {family: ridge, alpha: 1.0, min_train_size: 24, forecast_strategy: direct,
               training_start_rule: expanding, refit_policy: every_origin, search_algorithm: none}
      inputs: [src_X, src_y]
    - {id: predict_ridge, type: step, op: predict, inputs: [fit_ridge, src_X]}
  sinks:
    l4_forecasts_v1: predict_ridge
    l4_model_artifacts_v1: fit_ridge
    l4_training_metadata_v1: auto

5_evaluation:
  fixed_axes:
    primary_metric: mse
    point_metrics: [mse, rmse, mae]

8_output:
  fixed_axes:
    saved_objects: [forecasts, metrics, ranking]
  leaf_config:
    output_directory: ./output/my_study/
```

Run it:

```python
import macroforecast as mf
result = mf.run("my_study.yaml", output_directory="output/my_study/")
print(result.cells[0].sink_hashes)
```

**Quarterly example**

Change two lines:

```yaml
    dataset: fred_qd          # labels the panel as "quarterly"
    frequency: quarterly
```

Everything else stays the same. The quarterly panel uses the same date-index
format rules as monthly; the runtime resolves the frequency from `dataset`.

### Option B: Python helper functions

`mf.load_custom_csv` and `mf.load_custom_parquet` load your file and return
a `RawLoadResult` you can inspect before running a full study.

```python
import macroforecast as mf

# Monthly panel
result = mf.load_custom_csv("data/my_monthly_panel.csv", dataset="fred_md")
print(result.data.head())           # pandas DataFrame, date index
print(result.dataset_metadata)      # frequency, data_through, etc.

# Quarterly panel
result_q = mf.load_custom_csv("data/my_quarterly_panel.csv", dataset="fred_qd")

# Parquet
result_pq = mf.load_custom_parquet("data/my_panel.parquet", dataset="fred_md")
```

`dataset` must be one of `"fred_md"` (monthly), `"fred_qd"` (quarterly), or
`"fred_sd"` (state-level monthly). It labels the schema downstream -- it does
not require your columns to match FRED mnemonics.

These helper functions are for inspection only. To run a full study, use the
YAML recipe path (Option A).

## Merging with FRED-MD or FRED-QD

If you want McCracken-Ng's curated 126 monthly (or 245 quarterly) series
**plus** a few custom series, use `official_plus_custom`:

```yaml
1_data:
  fixed_axes:
    custom_source_policy: official_plus_custom
    dataset: fred_md
    frequency: monthly
  leaf_config:
    target: CPIAUCSL
    target_horizons: [1, 3, 6]
    custom_source_path: data/my_extra_series.csv
    custom_merge_rule: left_join    # inner_join / left_join / outer_join
    sample_start_date: "1990-01"
    sample_end_date: "2019-12"
```

`custom_merge_rule` is required. Choose:

| Rule | Keeps dates from |
|---|---|
| `inner_join` | Rows present in **both** FRED and your file |
| `left_join` | All FRED dates; your series gets `NaN` where missing |
| `outer_join` | All dates in either file |

The custom file must have the same date column format. Duplicate column names
(same mnemonic as a FRED series) will be suffixed by the runtime; rename
before merging if the intent is to replace a FRED series.

## Common pitfalls

| Symptom | Cause | Fix |
|---|---|---|
| `RawParseError: must have a parseable date index` | Date column is not the first column, or the date format is not parseable. | Move the date column first; use ISO format `YYYY-MM-DD`. |
| Target column is silently missing from the panel | Column name in `target:` does not match the CSV header (case-sensitive). | Check column names with `pd.read_csv("file.csv").columns`. |
| All-NaN columns dropped silently | A series has no numeric values after type coercion. | Inspect the raw file for text entries or hidden characters. |
| `official_transform_policy` has no effect | `custom_panel_only` disables FRED T-code application. | Apply your own transforms in `2_preprocessing` via `transform_policy: tcode` and a custom T-code map, or use `no_transform` and handle it upstream. |
| `custom_source_path` not found at runtime | Relative path resolves from where `mf.run()` is called, not from the YAML location. | Use an absolute path or change your working directory to the project root before calling `mf.run()`. |
| `official_plus_custom` fails with date mismatch | Your extra file's date range does not overlap the FRED vintage dates. | Use `outer_join` or trim your sample dates to the intersection. |

For FRED-MD / FRED-QD column definitions and T-code reference, see
[FRED Datasets in Recipes](../recipe_api/fred_datasets.md).

## See also

- [FRED Datasets in Recipes](../recipe_api/fred_datasets.md) -- FRED-MD, FRED-QD, FRED-SD reference status
- [Custom function quickstart](../for_recipe_authors/custom_function_quickstart.md) -- bring your own model, preprocessor, or target transformer
- [Quickstart](quickstart.md) -- minimal recipe walkthrough
- [First study](first_study.md) -- full study with diagnostics, tests, and output