# `failure_policy` [Back to L0](../index.md) | [Browse all axes](../../browse_by_axis.md) | [Browse all options](../../browse_by_option.md) > Axis ``failure_policy`` on sub-layer ``l0_a`` (layer ``l0``). ## Sub-layer **l0_a** ## Axis metadata - Default: `'fail_fast'` - Sweepable: False - Status: operational ## Operational status summary - Operational: 2 option(s) - Future: 0 option(s) ## Options ### `fail_fast` -- operational Stop the entire study on the first cell that errors. When the cell-loop catches an exception in any sweep cell, ``fail_fast`` raises immediately and the manifest is **not** written. The remaining cells are skipped. This is the default because the typical authoring failure mode is a schema or data error that affects every cell -- catching it after the first cell saves wall-clock and surfaces the problem with a single traceback rather than a wall of identical errors. For sweeps where cells *can* fail independently (e.g., one model family throws on a particular target while others succeed), use ``continue_on_failure`` instead so partial results survive. **When to use** Default for every authoring iteration. Pick this while the recipe is still being tuned; the first failure tells you exactly what to fix without waiting for a full sweep to finish. **When NOT to use** Long-running production sweeps where a transient failure on one cell (e.g., a memory hiccup on one bootstrap iteration) should not abort the whole study. **References** * macroforecast design Part 1, L0 §A: 'fail_fast vs continue_on_failure is the canonical execution-policy choice for any cell-loop study.' **Related options**: [`continue_on_failure`](#continue-on-failure) **Examples** *Author-time recipe (default)* ```yaml 0_meta: fixed_axes: failure_policy: fail_fast ``` _Last reviewed 2026-05-04 by macroforecast author._ ### `continue_on_failure` -- operational Record failed cells in the manifest and keep the sweep running. Per-cell exceptions are caught by the cell loop, the cell's ``CellExecutionResult.error`` and ``traceback`` fields are populated, and the loop moves on to the next cell. The manifest's ``cells_summary`` distinguishes succeeded from failed cells; the failed-cell entries carry the captured traceback for post-hoc diagnosis. Replication still runs end-to-end on a manifest with failed cells: ``replicate()`` re-executes every cell and verifies the failure occurs in the same place with the same exception class. **When to use** Production horse-race sweeps where partial coverage is more useful than no coverage. Common examples: a 50-cell model-family sweep where one optional family (xgboost without the extra) fails to import, or a long bootstrap where a single iteration trips a numerical edge case. **When NOT to use** Authoring iteration -- failures are usually configuration problems that affect every cell, and ``fail_fast`` shortens the feedback loop. **References** * macroforecast design Part 1, L0 §A: 'continue_on_failure preserves partial coverage; the manifest carries enough context to diagnose each failed cell after the run.' **Related options**: [`fail_fast`](#fail-fast) **Examples** *Production sweep over many model families* ```yaml 0_meta: fixed_axes: failure_policy: continue_on_failure ``` _Last reviewed 2026-05-04 by macroforecast author._