macroforecast.forecast_analysis#

Back to reference

macroforecast.forecast_analysis inspects outputs from macroforecast.forecasting.run(...). It does not refit models and does not change forecasts. It reads two sources:

Source

Used for

ForecastResult.forecasts

Forecast-vs-actual rows, residuals, rolling loss, model-selection metadata, and combination rows.

Saved model sidecar JSON from stored_model["metadata_path"]

Coefficients, intercepts, hyperparameters, and fit diagnostics recorded by macroforecast.models.

macroforecast.forecast_diagnostic remains available as a compatibility alias.

This module corresponds to the old generator/model diagnostic role, but the new API is callable-first. No YAML, recipe graph, or runtime materialization is involved.

Public Flow#

import macroforecast as mf

result = mf.forecasting.run(
    processed_panel,
    ["ridge", "random_forest"],
    window=window_spec,
    features=feature_spec,
    model_selection=mf.model_selection.grid({"alpha": [0.01, 0.1]}),
    combination=["mean", "inverse_mspe"],
)

analysis = mf.forecast_analysis.diagnose_forecasts(
    result,
    rolling_window=12,
    include_residual_acf=True,
    include_residual_qq=True,
)

diagnose_forecasts#

macroforecast.forecast_analysis.diagnose_forecasts(
    forecasts,
    *,
    include_fitted: bool = True,
    include_residuals: bool = True,
    include_residual_acf: bool = False,
    include_residual_qq: bool = False,
    include_rolling_loss: bool = True,
    rolling_window: int = 12,
    rolling_metric: str = "rmse",
    include_forecast_scale: bool = False,
    levels=None,
    scale_view: str = "both_overlay",
    back_transform=None,
    include_training_loss: bool = False,
    include_rolling_training_loss: bool = False,
    training_loss_metric: str = "rmse",
    include_first_vs_last: bool = False,
    include_dfm_idiosyncratic_acf: bool = False,
    include_dfm_factor_stability: bool = False,
    include_coefficients: bool = True,
    include_parameter_stability: bool = True,
    include_tuning: bool = True,
    include_tuning_objective: bool = True,
    include_hyperparameters: bool = True,
    include_tuning_scores: bool = True,
    include_ensemble_weights: bool = True,
    include_ensemble_concentration: bool = True,
    include_member_contribution: bool = False,
    include_stage_updates: bool = True,
    include_combined: bool = True,
) -> ForecastDiagnosticReport

Input#

Name

Type

Default

Choices

forecasts

ForecastResult or pandas.DataFrame

required

Runner result or forecast table.

include_fitted

bool

True

Include row-level fitted-vs-actual table.

include_residuals

bool

True

Include grouped residual summary.

include_residual_acf

bool

False

Include residual autocorrelation by model/horizon.

include_residual_qq

bool

False

Include normal QQ reference points for residuals.

include_rolling_loss

bool

True

Include rolling OOS loss table.

rolling_window

positive int

12

Rolling window length in forecast rows within each group.

rolling_metric

str

"rmse"

"mse", "rmse", "mae", or "bias".

include_forecast_scale

bool

False

Include transformed and/or back-transformed forecast rows.

levels

Series, DataFrame, or None

None

Original-level target series used by forecast_scale_view for change/growth back transforms.

scale_view

str

"both_overlay"

"transformed_only", "back_transformed_only", or "both_overlay".

back_transform

callable or None

None

Optional custom row-level back-transform function.

include_training_loss

bool

False

Read saved model sidecar fit metrics.

include_rolling_training_loss

bool

False

Add rolling traces of saved fit metrics.

training_loss_metric

str

"rmse"

Metric name from model sidecar diagnostics, usually "rmse", "mse", "mae", "mean", "std", or "n".

include_first_vs_last

bool

False

Include first-versus-last forecast comparison by model/horizon.

include_dfm_idiosyncratic_acf

bool

False

Include ACF of DFM residual diagnostics when sidecars contain DFM residuals.

include_dfm_factor_stability

bool

False

Include filtered-factor stability summaries when sidecars contain DFM factors.

include_coefficients

bool

True

Read saved model sidecar JSON and return coefficient paths when available.

include_parameter_stability

bool

True

Summarize coefficient stability over origins/windows.

include_tuning

bool

True

Include per-forecast model-selection metadata.

include_tuning_objective

bool

True

Extract the selected objective and best score from tuning metadata.

include_hyperparameters

bool

True

Return selected hyperparameter values over time.

include_tuning_scores

bool

True

Summarize tuning best-score distributions.

include_ensemble_weights

bool

True

Reconstruct weights for identifiable combination methods.

include_ensemble_concentration

bool

True

Summarize ensemble-weight concentration by combined forecast row.

include_member_contribution

bool

False

Attach member prediction, weight, and contribution rows where weights are identifiable.

include_stage_updates

bool

True

Include preprocessing/feature stage update trace from result metadata.

include_combined

bool

True

Include combined forecast rows in fitted/residual/loss tables.

Output#

Returns ForecastDiagnosticReport.

Field

Type

Meaning

overview

dict

Forecast count, model/horizon counts, date range, missing prediction/actual counts, stored-model count, model-selection count, uncertainty count.

fitted

DataFrame or None

Forecast rows plus residual, absolute error, squared error, and percent error.

residuals

DataFrame or None

Grouped residual diagnostics by model and horizon.

residual_acf

DataFrame or None

Residual autocorrelation table by model, horizon, and lag.

residual_qq

DataFrame or None

Residual quantiles paired with standard-normal theoretical quantiles.

rolling_loss

DataFrame or None

Rolling OOS loss by model and horizon.

forecast_scale

DataFrame or None

Transformed/back-transformed forecast rows.

training_loss

DataFrame or None

Sidecar fit metrics by origin/model/horizon.

rolling_training_loss

DataFrame or None

Rolling sidecar fit metrics.

first_vs_last

DataFrame or None

First and last forecast rows in each group plus changes.

coefficients

DataFrame or None

Coefficient path from saved model sidecars when available.

parameter_stability

DataFrame or None

Coefficient stability summary across origins/windows.

tuning

DataFrame or None

Model-selection event trace from forecast table metadata.

tuning_objective

DataFrame or None

Selected objective, best score, retune flag, and trial counts by forecast row.

hyperparameters

DataFrame or None

Long-form selected hyperparameter values over forecast rows.

tuning_scores

DataFrame or None

Tuning-score distribution summary by model/horizon/method.

ensemble_weights

DataFrame or None

Reconstructed combination weights for supported methods.

ensemble_concentration

DataFrame or None

Herfindahl/effective-number summary of ensemble weights.

member_contribution

DataFrame or None

Member prediction and weighted contribution rows.

dfm_idiosyncratic_acf

DataFrame or None

DFM idiosyncratic residual ACF from saved fit diagnostics.

dfm_factor_stability

DataFrame or None

Filtered-factor mean, variance, drift, and autocorrelation summaries.

stage_updates

DataFrame or None

Runner stage update trace.

metadata

dict

Input metadata plus compact forecast_analysis stage.

Returned tables carry attrs["macroforecast_metadata"] == report.metadata.

Metadata#

diagnose_forecasts(...) attaches one compact stage:

analysis.metadata["forecast_analysis"]

The stage records:

Key

Meaning

overview

Compact counts: forecasts, models, combined rows, stored-model rows, and model-selection rows.

options

Residual, rolling-loss, coefficient, tuning, ensemble, and stage-update choices.

tables

Number of rows generated by each analysis table.

Helper Functions#

forecast_overview#

macroforecast.forecast_analysis.forecast_overview(forecasts) -> dict

Returns counts and coverage for one forecast table:

Key

Meaning

n_forecasts, n_models, models, horizons

Forecast-table shape.

start, end

Forecast date range.

missing_prediction_count, missing_actual_count

Missingness in forecast/actual columns.

combined_count, base_model_count

Combination versus base model rows.

stored_model_count

Rows with saved model metadata.

selection_count, retuned_count

Rows with model-selection metadata and rows that retuned.

variance_prediction_count, quantile_prediction_count

Uncertainty output coverage.

fitted_vs_actual#

macroforecast.forecast_analysis.fitted_vs_actual(
    forecasts,
    *,
    include_combined: bool = True,
    drop_missing_actual: bool = True,
) -> pandas.DataFrame

Returns row-level diagnostics:

Column

Meaning

prediction, actual

Forecast and realized target from the runner.

residual

actual - prediction.

abs_error

Absolute residual.

squared_error

Squared residual.

percent_error

residual / abs(actual); zero actuals become missing.

residual_report#

macroforecast.forecast_analysis.residual_report(
    forecasts,
    *,
    group_by: Sequence[str] = ("model", "horizon"),
    include_combined: bool = True,
) -> pandas.DataFrame

Default grouping is model by horizon. Output columns include n, bias, mae, mse, rmse, residual_sd, residual_autocorr1, mean_actual, mean_prediction, first_date, and last_date. The lag-1 autocorrelation is computed after sorting each group by origin_pos and date; input row order does not affect the result.

residual_autocorrelation#

macroforecast.forecast_analysis.residual_autocorrelation(
    forecasts,
    *,
    max_lag: int = 12,
    group_by: Sequence[str] = ("model", "horizon"),
    include_combined: bool = True,
) -> pandas.DataFrame

Returns one row per model/horizon/lag with residual ACF values. This supports forecast-residual correlogram checks without rerunning the model.

residual_qq#

macroforecast.forecast_analysis.residual_qq(
    forecasts,
    *,
    n_quantiles: int = 21,
    group_by: Sequence[str] = ("model", "horizon"),
    include_combined: bool = True,
) -> pandas.DataFrame

Returns empirical residual quantiles and matching standard-normal theoretical quantiles. Use it for QQ plots or tail-shape checks. It does not run a normality test.

rolling_loss#

macroforecast.forecast_analysis.rolling_loss(
    forecasts,
    *,
    metric: str = "rmse",
    window: int = 12,
    min_periods: int | None = None,
    group_by: Sequence[str] = ("model", "horizon"),
    include_combined: bool = True,
) -> pandas.DataFrame

Computes rolling OOS loss inside each group. rmse rolls the squared error and takes the square root after averaging.

forecast_scale_view#

macroforecast.forecast_analysis.forecast_scale_view(
    forecasts,
    *,
    levels=None,
    target=None,
    transform=None,
    view="both_overlay",
    back_transform=None,
    include_combined=True,
) -> pandas.DataFrame

Returns forecast rows on the transformed scale, the original level scale, or both. The runner records target_transform when available. Built-in back-transform support covers level, one-step change, growth, and log_growth. For path-average targets or custom transforms, pass back_transform, a callable that returns either a mapping with prediction and actual or a two-value sequence.

Output columns include date, origin, horizon, model, scale, target_transform, prediction, actual, residual, and back_transform_available.

select_forecast_origins#

macroforecast.forecast_analysis.select_forecast_origins(
    forecasts,
    *,
    view="all_origins",
    every_n=12,
    include_last=True,
    include_combined=True,
) -> pandas.DataFrame

Filters the forecast table to one of three origin views: "all_origins", "last_origin_only", or "every_n_origins". The last origin is retained by default in "every_n_origins" so the final test window is visible even when it does not land exactly on the spacing.

first_vs_last_forecast#

macroforecast.forecast_analysis.first_vs_last_forecast(
    forecasts,
    *,
    group_by=("model", "horizon"),
    include_combined=True,
) -> pandas.DataFrame

Compares the first and last forecast row inside each group. Output includes first/last dates, origins, predictions, actuals, residuals, and changes. This is the callable equivalent of the old first-window versus last-window view.

training_loss_trace#

macroforecast.forecast_analysis.training_loss_trace(
    forecasts,
    *,
    load_pickle=False,
) -> pandas.DataFrame

Reads saved model sidecar JSON and returns fit metrics recorded by macroforecast.models, usually n, mean, std, mae, mse, and rmse. It does not unpickle models by default. Path-average forecasts can save one fit per step; those rows use fit_step.

rolling_training_loss#

macroforecast.forecast_analysis.rolling_training_loss(
    forecasts_or_trace,
    *,
    metric="rmse",
    window=12,
    min_periods=None,
    group_by=("model", "horizon"),
    load_pickle=False,
) -> pandas.DataFrame

Computes a rolling average of sidecar training metrics. It accepts either a runner ForecastResult/forecast table or the output of training_loss_trace(...).

dfm_idiosyncratic_acf#

macroforecast.forecast_analysis.dfm_idiosyncratic_acf(
    source,
    *,
    max_lag=12,
    load_pickle=False,
) -> pandas.DataFrame

Reads DFM residual diagnostics from a ModelFit, ForecastResult, or forecast table with saved model sidecars. Output columns include model context, residual name, lag, ACF, and observation count.

dfm_factor_stability#

macroforecast.forecast_analysis.dfm_factor_stability(
    source,
    *,
    load_pickle=False,
) -> pandas.DataFrame

Reads filtered DFM factor diagnostics from a ModelFit, ForecastResult, or forecast table with saved model sidecars. Output includes factor name, count, mean, standard deviation, variance, first/last values, drift, and lag-1 autocorrelation.

coefficient_trace#

macroforecast.forecast_analysis.coefficient_trace(
    forecasts,
    *,
    include_intercept: bool = True,
    load_pickle: bool = False,
    models: Iterable[str] | None = None,
) -> pandas.DataFrame

Reads stored_model["metadata_path"] sidecar JSON for each forecast row and extracts fit.fit.diagnostics.coefficients. It does not unpickle model objects by default. Set load_pickle=True only for trusted local artifacts when a sidecar is missing.

Returned rows include date, origin, origin_pos, horizon, model, fit_step, feature, coefficient, component, and stored artifact paths.

parameter_stability#

macroforecast.forecast_analysis.parameter_stability(
    forecasts,
    *,
    include_intercept: bool = True,
    load_pickle: bool = False,
    group_by: Sequence[str] = ("model", "horizon", "feature"),
    models: Iterable[str] | None = None,
) -> pandas.DataFrame

Summarizes coefficient traces across forecast origins. Output includes count, mean, standard deviation, min/max, first/last coefficient, and sign-change count. It is available only when model sidecars contain coefficient metadata.

tuning_trace#

macroforecast.forecast_analysis.tuning_trace(forecasts) -> pandas.DataFrame

Returns one row per forecast row with model-selection metadata. It records method, metric, validation window, retune flag, best score, best params, trial counts, and policy. The current runner stores selection-event summaries, not full per-trial histories.

tuning_objective_trace#

macroforecast.forecast_analysis.tuning_objective_trace(forecasts) -> pandas.DataFrame

Extracts the objective-facing part of tuning_trace: model, horizon, origin, method, metric, validation window, retune flag, best score, trial counts, and policy. Use this when the question is whether model selection itself behaved consistently.

Output columns are date, origin, origin_pos, horizon, model, model_spec, method, metric, window, best_score, retuned, n_trials, n_successful, n_failed, and policy.

hyperparameter_path#

macroforecast.forecast_analysis.hyperparameter_path(forecasts) -> pandas.DataFrame

Returns selected hyperparameters in long form: one row per forecast row and parameter. This is the callable table behind hyperparameter-over-time plots.

tuning_score_distribution#

macroforecast.forecast_analysis.tuning_score_distribution(
    forecasts,
    *,
    group_by: Sequence[str] = ("model", "horizon", "method"),
) -> pandas.DataFrame

Summarizes selected tuning scores by group. Output includes count, mean, standard deviation, min, max, and median best score.

ensemble_weights_over_time#

macroforecast.forecast_analysis.ensemble_weights_over_time(
    forecasts,
    *,
    unsupported: str = "skip",
) -> pandas.DataFrame

Reconstructs weights when the combination method has identifiable weights.

Method

Weight support

mean

Equal weights.

inverse_mspe, dmspe

Recomputed from historical squared forecast errors using the same discount/min-weight parameters.

best_n

Equal weights over historically best models.

median, trimmed_mean, winsorized_mean

No unique model weights; skipped by default.

unsupported controls non-identifiable methods: "skip", "nan", or "raise".

ensemble_weight_concentration#

macroforecast.forecast_analysis.ensemble_weight_concentration(forecasts) -> pandas.DataFrame

Summarizes reconstructed weights for each combined forecast row. Output includes member count, min/max weight, Herfindahl index, effective number of members, and entropy. These are concentration diagnostics for identifiable weights; median, trimmed-mean, and custom combinations can be non-identifiable.

ensemble_member_contribution#

macroforecast.forecast_analysis.ensemble_member_contribution(forecasts) -> pandas.DataFrame

Returns long-form member contribution rows when weights are identifiable: member model, member prediction, weight, contribution, and combined prediction.

stage_update_trace#

macroforecast.forecast_analysis.stage_update_trace(forecasts) -> pandas.DataFrame

Returns preprocessing and feature-engineering stage update records stored in ForecastResult.metadata["stages"]. This is empty for direct FeatureSet inputs because the runner does not refit preprocessing/features in that path.

custom_forecast_diagnostic#

macroforecast.forecast_analysis.custom_forecast_diagnostic(
    forecasts,
    func,
    *,
    name=None,
    metadata=None,
    **params,
) -> pandas.DataFrame

Runs one user diagnostic on a runner ForecastResult or forecast table. This is for inspection only; it does not refit models, recompute model selection, or change forecast rows.

Callable signature:

func(forecasts, *, metadata=None, **params)

The forecasts argument passed to the callable is a copy of the forecast DataFrame. Accepted callable outputs are DataFrame, Series, mapping, or a sequence convertible to a DataFrame.

The returned table carries:

Attr

Meaning

macroforecast_metadata_schema.kind

Always custom_forecast_diagnostic.

macroforecast_metadata_schema.method

name or callable name.

macroforecast_metadata

Input metadata plus a custom_forecast_diagnostic stage.

Example:

def tail_errors(forecasts, *, metadata=None, q=0.95):
    err = (forecasts["actual"] - forecasts["prediction"]).abs()
    return {"q": q, "abs_error": float(err.quantile(q))}

diag = mf.forecast_analysis.custom_forecast_diagnostic(
    result,
    tail_errors,
    name="tail_errors",
    q=0.9,
)

Boundary#

Question

Use

Run windowed forecasts and combinations

mf.forecasting

Score and rank forecasts

mf.metrics

Run forecast-comparison tests

mf.tests

Inspect forecast rows, residuals, tuning, coefficients, weights, and stage updates

mf.forecast_analysis