macroforecast.tests#
macroforecast.tests owns forecast-comparison tests and residual diagnostics.
It does not compute general scoring tables, fit models, or choose windows.
Use the namespace form:
import macroforecast as mf
mf.tests.dm_test(loss_a, loss_b, horizon=1)
Top-level shortcuts such as mf.dm_test(...) are intentionally not exported.
TestResult#
Most pairwise forecast-comparison tests return TestResult.
macroforecast.tests.TestResult(
statistic,
p_value,
decision,
alternative,
correction_policy=None,
n_obs=None,
metadata={},
)
Field |
Meaning |
|---|---|
|
Test statistic, or |
|
P-value, or |
|
|
|
|
|
HAC or small-sample correction label. |
|
Number of aligned observations used. |
|
Test-specific details. |
Methods:
Method |
Output |
|---|---|
|
JSON-ready dictionary with |
|
JSON text and optional file write. |
|
Compact string summary. |
Custom Tests#
custom_test#
macroforecast.tests.custom_test(
name,
func,
*args,
alternative="two_sided",
alpha=0.05,
correction_policy=None,
metadata=None,
**params,
) -> TestResult
Runs a user-supplied forecast test and coerces the result to TestResult.
The callable receives *args and **params. It may return:
Return type |
Meaning |
|---|---|
|
Used directly, with custom metadata merged. |
mapping |
Must contain |
|
Decision is |
|
Same as above plus sample size. |
def sign_test_stat(loss_a, loss_b):
diff = pd.Series(loss_a).sub(pd.Series(loss_b)).dropna()
return {
"statistic": float((diff < 0).mean()),
"p_value": 0.04,
"n_obs": len(diff),
}
result = mf.tests.custom_test(
"sign_loss_test",
sign_test_stat,
loss_a,
loss_b,
)
custom_test() records the callable name, parameters, alpha, and
custom=True in result.metadata.
Equal Predictive Accuracy#
dm_test#
macroforecast.tests.dm_test(
loss_a,
loss_b,
*,
horizon=1,
correction="hln",
kernel="acf",
input_type="loss",
power=2.0,
alternative="two_sided",
alpha=0.05,
)
Input: two aligned loss series by default. Set input_type="error" to match
forecast::dm.test(e1, e2, h, power, varestimator) from the R forecast
package: the function then computes abs(e1)^power - abs(e2)^power internally.
Output: TestResult for the Diebold-Mariano equal predictive accuracy test.
correction="hln" applies the Harvey-Leybourne-Newbold small-sample
correction. P-values use a Student-t reference distribution with df=n-1,
matching forecast/R/DM2.R::dm.test.
kernel="acf" matches the R varestimator="acf" autocovariance estimator.
kernel="bartlett" or "newey_west" uses the Bartlett-weighted estimator,
matching the R varestimator="bartlett" option.
R/source alignment:
Setting |
Alignment |
|---|---|
|
Same statistic and Student-t p-value as |
|
Same statistic and Student-t p-value as |
|
Uses the same DM statistic after accepting precomputed losses. This is convenient for custom losses, but it is not a direct call-equivalent to R |
|
Omits the Harvey-Leybourne-Newbold small-sample factor used by |
|
Macroforecast extension. These HAC estimators are not options in R |
Returned metadata includes statistic_type="t",
null_hypothesis="equal predictive accuracy", p_value_status,
p_value_reference, source_reference, r_reference, r_alignment, and
r_argument_mapping.
gw_test#
macroforecast.tests.gw_test(
loss_a,
loss_b,
*,
horizon=1,
correction="hln",
kernel="acf",
input_type="loss",
power=2.0,
alternative="two_sided",
)
Input: two aligned loss series. Output: TestResult using the package’s
Giacomini-White-compatible loss differential surface. This callable uses the
same aligned DM-style loss-differential statistic; conditional predictive
ability with time-varying fluctuation paths is exposed separately through
conditional_predictive_ability_test(...).
Source boundary: gw_test() does not claim exact R-package alignment. It
preserves the legacy callable surface by reusing the DM/HLN loss-differential
statistic on aligned inputs. For the package’s time-varying conditional
predictive-ability path, use conditional_predictive_ability_test(...).
dmp_test#
macroforecast.tests.dmp_test(
loss_differences,
*,
kernel="newey_west",
alpha=0.05,
)
Input: one loss-difference series or a sequence of loss-difference series.
Output: TestResult for a stacked Diebold-Mariano-Pesaran-style joint test.
The test stacks finite loss-difference values, computes a HAC standard error
for the stacked mean, and reports a two-sided standard-normal p-value. No exact
R-package comparator is claimed in the checked R sources. Metadata records
statistic_type="z", null_hypothesis, p_value_status,
p_value_reference, source_reference, and r_alignment.
equal_predictive_tests#
macroforecast.tests.equal_predictive_tests(
loss_a,
loss_b,
*,
tests=("dm", "gw", "dmp"),
error_a=None,
error_b=None,
horizon=1,
correction="hln",
kernel="acf",
alpha=0.05,
) -> pandas.DataFrame
Runs multiple equal-predictive-ability tests and stacks one row per test.
Supported names are dm, gw, dmp, and hn. hn requires error_a and
error_b because Harvey-Newbold is an encompassing test on forecast errors.
Output: a pandas.DataFrame with one row per requested test. The table keeps
the full component metadata in the metadata column and also promotes the
paper-facing fields below to top-level columns.
Column |
Meaning |
|---|---|
|
Requested key and display name. |
|
Reference family ( |
|
P-value, availability flag, and reference distribution. |
|
Rejection flag, alternative direction, and null statement. |
|
Small-sample/HAC policy and aligned observation count. |
|
Provenance and source-comparison fields. |
|
Full |
Current source alignment by row:
Test |
R/source status |
|---|---|
|
Exact |
|
Legacy GW-compatible DM-style surface; no exact R comparator claimed. |
|
Macroforecast stacked HAC screen; no exact R comparator claimed. |
|
Legacy encompassing covariance approximation; not |
For paper output, pass this table to
macroforecast.reporting.test_report_table(...). For an appendix/audit table
that spells out source and R alignment, use
macroforecast.reporting.test_provenance_table(...).
harvey_newbold_test#
macroforecast.tests.harvey_newbold_test(
error_a,
error_b,
*,
horizon=1,
kernel="newey_west",
small_sample=True,
alpha=0.05,
)
Input: two forecast-error series. Output: one-sided TestResult for the legacy
forecast-error covariance approximation.
Source note: this is not forecast::dm.test. The R forecast package function
implements Harvey-Leybourne-Newbold Diebold-Mariano equal-accuracy testing.
harvey_newbold_test() remains a callable encompassing-style covariance
approximation and records that distinction in result.metadata.
The callable forms d_t = e_a,t * (e_a,t - e_b,t), computes a HAC standard
error, optionally applies an HLN-style small-sample factor, and reports a
one-sided Student-t upper-tail p-value. Metadata records
statistic_type="t", p_value_status, p_value_reference,
source_reference, r_reference=None, and r_alignment.
Alias: hn_test.
Nested And Encompassing Tests#
clark_west_test#
macroforecast.tests.clark_west_test(
loss_small,
loss_large,
forecast_small,
forecast_large,
*,
horizon=1,
cw_adjustment=True,
kernel="newey_west",
alpha=0.05,
)
Input: small-model loss, large-model loss, and both forecast series. Output:
one-sided TestResult for the Clark-West nested forecast comparison.
Statistic:
q_t = e_r,t^2 - e_u,t^2 + (f_r,t - f_u,t)^2
z = mean(q_t) / sqrt(LRV(q_t) / n)
Here r is the restricted/small model and u is the unrestricted/large model.
The implementation follows the standard adjusted MSPE differential used by
Clark-West references such as GAUSS cwTest and HypothesisTests.jl
ClarkWestTest. Archived R examples can differ by sign convention, so this
page treats the formula above as the package contract.
Alias: cw_test.
enc_new_test#
macroforecast.tests.enc_new_test(
error_small,
error_large,
*,
critical_value=None,
alpha=0.05,
)
Input: restricted/small-model forecast errors and unrestricted/large-model
forecast errors. Output: one-sided TestResult.
Statistic:
c_t = e_r,t * (e_r,t - e_u,t)
ENC-NEW = n * mean(c_t) / mean(e_u,t^2)
Default p_value is None because Clark-McCracken nested forecast
encompassing tests have nonstandard distributions. Pass a design-appropriate
critical_value to get a boolean decision.
enc_t_test#
macroforecast.tests.enc_t_test(
error_small,
error_large,
*,
horizon=1,
kernel="newey_west",
critical_value=None,
normal_approximation=False,
alpha=0.05,
)
Input: restricted/small-model forecast errors and unrestricted/large-model
forecast errors. Output: one-sided TestResult.
Statistic:
c_t = e_r,t * (e_r,t - e_u,t)
ENC-T = mean(c_t) / sqrt(LRV(c_t) / n)
Default p_value is None. Set normal_approximation=True only for
diagnostic screening, or pass critical_value for a design-specific decision.
nested_tests#
macroforecast.tests.nested_tests(
loss_small,
loss_large,
*,
forecast_small=None,
forecast_large=None,
error_small=None,
error_large=None,
tests=("clark_west", "enc_new", "enc_t"),
horizon=1,
kernel="newey_west",
enc_critical_value=None,
enc_normal_approximation=False,
alpha=0.05,
) -> pandas.DataFrame
Runs multiple nested-model tests and stacks one row per test. Clark-West
requires forecast_small and forecast_large; enc_new and enc_t require
error_small and error_large. This separation is intentional because
Clark-West is an adjusted MSPE differential while ENC-NEW and ENC-T are
forecast-error encompassing covariance statistics.
Directional Accuracy Tests#
directional_accuracy_test#
macroforecast.tests.directional_accuracy_test(
y_true,
y_pred,
*,
threshold=0.0,
method="pesaran_timmermann",
alpha=0.05,
)
Input: realized values and forecasts. Output: TestResult. Supported methods
are pesaran_timmermann, anatolyev_gerko, and henriksson_merton.
The pesaran_timmermann and anatolyev_gerko branches are aligned with
R tstests/R/dac.R::dac_test and rugarch/R/rugarch-tests.R::DACTest. The
p-value is a one-sided upper-tail normal p-value, 1 - Phi(statistic).
Forecasts that are constant after subtracting threshold are rejected because
the directional tests are undefined for a constant sign forecast.
Options:
Option |
Default |
Choices |
Meaning |
|---|---|---|---|
|
|
numeric |
Values above this threshold are positive-direction observations. |
|
|
|
Directional statistic to compute. |
|
|
probability in |
Rejection level. |
Method notes:
Method |
Null |
Statistic input |
|---|---|---|
|
No sign predictability. |
Exact R alignment with |
|
No excess profitability. |
Exact R alignment with |
|
No market-timing skill. |
Macroforecast extension. No exact comparator in |
R/source alignment:
Branch |
R comparator |
Notes |
|---|---|---|
|
|
Uses |
|
|
Uses |
|
None |
Kept as a callable screening diagnostic, not claimed as an R-package-aligned DAC branch. |
Zero rule: R uses strict positivity, actual > 0 and forecast > 0.
macroforecast applies the same strict rule after subtracting threshold, so
values equal to threshold are treated as non-positive.
Aliases:
Alias |
Equivalent call |
|---|---|
|
|
|
|
|
|
Density And Interval Diagnostics#
density_interval_tests#
macroforecast.tests.density_interval_tests(
pit,
*,
alpha=0.05,
n_bins=10,
pit_lag=1,
)
Input: probability integral transform values in [0, 1]. Output: JSON-ready
dictionary with metadata_schema.kind="density_interval_tests" plus
Berkowitz, KS, Kupiec POF, Christoffersen independence, Engle-Manganelli DQ,
Du-Escanciano shortfall, PIT histogram, and PIT autocorrelation diagnostics.
Options:
Option |
Default |
Meaning |
|---|---|---|
|
|
Tail probability for VaR/shortfall-style hit tests. |
|
|
Number of PIT histogram bins. |
|
|
Lag used for PIT autocorrelation, Berkowitz AR lag, and Du-Escanciano conditional shortfall lag. |
Output keys:
Key |
Meaning |
|---|---|
|
Berkowitz density LR test plus Jarque-Bera normality check after normal score transform. |
|
Kolmogorov-Smirnov test against uniform PIT. |
|
Unconditional hit-rate test at |
|
Markov independence test for hits. |
|
PIT hit-only DQ proxy. Use |
|
Du-Escanciano unconditional and conditional shortfall tests. |
|
One record per histogram bin. |
|
|
|
Composite provenance metadata. Component-level diagnostics also carry their own R/source metadata. |
R/source alignment:
Diagnostic |
Reference |
|---|---|
Berkowitz |
|
Du-Escanciano shortfall |
|
Kupiec/Christoffersen |
|
PIT hit-only DQ proxy |
No direct R comparator. It is a PIT-hit lag diagnostic inside this composite wrapper, not the full Engle-Manganelli VaR DQ test. |
Boundary handling: values outside [0, 1] raise. Boundary PIT values 0 and
1 are accepted as PIT values but clipped internally for the normal-score
Berkowitz transform to avoid infinite ARIMA inputs.
shortfall_de_test#
macroforecast.tests.shortfall_de_test(
pit,
*,
alpha=0.05,
lags=1,
boot=False,
n_boot=2000,
random_state=0,
) -> dict
Input: PIT values in [0, 1]. Output: JSON-ready dictionary with
metadata_schema.kind="shortfall_de_test".
The unconditional statistic is the sample mean of cumulative tail shortfall,
mean((alpha - pit) * 1{pit <= alpha} / alpha). The conditional statistic is
a portmanteau statistic on autocorrelations of that series centered by
alpha / 2. With boot=False, the unconditional p-value uses the
Du-Escanciano normal approximation and the conditional p-value uses
Chi-squared(lags). With boot=True, both p-values use simulated uniform PIT
draws with the same sample size.
dynamic_quantile_test#
macroforecast.tests.dynamic_quantile_test(
y_true,
var,
*,
alpha=0.05,
lag=1,
lag_hit=1,
lag_var=1,
) -> TestResult
Input: realized values and one-step-ahead lower-tail VaR forecasts. Output:
TestResult for the Engle-Manganelli dynamic quantile test.
This is the full VaR DQ callable. It is separate from
density_interval_tests(...) because the exact DQ statistic needs realized
values and VaR forecasts, not PIT values alone.
R/source alignment: segMGarch/R/DQtest.R::DQtest. The hit series is
1 - alpha when y_true < var and -alpha otherwise. The regressor matrix
contains a constant, lag-aligned VaR forecasts, lag_hit lagged hit columns,
and lagged squared realized values. The statistic is
Hit' X (X'X)^(-1) X' Hit / (alpha * (1 - alpha)), with a chi-squared
reference distribution using the number of columns of X.
R argument mapping: segMGarch::DQtest names the VaR probability
VaR_level and converts it internally to the lower-tail probability
1 - VaR_level. macroforecast accepts the lower-tail probability directly
as alpha; therefore a 5% lower-tail VaR is alpha=0.05, corresponding to
VaR_level=0.95 in the R function.
Source: https://rdrr.io/cran/segMGarch/src/R/DQtest.R
Options:
Option |
Default |
Meaning |
|---|---|---|
|
|
Lower-tail probability. A 5% VaR uses |
|
|
Lag used for squared realized values. |
|
|
Number of lagged hit columns. |
|
|
Lag alignment for VaR forecasts. |
pit_histogram#
macroforecast.tests.pit_histogram(pit, *, n_bins=10) -> pandas.DataFrame
Returns one row per PIT histogram bin with observed count, expected count under uniformity, and deviation.
pit_autocorrelation_test#
macroforecast.tests.pit_autocorrelation_test(
pit,
*,
lag=1,
alpha=0.05,
) -> TestResult
Runs a normal-approximation test for serial dependence in PIT values.
interval_coverage_test#
macroforecast.tests.interval_coverage_test(
y_true,
lower,
upper,
*,
alpha=0.05,
) -> dict
Runs Kupiec POF, Christoffersen independence, combined conditional coverage,
and Christoffersen-Pelletier duration diagnostics for forecast intervals.
alpha is the expected non-coverage rate, so a 90% interval uses
alpha=0.10.
Boundary cases follow the likelihood-ratio convention used by R
tstests::var_cp_test and rugarch::VaRTest: zero violations do not
automatically imply a passing Kupiec statistic; the restricted Bernoulli
likelihood is compared with the boundary unrestricted likelihood.
The christoffersen_pelletier_duration output follows the duration-test
logic in tstests/R/var_cp.R::.duration_test: durations between interval
misses are modeled with a Weibull likelihood, and the no-memory exponential
restriction is tested by setting the Weibull shape parameter to 1. The
duration statistic is unavailable when there is one or fewer misses.
Coverage output:
Key |
Meaning |
|---|---|
|
Unconditional coverage LR. Carries |
|
First-order Markov independence LR plus transition counts |
|
Sum of Kupiec and independence LR statistics with chi-squared df 2. |
|
Weibull duration LR for the exponential no-memory restriction. |
|
Package-level provenance. |
Duration likelihood note: the duration construction is the same in
tstests and rugarch. The implemented density/survival likelihood follows
rugarch/R/rugarch-tests.R::VaRDurTest, which is the internally consistent
Christoffersen-Pelletier Weibull likelihood form.
Conditional Predictive Ability#
conditional_predictive_ability_test#
macroforecast.tests.conditional_predictive_ability_test(
loss_a,
loss_b,
*,
method="giacomini_rossi",
window_ratio=0.5,
dmv_fullsample=True,
lag_truncate=0,
alpha=0.05,
)
Input: two aligned loss series. Output: JSON-ready dictionary with
metadata_schema.kind="conditional_predictive_ability", a fluctuation
statistic, critical value, decision, time path, window size, loss-difference
orientation, and source-alignment metadata.
Supported methods: giacomini_rossi, recursive_fluctuation.
The giacomini_rossi branch is aligned with
murphydiagram/R/procs.R::fluctuation_test, which implements Proposition 1 of
Giacomini and Rossi (2010). It computes rolling-window Diebold-Mariano-type
statistics for the loss difference loss_a - loss_b, uses Bartlett HAC
variance, and compares the supremum absolute statistic with the tabulated
critical values from Giacomini-Rossi Table 1. Positive path values mean
loss_a is larger than loss_b over that window, so the final statistic is
two-sided because it uses the supremum absolute path.
R alignment:
R package / function |
macroforecast branch |
Alignment |
|---|---|---|
|
|
Same |
None |
|
Package extension over expanding-prefix loss windows. It reuses the same Bartlett HAC helper but does not claim to implement a named R-package test. |
Options:
Option |
Default |
Choices |
Meaning |
|---|---|---|---|
|
|
|
|
|
|
|
Rolling window size as a fraction of the evaluation sample. |
|
|
boolean |
If |
|
|
|
Bartlett HAC truncation lag, matching the R package’s allowed range. |
|
|
|
Test size used to select the tabulated critical value. |
Output fields:
Field |
Meaning |
|---|---|
|
Supremum absolute value of the fluctuation path. |
|
Rolling or recursive fluctuation path before the supremum is taken. |
|
Tabulated Giacomini-Rossi comparison when available; |
|
|
|
Always |
|
Source and R-package comparison metadata. |
|
The user-supplied method, normalized method, and any alias caveat. |
method="rossi_sekhposyan" remains accepted as a legacy alias for
recursive_fluctuation, but Rossi-Sekhposyan forecast rationality is a
different test family and is not represented by this loss-comparison callable.
Multiple-Model Tests#
blocked_oob_reality_check#
macroforecast.tests.blocked_oob_reality_check(
loss_panel,
*,
benchmark,
loss="squared_error",
alpha=0.05,
n_boot=1000,
block_length=4,
bootstrap_method="fixed_block_bootstrap",
random_state=0,
target="target",
horizon="horizon",
origin="origin",
model="model_id",
) -> pandas.DataFrame
Block-bootstrap one-sided benchmark-superiority screen against a named
benchmark model. This is the direct callable replacement for the legacy
blocked_oob_reality_check operation. It is intentionally documented as a
legacy screen, not as the exact White Reality Check.
Inputs:
Form |
Required columns |
|---|---|
Long panel |
|
Wide matrix |
One column per model, including the |
Long-panel input must have one row per target/horizon/origin/model key. If the loss table contains duplicate rows for that key, aggregate them explicitly before calling; the test helpers do not average duplicates silently.
Output: one row per candidate model and target/horizon group.
Column |
Meaning |
|---|---|
|
Group labels. Wide input uses |
|
Candidate model tested against the benchmark. |
|
Benchmark model name. |
|
|
|
Mean loss differential scaled by bootstrap standard error. |
|
Pairwise one-sided block-bootstrap p-value for no improvement over benchmark. |
|
|
|
Max-bootstrap p-value adjusted across all candidate models in the same target/horizon group. |
|
|
|
Complete-case origins used for the family-wise adjustment. |
|
Number of aligned origins. |
|
Bootstrap settings used. |
|
Provenance metadata. |
The returned table carries
attrs["macroforecast_metadata_schema"]["kind"] = "blocked_oob_reality_check".
R/source comparison:
Function |
Status |
|---|---|
|
No exact R-package comparator. It computes pairwise and family-wise max-centered block bootstrap p-values from precomputed benchmark/candidate loss differences. |
|
Strategy-specific data-snooping code. It rebuilds technical-trading parameter-grid performance on each bootstrapped price sample, so it is not the same API contract. |
|
Exact multiple-comparison callable family for White RC, Hansen SPA, and Romano-Wolf StepM using the optional |
superior_predictive_ability_test#
macroforecast.tests.superior_predictive_ability_test(
loss_panel,
*,
benchmark,
loss="squared_error",
alpha=0.05,
n_boot=1000,
block_length="auto",
bootstrap_method="stationary_bootstrap",
p_value_type="consistent",
studentize=True,
nested=False,
random_state=0,
target="target",
horizon="horizon",
origin="origin",
model="model_id",
) -> dict
Input: long or wide loss panel with a named benchmark model. Output:
JSON-ready dictionary with one record per target/horizon group. The record
contains p_values for lower, consistent, and upper SPA p-value
variants, critical_values, selected p_value, superior_models, and
backend metadata.
Backend alignment: delegates to arch.bootstrap.SPA. The backend takes
benchmark losses and candidate losses, forms loss differentials internally as
benchmark_loss - candidate_loss, and reports lower, consistent, and
upper p-values from Hansen’s recentering choices. Positive
mean_loss_difference in the output means the candidate has lower average loss
than the benchmark.
R/source comparison: archived R ttrTests/R/dataSnoop.R::dataSnoop(test="SPA")
implements Hansen SPA for technical-trading rule parameter grids. It recomputes
strategy performance on each bootstrapped price sample, so it is not a direct
general loss-matrix API. macroforecast keeps the general forecast-evaluation
contract and records this as conceptual R alignment in each output record.
Options:
Option |
Default |
Choices |
Meaning |
|---|---|---|---|
|
|
|
Bootstrap family. Fixed-block inputs are mapped to |
|
|
|
Which SPA p-value variant to use for |
|
|
boolean |
Passed to |
|
|
boolean |
Passed to |
reality_check_test#
macroforecast.tests.reality_check_test(
loss_panel,
*,
benchmark,
loss="squared_error",
alpha=0.05,
n_boot=1000,
block_length="auto",
bootstrap_method="stationary_bootstrap",
p_value_type="consistent",
studentize=True,
nested=False,
random_state=0,
target="target",
horizon="horizon",
origin="origin",
model="model_id",
) -> dict
Input and output follow superior_predictive_ability_test(...). Backend:
arch.bootstrap.RealityCheck. In the current arch backend this class is a
Reality Check alias over the same SPA machinery, with the same p-value fields.
Use this when the research design calls for the White Reality Check against a
benchmark model.
R/source comparison: archived R ttrTests/R/dataSnoop.R::dataSnoop(test="RC")
implements White’s Reality Check for technical-trading rule grids. As with SPA,
the R function is strategy-generator specific; macroforecast uses
precomputed benchmark and candidate forecast-loss series.
stepm_test#
macroforecast.tests.stepm_test(
loss_panel,
*,
benchmark,
loss="squared_error",
alpha=0.05,
n_boot=1000,
block_length="auto",
bootstrap_method="stationary_bootstrap",
studentize=True,
nested=False,
random_state=0,
target="target",
horizon="horizon",
origin="origin",
model="model_id",
) -> dict
Input: long or wide loss panel with a named benchmark model. Output:
JSON-ready dictionary with superior_models for each target/horizon group.
Backend: arch.bootstrap.StepM.
R/source comparison: oosanalysis-R-library/R/stepm.R::stepm implements a
generic Romano-Wolf stepdown loop from supplied test statistics and bootstrap
test-statistic draws. macroforecast delegates to arch.bootstrap.StepM, which
constructs the benchmark-vs-candidate loss-difference statistics using the SPA
backend and then applies the stepdown procedure. The objective is aligned, but
the inputs are higher level in macroforecast: forecast-loss panel in,
superior model names out.
model_confidence_set#
macroforecast.tests.model_confidence_set(
loss_panel,
*,
loss="squared_error",
alpha=0.10,
n_boot=1000,
block_length="auto",
bootstrap_method="mcs_fixed_block",
statistic="max",
random_state=0,
target="target",
horizon="horizon",
origin="origin",
model="model_id",
) -> dict
Exact Hansen-Lunde-Nason model confidence set callable aligned with the R
MCS package’s MCSprocedure. It constructs pairwise loss-difference
statistics, bootstraps those loss-difference means, removes one model per step,
tracks cumulative MCS p-values, and records included and rejected model sets by
target/horizon group.
Inputs:
Form |
Required columns |
|---|---|
Long panel |
|
Wide matrix |
Numeric model-loss columns. The target/horizon labels are set to |
Long-panel input must have one row per target/horizon/origin/model key. Duplicate loss rows are rejected instead of being averaged inside the pivot step.
Options:
Option |
Default |
Choices |
Meaning |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
positive int or |
Block length. |
Output: JSON-ready dictionary with
metadata_schema.kind="model_confidence_set".
Key |
Meaning |
|---|---|
|
Included model records by target, horizon, and alpha after the iterative procedure stops. |
|
Eliminated model records by target, horizon, and alpha. |
|
Final stopping-test p-value by target and horizon. |
|
One record per removal step, including active models, statistic, p-value, cumulative MCS p-value, removed model, rejected model if any, and mean losses. |
|
Block length used by target and horizon. |
R/source alignment:
R source |
Python contract |
|---|---|
|
Sequential elimination until one model remains; included/excluded sets are determined by |
|
Pairwise loss differences |
|
Default |
block_length="auto" follows the same rule conceptually as R k=NULL: choose
the maximum selected AR order across loss columns and enforce a minimum of 3.
For bit-level reproducibility across software stacks, pass an explicit integer
block_length.
iterative_model_confidence_set#
macroforecast.tests.iterative_model_confidence_set(
loss_panel,
*,
loss="squared_error",
alpha=0.10,
n_boot=1000,
block_length="auto",
bootstrap_method="mcs_fixed_block",
statistic="max",
random_state=0,
target="target",
horizon="horizon",
origin="origin",
model="model_id",
)
Descriptive alias for model_confidence_set(...). It calls the same exact MCS
engine and returns the same fields, with
metadata_schema.kind="iterative_model_confidence_set" so older code can trace
which callable produced the result.
Residual Diagnostics#
residual_diagnostics#
macroforecast.tests.residual_diagnostics(
residuals,
*,
tests=(
"ljung_box_q",
"arch_lm",
"jarque_bera_normality",
"durbin_watson",
),
lag=10,
alpha=0.05,
model_df=0,
exog=None,
demean_arch=False,
)
Input: residual series. Output: one-row-per-test pandas DataFrame with
test, statistic, p_value, decision, lag_used, df, n_obs,
source_reference, r_reference, r_alignment, and status. The result carries
attrs["macroforecast_metadata_schema"] = {"kind": "residual_diagnostics", "version": 1, ...}.
Supported tests:
Name |
Meaning |
|---|---|
|
Ljung-Box serial-correlation diagnostic, aligned with |
|
Breusch-Godfrey Chisq LM diagnostic under the residual-series contract; default is equivalent to testing |
|
Engle ARCH LM diagnostic, aligned with |
|
Jarque-Bera normality diagnostic using the same population-moment formula as |
|
Durbin-Watson statistic aligned with the statistic in |
Options:
Option |
Default |
Meaning |
|---|---|---|
|
|
Maximum lag for Ljung-Box, ARCH-LM, and Breusch-Godfrey. |
|
|
Rejection level used for |
|
|
Degrees of freedom consumed by the fitted model. Used in Ljung-Box p-values and ARCH-LM degrees-of-freedom adjustment. |
|
|
Optional design matrix for the Breusch-Godfrey auxiliary regression. If omitted, an intercept-only design is used. |
|
|
Demean residuals before ARCH-LM, matching |
Source-alignment notes:
Diagnostic |
Source logic |
|---|---|
Ljung-Box |
|
ARCH-LM |
|
Jarque-Bera |
|
Breusch-Godfrey |
|
Durbin-Watson |
|
jarque_bera_test– Jarque-Bera normality test (single series, chi2 df=2; tseries::jarque.bera.test convention).granger_causality– Granger causality test in a VAR (vars::causality; F or Wald).instantaneous_causality– instantaneous (contemporaneous) causality test in a VAR.giacomini_white_test– Giacomini-White (2006) CONDITIONAL predictive ability Wald test (chi2, HAC), instrument [1, dL_{t-h}].var_serial_test– multivariate residual serial-correlation (Portmanteau/LM) test for a VAR (vars::serial.test).var_normality_test– multivariate normality (Doornik-Hansen/Lutkepohl JB) test for VAR residuals (vars::normality.test).var_arch_test– multivariate ARCH-LM test for VAR residuals (vars::arch.test, Lutkepohl).