macroforecast.model_selection#
macroforecast.model_selection chooses model hyperparameters. It does not
select variables or create features; feature selection belongs to
macroforecast.feature_engineering. It can resolve specs from both
macroforecast.models and macroforecast.model_ensemble.
Use:
macroforecast.windowto define train/val splits.macroforecast.metricsto define the score.macroforecast.model_selectionto evaluate parameter candidates and return the best parameter set.
window = mf.window.last_block(validation_size=24)
search = mf.model_selection.grid({"alpha": [0.01, 0.1, 1.0]})
result = mf.model_selection.select_params(
"ridge",
X,
y,
search=search,
window=window,
metric=mf.metrics.rmse,
)
Public Functions#
Task |
Functions |
|---|---|
Build a search spec |
|
Define stochastic distributions |
|
Run model selection |
|
Store results |
|
SearchSpec#
macroforecast.model_selection.SearchSpec(
method,
param_grid={},
param_distributions={},
n_iter=20,
random_state=None,
population_size=12,
generations=4,
mutation_rate=0.2,
custom_func=None,
custom_params={},
metadata={},
)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
str |
required |
|
|
dict |
|
Explicit finite candidates for |
|
dict |
|
Sampling rules for |
|
int |
|
Candidate count for |
|
int or |
|
Seed for stochastic searches. |
|
int |
|
Population size for |
|
int |
|
Number of generations for |
|
float |
|
Per-parameter mutation probability for |
|
callable or |
|
User search callable used only when |
|
dict |
|
User parameters passed to |
|
dict |
|
Search metadata, including model-owned search-space provenance. |
Output:
SearchSpec is consumed by select_params(). It also supports
to_metadata(), to_dict(), and to_json(path=None).
Window and metric are intentionally absent from SearchSpec; they are supplied
to select_params().
SearchResult#
macroforecast.model_selection.SearchResult(
best_params,
best_score,
trials,
metric,
method,
window,
metadata={},
)
Output fields:
Field |
Type |
Meaning |
|---|---|---|
|
dict |
Selected parameter values. |
|
float |
Score for the selected trial. |
|
pandas DataFrame |
One row per evaluated candidate. |
|
str or callable |
Metric used during model selection. |
|
str |
Search method used. |
|
str |
Canonical window method used. |
|
dict |
Model, fixed model params, window, search, and runtime metadata. |
SearchResult.to_frame() returns a copy of the trial table.
to_metadata(), to_dict(), and to_json() provide JSON-ready exports.
SearchError#
macroforecast.model_selection.SearchError(message, *, trials=None)
Raised when every candidate fit fails. The exception carries attempted trial
rows on .trials.
try:
result = mf.model_selection.select_params(model, X, y, search, window=window)
except mf.model_selection.SearchError as err:
failed_trials = err.trials
fixed#
macroforecast.model_selection.fixed(params=None, *, random_state=None)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
dict or |
|
One parameter combination. |
|
int or |
|
Stored for reproducibility metadata. |
Output: SearchSpec(method="fixed").
grid#
macroforecast.model_selection.grid(param_grid)
Input:
Argument |
Type |
Meaning |
|---|---|---|
|
dict |
Parameter name to iterable values. Scalars are treated as one-value grids. |
Output: SearchSpec(method="grid").
random_search#
macroforecast.model_selection.random_search(param_distributions, *, n_iter=20, random_state=None)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
dict |
required |
Distribution builders, lists, tuples, or scalar values. |
|
int |
|
Number of random candidates. |
|
int or |
|
Seed. |
Output: SearchSpec(method="random").
cv_path#
macroforecast.model_selection.cv_path(param="alpha", values=None)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
str |
|
One parameter to sweep. |
|
iterable or |
default alpha path |
Ordered candidate values. |
Output: SearchSpec(method="cv_path").
bayesian_search#
macroforecast.model_selection.bayesian_search(param_distributions, *, n_iter=20, random_state=None)
Creates a sampled-pool Bayesian optimization request. Runtime behavior:
seeded initial random trials
Gaussian-process surrogate
expected improvement over a sampled candidate pool
random fallback when the surrogate cannot be fit or the candidate pool is exhausted
Output: SearchSpec(method="bayesian").
genetic_search#
macroforecast.model_selection.genetic_search(
param_distributions,
*,
population_size=12,
generations=4,
mutation_rate=0.2,
random_state=None,
)
Output: SearchSpec(method="genetic").
custom_search#
macroforecast.model_selection.custom_search(
name,
func,
*,
param_grid=None,
param_distributions=None,
n_iter=20,
random_state=None,
metadata=None,
**params,
) -> SearchSpec
Builds a user-supplied search request. This is for custom parameter-search
algorithms, not custom metrics. Custom metrics already belong in
select_params(..., metric=...).
The callable receives keyword arguments:
func(
*,
model,
X,
y,
splits,
metric,
fixed_params,
search,
rng,
maximize,
evaluate_candidate,
**params,
)
Argument |
Meaning |
|---|---|
|
Fit callable resolved from the model name, callable, or |
|
Aligned model-selection sample. |
|
List of train/validation position splits. The default contract is temporal unless non-temporal folds are explicitly allowed. |
|
Resolved metric callable. |
|
Parameters applied to every candidate. |
|
The prepared |
|
NumPy random generator seeded by the spec. |
|
Whether larger scores are better. |
|
Package helper for evaluating one parameter dictionary across all splits. |
|
User parameters supplied to |
The custom callable must return one of:
Return type |
Meaning |
|---|---|
|
Already evaluated trial records. |
|
Trial table with |
|
Existing search result; its trial table is reused. |
|
Any accepted records plus runtime metadata merged into |
The most common pattern is to use evaluate_candidate and return the resulting
trial rows:
def ordered_search(
*,
model,
X,
y,
splits,
metric,
fixed_params,
evaluate_candidate,
values,
**_,
):
return [
evaluate_candidate(
model,
X,
y,
splits,
metric,
fixed_params,
{"alpha": value},
trial,
)
for trial, value in enumerate(values)
]
search = mf.model_selection.custom_search(
"ordered_alpha",
ordered_search,
values=(0.01, 0.1, 1.0),
)
result = mf.model_selection.select_params(
"ridge",
X,
y,
search=search,
window=window,
)
SearchSpec.to_dict() and SearchResult.to_metadata() store the callable
name and user parameters. The callable source code is not serialized.
search_spec#
macroforecast.model_selection.search_spec(
model,
*,
preset=None,
method=None,
random_state=None,
n_iter=None,
population_size=None,
generations=None,
mutation_rate=None,
)
Builds a SearchSpec from a model-owned search space.
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
str, callable, or |
required |
Registered model, fit-time model ensemble, or model spec. |
|
str or |
|
Model search-space preset. |
|
str or |
model default |
Override search method. |
stochastic options |
int/float or |
|
Passed to stochastic search builders. |
Output: SearchSpec with model metadata. The same resolver is used for
macroforecast.models and macroforecast.model_ensemble, so
search_spec("bagging", preset="small") returns a fit-time ensemble search
space with metadata["model_family"] == "model_ensemble".
Distributions#
macroforecast.model_selection.uniform(low, high)
macroforecast.model_selection.log_uniform(low, high)
macroforecast.model_selection.randint(low, high)
macroforecast.model_selection.choice(values)
Output: ParamDistribution.
Rules:
Function |
Meaning |
|---|---|
|
Continuous uniform sample on |
|
Continuous log-uniform sample; bounds must be positive. |
|
Inclusive integer sample from |
|
Categorical sample from explicit values. |
select_params#
macroforecast.model_selection.select_params(
model,
X,
y=None,
search=None,
*,
window=None,
splits=None,
metric="mse",
maximize=False,
fixed_params=None,
preset=None,
method=None,
random_state=None,
n_iter=None,
population_size=None,
generations=None,
mutation_rate=None,
allow_non_temporal_splits=False,
)
Input:
Argument |
Type |
Default |
Meaning |
|---|---|---|---|
|
str, callable, or |
required |
Model or fit-time model ensemble to fit for each candidate. |
|
pandas object |
required |
Predictors, panel, or target series depending on model input kind. |
|
pandas Series or |
|
Supervised target when separate from |
|
|
|
Explicit search spec. If absent, model-owned search space is used. |
|
|
|
Window used to create validation splits. Do not pass with |
|
sequence of |
|
Explicit integer-position validation splits, usually produced by |
|
str or callable |
|
Metric from |
|
bool |
|
Whether larger metric values are better. |
|
dict or |
|
Parameters passed to every candidate fit. |
|
str or |
|
Model preset when resolving a registered model. |
|
str or |
model default |
Search method when |
stochastic options |
int/float or |
|
Used when building a model-owned search spec. |
|
bool |
|
Allow explicit splits whose training positions do not all precede validation positions. Use only for replications that intentionally use random folds. |
Output: SearchResult.
Example:
window = mf.window.poos(min_train_size=120, validation_size=24, horizon=1)
search = mf.model_selection.search_spec("lasso", preset="small", method="cv_path")
result = mf.model_selection.select_params(
"lasso",
X,
y,
search=search,
window=window,
metric=mf.metrics.mae,
)
For loss metrics such as mse, rmse, and mae, keep maximize=False.
For custom reward metrics, set maximize=True.
When a forecasting runner already has a complete window plan, pass explicit
splits instead of another window:
splits = [
(range(0, 120), range(120, 132)),
(range(0, 132), range(132, 144)),
]
result = mf.model_selection.select_params(
"ridge",
X,
y,
search=mf.model_selection.grid({"alpha": [0.01, 0.1, 1.0]}),
splits=splits,
)
select_params() validates explicit splits before fitting:
each split must contain non-empty train and validation integer positions
positions must be inside
X/ytrain and validation positions cannot overlap
by default, train positions must precede validation positions
boolean masks are allowed only when mask length equals the aligned sample
Non-temporal folds are opt-in:
random_window = mf.window.random_kfold(n_splits=5, random_state=123)
result = mf.model_selection.select_params(
"elastic_net",
X,
y,
search=search,
window=random_window,
)
mf.window.random_kfold(...) records that the fold assignment is intentionally
random and the selection metadata stores temporal_order=False. If you pass
explicit random folds yourself, set
allow_non_temporal_splits=True; otherwise select_params() raises. This
keeps ordinary macro-forecast validation time-aware while still allowing
paper replications whose appendix used random iid folds.
SearchResult.window is "explicit_splits" when splits is used. Metadata
stores split_source, n_splits, and a compact split_summary with counts and
position bounds for each split.
When a ModelSpec already carries fixed parameters, select_params() keeps
those fixed during every candidate fit and stores them in
SearchResult.metadata["fixed_model_params"]. The selected best_params remain
the searched candidate parameters.