macroforecast.model_selection#

Back to reference

macroforecast.model_selection chooses model hyperparameters. It does not select variables or create features; feature selection belongs to macroforecast.feature_engineering. It can resolve specs from both macroforecast.models and macroforecast.model_ensemble.

Use:

macroforecast.window to define train/val splits.
macroforecast.metrics to define the score.
macroforecast.model_selection to evaluate parameter candidates and return the best parameter set.

window = mf.window.last_block(validation_size=24)
search = mf.model_selection.grid({"alpha": [0.01, 0.1, 1.0]})

result = mf.model_selection.select_params(
    "ridge",
    X,
    y,
    search=search,
    window=window,
    metric=mf.metrics.rmse,
)

Public Functions#

Task	Functions
Build a search spec	`fixed()`, `grid()`, `random_search()`, `cv_path()`, `bayesian_search()`, `genetic_search()`, `custom_search()`, `search_spec()`
Define stochastic distributions	`uniform()`, `log_uniform()`, `randint()`, `choice()`
Run model selection	`select_params()`
Store results	`SearchSpec`, `SearchResult`, `SearchError`, `ParamDistribution`

SearchSpec#

macroforecast.model_selection.SearchSpec(
    method,
    param_grid={},
    param_distributions={},
    n_iter=20,
    random_state=None,
    population_size=12,
    generations=4,
    mutation_rate=0.2,
    custom_func=None,
    custom_params={},
    metadata={},
)

Input:

Argument	Type	Default	Meaning
`method`	str	required	`fixed`, `grid`, `cv_path`, `random`, `bayesian`, `genetic`, or `custom`.
`param_grid`	dict	`{}`	Explicit finite candidates for `fixed`, `grid`, and `cv_path`.
`param_distributions`	dict	`{}`	Sampling rules for `random`, `bayesian`, and `genetic`.
`n_iter`	int	`20`	Candidate count for `random`; total sequential evaluations for `bayesian`.
`random_state`	int or `None`	`None`	Seed for stochastic searches.
`population_size`	int	`12`	Population size for `genetic`.
`generations`	int	`4`	Number of generations for `genetic`.
`mutation_rate`	float	`0.2`	Per-parameter mutation probability for `genetic`.
`custom_func`	callable or `None`	`None`	User search callable used only when `method="custom"`.
`custom_params`	dict	`{}`	User parameters passed to `custom_func`.
`metadata`	dict	`{}`	Search metadata, including model-owned search-space provenance.

Output:

SearchSpec is consumed by select_params(). It also supports to_metadata(), to_dict(), and to_json(path=None).

Window and metric are intentionally absent from SearchSpec; they are supplied to select_params().

SearchResult#

macroforecast.model_selection.SearchResult(
    best_params,
    best_score,
    trials,
    metric,
    method,
    window,
    metadata={},
)

Output fields:

Field	Type	Meaning
`best_params`	dict	Selected parameter values.
`best_score`	float	Score for the selected trial.
`trials`	pandas DataFrame	One row per evaluated candidate.
`metric`	str or callable	Metric used during model selection.
`method`	str	Search method used.
`window`	str	Canonical window method used.
`metadata`	dict	Model, fixed model params, window, search, and runtime metadata.

SearchResult.to_frame() returns a copy of the trial table. to_metadata(), to_dict(), and to_json() provide JSON-ready exports.

SearchError#

macroforecast.model_selection.SearchError(message, *, trials=None)

Raised when every candidate fit fails. The exception carries attempted trial rows on .trials.

try:
    result = mf.model_selection.select_params(model, X, y, search, window=window)
except mf.model_selection.SearchError as err:
    failed_trials = err.trials

fixed#

macroforecast.model_selection.fixed(params=None, *, random_state=None)

Input:

Argument	Type	Default	Meaning
`params`	dict or `None`	`None`	One parameter combination.
`random_state`	int or `None`	`None`	Stored for reproducibility metadata.

Output: SearchSpec(method="fixed").

grid#

macroforecast.model_selection.grid(param_grid)

Input:

Argument	Type	Meaning
`param_grid`	dict	Parameter name to iterable values. Scalars are treated as one-value grids.

Output: SearchSpec(method="grid").

random_search#

macroforecast.model_selection.random_search(param_distributions, *, n_iter=20, random_state=None)

Input:

Argument	Type	Default	Meaning
`param_distributions`	dict	required	Distribution builders, lists, tuples, or scalar values.
`n_iter`	int	`20`	Number of random candidates.
`random_state`	int or `None`	`None`	Seed.

Output: SearchSpec(method="random").

cv_path#

macroforecast.model_selection.cv_path(param="alpha", values=None)

Input:

Argument	Type	Default	Meaning
`param`	str	`"alpha"`	One parameter to sweep.
`values`	iterable or `None`	default alpha path	Ordered candidate values.

Output: SearchSpec(method="cv_path").

bayesian_search#

macroforecast.model_selection.bayesian_search(param_distributions, *, n_iter=20, random_state=None)

Creates a sampled-pool Bayesian optimization request. Runtime behavior:

seeded initial random trials
Gaussian-process surrogate
expected improvement over a sampled candidate pool
random fallback when the surrogate cannot be fit or the candidate pool is exhausted

Output: SearchSpec(method="bayesian").

genetic_search#

macroforecast.model_selection.genetic_search(
    param_distributions,
    *,
    population_size=12,
    generations=4,
    mutation_rate=0.2,
    random_state=None,
)

Output: SearchSpec(method="genetic").

custom_search#

macroforecast.model_selection.custom_search(
    name,
    func,
    *,
    param_grid=None,
    param_distributions=None,
    n_iter=20,
    random_state=None,
    metadata=None,
    **params,
) -> SearchSpec

Builds a user-supplied search request. This is for custom parameter-search algorithms, not custom metrics. Custom metrics already belong in select_params(..., metric=...).

The callable receives keyword arguments:

func(
    *,
    model,
    X,
    y,
    splits,
    metric,
    fixed_params,
    search,
    rng,
    maximize,
    evaluate_candidate,
    **params,
)

Argument	Meaning
`model`	Fit callable resolved from the model name, callable, or `ModelSpec`.
`X`, `y`	Aligned model-selection sample.
`splits`	List of train/validation position splits. The default contract is temporal unless non-temporal folds are explicitly allowed.
`metric`	Resolved metric callable.
`fixed_params`	Parameters applied to every candidate.
`search`	The prepared `SearchSpec`.
`rng`	NumPy random generator seeded by the spec.
`maximize`	Whether larger scores are better.
`evaluate_candidate`	Package helper for evaluating one parameter dictionary across all splits.
`**params`	User parameters supplied to `custom_search(...)`.

The custom callable must return one of:

Return type	Meaning
`list[SearchTrial]`	Already evaluated trial records.
`pandas.DataFrame`	Trial table with `trial`, candidate parameter columns, `score`, `n_splits`, `status`, and `error`.
`SearchResult`	Existing search result; its trial table is reused.
`(records, metadata)`	Any accepted records plus runtime metadata merged into `SearchResult.metadata`.

The most common pattern is to use evaluate_candidate and return the resulting trial rows:

def ordered_search(
    *,
    model,
    X,
    y,
    splits,
    metric,
    fixed_params,
    evaluate_candidate,
    values,
    **_,
):
    return [
        evaluate_candidate(
            model,
            X,
            y,
            splits,
            metric,
            fixed_params,
            {"alpha": value},
            trial,
        )
        for trial, value in enumerate(values)
    ]

search = mf.model_selection.custom_search(
    "ordered_alpha",
    ordered_search,
    values=(0.01, 0.1, 1.0),
)

result = mf.model_selection.select_params(
    "ridge",
    X,
    y,
    search=search,
    window=window,
)

SearchSpec.to_dict() and SearchResult.to_metadata() store the callable name and user parameters. The callable source code is not serialized.

search_spec#

macroforecast.model_selection.search_spec(
    model,
    *,
    preset=None,
    method=None,
    random_state=None,
    n_iter=None,
    population_size=None,
    generations=None,
    mutation_rate=None,
)

Builds a SearchSpec from a model-owned search space.

Input:

Argument	Type	Default	Meaning
`model`	str, callable, or `ModelSpec`	required	Registered model, fit-time model ensemble, or model spec.
`preset`	str or `None`	`None`	Model search-space preset.
`method`	str or `None`	model default	Override search method.
stochastic options	int/float or `None`	`None`	Passed to stochastic search builders.

Output: SearchSpec with model metadata. The same resolver is used for macroforecast.models and macroforecast.model_ensemble, so search_spec("bagging", preset="small") returns a fit-time ensemble search space with metadata["model_family"] == "model_ensemble".

Distributions#

macroforecast.model_selection.uniform(low, high)
macroforecast.model_selection.log_uniform(low, high)
macroforecast.model_selection.randint(low, high)
macroforecast.model_selection.choice(values)

Output: ParamDistribution.

Rules:

Function	Meaning
`uniform()`	Continuous uniform sample on `[low, high)`.
`log_uniform()`	Continuous log-uniform sample; bounds must be positive.
`randint()`	Inclusive integer sample from `low` to `high`.
`choice()`	Categorical sample from explicit values.

select_params#

macroforecast.model_selection.select_params(
    model,
    X,
    y=None,
    search=None,
    *,
    window=None,
    splits=None,
    metric="mse",
    maximize=False,
    fixed_params=None,
    preset=None,
    method=None,
    random_state=None,
    n_iter=None,
    population_size=None,
    generations=None,
    mutation_rate=None,
    allow_non_temporal_splits=False,
)

Input:

Argument	Type	Default	Meaning
`model`	str, callable, or `ModelSpec`	required	Model or fit-time model ensemble to fit for each candidate.
`X`	pandas object	required	Predictors, panel, or target series depending on model input kind.
`y`	pandas Series or `None`	`None`	Supervised target when separate from `X`.
`search`	`SearchSpec` or `None`	`None`	Explicit search spec. If absent, model-owned search space is used.
`window`	`WindowSpec`, str, or `None`	`None`	Window used to create validation splits. Do not pass with `splits`.
`splits`	sequence of `(train_pos, validation_pos)` or `None`	`None`	Explicit integer-position validation splits, usually produced by `macroforecast.window`. Do not pass with `window`.
`metric`	str or callable	`"mse"`	Metric from `macroforecast.metrics` or custom callable.
`maximize`	bool	`False`	Whether larger metric values are better.
`fixed_params`	dict or `None`	`None`	Parameters passed to every candidate fit.
`preset`	str or `None`	`None`	Model preset when resolving a registered model.
`method`	str or `None`	model default	Search method when `search=None`.
stochastic options	int/float or `None`	`None`	Used when building a model-owned search spec.
`allow_non_temporal_splits`	bool	`False`	Allow explicit splits whose training positions do not all precede validation positions. Use only for replications that intentionally use random folds.

Output: SearchResult.

Example:

window = mf.window.poos(min_train_size=120, validation_size=24, horizon=1)
search = mf.model_selection.search_spec("lasso", preset="small", method="cv_path")

result = mf.model_selection.select_params(
    "lasso",
    X,
    y,
    search=search,
    window=window,
    metric=mf.metrics.mae,
)

For loss metrics such as mse, rmse, and mae, keep maximize=False. For custom reward metrics, set maximize=True.

When a forecasting runner already has a complete window plan, pass explicit splits instead of another window:

splits = [
    (range(0, 120), range(120, 132)),
    (range(0, 132), range(132, 144)),
]

result = mf.model_selection.select_params(
    "ridge",
    X,
    y,
    search=mf.model_selection.grid({"alpha": [0.01, 0.1, 1.0]}),
    splits=splits,
)

select_params() validates explicit splits before fitting:

each split must contain non-empty train and validation integer positions
positions must be inside X/y
train and validation positions cannot overlap
by default, train positions must precede validation positions
boolean masks are allowed only when mask length equals the aligned sample

Non-temporal folds are opt-in:

random_window = mf.window.random_kfold(n_splits=5, random_state=123)

result = mf.model_selection.select_params(
    "elastic_net",
    X,
    y,
    search=search,
    window=random_window,
)

mf.window.random_kfold(...) records that the fold assignment is intentionally random and the selection metadata stores temporal_order=False. If you pass explicit random folds yourself, set allow_non_temporal_splits=True; otherwise select_params() raises. This keeps ordinary macro-forecast validation time-aware while still allowing paper replications whose appendix used random iid folds.

SearchResult.window is "explicit_splits" when splits is used. Metadata stores split_source, n_splits, and a compact split_summary with counts and position bounds for each split.

When a ModelSpec already carries fixed parameters, select_params() keeps those fixed during every candidate fit and stores them in SearchResult.metadata["fixed_model_params"]. The selected best_params remain the searched candidate parameters.