# macroforecast.model_selection [Back to reference](index.md) `macroforecast.model_selection` chooses model hyperparameters. It does not select variables or create features; feature selection belongs to `macroforecast.feature_engineering`. It can resolve specs from both `macroforecast.models` and `macroforecast.model_ensemble`. Use: - `macroforecast.window` to define train/val splits. - `macroforecast.metrics` to define the score. - `macroforecast.model_selection` to evaluate parameter candidates and return the best parameter set. ```python window = mf.window.last_block(validation_size=24) search = mf.model_selection.grid({"alpha": [0.01, 0.1, 1.0]}) result = mf.model_selection.select_params( "ridge", X, y, search=search, window=window, metric=mf.metrics.rmse, ) ``` ## Public Functions | Task | Functions | | --- | --- | | Build a search spec | `fixed()`, `grid()`, `random_search()`, `cv_path()`, `bayesian_search()`, `genetic_search()`, `custom_search()`, `search_spec()` | | Define stochastic distributions | `uniform()`, `log_uniform()`, `randint()`, `choice()` | | Run model selection | `select_params()` | | Store results | `SearchSpec`, `SearchResult`, `SearchError`, `ParamDistribution` | ## SearchSpec ```python macroforecast.model_selection.SearchSpec( method, param_grid={}, param_distributions={}, n_iter=20, random_state=None, population_size=12, generations=4, mutation_rate=0.2, custom_func=None, custom_params={}, metadata={}, ) ``` Input: | Argument | Type | Default | Meaning | | --- | --- | --- | --- | | `method` | str | required | `fixed`, `grid`, `cv_path`, `random`, `bayesian`, `genetic`, or `custom`. | | `param_grid` | dict | `{}` | Explicit finite candidates for `fixed`, `grid`, and `cv_path`. | | `param_distributions` | dict | `{}` | Sampling rules for `random`, `bayesian`, and `genetic`. | | `n_iter` | int | `20` | Candidate count for `random`; total sequential evaluations for `bayesian`. | | `random_state` | int or `None` | `None` | Seed for stochastic searches. | | `population_size` | int | `12` | Population size for `genetic`. | | `generations` | int | `4` | Number of generations for `genetic`. | | `mutation_rate` | float | `0.2` | Per-parameter mutation probability for `genetic`. | | `custom_func` | callable or `None` | `None` | User search callable used only when `method="custom"`. | | `custom_params` | dict | `{}` | User parameters passed to `custom_func`. | | `metadata` | dict | `{}` | Search metadata, including model-owned search-space provenance. | Output: `SearchSpec` is consumed by `select_params()`. It also supports `to_metadata()`, `to_dict()`, and `to_json(path=None)`. Window and metric are intentionally absent from `SearchSpec`; they are supplied to `select_params()`. ## SearchResult ```python macroforecast.model_selection.SearchResult( best_params, best_score, trials, metric, method, window, metadata={}, ) ``` Output fields: | Field | Type | Meaning | | --- | --- | --- | | `best_params` | dict | Selected parameter values. | | `best_score` | float | Score for the selected trial. | | `trials` | pandas DataFrame | One row per evaluated candidate. | | `metric` | str or callable | Metric used during model selection. | | `method` | str | Search method used. | | `window` | str | Canonical window method used. | | `metadata` | dict | Model, fixed model params, window, search, and runtime metadata. | `SearchResult.to_frame()` returns a copy of the trial table. `to_metadata()`, `to_dict()`, and `to_json()` provide JSON-ready exports. ## SearchError ```python macroforecast.model_selection.SearchError(message, *, trials=None) ``` Raised when every candidate fit fails. The exception carries attempted trial rows on `.trials`. ```python try: result = mf.model_selection.select_params(model, X, y, search, window=window) except mf.model_selection.SearchError as err: failed_trials = err.trials ``` ## fixed ```python macroforecast.model_selection.fixed(params=None, *, random_state=None) ``` Input: | Argument | Type | Default | Meaning | | --- | --- | --- | --- | | `params` | dict or `None` | `None` | One parameter combination. | | `random_state` | int or `None` | `None` | Stored for reproducibility metadata. | Output: `SearchSpec(method="fixed")`. ## grid ```python macroforecast.model_selection.grid(param_grid) ``` Input: | Argument | Type | Meaning | | --- | --- | --- | | `param_grid` | dict | Parameter name to iterable values. Scalars are treated as one-value grids. | Output: `SearchSpec(method="grid")`. ## random_search ```python macroforecast.model_selection.random_search(param_distributions, *, n_iter=20, random_state=None) ``` Input: | Argument | Type | Default | Meaning | | --- | --- | --- | --- | | `param_distributions` | dict | required | Distribution builders, lists, tuples, or scalar values. | | `n_iter` | int | `20` | Number of random candidates. | | `random_state` | int or `None` | `None` | Seed. | Output: `SearchSpec(method="random")`. ## cv_path ```python macroforecast.model_selection.cv_path(param="alpha", values=None) ``` Input: | Argument | Type | Default | Meaning | | --- | --- | --- | --- | | `param` | str | `"alpha"` | One parameter to sweep. | | `values` | iterable or `None` | default alpha path | Ordered candidate values. | Output: `SearchSpec(method="cv_path")`. ## bayesian_search ```python macroforecast.model_selection.bayesian_search(param_distributions, *, n_iter=20, random_state=None) ``` Creates a sampled-pool Bayesian optimization request. Runtime behavior: - seeded initial random trials - Gaussian-process surrogate - expected improvement over a sampled candidate pool - random fallback when the surrogate cannot be fit or the candidate pool is exhausted Output: `SearchSpec(method="bayesian")`. ## genetic_search ```python macroforecast.model_selection.genetic_search( param_distributions, *, population_size=12, generations=4, mutation_rate=0.2, random_state=None, ) ``` Output: `SearchSpec(method="genetic")`. ## custom_search ```python macroforecast.model_selection.custom_search( name, func, *, param_grid=None, param_distributions=None, n_iter=20, random_state=None, metadata=None, **params, ) -> SearchSpec ``` Builds a user-supplied search request. This is for custom parameter-search algorithms, not custom metrics. Custom metrics already belong in `select_params(..., metric=...)`. The callable receives keyword arguments: ```python func( *, model, X, y, splits, metric, fixed_params, search, rng, maximize, evaluate_candidate, **params, ) ``` | Argument | Meaning | | --- | --- | | `model` | Fit callable resolved from the model name, callable, or `ModelSpec`. | | `X`, `y` | Aligned model-selection sample. | | `splits` | List of train/validation position splits. The default contract is temporal unless non-temporal folds are explicitly allowed. | | `metric` | Resolved metric callable. | | `fixed_params` | Parameters applied to every candidate. | | `search` | The prepared `SearchSpec`. | | `rng` | NumPy random generator seeded by the spec. | | `maximize` | Whether larger scores are better. | | `evaluate_candidate` | Package helper for evaluating one parameter dictionary across all splits. | | `**params` | User parameters supplied to `custom_search(...)`. | The custom callable must return one of: | Return type | Meaning | | --- | --- | | `list[SearchTrial]` | Already evaluated trial records. | | `pandas.DataFrame` | Trial table with `trial`, candidate parameter columns, `score`, `n_splits`, `status`, and `error`. | | `SearchResult` | Existing search result; its trial table is reused. | | `(records, metadata)` | Any accepted records plus runtime metadata merged into `SearchResult.metadata`. | The most common pattern is to use `evaluate_candidate` and return the resulting trial rows: ```python def ordered_search( *, model, X, y, splits, metric, fixed_params, evaluate_candidate, values, **_, ): return [ evaluate_candidate( model, X, y, splits, metric, fixed_params, {"alpha": value}, trial, ) for trial, value in enumerate(values) ] search = mf.model_selection.custom_search( "ordered_alpha", ordered_search, values=(0.01, 0.1, 1.0), ) result = mf.model_selection.select_params( "ridge", X, y, search=search, window=window, ) ``` `SearchSpec.to_dict()` and `SearchResult.to_metadata()` store the callable name and user parameters. The callable source code is not serialized. ## search_spec ```python macroforecast.model_selection.search_spec( model, *, preset=None, method=None, random_state=None, n_iter=None, population_size=None, generations=None, mutation_rate=None, ) ``` Builds a `SearchSpec` from a model-owned search space. Input: | Argument | Type | Default | Meaning | | --- | --- | --- | --- | | `model` | str, callable, or `ModelSpec` | required | Registered model, fit-time model ensemble, or model spec. | | `preset` | str or `None` | `None` | Model search-space preset. | | `method` | str or `None` | model default | Override search method. | | stochastic options | int/float or `None` | `None` | Passed to stochastic search builders. | Output: `SearchSpec` with model metadata. The same resolver is used for `macroforecast.models` and `macroforecast.model_ensemble`, so `search_spec("bagging", preset="small")` returns a fit-time ensemble search space with `metadata["model_family"] == "model_ensemble"`. ## Distributions ```python macroforecast.model_selection.uniform(low, high) macroforecast.model_selection.log_uniform(low, high) macroforecast.model_selection.randint(low, high) macroforecast.model_selection.choice(values) ``` Output: `ParamDistribution`. Rules: | Function | Meaning | | --- | --- | | `uniform()` | Continuous uniform sample on `[low, high)`. | | `log_uniform()` | Continuous log-uniform sample; bounds must be positive. | | `randint()` | Inclusive integer sample from `low` to `high`. | | `choice()` | Categorical sample from explicit values. | ## select_params ```python macroforecast.model_selection.select_params( model, X, y=None, search=None, *, window=None, splits=None, metric="mse", maximize=False, fixed_params=None, preset=None, method=None, random_state=None, n_iter=None, population_size=None, generations=None, mutation_rate=None, allow_non_temporal_splits=False, ) ``` Input: | Argument | Type | Default | Meaning | | --- | --- | --- | --- | | `model` | str, callable, or `ModelSpec` | required | Model or fit-time model ensemble to fit for each candidate. | | `X` | pandas object | required | Predictors, panel, or target series depending on model input kind. | | `y` | pandas Series or `None` | `None` | Supervised target when separate from `X`. | | `search` | `SearchSpec` or `None` | `None` | Explicit search spec. If absent, model-owned search space is used. | | `window` | `WindowSpec`, str, or `None` | `None` | Window used to create validation splits. Do not pass with `splits`. | | `splits` | sequence of `(train_pos, validation_pos)` or `None` | `None` | Explicit integer-position validation splits, usually produced by `macroforecast.window`. Do not pass with `window`. | | `metric` | str or callable | `"mse"` | Metric from `macroforecast.metrics` or custom callable. | | `maximize` | bool | `False` | Whether larger metric values are better. | | `fixed_params` | dict or `None` | `None` | Parameters passed to every candidate fit. | | `preset` | str or `None` | `None` | Model preset when resolving a registered model. | | `method` | str or `None` | model default | Search method when `search=None`. | | stochastic options | int/float or `None` | `None` | Used when building a model-owned search spec. | | `allow_non_temporal_splits` | bool | `False` | Allow explicit splits whose training positions do not all precede validation positions. Use only for replications that intentionally use random folds. | Output: `SearchResult`. Example: ```python window = mf.window.poos(min_train_size=120, validation_size=24, horizon=1) search = mf.model_selection.search_spec("lasso", preset="small", method="cv_path") result = mf.model_selection.select_params( "lasso", X, y, search=search, window=window, metric=mf.metrics.mae, ) ``` For loss metrics such as `mse`, `rmse`, and `mae`, keep `maximize=False`. For custom reward metrics, set `maximize=True`. When a forecasting runner already has a complete window plan, pass explicit splits instead of another `window`: ```python splits = [ (range(0, 120), range(120, 132)), (range(0, 132), range(132, 144)), ] result = mf.model_selection.select_params( "ridge", X, y, search=mf.model_selection.grid({"alpha": [0.01, 0.1, 1.0]}), splits=splits, ) ``` `select_params()` validates explicit splits before fitting: - each split must contain non-empty train and validation integer positions - positions must be inside `X`/`y` - train and validation positions cannot overlap - by default, train positions must precede validation positions - boolean masks are allowed only when mask length equals the aligned sample Non-temporal folds are opt-in: ```python random_window = mf.window.random_kfold(n_splits=5, random_state=123) result = mf.model_selection.select_params( "elastic_net", X, y, search=search, window=random_window, ) ``` `mf.window.random_kfold(...)` records that the fold assignment is intentionally random and the selection metadata stores `temporal_order=False`. If you pass explicit random folds yourself, set `allow_non_temporal_splits=True`; otherwise `select_params()` raises. This keeps ordinary macro-forecast validation time-aware while still allowing paper replications whose appendix used random iid folds. `SearchResult.window` is `"explicit_splits"` when `splits` is used. Metadata stores `split_source`, `n_splits`, and a compact `split_summary` with counts and position bounds for each split. When a `ModelSpec` already carries fixed parameters, `select_params()` keeps those fixed during every candidate fit and stores them in `SearchResult.metadata["fixed_model_params"]`. The selected `best_params` remain the searched candidate parameters.