7 Scikit tricks for tuning hyperparameters

Share

Photo by the editor

# Entry

Hyperparameter tuning in machine learning models is somewhat of an art or craft that requires appropriate skills to balance experience, intuition, and plenty of experimentation. In practice, this process can sometimes seem arduous because sophisticated models have a enormous search space, the interactions between hyperparameters are sophisticated, and the performance gains from adjusting them are sometimes subtle.

Below is a list of 7 Scikit-powered tricks to take your machine learning hyperparameter tuning skills to the next level.

# 1. Limiting the search space using domain knowledge

Not limiting the huge search space means looking for a needle in the middle of a (enormous) haystack! Employ your domain knowledge – or, if necessary, a domain expert – to first define a set of well-chosen boundaries for some of the relevant hyperparameters in your model. This will support reduce the complexity and boost the feasibility of the running process by ruling out unlikely settings.

A sample grid of two typical hyperparameters in random forest examples might look like this:

param_grid = {"max_depth": [3, 5, 7], "min_samples_split": [2, 10]}

# 2. Start with a general random search

For low-cost contexts, try using random search, an effective approach to exploring enormous search spaces, by incorporating a distribution-based sampling process that samples some ranges of hyperparameter values. Similar to this sampling example Ci.e. a hyperparameter controlling stiffness within the limits of SVM models:

param_dist = {"C": loguniform(1e-3, 1e2)}
RandomizedSearchCV(SVC(), param_dist, n_iter=20)

# 3. Refine locally using grid search

Once you have found promising regions using random search, it is sometimes a good idea to exploit a narrow focus grid search to further explore these regions to identify marginal benefits. First exploration, then exploitation.

GridSearchCV(SVC(), {"C": [5, 10], "gamma": [0.01, 0.1]})

# 4. Encapsulation of preprocessing pipelines for hyperparameter tuning

Scikit-learn pipelines are a great way to simplify and optimize end-to-end machine learning workflows and prevent issues like data leak. Both the preprocessing hyperparameters and the model hyperparameters can be tuned together if we pass the pipeline to the search instance as follows:

param_grid = {
    "scaler__with_mean": [True, False],  # Scaling hyperparameter
    "clf__C": [0.1, 1, 10],              # SVM model hyperparameter
    "clf__kernel": ["linear", "rbf"]     # Another SVM hyperparameter
}

grid_search = GridSearchCV(pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# 5. Trading speed ensuring reliability with cross verification

While the exploit of cross-validation is the norm in Scikit’s learning-based hyperparameter tuning, it’s worth understanding that omitting it means using a single train validation split: this is faster, but produces more variable and sometimes less reliable results. Increasing the number of cross-verification cases – e.g cv=5 – Increases performance stability for cross-model comparisons. Find a value that gives you the right balance:

GridSearchCV(model, params, cv=5)

# 6. Optimization of many indicators

If there are several performance trade-offs, monitoring several metrics during the tuning process helps detect trade-offs that may be unintended when using single-point optimization. Besides, you can benefit refit determine the main goal of determining the final “best” model.

from sklearn.model_selection import GridSearchCV

param_grid = {
    "C": [0.1, 1, 10],
    "gamma": [0.01, 0.1]
}

scoring = {
    "accuracy": "accuracy",
    "f1": "f1"
}

gs = GridSearchCV(
    SVC(),
    param_grid,
    scoring=scoring,
    refit="f1",   # metric used to select the final model
    cv=5
)

gs.fit(X_train, y_train)

# 7. Interpret results wisely

Once you have completed the tuning process and found the model that achieved the best result, do your best by using cv_results_ to better understand parameter interactions, trends, etc. or, if you want, to visualize the results. This example creates a report and ranks the results for a grid search object named gsafter completing the search and training process:

import pandas as pd

results_df = pd.DataFrame(gs.cv_results_)

# Target columns for our report
columns_to_show = [
    'param_clf__C',
    'mean_test_score',
    'std_test_score',
    'mean_fit_time',
    'rank_test_score'
]

print(results_df[columns_to_show].sort_values('rank_test_score'))

# Summary

Hyperparameter tuning is most effective when it is systematic and thoughtful. By combining knowledgeable search strategies, proper validation, and careful interpretation of results, you can achieve significant performance gains without wasting processing power and over-fitting. Treat tuning as an iterative learning process, not just an optimization checkbox.

Ivan Palomares Carrascosa is a thought leader, writer, speaker and advisor in the fields of Artificial Intelligence, Machine Learning, Deep Learning and LLM. Trains and advises others on the exploit of artificial intelligence in the real world.

The AI Sckool

Categories

7 Scikit tricks for tuning hyperparameters

# Entry

# 1. Limiting the search space using domain knowledge

# 2. Start with a general random search

# 3. Refine locally using grid search

# 4. Encapsulation of preprocessing pipelines for hyperparameter tuning

# 5. Trading speed ensuring reliability with cross verification

# 6. Optimization of many indicators

# 7. Interpret results wisely

# Summary

5 useful Python scripts to automate exploratory data analysis

Sleep apnea often goes undetected in women. This is starting to change

Anthropic’s contract with the Pentagon is a warning to startups chasing federal contracts

When AI companies go to war, security gets left behind

5 Powerful Python Decorators for Optimizing LLM Applications

More News

5 useful Python scripts to automate exploratory data analysis

Sleep apnea often goes undetected in women. This is starting to change

5 Powerful Python Decorators for Optimizing LLM Applications

Trump’s war with Iran could upend American farmers

5 useful Python scripts to automate exploratory data analysis

Sleep apnea often goes undetected in women. This is starting to change

Anthropic’s contract with the Pentagon is a warning to startups chasing federal contracts