
Photo by the editor
# Entry
This article demystifies the concept of a parameter in machine learning models and describes what they are, how many parameters the model has (spoiler alert: it depends!) and what can go wrong when setting model parameters during training. Let’s look at these basic components.
# Demystifying parameters in machine learning models
Parameters they are like internal machine learning model knobs and knobs: define the behavior of your model. Just as a barista’s coffee machine may brew a cup of coffee of varying quality depending on the quality of the ground coffee beans, the parameters of a machine learning model are set differently depending on the nature – and largely the quality – of the training data examples used to learn how to perform a task.
For example, returning to the case of predicting housing prices, if a training dataset of examples of apartments with known prices contains boisterous, irrelevant or biased information, the training process may produce a model whose parameters (remember, internal settings) capture misleading patterns or relationships between inputs and outputs, resulting in incorrect price predictions. Meanwhile, if the dataset contains tidy, representative, and high-quality examples, the training process is likely to produce a model whose parameters are fine-tuned to the real factors driving higher or lower housing prices, leading to great predictions.
Now I notice that I used italics to emphasize the word “internal” several times? This was completely intentional and necessary to distinguish machine learning model parameters from hyperparameters. Compared to parameters, a hyperparameter in a machine learning model is like a knob, dial, or even a button or switch that is outwardly and adjusted manually (not data-driven), usually by a human, but also through a search process to find the best configuration of relevant hyperparameters in your model. You can learn more about hyperparameters in this article on machine learning mastery.
Parameters are like the internal knobs and dials of a machine learning model – they define the “personality” or “behavior” of the model, namely what aspects of the data it applies to and to what extent.
Now that we better understand the parameters of the machine learning model, a few questions arise:
- What do the parameters look like?
- How many parameters exist in a machine learning model?
Parameters are usually numerical values, resembling weights, which in some types of models range from 0 to 1, and in others they can take on any other real value. This is why, in machine learning jargon, the terms parameter and weight are often used to refer to the same concept, especially in neural network-based models. The higher this weight, the more this “knob” inside the model influences the result or prediction. In simpler machine learning models, such as linear regression models, parameters are associated with features of the input data.
For example, let’s assume that we want to predict the price of an apartment based on four attributes: square footage, proximity to the city center, number of bedrooms and age of the building in years. A linear regression model trained for this prediction task would have four parameters – one associated with each input predictor – and one additional parameter called a bias term (or intercept), unrelated to any feature of the data input, but typically needed in many machine learning models to have more “freedom” to learn effectively from a variety of data. Thus, each parameter or weight value indicates the strength of influence of its associated input feature in the process of making predictions using this model. If the highest weight is for “closeness to the city center”, this means that apartment prices in Seville are heavily influenced by the distance from the city center.
More generally, in mathematical terms, the parameters of a basic model such as a multiple linear regression model are denoted by (theta_i) in the following equation:
[
hat{y} = theta_0 + theta_1x_1 + dots + theta_nx_n
]
Of course, only the simplest types of machine learning models have such a diminutive number of parameters. As data becomes more intricate, there is typically a need for larger, more sophisticated models, such as support vector machines, random forest ensembles, or neural networks, which introduce additional layers of structural complexity to learn arduous relationships and patterns. As a result, larger models have a much larger number of parameters, now associated not only with the input data, but also with the intricate and abstract interrelationships between the input data that are arranged and built inside the model. For example, a deep neural network can have hundreds to millions of parameters, and some of the largest machine learning models today – transformer architecture behind huge language models (LLM) – they usually contain billions of parameters that can be learned!
# Learning parameters and solving potential problems
Once the process of training a machine learning model begins, the parameters are typically initialized as random values. The model makes predictions on examples of training data with known prediction results, e.g. apartments with known prices, determining the error made and appropriately adjusting some parameters to gradually reduce the errors made. This is how machine learning models learn example by example: parameters are gradually and iteratively updated during training, making them increasingly relevant to the set of training examples the model is exposed to.
Unfortunately, in practice, certain difficulties and problems may arise when training a machine learning model – that is, when gradually determining the values of its parameters. Some common problems include overfitting and the misfit of its counterpart, which manifests itself in some ultimately learned parameters that are not in the best condition, resulting in a model that may perform penniless predictions. These problems may also be partly due to human choices, such as selecting a model that is too intricate or too basic for the available training data, i.e. the number of parameters in the model is too few or too many. A model with too many parameters can become tardy, steep to train and operate, and more arduous to control if it degrades over time. Meanwhile, a model with too few parameters does not have enough flexibility to learn useful patterns from the data.
# Summary
This article explains an vital element of machine learning models in a basic and cordial way: parameters. They are like the DNA of your model, and understanding what they are, how they are learned, and how they relate to model behavior and performance is crucial to learning about machine learning.
Ivan Palomares Carrascosa is a thought leader, writer, speaker and advisor in the fields of Artificial Intelligence, Machine Learning, Deep Learning and LLM. Trains and advises others on the operate of artificial intelligence in the real world.
