In most cases, the role of a machine learning tool is to process data and deliver results. The processing, however, is preprogrammed.
A machine learning model is usually trained to process data using one or multiple layers of algorithms. This training requires a lot of relevant data and the accuracy is directly related to the volume of data. Under-performance in machine learning can be explained by two major phenomena; Underfitting and overfitting. Both the phenomena are a direct result of over and undertraining. A machine learning tool is trained in detail. The noises and patterns in the data can thus be deciphered. The depth of training determines how optimized a machine learning tool will be to reduce the noises and unwanted irregularities but not the hidden information.
What are underfitting and overfitting?
Underfitting is a condition, where a machine learning tool is undertrained and fails to recognise the patterns and hidden information from the real-world data. However, the problem remains with the approach of training. And in most cases, the performance on practice data does not change drastically. Thus, trouble is hard to detect before onset.
Overfitting is a situation where a machine learning tool is very much sensitive to picking up noises and unwanted features from the training data. And a model trained with all these irregularities can not be optimized properly. Thus in the face of a massive variety and volume, the tool fails to perform accordingly and tends to pick up even greater amounts of erroneous data.
The mitigation approach: regularization in machine learning
Regularization is a process similar to noise reduction. The process is concerned with the reduction of feature magnitudes while keeping the number of features the same. A machine learning tool, too late to go back to training can be regularized and made into a more efficient rendition of the same. The reduction of features is done in a generalized manner so that the relative magnitudes remain the same. And accuracy is maintained. Mathematically, it reduces the coefficient of features to zero. While keeping the number of features unchanged.
How does it work?
Regularization works by adding a penalty to a complex model. In terms of regression,
y= β0+β1x1+β2x2+β3X3+⋯+βnxn +b
Y in this case is the value that a tool is to predict. X1, X2, X3……………….Xn are the features of Y. the β0, β1, β2………….βn are the biases or weightage that is added to the features. And b in this equation is the intercept. ( y=mx+C )
In the case of regularization, linear regression models try to reduce the values of β and b for the minimization of the cost function.
Preceding this, a loss function is added for the linear regression, which is called the residual sum of squares.
Methods of regularization in machine learning
Ridge regression
- Ridge regression is a linear regression, in which a small amount of bias is added so that the model becomes more reliable. And worthy of long-term implementation.
- Ridge regression is also called L2 regression. As it is a regularization technique concerned with reducing complexity from the model. In this regression, the cost function is added to a bias. This bias is known as the ridge regression penalty. The same is proportional to the weightage of each feature.
- Ridge regression, unlike general polynomial or linear regressions, can be deployed in the case of high collinearity between the independent variables. Ridge regression can be deployed to solve problems with more parameters than samples.
- Used mostly for mitigation of overfitting.
Lasso regression
- The least absolute and selection operator or lasso regression is also used for the reduction of complexity from a model.
- In ridge regression, the penalty generally contains the square of weights, but in lasso regression, the penalty contains absolute weights
- Due to the absolute nature of weightage, a lasso regression can shrink the slope to zero. Unlike ridge regression, which can only shrink to somewhere near zero.
- The least absolute and selection (lasso regression) operator enables us to choose from features and reduce overfitting selectively.
Conclusion
Regularization in machine learning is mainly deployed for the reduction of complexity and bringing in the quotient of accuracy in the process of machine learning deployment. It is not always possible to train a machine learning tool with the most appropriate data sets. The practice and target data sets can be very different from each other. And mitigation at the stage of execution can be required more than often. Overfitting in these cases is generally the most common problem. And with the deployment of the right mitigation or regularization approach can be fixed and avoided altogether. However, errors can still remain, but to a negligible extent.