In generalized linear models, we estimate coefficients using the ordinary least squares (OLS) method. The LASSO technique adds a penalty term to the ordinary least squares (OLS) equation that’s used to minimize the error. The total penalty applied is the sum of the absolute value of the coefficients scaled by some lambda. If lambda is zero, we get the OLS equation with all the variables in the model formula.
Increasing lambda forces coefficients closer to zero and results in some of these coefficients being zeroized (effectively removing the variable from the model). In this way the LASSO model identifies which variables have an influence on the actual outcome based on the data that is provided.
Cross-Validation
Cross-validation is commonly used in a range of machine learning techniques to assess and set hyperparameters without using separate hold-out or test data. This is done to reduce the risk of overfitting to the data. Cross-validation involves the following steps:
- Breaking the training data into several folds (e.g. five).
- Fitting a model on the four folds of the data and validating on the fifth fold.
- Repeating the process until each fold has been used.
- Taking an average of the error metric calculated over all folds.
In the case of cross-validated LASSO regression, cross-validation is used to set the lambda parameters. This results in models that depend on the level and consistency of relationships in the data. As such, these models are less prone to overfitting.
Three Benefits of Cross-Validated LASSO Regression
Variable Selection
By incorporating the penalization term, LASSO jointly optimizes two critical aspects:
- Goodness of fit: Like generalized linear models (GLMs), LASSO aims for a good fit to the data.
- Coefficient structure: It seeks a desirable structure for the estimated coefficients.
The penalty encourages some coefficients to shrink, and in some cases, even become zero. This tradeoff ensures that the model retains the most important variables while others get removed. The resulting model is simpler, easier to understand, and less prone to overfitting.
Even in cases where the data has limited variables, incorporating variable interactions into the model can lead to hundreds or even thousands of unique interacted variables. LASSO can effectively handle interactions among predictor variables including handling collinearity.
One can purposefully simplify a LASSO regression by using higher lambda than that suggested by cross-validation to further reduce the number of variables in the model. This may be desirable in some cases.
Weighing Existing Assumptions
By using a baseline such as an existing mortality table, LASSO models can incorporate existing assumptions. Using a value of lambda, set through cross validation, results in a model that balances the existing assumptions against the data. This model fits closer to the experience, where there is enough data to support it, and closer to the existing assumptions where data is limited.
Interpretability
In addition to having fewer coefficients, LASSO regression provides model output in a format similar to that of GLMs. This consistency is advantageous for actuaries who are already familiar with GLMs. By providing an interpretable formula for generating predictions, LASSO allows practitioners to understand the key drivers behind the model’s predictions, avoiding the complexity of “black box models”.
In Summary
LASSO regression combines the advantages of more advanced machine learning approaches with the simplicity of GLMs. It allows one to efficiently select variables from a large pool of possibilities while balancing existing assumptions. The penalty term ensures that only relevant coefficients are included for variables with sufficient supporting data. Essentially, LASSO regression encourages simple models where only the most important variables contribute significantly to the model output. These models have coefficients that actuaries can easily interpret and use when setting assumptions.
This article was initially published by the Canadian Institute of Actuaries https://www.cia-ica.ca/news/simplicity-interpretability-and-effective-variable-selection-with-lasso-regression/.
References
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the LASSO.” Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267‑88. https://www.jstor.org/stable/2346178.