Asset Classes

Free investment financial education

Language

Multilingual content from IBKR

Close Navigation
Learn more about IBKR accounts
Lasso Regression Model with R Code

Lasso Regression Model with R Code

Posted June 17, 2024 at 11:23 am
Sang-Heon Lee
SHLee AI Financial Model
Tibshirani (1996) introduces the so called LASSO (Least Absolute Shrinkage and Selection Operator) model for the selection and shrinkage of parameters. This model is very useful when we analyze big data. In this post, we learn how to set up the Lasso model and estimate it using glmnet R package.


Tibshirani (1996) introduces the LASSO (Least Absolute Shrinkage and Selection Operator) model for the selection and shrinkage of parameters. The Ridge model is similar to it in terms of the shrinkage but does not have selection function because the ridge model make the coefficient of unimportant variable close to zero but not exactly to zero.

These regression models are called as the regularized or penalized regression model. In particular, Lasso is so powerful that it can work for big dataset in which the number of variables is more than 100 or 1000, 10000, …. and so on. The traditional linear regression model cannot deal with this sort of big data.

Although the linear regression estimator is the unbiased estimator in terms of bias-variance trade-off relationship, the regularized or penalized regression such as Lasso, Ridge admit some bias for reducing variance. This means the minimization problem for the latter has two components: mean squared error and penalty for parameters.  l1-penalty of Lasso make variable selection and shrinkage possible but l2- penalty of Ridge make only shrinkage possible.

Model : Lasso

For observation index i=1,2,…,N and variable index j=1,2,…,p standardized predictors xij and demeaned or centered response variables) are given, Lasso model finds βj which minimize the following objective function.

Here, Y is demeaned for the sake of exposition, but this is not a must. However, X variables should be standardized with mean zero and unit variance because the difference in scale of variables tends to distribute penalty to each variables unequally.

From the above equations, the first part of it is the RSS (Residual Sum of Squares) and the second is the penalty term. This penalty term is adjusted by the hyperparameter λ. Hyperparameter is given exogenously by user through the process of manual searching or cross-validation.

When certain variable is included in the Lasso but decreases RSS so small to be negligible (i.e. decreasing RSS by 0.000000001), the impact of the shrinkage penalty grows. This means the coefficient of this variable is to zero (Lasso) or close to zero (Ridge).

Unlike the Ridge (convex and differentiable), the Lasso (non-convex and non-differentiable) does not have the closed-form solution in most problems and we use the cyclic coordinate descent algorithm. The only exception is the case where all X variables are orthonormal but this case is highly unlikely.

R code

Let’s estimate parameters of Lasso and Ridge using glmnet R package which provides fast calculation and useful functions.

For example, we make some artificial time series data. let X be 10 randomly drawn time series (variables) and Y variable with predetermined coefficients and randomly drawn error terms. Some coefficient are set to zero for the clear understanding of the differences between the standard linear, Lasso, and Ridge regression. The R code is as follows.

#=========================================================================#
# Financial Econometrics & Derivatives, ML/DL using R, Python, Tensorflow 
# by Sang-Heon Lee
#
# https://shleeai.blogspot.com
#-------------------------------------------------------------------------#
# Lasso, Ridge
#=========================================================================#
 
library(glmnet)
 
    graphics.off()  # clear all graphs
    rm(list = ls()) # remove all files from your workspace
    
    N = 500 # number of observations
    p = 20  # number of variables
    
#--------------------------------------------
# X variable
#--------------------------------------------
    X = matrix(rnorm(N*p), ncol=p)
 
# before standardization
    colMeans(X)    # mean
    apply(X,2,sd)  # standard deviation
 
# scale : mean = 0, std=1
    X = scale(X)
 
# after standardization
    colMeans(X)    # mean
    apply(X,2,sd)  # standard deviation
 
#--------------------------------------------
# Y variable
#--------------------------------------------
    beta = c( 0.15, -0.33,  0.25, -0.25, 0.05,rep(0, p/2-5), 
             -0.25,  0.12, -0.125, rep(0, p/2-3))
 
    # Y variable, standardized Y
    y = X%*%beta + rnorm(N, sd=0.5)
    y = scale(y)
 
#--------------------------------------------
# Model
#--------------------------------------------
    lambda <- 0.01
    
    # standard linear regression without intercept(-1)
    li.eq <- lm(y ~ X-1) 
    
    # lasso
    la.eq <- glmnet(X, y, lambda=lambda, 
                    family="gaussian", 
                    intercept = F, alpha=1) 
    # Ridge
    ri.eq <- glmnet(X, y, lambda=lambda, 
                    family="gaussian", 
                    intercept = F, alpha=0) 
 
#--------------------------------------------
# Results (lambda=0.01)
#--------------------------------------------
    df.comp <- data.frame(
        beta    = beta,
        Linear  = li.eq$coefficients,
        Lasso   = la.eq$beta[,1],
        Ridge   = ri.eq$beta[,1]
    )
    df.comp
    
#--------------------------------------------
# Results (lambda=0.1)
#--------------------------------------------
    lambda <- 0.1
    
    # lasso
    la.eq <- glmnet(X, y, lambda=lambda,
                    family="gaussian",
                    intercept = F, alpha=1) 
    # Ridge
    ri.eq <- glmnet(X, y, lambda=lambda,
                    family="gaussian",
                    intercept = F, alpha=0) 
    
    df.comp <- data.frame(
        beta    = beta,
        Linear  = li.eq$coefficients,
        Lasso   = la.eq$beta[,1],
        Ridge   = ri.eq$beta[,1]
    )
    df.comp
    
#------------------------------------------------
# Shrinkage of coefficients 
# (rangle lambda input or without lambda input)
#------------------------------------------------
    
    # lasso
    la.eq <- glmnet(X, y, family="gaussian", 
                    intercept = F, alpha=1) 
    # Ridge
    ri.eq <- glmnet(X, y, family="gaussian", 
                    intercept = F, alpha=0) 
    # plot
    x11(); par(mfrow=c(2,1)) 
    x11(); matplot(log(la.eq$lambda), t(la.eq$beta),
                   type="l", main="Lasso", lwd=2)
    x11(); matplot(log(ri.eq$lambda), t(ri.eq$beta),
                   type="l", main="Ridge", lwd=2)
    
#------------------------------------------------    
# Run cross-validation & select lambda
#------------------------------------------------
    mod_cv <- cv.glmnet(x=X, y=y, family='gaussian',
                        intercept = F, alpha=1)
    
    # plot(log(mod_cv$lambda), mod_cv$cvm)
    # cvm : The mean cross-validated error 
    #     - a vector of length length(lambda)
    
    # lambda.min : the λ at which 
    # the minimal MSE is achieved.
    
    # lambda.1se : the largest λ at which 
    # the MSE is within one standard error 
    # of the minimal MSE.
    
    x11(); plot(mod_cv) 
    coef(mod_cv, c(mod_cv$lambda.min,
                   mod_cv$lambda.1se))
    print(paste(mod_cv$lambda.min,
                log(mod_cv$lambda.min)))
    print(paste(mod_cv$lambda.1se,
                log(mod_cv$lambda.1se)))

Estimation Results

The following figure shows true coefficients (β) with which we generate data, the estimated coefficients of three regression models.

The estimation results provide similar results between models despite the uncertainty in data-generating process. In particular, Lasso is identifying the insignificant or unimportant variables as zero coefficients. The variable selection and shrinkage effect are strong with λ. The following figures show the change of estimated coefficients with respect to the change of the penalty parameter (log(λ)) which is the shrinkage path.

Model Selection

The most important thing in Lasso boils down to select the optimal λ. This is determined in the process of the cross-validation. cv.glmnet() function in glmnet provides the cross-validation results with some proper range of λ. Using this output, we can draw a graph of log(λ) and MSE(means squared error).

From the above figures, the first candidate is the λ at which the minimal MSE is achieved but it is likely that this model have many variables. The second is the largest λ at which the MSE is within one standard error of the minimal MSE. This is somewhat heuristic or empirical approach but have some merits for reducing the number of variables. It is typical to choose the second, MSE minimized 1se λ. But visual inspection is very important tool to find the pattern of shrinkage process.

The following result reports the estimated coefficients under the MSE minimized λ and MSE minimized 1se λ respectively.

Forecast

After estimating the parameters of Lasso regression, it is necessary to use this model for prediction. The forecasting exercise use not the penalty term but the estimated coefficients. Looking at the forecasting method, the Lasso, Ridge, and linear regression models are the same because the penalty term is only used for the estimation.

Based on this post, sign restricted Lasso model will be discussed. It is important to set constraints on the sign of coefficient since the economic theory or empirical stylized fact advocate the specific sign.

Tibshirani, Robert (1996). “Regression Shrinkage and Selection via the lasso,” Journal of the Royal Statistical Society 58-1, 267–88.

Originally posted on SH Fintech Modeling.

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Leave a Reply

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from SHLee AI Financial Model and is being posted with its permission. The views expressed in this material are solely those of the author and/or SHLee AI Financial Model and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.