Step-by-Step Guide to Implementing LASSO Regression in Python

Ujang Riswanto
11 min readNov 18, 2024

--

Photo by Zach Graves on Unsplash

When it comes to predictive modeling, regression techniques are like the bread and butter of data science. They’re simple, effective, and a great place to start when you want to understand the relationships in your data. But sometimes, the classic linear regression doesn’t quite cut it — especially when you’ve got a dataset with tons of features, many of which might be irrelevant. This is where LASSO Regression steps in to save the day.

So, what’s the deal with LASSO? It stands for Least Absolute Shrinkage and Selection Operator (sounds fancy, right?), and it’s basically a technique that not only helps you build a predictive model but also simplifies it by selecting only the most important features. Think of it as Marie Kondo-ing your dataset — it keeps the features that “spark joy” and tosses the rest.

Why should you care about LASSO? Well, if you’re working with high-dimensional data (a fancy way of saying “too many features”), LASSO helps prevent overfitting and makes your model easier to interpret. Whether you’re predicting house prices, diagnosing diseases, or forecasting stock prices, this method ensures you’re not drowning in unnecessary variables.

In this guide, we’ll walk you through the process of implementing LASSO Regression in Python, step by step. By the end, you’ll know exactly how to build a lean, mean, predictive machine — and have some fun doing it!

Prerequisites

Photo by Chris Ried on Unsplash

Before we dive into the nitty-gritty of LASSO Regression, let’s make sure you’ve got everything you need to follow along smoothly. Don’t worry, the list isn’t long, and you probably already know most of this stuff!

What You Should Know

First things first, it’ll help if you have a basic understanding of:

  • Linear regression: You don’t need to be a stats wizard, but knowing how regression works will definitely make this easier to follow.
  • Python basics: If you can write a simple script and know how to import libraries, you’re good to go.
  • Python data libraries: Familiarity with libraries like NumPy, pandas, and scikit-learn will make this a breeze.

What You’ll Need Installed

Here’s the tech checklist:

  • Python (obviously)
  • These Python libraries:
  • scikit-learn: The go-to library for all things machine learning.
  • pandas: For handling datasets like a pro.
  • Matplotlib: Because what’s a project without some cool plots?
  • NumPy: For all the number-crunching behind the scenes.

If you don’t have these installed yet, a quick pip install command in your terminal will sort you out:

pip install scikit-learn pandas matplotlib numpy

Once you’ve checked these off, you’re all set to jump into the fun part — working with data and building your LASSO Regression model!

Preparing the Dataset

Photo by Stephen Dawson on Unsplash

Alright, time to roll up our sleeves and get our hands dirty with some data! Before we can jump into LASSO Regression, we need to get a dataset ready, give it a little TLC, and split it into training and testing sets. Don’t worry; we’ll keep this simple and fun!

Step 1: Import the Libraries

First things first, let’s bring in the tools we need. Open up your Python IDE or notebook and start with this:

import numpy as np  
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

We’ll bring in more libraries as we go, but this gets us started.

Step 2: Load or Create a Dataset

Now we need some data to work with. You can use your own dataset if you have one, but for this guide, let’s keep things simple. The Boston Housing dataset is a classic choice. It’s built right into scikit-learn, so you don’t even need to download anything!

from sklearn.datasets import load_boston  

# Load the dataset
boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target

Alternatively, if you’re feeling adventurous, you can create a synthetic dataset using scikit-learn’s make_regression function:

from sklearn.datasets import make_regression  

X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)
data = pd.DataFrame(X, columns=[f'Feature_{i}' for i in range(1, 11)])
data['Target'] = y

Step 3: Explore the Data

Before diving in, let’s peek at the dataset. A quick glance can help spot missing values or weird outliers.

print(data.head())  
print(data.info())
print(data.describe())

To get a feel for the relationships in the data, let’s make a quick scatterplot:

plt.scatter(data['Feature_1'], data['Target'])  
plt.xlabel('Feature 1')
plt.ylabel('Target')
plt.title('Feature 1 vs Target')
plt.show()

Step 4: Split the Data

Now that we’ve got our dataset looking good, it’s time to split it into training and testing sets. This helps us evaluate how well our model generalizes to new data.

X = data.drop(columns=['Target'])  
y = data['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

And that’s it! Our dataset is locked, loaded, and ready for some LASSO magic. Next stop: building the model!

Implementing LASSO Regression

Photo by charlesdeluvio on Unsplash

Now that our data is prepped and ready to go, it’s time to dive into the heart of the matter: implementing LASSO Regression. Don’t worry — it’s way easier than it sounds, thanks to Python’s awesome libraries. Let’s break it down step by step.

Step 1: Import the LASSO Model

First, we need to bring in the Lasso class from scikit-learn. This is our star player for the day.

from sklearn.linear_model import Lasso

Boom, done. Let’s move on.

Step 2: Set Up the Model

The LASSO model comes with a very important parameter called alpha. Think of alpha as the “tightness” knob for your regression model—it controls how much penalty we apply to coefficients. Smaller alpha values mean less regularization, while larger ones will squeeze out more of the unnecessary features.

Here’s how you set it up:

lasso = Lasso(alpha=0.1)  # You can adjust alpha later to see its effect

For now, we’re starting with alpha=0.1, a good middle-of-the-road value.

Step 3: Train the Model

Next, we fit the model to our training data. This is where the magic happens:

lasso.fit(X_train, y_train)

After this step, your LASSO model has learned the relationships between the features and the target variable.

If you’re curious about which features made the cut, you can check the coefficients:

print("LASSO Coefficients:", lasso.coef_)

Zero coefficients? Those features got booted out.

Step 4: Make Predictions

With the model trained, it’s time to test it out on the test set. Let’s see how well it performs:

y_pred = lasso.predict(X_test)

You’ve just made your first predictions with LASSO Regression! 🎉

And that’s it for the basics of implementing LASSO. Next up, we’ll evaluate the model’s performance and see how it stacks up. Spoiler alert: It’s going to look pretty good!

Evaluating the Model

Photo by Campaign Creators on Unsplash

Alright, so we’ve got our LASSO model trained and ready to roll. But how do we know if it’s any good? That’s where evaluation comes in. In this section, we’ll break down how to measure your model’s performance and see how it stacks up against plain old linear regression.

Step 1: Choose Your Metrics

There are plenty of ways to judge a model, but for regression, these two are the MVPs:

  • Mean Squared Error (MSE): Measures how far off your predictions are from the actual values. Smaller is better!
  • R-squared (R²): Tells you how much of the variation in the target variable your model explains. Closer to 1 = awesome.

Step 2: Calculate the Metrics

Let’s throw our test set predictions into some metrics and see how the LASSO model did:

from sklearn.metrics import mean_squared_error, r2_score  

# Calculate MSE and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")

If the numbers look decent, great! If not, don’t worry — there’s always room for tuning (we’ll get to that later).

Step 3: Compare to Linear Regression

Curious to see if LASSO is really pulling its weight? Let’s compare it to a standard linear regression model:

from sklearn.linear_model import LinearRegression  

# Train a simple linear regression model
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred_lin = lin_reg.predict(X_test)

# Evaluate the linear model
mse_lin = mean_squared_error(y_test, y_pred_lin)
r2_lin = r2_score(y_test, y_pred_lin)

print(f"Linear Regression - MSE: {mse_lin:.2f}, R²: {r2_lin:.2f}")

You’ll probably notice that LASSO has a slightly higher MSE but better R² if your dataset has irrelevant features. Why? Because LASSO focuses on the most important features, making the model more interpretable.

Step 4: Visualize Feature Importance

LASSO doesn’t just predict — it also tells you which features matter the most. Let’s visualize it:

import matplotlib.pyplot as plt  

# Plot feature importance
plt.bar(X_train.columns, lasso.coef_)
plt.xlabel('Features')
plt.ylabel('Coefficient Value')
plt.title('LASSO Feature Importance')
plt.xticks(rotation=45)
plt.show()

Any feature with a coefficient close to zero? Yeah, it’s safe to say LASSO doesn’t think it’s worth keeping around.

With the model evaluated and compared, you’ve got a solid grasp of how LASSO performs. If you’re happy with the results, great! If not, hang tight — we’ll dive into fine-tuning in the next section.

Tuning Hyperparameters with Cross-Validation

Photo by Scott Graham on Unsplash

Alright, so your LASSO model is up and running, but what if you want to squeeze out every last drop of performance? That’s where hyperparameter tuning comes in. Specifically, we’ll fine-tune the all-important alpha parameter to find the sweet spot that balances model simplicity and accuracy. And the best way to do this? Cross-validation!

Step 1: Use LassoCV for Automatic Tuning

Manually testing different alpha values can be a pain, so let’s make life easier by using scikit-learn’s LassoCV. This handy tool automatically tests multiple alpha values using cross-validation and picks the best one for you.

Here’s how to do it:

from sklearn.linear_model import LassoCV  

# Set up LassoCV with a range of alpha values
lasso_cv = LassoCV(alphas=np.logspace(-4, 1, 50), cv=5) # Testing 50 alpha values
lasso_cv.fit(X_train, y_train)

# Best alpha value
print(f"Best alpha: {lasso_cv.alpha_:.4f}")

This will test 50 different alpha values between 10−410^{-4}10−4 and 10110^1101 (a good range to start with) and choose the one that works best.

Step 2: Retrain the Model with the Best Alpha

Once you’ve found the optimal alpha, it’s time to retrain your LASSO model. But guess what? LassoCV already does this for you! Its coefficients are automatically updated to match the best alpha.

print("LASSO Coefficients with Best Alpha:", lasso_cv.coef_)

You can now use lasso_cv to make predictions, just like before:

y_pred_cv = lasso_cv.predict(X_test)

Step 3: Visualize Alpha vs. Model Performance

Curious about how alpha affects your model? Let’s plot the relationship between alpha values and mean squared error:

plt.plot(lasso_cv.alphas_, lasso_cv.mse_path_.mean(axis=1), marker='o')  
plt.xscale('log') # Log scale for alpha
plt.xlabel('Alpha')
plt.ylabel('Mean Squared Error')
plt.title('Alpha vs MSE')
plt.show()

This plot shows you exactly why the chosen alpha is the best—it minimizes the error while keeping the model lean.

Step 4: Test the New Model

Finally, let’s see how your fine-tuned LASSO model stacks up against the earlier version:

from sklearn.metrics import mean_squared_error, r2_score  

mse_cv = mean_squared_error(y_test, y_pred_cv)
r2_cv = r2_score(y_test, y_pred_cv)

print(f"Fine-Tuned LASSO - MSE: {mse_cv:.2f}, R²: {r2_cv:.2f}")

You should notice a nice little improvement in performance. If not, don’t worry — tuning is all about trial and error!

With cross-validation in your toolkit, you’re no longer just building models — you’re optimizing them like a pro. 🎯 Next up: some tips and tricks to make sure you’re always getting the most out of LASSO Regression.

Practical Tips and Common Pitfalls

Photo by Firmbee.com on Unsplash

So, you’ve built and fine-tuned your LASSO Regression model. Awesome! But before you go off predicting the future, let’s talk about some practical tips to keep your model sharp — and some common traps to avoid.

Tip 1: Choose the Right Alpha

The alpha parameter is like Goldilocks: too small, and your model acts like regular linear regression (keeping every feature); too large, and it throws out everything useful. Use cross-validation (LassoCV) to find the “just right” value, and always double-check how it affects your model’s performance.

Tip 2: Standardize Your Data

LASSO regression is sensitive to the scale of your features. If one feature has values ranging from 1 to 10 and another from 1,000 to 10,000, LASSO might unfairly penalize the larger one. Fix this by standardizing your features:

from sklearn.preprocessing import StandardScaler  

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

This ensures every feature gets a fair shot.

Tip 3: Don’t Overlook Feature Selection

One of the best parts of LASSO is that it automatically selects features for you. But don’t take its word as gospel! Check which features are dropped and ensure they actually make sense in the context of your problem.

Tip 4: Know When Not to Use LASSO

LASSO is amazing for datasets where only a few features matter. But if all your features are important (or if they’re highly correlated), LASSO might struggle. In those cases, consider:

  • Ridge Regression: Better for handling multicollinearity (correlated features).
  • Elastic Net: Combines LASSO and Ridge, balancing feature selection with stability.

Tip 5: Be Wary of Overfitting

Yes, LASSO helps reduce overfitting, but it’s not a magic bullet. If your dataset is small or noisy, your model might still overfit. Regularization is powerful, but so is having clean, well-prepped data.

Tip 6: Use Visuals to Interpret Results

A model is only as good as how well you understand it. Use coefficient plots to visualize which features are important and how much they contribute. If a key feature has been dropped, dig deeper — it might be a clue that something’s off with your data.

Tip 7: Experiment with Different Data Splits

Your results can vary depending on how you split your data into training and test sets. Try a few different splits or use cross-validation to ensure your model performs consistently across the board.

With these tips in mind, you’re well-equipped to make the most out of LASSO Regression. It’s a fantastic tool, but like any tool, it shines brightest when you know how to use it effectively. Now go forth and build models that are as sharp as they are simple! 🚀

Conclusion

And there you have it! You’ve just gone through a complete step-by-step guide to implementing LASSO Regression in Python. 🎉 By now, you’ve learned:

  • What makes LASSO Regression special (hello, feature selection!).
  • How to prep your data so your model has the best chance of success.
  • The magic of setting up, training, and fine-tuning a LASSO model.
  • How to evaluate performance and avoid common pitfalls.

LASSO Regression is more than just a fancy acronym — it’s a powerful tool that helps simplify complex datasets while keeping the predictive power intact. It’s perfect for when you’re juggling a lot of features but want a model that’s both lean and effective.

But don’t stop here! There’s so much more you can explore:

  • Test LASSO on your own datasets to see how it handles different challenges.
  • Experiment with alternatives like Ridge or Elastic Net to understand when they shine.
  • Dive deeper into scikit-learn’s documentation for advanced tweaks.

The more you play around, the better you’ll get at picking the right tool for the job. Machine learning is all about experimenting, learning, and iterating — so go ahead and build something awesome.

Good luck, and happy coding! 🚀

--

--

Ujang Riswanto
Ujang Riswanto

Written by Ujang Riswanto

web developer, uiux enthusiast and currently learning about artificial intelligence

No responses yet