How LASSO Helps You Identify and Discard Irrelevant Features in Machine Learning

LASSO is like a laser-focused editor, cutting through the clutter and leaving you with a dataset that’s lean, clean, and ready to rock. Next up, let’s look at where LASSO really shines in real-world applications

Ujang Riswanto
14 min readNov 19, 2024
Photo by Arseny Togulev on Unsplash

Let’s face it — dealing with messy data is one of the biggest headaches in machine learning. Whether you’re building a model to predict stock prices or classify cat photos, irrelevant features can sneak in and cause all kinds of trouble. They make your models unnecessarily complex, slow, and — let’s be honest — just plain bad at their job.

That’s where LASSO (short for Least Absolute Shrinkage and Selection Operator) comes in, ready to save the day. Think of it as a neat little algorithm with a sharp pair of scissors, trimming out the features you don’t actually need. LASSO doesn’t just help your model run smoother — it makes it smarter by focusing only on the most important predictors.

In this article, we’ll dive into what makes LASSO such a powerful tool for feature selection. By the end, you’ll see how it can cut through the noise in your data and leave you with a lean, mean predictive machine. Ready? Let’s go!

The Problem with Irrelevant Features

Photo by Dennis Kummer on Unsplash

Imagine you’re trying to predict how well someone will do on a math test. You’ve got a bunch of data: hours spent studying, previous test scores, favorite pizza topping, and their shoe size. While “hours spent studying” and “previous test scores” probably make sense, “favorite pizza topping” and “shoe size”? Yeah, not so much. These irrelevant features add noise to your data, making it harder for your model to figure out what really matters.

Here’s the thing — machine learning models aren’t great at automatically knowing what’s important. They’ll try to use everything you give them, which can lead to some big problems:

  • Overfitting: The model gets too good at memorizing the irrelevant stuff and struggles to generalize to new data.
  • Longer training times: Extra features mean more work for the model.
  • Confusing outputs: Ever tried explaining why your model thinks pizza toppings matter? Yeah, good luck with that.

This gets even worse with high-dimensional data, like in genomics or text processing, where you might have thousands (or millions!) of features. Without a way to separate the useful from the useless, your model is doomed to drown in a sea of irrelevant data.

Bottom line? If you want your model to perform well and make sense, you’ve got to deal with those irrelevant features. That’s where LASSO steps in to help. Stay tuned!

What is LASSO?

Photo by Markus Spiske on Unsplash

Alright, so let’s talk about LASSO — your new best friend for simplifying machine learning models. The name might sound fancy (Least Absolute Shrinkage and Selection Operator), but the idea is pretty straightforward. LASSO is a type of linear regression that not only tries to fit your data but also decides which features actually matter and which ones are just taking up space.

Here’s the magic: LASSO adds a penalty to the usual linear regression formula. This penalty, based on the absolute values of the feature coefficients, forces the model to shrink some of those coefficients all the way down to zero. And guess what? When a feature’s coefficient is zero, it means the model is straight-up ignoring it. Goodbye, irrelevant features!

The Formula (Don’t Worry, It’s Not Scary)

In regular linear regression, the goal is to minimize the Residual Sum of Squares (RSS) — basically, the difference between the actual and predicted values. LASSO adds a penalty to this:

Cost Function=RSS+λ∑∣βi​∣

Here’s what’s happening:

  • λ : A tuning parameter that controls how aggressive the feature selection is.
  • ∑∣βi∣∣ : The penalty term that LASSO uses to shrink coefficients.

When λ is big, the model goes hard on shrinking coefficients, and more features get kicked to the curb. When λ is small, LASSO chills out and keeps more features around.

How is LASSO Different from Ridge Regression?

You might’ve heard of Ridge regression, which also uses a penalty, but there’s a key difference:

  • Ridge regression: Shrinks coefficients but doesn’t actually set any to zero. So, all features stick around.
  • LASSO: Shrinks AND eliminates features by setting some coefficients to exactly zero.

Think of Ridge as the model that hoards all your features, while LASSO is the minimalist that only keeps what’s necessary.

In short, LASSO is like having a built-in feature selection process while doing regression. It’s simple, effective, and saves you a ton of time when dealing with messy datasets. Ready to see how it works its magic? Let’s move on!

How LASSO Identifies and Discards Irrelevant Features

Photo by Mika Baumeister on Unsplash

Alright, let’s dive into the cool part — how LASSO actually figures out which features to keep and which ones to ditch. It’s kind of like a reality TV show where the most useful features get a rose, and the irrelevant ones are sent home.

How It Works

Here’s the deal: LASSO works by tweaking the coefficients of your model. Every feature in your dataset starts out with a coefficient that tells the model how important it is. LASSO’s job is to shrink those coefficients down. And if a feature isn’t pulling its weight? Its coefficient gets reduced all the way to zero. At that point, the model says, “You’re outta here!”

This is thanks to LASSO’s L1 regularization penalty. Unlike other penalties (looking at you, Ridge regression), the L1 penalty forces some coefficients to be exactly zero, which is what makes LASSO so good at feature selection.

Tuning the Aggressiveness

The key to LASSO’s magic is the λ\lambdaλ parameter. Think of λ\lambdaλ as a slider that controls how strict LASSO is about eliminating features:

  • High λ\lambdaλ: LASSO goes full Marie Kondo, cutting features left and right. Only the most important ones survive.
  • Low λ\lambdaλ: LASSO takes it easy and keeps more features around.

The trick is finding the right balance so your model doesn’t lose useful features or get overwhelmed by irrelevant ones. Usually, this is done using cross-validation (a fancy way of testing different λ\lambdaλ values to find the best one).

Let’s Visualize It

Imagine you’re plotting feature coefficients on a graph. As λ\lambdaλ increases, the coefficients for less important features shrink closer and closer to zero until — poof — they’re gone. What’s left are the features that actually matter.

Why This Matters

By tossing out the junk, LASSO helps your model focus on the good stuff. This:

  • Makes the model simpler and faster.
  • Reduces overfitting by ignoring irrelevant noise.
  • Leaves you with a clearer understanding of which features drive your predictions.

Practical Applications of LASSO

Photo by imgix on Unsplash

So, when should you actually use LASSO? Turns out, it’s a rockstar in all sorts of situations, especially when you’re dealing with large, complex datasets. Let’s look at where LASSO shines the brightest.

1. Tackling High-Dimensional Data

If your dataset has more features than you can count (or even more features than samples), LASSO is your best bet. Think of fields like:

  • Genomics: Thousands of genes, but only a handful might actually matter for predicting a disease.
  • Finance: Hundreds of economic indicators, but only a few drive the market.
  • Natural Language Processing (NLP): Millions of words or phrases, but only some are useful for a specific task.

LASSO thrives in these situations, cutting through the noise and finding the features that actually have predictive power.

2. Handling Correlated Predictors

Ever had two or more features that are so similar they might as well be twins? Like “square footage” and “number of rooms” in a real estate dataset? LASSO helps by picking one of them and ignoring the rest, simplifying your model while keeping the important info.

3. Improving Model Interpretability

Models with too many features can feel like a black box. LASSO simplifies things by narrowing your feature set, making it easier to explain how your model works. This is especially helpful in areas like healthcare or legal systems, where understanding the “why” behind a prediction is critical.

4. Real-World Use Cases

Here’s where you’ll often find LASSO in action:

  • Predicting Housing Prices: Trim down irrelevant features like “color of the front door” while keeping big hitters like location and square footage.
  • Customer Churn Prediction: Focus on key predictors like subscription length and engagement rate instead of fluff like “favorite browser.”
  • Gene Expression Analysis: Zero in on the handful of genes linked to a disease while ignoring the rest of the genome.

Why LASSO Wins

By eliminating irrelevant features, LASSO does more than just save time and computational power — it makes your models better at their jobs. And when it comes to delivering reliable, interpretable results, that’s the name of the game.

Up next, let’s talk about the flipside — some of the limitations you should watch out for when using LASSO. Spoiler alert: It’s not perfect, but it’s pretty close!

Limitations of LASSO

Photo by Firmbee.com on Unsplash

Alright, let’s keep it real — LASSO is awesome, but it’s not a magic wand. While it does a great job at trimming the fat from your dataset, it has a few quirks and limitations you should know about before diving in headfirst.

1. Struggles with Highly Correlated Predictors

LASSO doesn’t always play well with features that are highly correlated. Imagine you have two predictors that are super similar, like “age in years” and “age in months.” LASSO might randomly pick one and drop the other, even if both are important. If you care about keeping groups of related features, you might need to look at alternatives like Elastic Net, which balances LASSO’s feature selection with Ridge regression’s grouping tendency.

2. Not Great for Ultra-Large Datasets

LASSO can be computationally intense when you’re dealing with massive datasets — think millions of samples or features. While it’s still effective, the time and resources needed to fit the model can be a headache. If speed is critical, you might need to consider approximations or other techniques.

3. Sensitive to Scaling

LASSO loves when all your features are on the same playing field. If one feature has values in the thousands and another is in fractions, LASSO will unfairly favor the big numbers. The solution? Standardize or normalize your data before running LASSO.

4. Might Over-Simplify Your Model

In its quest to simplify, LASSO can sometimes overdo it and remove features that are actually useful in the right context. This is especially true if you set the λ\lambdaλ parameter too high. Finding the sweet spot for λ\lambdaλ is crucial to avoid throwing out the baby with the bathwater.

5. Limited Interpretability in Complex Settings

While LASSO simplifies models, the zeroing-out process can make it harder to explain why certain features were dropped, especially in cases where features are correlated or the dataset is noisy. This could be a downside in fields like healthcare or finance, where transparency is key.

So, Should You Still Use LASSO?

Absolutely! Just keep its limitations in mind:

  • If your features are highly correlated, maybe pair LASSO with Elastic Net.
  • For ultra-large datasets, consider ways to preprocess or sample your data.
  • Always scale your features before running LASSO to keep things fair.

LASSO is still one of the best tools out there for feature selection. It’s like a scalpel — precise and effective, but only if you use it the right way. Next up, we’ll look at how to get LASSO working in your projects with some practical tips and examples!

Steps to Implement LASSO in Your Workflow

Photo by Pankaj Patel on Unsplash

So, you’re sold on LASSO and ready to use it to clean up your dataset. Awesome! Here’s a simple, step-by-step guide to get you started.

Step 1: Preprocess Your Data

Before you let LASSO work its magic, you need to tidy up your data:

  • Handle missing values: Fill them in or drop incomplete rows — LASSO doesn’t like gaps.
  • Standardize your features: LASSO is sensitive to scale, so make sure everything is normalized (e.g., using StandardScaler in Python). Otherwise, features with larger scales will dominate the selection process.
  • Remove irrelevant features manually (optional): If you already know some features are useless, like “favorite emoji” in a dataset about health, get rid of them upfront.

Step 2: Choose the Right λ

The λ parameter (also called alpha in some libraries) controls how aggressive LASSO is about shrinking coefficients. Finding the perfect λ is key:

  • Use cross-validation to test different values of λ.
  • Most libraries have built-in tools to help with this. For example, in Scikit-learn, you can use LassoCV to automatically pick the best λ.

Step 3: Train the Model

Once your data is ready and λ is set, it’s time to train your LASSO model. Here’s a quick Python example using Scikit-learn:

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load and preprocess data
X, y = your_data() # Replace with your dataset
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train LASSO model
lasso = Lasso(alpha=0.1) # Set your lambda (alpha) value
lasso.fit(X_train, y_train)

# Check results
print("Selected features:", lasso.coef_)
print("Intercept:", lasso.intercept_)
print("Model score:", lasso.score(X_test, y_test))

Step 4: Interpret the Results

Once the model is trained:

  • Look at the coefficients (lasso.coef_). Any feature with a coefficient of 0 has been dropped.
  • Analyze which features were kept and how they impact the target variable.

Step 5: Fine-Tune and Iterate

Feature selection isn’t always a one-and-done process. If your model isn’t performing well:

  • Try adjusting λ to see if a less or more aggressive selection works better.
  • Check if preprocessing steps like scaling or feature encoding need tweaking.
  • Consider combining LASSO with other techniques, like Elastic Net, if correlated features are an issue.

Tools and Libraries

Here are a few popular options for implementing LASSO:

  • Python (Scikit-learn): Lasso and LassoCV are super easy to use.
  • R (glmnet package): Great for both LASSO and Elastic Net.
  • Other tools: MATLAB, SAS, or even cloud-based platforms like Google AutoML.

Pro Tip

Once you’ve nailed down your feature selection with LASSO, you can use the cleaned-up feature set with any other model — linear regression, random forests, neural networks — you name it. LASSO doesn’t just give you a better model; it sets you up for success across the board.

And that’s it! You’re now ready to use LASSO like a pro. In the next section, we’ll compare LASSO to other methods and see when it’s the best tool for the job. Let’s keep going!

Comparing LASSO with Alternative Methods

Photo by Arian Darvishi on Unsplash

LASSO isn’t the only player in the feature selection game. Depending on your dataset and goals, other methods might work better — or at least give you a solid backup plan. Let’s break it down and see how LASSO stacks up against the competition.

1. Ridge Regression

What it does: Ridge regression is like LASSO’s less aggressive sibling. Instead of using an L1 penalty to shrink coefficients to zero, it uses an L2 penalty, which shrinks them toward zero but never quite eliminates them.

Pros:

  • Great for preventing overfitting without outright dropping features.
  • Handles correlated predictors better by spreading the importance across them.

Cons:

  • Doesn’t eliminate irrelevant features. If you’ve got a messy dataset, Ridge will keep the clutter.

When to use it: When your focus is on regularization and you don’t need to prune features.

2. Elastic Net

What it does: Elastic Net combines the best of both worlds — LASSO’s feature elimination with Ridge’s ability to handle correlated predictors. It uses a mix of L1 and L2 penalties.

Pros:

  • Balances feature selection and regularization.
  • Performs better than LASSO when features are highly correlated.

Cons:

  • Requires tuning two parameters instead of one (λ\lambdaλ for overall regularization and a ratio for the mix of L1/L2 penalties).

When to use it: When you have correlated features and want to avoid dropping important ones.

3. Tree-Based Methods (e.g., Random Forest, XGBoost)

What they do: Tree-based models don’t use penalties or coefficients. Instead, they determine feature importance based on how much each feature splits the data during training.

Pros:

  • Can handle non-linear relationships and interactions between features.
  • Feature importance scores are easy to interpret.

Cons:

  • Won’t explicitly eliminate features; you’ll still need to manually drop low-importance ones.
  • Computationally expensive for very large datasets.

When to use them: When you want a non-linear model or need to account for feature interactions.

4. Principal Component Analysis (PCA)

What it does: PCA isn’t exactly a feature selection method — it’s more of a feature transformation technique. It reduces your dataset to a smaller set of uncorrelated “principal components.”

Pros:

  • Great for dimensionality reduction.
  • Works well with extremely high-dimensional data.

Cons:

  • You lose interpretability since the new features are combinations of the original ones.
  • Doesn’t tell you which original features were important.

When to use it: When interpretability isn’t a priority, and you just need fewer dimensions.

5. Manual Feature Selection

What it is: The old-school approach — use your knowledge of the data to decide which features to keep or drop.

Pros:

  • Can be very effective if you know your data well.
  • No need for fancy algorithms.

Cons:

  • Prone to human bias and error.
  • Impractical for large datasets with hundreds of features.

When to use it: When you’re working with small datasets and have domain expertise.

The Verdict

So, where does LASSO fit in?

  • Use LASSO when you need automatic feature selection and want to simplify your model.
  • Switch to Elastic Net if you’re worried about correlated predictors.
  • Try tree-based methods for non-linear relationships or when feature interactions matter.
  • Lean on Ridge regression for regularization without feature elimination.
  • And when all else fails, PCA or manual selection can step in to help.

The bottom line: LASSO isn’t perfect for every situation, but when it comes to trimming irrelevant features while keeping your model simple, it’s hard to beat. Now that you know when and how to use it, you’re ready to level up your machine learning game!

Conclusion

And there you have it — LASSO in all its glory! From trimming down irrelevant features to simplifying your models, it’s an incredible tool that can make your machine learning workflow cleaner, faster, and more effective. Whether you’re working with a cluttered dataset full of unnecessary noise or diving into high-dimensional data, LASSO helps you cut through the mess like a pro.

But like any tool, it’s not a one-size-fits-all solution. It has its quirks, like struggling with correlated predictors or being sensitive to feature scaling. Knowing when to use it — and when to opt for alternatives like Elastic Net or tree-based methods — is the key to getting the most out of your models.

At the end of the day, LASSO is like a minimalist’s dream. It helps you focus on what really matters, leaving behind a sleek, efficient dataset that can power smarter predictions. So, next time you’re overwhelmed by features, remember: LASSO’s got your back. Now go build something awesome! 🎉

--

--

Ujang Riswanto
Ujang Riswanto

Written by Ujang Riswanto

web developer, uiux enthusiast and currently learning about artificial intelligence

No responses yet