ML: Supervised Learning - Regression Models

Introduction to Regression Models

Regression is one of the most fundamental techniques in supervised learning, where the goal is to predict a continuous target variable based on input features. Unlike classification, where outputs are discrete labels, regression outputs numerical values. It’s used in countless real-world applications such as predicting house prices, stock market trends, and customer spending behavior.

Key Assumptions of Regression Models

Linearity: The relationship between independent and dependent variables should be linear.
Independence: Observations should be independent of each other.
Homoscedasticity: The variance of residuals should be constant across all levels of the independent variable.
Normality: Residuals should be normally distributed.

Simple and Multiple Linear Regression

Linear regression is the most straightforward regression technique. It assumes a linear relationship between input variables (X) and output (Y), which can be expressed as:

Y = b0 + b1X1 + b2X2 + … + bnXn + ε

Where b0 is the intercept, b1...bn are coefficients, and ε is the error term.

Implementing Linear Regression with Scikit-learn

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset
data = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
print(f"R² Score: {r2_score(y_test, y_pred):.2f}")
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.2f}")

Polynomial Regression

Linear regression often fails to capture more complex relationships. This is where polynomial regression comes in, which extends linear regression by introducing polynomial features.

Implementing Polynomial Regression in Python

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(42)
X = np.sort(2 * np.random.rand(100, 1), axis=0)
y = 2 + X + X**2 + 0.5 * np.random.randn(100, 1)

# Train polynomial regression model
poly_model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
poly_model.fit(X, y)

# Predictions
X_test = np.linspace(0, 2, 100).reshape(-1, 1)
y_pred = poly_model.predict(X_test)

# Plot
plt.scatter(X, y, label="Data")
plt.plot(X_test, y_pred, color='red', label="Polynomial Regression")
plt.legend()
plt.show()

Ridge and Lasso Regression for Regularization

Regularization techniques help prevent overfitting by adding penalty terms to the regression model.

Ridge Regression (L2 Regularization): Adds the squared magnitude of coefficients as a penalty term.
Lasso Regression (L1 Regularization): Adds the absolute value of coefficients as a penalty term, which can shrink some coefficients to zero (feature selection).

Implementing Ridge and Lasso Regression

from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

print(f"Ridge R²: {ridge.score(X_test, y_test):.2f}")
print(f"Lasso R²: {lasso.score(X_test, y_test):.2f}")

Model Evaluation Metrics

Evaluating regression models is crucial to understand their performance. Some commonly used metrics are:

Metric	Formula	Interpretation
Mean Squared Error (MSE)	(\frac{1}{n} \sum (y_i - \hat{y}_i)^2)	Penalizes large errors heavily
Root Mean Squared Error (RMSE)	(\sqrt{MSE})	Easier to interpret as it has the same unit as Y
R² Score	(1 - \frac{SS_{res}}{SS_{tot}})	Explains variance captured by the model
Adjusted R²	Adjusts R² for number of predictors	More reliable for multiple regression

Hands-On Exercises

Exercise 1: Implementing Linear Regression

Objective: Train and evaluate a linear regression model using Scikit-learn.

Exercise 2: Implementing Polynomial Regression

Objective: Train and compare polynomial regression models.

Steps:

Generate a synthetic dataset with non-linear relationships.
Transform features using polynomial features.
Train polynomial regression models of different degrees.
Compare performance using RMSE and R².

Summary

Explored different types of regression models.
Implemented regularization techniques to prevent overfitting.
Evaluated model performance using key metrics.

References

Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
Scikit-Learn Documentation: https://scikit-learn.org/stable/

K3S: Deploying Applications

Nginx: Advanced Configuration

Datascience

Rizki Sasri Dwitama

Title here

ML: Supervised Learning - Regression Models

Introduction to Regression Models

Key Assumptions of Regression Models

Simple and Multiple Linear Regression

Implementing Linear Regression with Scikit-learn

Polynomial Regression

Implementing Polynomial Regression in Python

Ridge and Lasso Regression for Regularization

Implementing Ridge and Lasso Regression

Model Evaluation Metrics

Hands-On Exercises

Exercise 1: Implementing Linear Regression

Exercise 2: Implementing Polynomial Regression

Summary

References

ML: Supervised Learning - Regression Models

Introduction to Regression Models#

Key Assumptions of Regression Models#

Simple and Multiple Linear Regression#

Implementing Linear Regression with Scikit-learn#

Polynomial Regression#

Implementing Polynomial Regression in Python#

Ridge and Lasso Regression for Regularization#

Implementing Ridge and Lasso Regression#

Model Evaluation Metrics#

Hands-On Exercises#

Exercise 1: Implementing Linear Regression#

Exercise 2: Implementing Polynomial Regression#

Summary#

References#

Introduction to Regression Models

Key Assumptions of Regression Models

Simple and Multiple Linear Regression

Implementing Linear Regression with Scikit-learn

Polynomial Regression

Implementing Polynomial Regression in Python

Ridge and Lasso Regression for Regularization

Implementing Ridge and Lasso Regression

Model Evaluation Metrics

Hands-On Exercises

Exercise 1: Implementing Linear Regression

Exercise 2: Implementing Polynomial Regression

Summary

References