Understanding Linear Regression in Machine Learning using Python Language
Linear Regression is a simple machine learning model for regression problems, i.e., when the target variable is a real value.
Example: Let’s start with an example — suppose we have a dataset with information about the area of a house (in square feet) and its price and our task is to build a machine learning model which can predict the price given the area.
Linear Regression is a linear model, e.g. a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that y can be calculated from a linear combination of the input variables (x).
- When there is a single input variable (x), the method is referred to as
simple linear regression.
- When there are multiple input variables, the method is known as
multiple linear regression.
X is the Independent Variable
Y is the Dependent Variable
Simple Linear Regression using Python without Scikit-Learn Library
Simple linear regression is a statistical method used to model the relationship between a single independent variable (predictor) and a dependent variable (response) by fitting a linear equation to the observed data. In Python, you can perform simple linear regression using the scikit-learn library. Here’s a step-by-step guide:
- Import Necessary Libraries:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
- Load and Prepare Data: Load your dataset and organize it into the independent variable (feature) and the dependent variable (target).
# Example data
data = pd.read_csv('your_dataset.csv')
# Separate the feature (independent variable) and the target (dependent variable)
X = data['Feature'] # Independent variable (feature)
y = data['Target'] # Dependent variable (target)
- Split Data: Split your dataset into a training set and a test set to evaluate the model’s performance on unseen data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- Create and Fit the Model: Create a LinearRegression model and fit it to your training data.
# Create a linear regression model
model = LinearRegression()
# Fit the model to the training data
model.fit(X_train.values.reshape(-1, 1), y_train)
Note that values.reshape(-1, 1)
is used to reshape the feature data since scikit-learn expects the input to be a 2D array, and X_train
is a 1D array.
- Predictions: Once the model is trained, you can use it to make predictions on the test data.
y_pred = model.predict(X_test.values.reshape(-1, 1))
- Evaluate the Model: You can evaluate the model’s performance using various metrics, such as Mean Squared Error (MSE), R-squared (R^2), or others, depending on your specific goals.
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, y_pred)
r_squared = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r_squared}")
- Interpret the Coefficients: In simple linear regression, you can interpret the coefficients of the linear regression model to understand the relationship between the independent variable and the dependent variable.
coefficient = model.coef_[0]
intercept = model.intercept_
print("Coefficient:", coefficient)
print("Intercept:", intercept)
This example demonstrates how to perform simple linear regression using scikit-learn in Python. You can extend this approach to handle more complex datasets and explore various aspects of regression analysis, such as feature selection, regularization, and model diagnostics, for your specific use case.