Linear Regression on Wave Dataset

This repository contains a Python implementation of a Simple Linear Regression model. Using the synthetic wave dataset from the mglearn library, the project demonstrates how to model the relationship between a single feature and a continuous target variable.

Overview

The goal of this script is to illustrate the fundamental steps of supervised learning:

Data Synthesis: Creating a non-linear "wave" pattern.
Visualization: Understanding data distribution before modeling.
Data Splitting: Partitioning data into training and testing sets to evaluate generalization.
Modeling: Fitting a Linear Regression line to the data.
Evaluation: Using the R^2 (coefficient of determination) metric to assess performance.

Prerequisites & Installation

To run this code, you will need Python 3.x and the following libraries:

pip install scikit-learn mglearn matplotlib

Project Structure

1. Data Generation & Visualization

The script utilizes mglearn.datasets.make_wave, which generates a synthetic 1D dataset. This is ideal for visualizing how a linear model attempts to capture trends in data that might have slight curvature.

2. The Machine Learning Pipeline

Split: The data is divided using an 80/20 split. The random_state=42 ensures that the results are reproducible.
Training: The LinearRegression() model finds the optimal parameters (slope and intercept) by minimizing the Mean Squared Error (MSE).
Mathematical Representation: The model follows the simple linear equation:

y = wx + b

Where w is the Coefficient (slope) and b is the Intercept.

3. Performance Metrics

The model is evaluated using the R^2 Score:

Training Score: Indicates how well the model fits the data it was trained on.
Testing Score: Indicates how well the model predicts unseen data.

Expected Output

When you run the script, you can expect the following results:

Console Output

Training data shape: (32, 1), (32,)
Testing data shape: (8, 1), (8,)
Slope (Coefficient): 0.459...
Intercept: -0.017...
Training R^2 Score: 0.67
Testing R^2 Score: 0.66

Visualization

A Matplotlib window will display a scatter plot of the wave dataset, showing the relationship between the input feature and the target values.

Technical Documentation

Component	Description
Library	`sklearn.linear_model.LinearRegression`
Dataset	`mglearn.datasets.make_wave` (n=40)
Test Size	20%
Model Parameters	`coef_` (Weight), `intercept_` (Bias)
Metric	R^2 (Coefficient of Determination)

Key Functions Used:

train_test_split(): Prevents overfitting by isolating test data.
model.fit(): The "learning" phase where the model calculates w and b.
model.score(): Returns the R^2 score. A score of 1.0 is a perfect fit, while 0.0 indicates the model performs no better than predicting the mean.

Contributing

Feel free to fork this repository,experiment with the n_samples parameter, or try applying a PolynomialFeatures transformation to see if you can improve the R^2 score! But follow me first before you do this,and don't forget to mention me so that I can see the changes and probably learn from your code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
README.md		README.md
linear_regression_wave_dataset.py		linear_regression_wave_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression on Wave Dataset

Overview

Prerequisites & Installation

Project Structure

1. Data Generation & Visualization

2. The Machine Learning Pipeline

3. Performance Metrics

Expected Output

Console Output

Visualization

Technical Documentation

Key Functions Used:

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linear Regression on Wave Dataset

Overview

Prerequisites & Installation

Project Structure

1. Data Generation & Visualization

2. The Machine Learning Pipeline

3. Performance Metrics

Expected Output

Console Output

Visualization

Technical Documentation

Key Functions Used:

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages