This repository contains a comprehensive implementation of the K-Nearest Neighbors (KNN) classification algorithm using the Iris dataset. The objective of this project is to understand how KNN works for classification problems, experiment with different 'k' values, and visualize decision boundaries.
1.Understand and implement KNN for classification tasks.
2.Explore the impact of feature normalization and different 'k' values.
3.Evaluate model performance using accuracy, confusion matrix, and classification report.
4.Visualize decision boundaries using two features.
1.Python 🐍
2.Scikit-learn
3.Pandas
4.Matplotlib
5.Seaborn
6.NumPy
- Name: Iris Dataset
- Source: UCI Machine Learning Repository
- Features: Sepal length, Sepal width, Petal length, Petal width
- Target: Species (Iris-setosa, Iris-versicolor, Iris-virginica)
-
Load and preprocess the dataset
(i) DropIdcolumn (ii) Encode species labels (ii) Normalize feature values usingStandardScaler -
Split the dataset - 80% training, 20% testing
-
Train the model - Use
KNeighborsClassifierfromscikit-learnwith defaultk=5 -
Model evaluation
(i) Accuracy (ii) Confusion matrix (iii) Classification report -
Hyperparameter tuning
(i) Experiment withkvalues from 1 to 20 (ii) Plot accuracy vs.k -
Decision boundary visualization
(i) Use first 2 features for 2D visualization (ii) Highlight class regions
1.Accuracy vs. K plot
2.Decision boundary plot (using two features)