Skip to content

Latest commit

 

History

History
111 lines (71 loc) · 2.26 KB

File metadata and controls

111 lines (71 loc) · 2.26 KB

❤️ Heart Disease Classification

Machine Learning pipeline for predicting heart disease risk using clinical features and statistical modeling techniques.

This project focuses on data distribution analysis, preprocessing, and classification performance evaluation, combining exploratory analysis with supervised learning.


🧠 Project Overview

The objective of this project is to analyze cardiovascular health indicators and build a predictive model capable of identifying heart disease presence.

The workflow includes:

✔️ Numerical feature distribution analysis ✔️ Data preprocessing & feature preparation ✔️ Logistic Regression modeling ✔️ Performance evaluation using confusion matrix


📂 Repository Structure

heart_disease_classification/
│
├── heart_disase_classification.ipynb
│
│
└── README.md

📊 Key Visual Insights

📈 Feature Distributions

Understanding the distribution of medical features is critical before training predictive models.

Observed Patterns:

  • Age and Max Heart Rate follow near-normal distributions.
  • Cholesterol shows wider variance and potential outliers.
  • Oldpeak is heavily right-skewed, indicating potential scaling considerations.

🤖 Model Evaluation — Logistic Regression

The confusion matrix below shows the performance of the baseline classification model.

Interpretation:

  • The model correctly identifies a strong portion of positive heart disease cases.
  • Some false positives and false negatives remain, suggesting room for improvement with advanced models.

🛠️ Tech Stack

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • Jupyter Notebook

🚀 How to Run

git clone https://github.com/your-username/heart_disease_classification.git
cd heart_disease_classification

Install dependencies:

pip install pandas numpy matplotlib seaborn scikit-learn

Run:

heart_disase_classification.ipynb

📈 Future Improvements

  • Feature scaling experiments
  • Hyperparameter tuning
  • Tree-based models (Random Forest / XGBoost)
  • ROC-AUC & Precision-Recall analysis

👩‍💻 Author

Arzu Selda Avcı Computer Engineering — Final Year Data Science & AI Enthusiast