This project is part of CSCA 5622: Introduction to Machine Learning: Supervised Learning, a course offered by CU Boulder, where I earned an A- grade and 3 quarter credits. The course covered:
- Theoretical and practical foundations of supervised learning using Python and Jupyter Notebook.
- Machine learning models such as Linear Regression, Logistic Regression, Decision Trees, KNN, Ensembles, and Support Vector Machines (SVM).
- Proficient use of modern machine learning tools and Python libraries.
- Understanding of methods to address linearly inseparable data.
- Comparison of the strengths and weaknesses of various supervised learning models.
- Insights into ensemble methods and kernel techniques.
Objective: Develop a predictive churn analysis model for banking institutions to enhance customer loyalty by analyzing customer behavior and identifying churn risks.
Significance: Customer churn is a critical concern for financial institutions, leading to revenue loss and high acquisition costs for new customers. Retaining customers is more cost-effective than acquiring new ones, making predictive churn analysis a vital business tool.
The project aims to address customer retention challenges in banking by leveraging supervised learning techniques. Using real-world customer data, the analysis identifies key churn predictors and builds models to classify customers based on churn risk.
-
Data Preparation:
- Data cleaning and exploratory data analysis (EDA) to identify trends and outliers.
- Feature engineering to enhance model performance.
-
Model Development:
- Multiple machine learning models were implemented, including:
- Logistic Regression
- Decision Trees
- K-Nearest Neighbors (KNN)
- Ensemble methods like Random Forest and Gradient Boosting
- Support Vector Machines (SVM)
- Hyperparameter tuning for model optimization.
- Multiple machine learning models were implemented, including:
-
Evaluation Metrics:
- Accuracy, Precision, Recall, and F1-Score were used to assess model performance.
- Confusion matrix analysis provided insights into classification errors.
The project demonstrated the potential of machine learning to address customer churn effectively. The optimized models can assist banking institutions in predicting churn risks and implementing proactive retention strategies, ultimately reducing revenue loss and improving customer satisfaction.
- Logistic Regression: To model the probability of customer churn.
- Decision Trees: For intuitive classification and insight into feature importance.
- K-Nearest Neighbors: To classify based on similarity metrics.
- Ensemble Methods: Leveraging Random Forest and Gradient Boosting for robust predictions.
- Support Vector Machines: For handling non-linear decision boundaries using kernel methods.
I would like to thank Dr. Geena Kim, the instructor for CSCA 5622, for her guidance throughout the course. This project benefited from the theoretical and practical insights gained from the course material and assignments.
- Extend the analysis to include additional features like customer demographics and transaction histories.
- Explore deep learning approaches for churn prediction.
- Integrate the model into a real-time banking application for predictive analytics.
- Notebooks: Jupyter notebooks containing data analysis and model implementation.
- Reports: Detailed EDA and model performance evaluations.
- Datasets: Visit https://www.kaggle.com/competitions/playground-series-s4e1/data for original dataset.