Credit Classification

Supervised learning project to predict whether a loan will default using tabular customer and loan features. The aim is to build a clear, reproducible baseline credit-risk model that can be extended to more advanced approaches.

1. Overview

This repository contains a small end-to-end workflow for loan default prediction:

Exploratory data analysis of the loan dataset
Data cleaning and feature engineering
Training and evaluation of supervised learning models
Interpretation of model performance and limitations

The work was originally developed as part of a university assignment, but the structure is organised to resemble a real data science project rather than a single notebook.

2. Data

The project assumes a single CSV file with one row per loan and a binary target variable indicating default vs non-default, together with customer and loan attributes.

The dataset is stored locally and not committed to the repository. A typical local layout is:

data/raw/dataset.csv – original dataset
data/processed/ – any cleaned or engineered versions

You can adjust the paths inside the notebook or scripts if your filenames differ.

3. Methodology

The modelling workflow follows a standard supervised learning pipeline for credit risk:

Preprocessing and feature engineering
- Handling missing values
- Encoding categorical variables
- Scaling or normalising numerical features where appropriate
Model training
- Baseline models such as logistic regression
- Optionally, comparison with tree-based methods (for example random forests or gradient boosting)
Evaluation
- Train/validation split or cross-validation
- Metrics including ROC-AUC, accuracy, precision, recall and confusion matrices
- Qualitative discussion of where the model performs well or poorly

Most of the experimentation currently lives in notebooks/Loan prediction.ipynb. As the project evolves, more logic can be refactored into reusable modules under src/.

4. Repository structure

.
├── notebooks/          # Jupyter notebooks for exploration and modelling
│   └── Loan prediction.ipynb
├── src/                # Python modules (data prep, training, evaluation)
├── requirements.txt    # Python dependencies
├── .gitignore          # Ignore rules (data, caches, IDE files, etc.)
└── README.md           # Project documentation

The data/, report/ and docs/ folders are expected to exist locally but are not tracked by git, so they do not appear in the GitHub view.

5. Getting started

Clone the repository

git clone [email protected]:abailey81/Credit-Classification.git
cd Credit-Classification

(Optional) Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate    # macOS / Linux
# .venv\Scripts\activate     # Windows

Install dependencies
```
pip install -r requirements.txt
```
Add the dataset

Place your CSV in data/raw/dataset.csv (or update the notebook path accordingly).
Run the analysis
```
jupyter notebook notebooks/Loan\ prediction.ipynb
```
From there you can reproduce the analysis, adjust features, or try alternative models.

6. Reproducibility and next steps

Dependencies are listed in requirements.txt.
Data files are kept out of version control to avoid exposing sensitive information.
Random seeds can be fixed in the notebook to make results more stable between runs.

Planned improvements include refactoring more code into src/, adding configuration files for experiments, logging model outputs, and extending evaluation to include calibration, scorecards and monitoring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Classification

1. Overview

2. Data

3. Methodology

4. Repository structure

5. Getting started

6. Reproducibility and next steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

abailey81/Credit-Classification

Folders and files

Latest commit

History

Repository files navigation

Credit Classification

1. Overview

2. Data

3. Methodology

4. Repository structure

5. Getting started

6. Reproducibility and next steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages