CIS-3200-Final-Project

This repository contains instructions and requirements for lab activities in the Data Processing and Analytics course.

Install Pandas

Pandas is a popular library in Python for data processing and analysis. Depending on the Python installation on your computer, you may run on of the following options (using the command line inteface):

Install pandas via pip:

pip install pandas

Use pip3 if you are using python3:

pip3 install pandas

On Windows, you may need to use the Python launcher (py):

py -m pip install pandas

Install data visualization utilities

Matplotlib and seaborn are the two most popular modules for data visualization in Python. Install these packages using pip:

pip install matplotlib
pip install seaborn

Adapt the pip command to the installation of Python in your computer as mentioned earlier.

Install scikit-learn

[Scikit-learn] is a module that provides tools for predictive modeling. Install scikit-learn using pip:

pip install scikit-learn

The Data Mining Project Template

We will create our data mining models using the Data Mining Project Template. The template comprises six sections:

1. Business understanding

This section briefly explains the project from a business perspective, casting business objectives into a data mining problem definition. In your course project, you will complement this section with a slideshow presentation.

2. Setup

The purpose of this section is to improve the organization and efficiency of your Python code.

3. Data understanding

Explore the data by performing visualizations, check the ranges and distributions of numeric values using histograms, and examine correlations among the attribute variables. In supervised learning, examine correlations between the target variables and attributes.

4. Data processing

Perform data cleaning and transformation tasks as necessary.

5. Data Modeling

Train different models and calibrate the parameters of the most promising ones to optimal values.

6. Evaluation

Measure the performance of your final model on the test set to estimate the generalization error.

Accessing the template

TDhe following notebook contains a template for building a data mining project in Python:

Data mining project template

Click here to download this repository and access the template in your local computer.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
5_Wine_Dataset.csv		5_Wine_Dataset.csv
CIS 3200 Final Project.pdf		CIS 3200 Final Project.pdf
Data_mining_project-template.ipynb		Data_mining_project-template.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CIS-3200-Final-Project

Install Pandas

Install data visualization utilities

Install scikit-learn

The Data Mining Project Template

1. Business understanding

2. Setup

3. Data understanding

4. Data processing

5. Data Modeling

6. Evaluation

Accessing the template

About

Uh oh!

Releases

Packages

Languages

Joycelam082/CIS-3200-Final-Project

Folders and files

Latest commit

History

Repository files navigation

CIS-3200-Final-Project

Install Pandas

Install data visualization utilities

Install scikit-learn

The Data Mining Project Template

1. Business understanding

2. Setup

3. Data understanding

4. Data processing

5. Data Modeling

6. Evaluation

Accessing the template

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages