Skip to content

Joycelam082/CIS-3200-Final-Project

Repository files navigation

CIS-3200-Final-Project

This repository contains instructions and requirements for lab activities in the Data Processing and Analytics course.

Install Pandas

Pandas is a popular library in Python for data processing and analysis. Depending on the Python installation on your computer, you may run on of the following options (using the command line inteface):

Install pandas via pip:

pip install pandas

Use pip3 if you are using python3:

pip3 install pandas

On Windows, you may need to use the Python launcher (py):

py -m pip install pandas

Install data visualization utilities

Matplotlib and seaborn are the two most popular modules for data visualization in Python. Install these packages using pip:

pip install matplotlib
pip install seaborn

Adapt the pip command to the installation of Python in your computer as mentioned earlier.

Install scikit-learn

[Scikit-learn] is a module that provides tools for predictive modeling. Install scikit-learn using pip:

pip install scikit-learn

The Data Mining Project Template

We will create our data mining models using the Data Mining Project Template. The template comprises six sections:

1. Business understanding

This section briefly explains the project from a business perspective, casting business objectives into a data mining problem definition. In your course project, you will complement this section with a slideshow presentation.

2. Setup

The purpose of this section is to improve the organization and efficiency of your Python code.

3. Data understanding

Explore the data by performing visualizations, check the ranges and distributions of numeric values using histograms, and examine correlations among the attribute variables. In supervised learning, examine correlations between the target variables and attributes.

4. Data processing

Perform data cleaning and transformation tasks as necessary.

5. Data Modeling

Train different models and calibrate the parameters of the most promising ones to optimal values.

6. Evaluation

Measure the performance of your final model on the test set to estimate the generalization error.

Accessing the template

TDhe following notebook contains a template for building a data mining project in Python:

Click here to download this repository and access the template in your local computer.

About

CIS 3200 Project, Fall 2021 at California State University, Los Angeles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published