Skip to content

com-480-data-visualization/Glucose_Explorers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project of Data Visualization (COM-480)

Student's name SCIPER
Sergio Boffi 414595
Nicolas Andreas Berlin 355535
Valentine Casalta 316420

Milestone 1Milestone 2Milestone 3

Milestone 1 (20th March, 5pm)

10% of the final grade

This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas. Please, fill the following sections about your project.

(max. 2000 characters per section)

Dataset

Find a dataset (or multiple) that you will explore. Assess the quality of the data it contains and how much preprocessing / data-cleaning it will require before tackling visualization. We recommend using a standard dataset as this course is not about scraping nor data processing.

Hint: some good pointers for finding quality publicly available datasets (Google dataset search, Kaggle, OpenSwissData, SNAP and FiveThirtyEight).

Dataset 1:
Title: Diabetes Health Indicators Dataset
URL: https://www.kaggle.com/datasets/mohankrishnathalla/diabetes-health-indicators-dataset

Problematic

Frame the general topic of your visualization and the main axis that you want to develop.

  • What am I trying to show with my visualization?
  • Think of an overview for the project, your motivation, and the target audience.

General topic

Diabetes risk and patient health profiles.

Main axis / research question

This project explores how health, lifestyle, and demographic factors are associated with diabetes prevalence. The main question is: which factors and combinations of factors are most associated with diabetes in the population?

Project overview

Using a diabetes health indicators dataset, the visualization aims to reveal patterns linking diabetes to health conditions, lifestyle habits, and demographic characteristics. Instead of examining variables individually, the project highlights how multiple indicators combine to form profiles associated with higher diabetes prevalence.

Motivation

Diabetes is influenced by many interacting factors. Visualization helps reveal these relationships and makes complex patterns in the data easier to understand.

Target audience

Students, educators, and non-specialist readers interested in understanding diabetes-related health patterns through clear visualizations.

Exploratory Data Analysis

Pre-processing of the data set you chose

  • Show some basic statistics and get insights about the data

Exploratory Data Analysis – Key Insights

The dataset comprises 31 variables for 100,000 patients with no missing values, indicating high data quality and minimal preprocessing requirements. Variables were grouped into demographics, lifestyle, medical history, clinical measurements, and diabetes-related descriptors. The identifier variable was excluded from analysis.

Initial visualizations show that demographic variables such as age, gender, and ethnicity are relatively well distributed across diabetic and non-diabetic groups, suggesting no strong bias from these factors alone. Similarly, lifestyle variables (smoking status, alcohol consumption, physical activity, sleep, and diet score) exhibit overlapping distributions between groups, although slight trends are visible—diabetic individuals tend to have lower physical activity, poorer diet scores, and slightly higher screen time. Demographic variables Lifestyle variables

In contrast, medical history variables (family history of diabetes, hypertension, and cardiovascular conditions) display clearer differences, with higher proportions of positive history among diabetic patients. This indicates their relevance as risk factors. Medical variables

The most significant separation is observed in clinical and biomarker variables. Especially, distributions of glucose-related measures (fasting glucose, postprandial glucose, HbA1c, and insulin levels) are visibly shifted for diabetic individuals. These variables show reduced overlap between groups, highlighting their strong predictive power.

Overall, the analysis suggests that while demographic and lifestyle factors contribute moderately to diabetes risk, clinical biomarkers and medical history provide the strongest signals for distinguishing diabetic patients.

Related work

  • What others have already done with the data?
  • Why is your approach original?
  • What source of inspiration do you take? Visualizations that you found on other websites or magazines (might be unrelated to your data).
  • In case you are using a dataset that you have already explored in another context (ML or ADA course, semester project...), you are required to share the report of that work to outline the differences with the submission for this class.

Most diabetes websites show prevalence maps, time trends, pie charts of types or complications, and basic risk factor lists.
The data is difficult to understand, not organized and not always useful to the common user. Unlike these resources, our project explains how and why specific factors (BMI, glucose, age, activity, etc.) drive risk. We prioritize clear, well-designed, interactive visualizations of relationships and interactions; avoiding dense tables or misleading charts. We help users understand better the roots of this disease, and maybe lead them to find a path toward prevention or better management through informed lifestyle choices. Our approach fills a key gap: while the International Diabetes Federation (IDF) Diabetes Atlas mainly provides global prevalence maps, regional trends, projections, and basic statistics on risk factors (like obesity or age), and the World Health Organization (WHO) offers high-level overviews of diabetes with some visualizations, almost no global tools let people clearly and personally explore how different risk factors interact with each other. We make these connections easy to understand and visually intuitive, so users can immediately see how the factors add up and influence their risk. This helps them better grasp the real causes of the disease and motivates meaningful changes in their everyday habits.

Milestone 2 (17th April, 5pm)

10% of the final grade

URL to the website: https://com-480-data-visualization.github.io/Glucose_Explorers/

Milestone 3 (29th May, 5pm)

80% of the final grade

Late policy

  • < 24h: 80% of the grade for the milestone
  • < 48h: 70% of the grade for the milestone

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors