Dataset : Overview of Data Gathered
Dataset | Download | Source Link | Type/Information | Description | |
---|---|---|---|---|---|
Data1 | csv | link | Numeric/Combined Data | Obesity Percentage By Country From 1975-2016 | |
Data2 | csv | link | Numeric/Text/Missing Values | Nutrition Physical Activity and Obesity Behavioral Risk Factor Surveillance System | |
Data3_1 - 3_5 | csv | link | COVID/Food/Protein/Geo-location | Food Supply by Country, Find Better Dietary Option for Stronger Body | |
Data4 | csv | link | Numeric/Text/Health | Acute Liver Failuer Data | |
Data5 | csv | link | Numeric/Text/Combined | Cardiac Disease Data | |
Data6_1 - 6_2 | api code and csv | api documentation | Corpus/Text/Combined | Subreddit Keto and Intermittend Fasting hottest submissions for a specific gathered date | |
Data7 | api code (no csv available) | api documentation | Corpus/Text/Combined | Scraping News with R |
Data 1: Obesity Percentage By Country From 1975-2016
Kaggle data, data 1 from WHO gives the adult obsesity for each country by percentage from 1975-2016. Male, female and both data is included. Can be used for analyzing q1 q2 q3. Data 1 is cleaned.
Data 2: Nutrition Physical Activity and Obesity Behavioral Risk Factor Surveillance System
Kaggle data, data 2 provides. Data 2 is not cleaned.
Data 3: Food Supply by Country, Find Better Dietary Option for Stronger Body
Data 3 include fat quantity, energy intake (kcal), food supply quantity (kg), and protein for different categories of food (all calculated as percentage of total intake amount). At the end of the dataset are COVID-19 related cases, death etc. The original purpose of this dataset is to find optimal dietary option and healthy eating style to alleviate COVID-19 crisis. It is looking at non-pharmaceutical interventions to help fight severe disease. Although not tied directly to the topic of obesity, we may find insights on what type of food helps build stronger immune system and healther body.
- Different Food Supply in Kcal
- Different Food Fat Quantity
- Food Supply Quantity in KG
- Amount of Protein in Different Food Supply
- Food supply dictionary/Explaination
Data 4 & 5: Acute Liver Failuer Data and Cardio Disease Data
Kaggle data, data 4 presents patients with acute liver failureUse data 4 and 5 which to understand the weight and health statitics of patient with major disease. Data 4 cleaned, data 5 is not cleaned.
- Acute Liver Failure Patient Statistics
- Cardiology Disease Patient Statistics
Data6: Scraping Reddit with PRAW(API) in Python
The data collected here used API called PRAW, link below for detailed description on how to sign up for access to Reddit. The reddit data collection focuses on healthy diet and its impact on weightloss. This include intermitent fasting and ketogenetic diet for now. I want to understand the general sentiment and basic evalution people have towards this kind of weightloss approach and have a elementary understanding if they work or not. The top 1000 posts are collected from each subreddit. Furthur analysis requires removal of useless entry with the following criteria: 1. less than 20 comments to the post, no defined keyword in posts like weightloss, from, etc which requires furthur investigation.
- Keto Subreddit Hot Posts Data
- Intermitent Fasting Subreddit Hot Posts
Data7: Scraping News with R
The use-case of this API is to search for obesity-related news articles and find potential information that might help answer some of the questions. The data is not cleaned and is very basic, requires further web scrapping.