Obesity Rates in America
Research conducted by: Blanchard and Ng (2014)
Case Study Prepared by: Sarah Blanchard and Wesley Ng
Case Study Prepared by: Sarah Blanchard and Wesley Ng
Overview
This case study was conducted in order to study the relationship between obesity rate and behavior. Multiple tests were carried out in order to collect statistics concerning the obesity rates in each individual state. Data was extracted from states all across the country; the states included are Massachusetts, Connecticut, Hawaii, Rhode Island, Idaho, Kansas, Georgia, California, Florida, New York, Pennsylvania, Alaska, Montana, North Dakota, Wyoming, Ohio, Illinois, Indiana, Utah, Vermont, Maine, Tennessee, Maryland, Nevada, New Hampshire, North Carolina, Virginia, Washington, Texas, and Michigan.
The data collected surrounded the death rates per state, the obesity and undernourishment rates of each state, the drinking and smoking rates of both students and adults, and the how amount of television a person watches. The following graphs and comparisons are very general comparisons of the related variables to generate a conclusion about how it relates in the real world.
Question To Answer
Is there a correlation between the typical activities of a particular state and the rates of obesity of that state?
Exercises
1. What are the independent variables? the dependent variables?
The independent variable is OBESITY RATE. The dependent variable is the DEATH RATE.
2. What is the mean, median, and standard deviation of the obesity rate among the 30 states? Also, describe the shape of the distribution.
Overview
This case study was conducted in order to study the relationship between obesity rate and behavior. Multiple tests were carried out in order to collect statistics concerning the obesity rates in each individual state. Data was extracted from states all across the country; the states included are Massachusetts, Connecticut, Hawaii, Rhode Island, Idaho, Kansas, Georgia, California, Florida, New York, Pennsylvania, Alaska, Montana, North Dakota, Wyoming, Ohio, Illinois, Indiana, Utah, Vermont, Maine, Tennessee, Maryland, Nevada, New Hampshire, North Carolina, Virginia, Washington, Texas, and Michigan.
The data collected surrounded the death rates per state, the obesity and undernourishment rates of each state, the drinking and smoking rates of both students and adults, and the how amount of television a person watches. The following graphs and comparisons are very general comparisons of the related variables to generate a conclusion about how it relates in the real world.
Question To Answer
Is there a correlation between the typical activities of a particular state and the rates of obesity of that state?
Exercises
1. What are the independent variables? the dependent variables?
The independent variable is OBESITY RATE. The dependent variable is the DEATH RATE.
2. What is the mean, median, and standard deviation of the obesity rate among the 30 states? Also, describe the shape of the distribution.
2. Mean= 28%
standard deviation= 2
median= 26%
This distribution is skewed to the right. So it is not normally distributed.
standard deviation= 2
median= 26%
This distribution is skewed to the right. So it is not normally distributed.
3. Create a scatter plot of the rates of student obesity versus students who are not obese.
4. What is the correlation between percent of students who said they were underweight or overweight to the ACTUAL obesity rate?
Given, this data set, we are able to see that there is correlation between people who said they were obese to people who actually are obese. The correlation is found to be: 0.0275. There is a very weak correlation because it is lower than 1. But, it has a positive association. Therefore, people who think they were overweight or underweight does not relate with the actual obesity rate. Since there is no correlation, it is safe to assume that people, who think they are overweight are actually NOT overweight.
5. What is the equation for the linear regression line between deaths per state and the obesity rate?
EQUATION: y=98.12-1669.32
6. Create a line graph of the data. Does the linear regression line represent a strong correlation between the tested variables?
Given, this data set, we are able to see that there is correlation between people who said they were obese to people who actually are obese. The correlation is found to be: 0.0275. There is a very weak correlation because it is lower than 1. But, it has a positive association. Therefore, people who think they were overweight or underweight does not relate with the actual obesity rate. Since there is no correlation, it is safe to assume that people, who think they are overweight are actually NOT overweight.
5. What is the equation for the linear regression line between deaths per state and the obesity rate?
EQUATION: y=98.12-1669.32
6. Create a line graph of the data. Does the linear regression line represent a strong correlation between the tested variables?
The linear regression line does not have a strong correlation between the tested variables because it equals 0.15. It is also nonlinear, so a linear model will not work for this data.
7. Create a box-plot of the percentage of people who watch TV 3 hours a day.
7. Create a box-plot of the percentage of people who watch TV 3 hours a day.
8. Based on this graph, what is the median? What is the 3rd Quartile? What is the 1st Quartile? What is the Intermediate Quartile Range? Is it a normal distribution?
According to this graph, the median is 30%. The First Quartile is 25%. Then, the 3rd Quartile is roughly 35%. To find the Intermediate Quartile Range, you subtract the 1st Quartile from the 3rd Quartile. Therefore, you'd get an IQR of roughly 10%. In addition, based on this data, the percentage of adults who do watch tv at least 3 hours a day is normally distributed.
9. Now use the following values to calculate the Lower and Upper Outlier threshold.
To find the Lower Outlier Threshold, we use the following equation: Q1-1.5*IQR. So, 25-(1.5*10)= 10%
To Find the the Upper Outlier THreshold, we use the following equation: Q3+1.5*IQR. So, 35+(1.5*10)=20%
10. Are there any outliers based on this graph?
None because no value is beyond or below the minimum or maximum, which are the lower or upper outlier thresholds.
11. Create a box-plot that compared 2 VARIABLES: obesity rate and percentage of students who are obese.
According to this graph, the median is 30%. The First Quartile is 25%. Then, the 3rd Quartile is roughly 35%. To find the Intermediate Quartile Range, you subtract the 1st Quartile from the 3rd Quartile. Therefore, you'd get an IQR of roughly 10%. In addition, based on this data, the percentage of adults who do watch tv at least 3 hours a day is normally distributed.
9. Now use the following values to calculate the Lower and Upper Outlier threshold.
To find the Lower Outlier Threshold, we use the following equation: Q1-1.5*IQR. So, 25-(1.5*10)= 10%
To Find the the Upper Outlier THreshold, we use the following equation: Q3+1.5*IQR. So, 35+(1.5*10)=20%
10. Are there any outliers based on this graph?
None because no value is beyond or below the minimum or maximum, which are the lower or upper outlier thresholds.
11. Create a box-plot that compared 2 VARIABLES: obesity rate and percentage of students who are obese.
12. Now it's time to compare! What is the median of the obesity rate data? What is the median percentage of students who said they were obese? How do these two compare?
According to this data, the median of the obesity rate per state data is roughly 26% while the median percentage of obese students per state is 12.5%. This illustrates that on average, students may be less obese than the actual obesity rate. So, student rate obesity is mildly lower than the average obesity rate per state. This suggests that adults are more likely to be obese than students are, because adults make up for the remainder of the obesity rate. Student obesity rate are also less of a normal distribution than the average obesity rate per state.
13. What are the quartile values for each set of data? What is the Intermediate Quartile Range of the two datas?
According to these graphs, the First Quartile and Third Quartile for the obesity rate data is 24% and 28% respectively. Meanwhile, the First and THird Quartile Range for the percentage of students who were obese is: 11% and 13% respectively. The IQRs are 4% for the obesity rate data while another is 2% for the percentage of students who are obese.
14. Create a scatterplot of obesity rate vs. percentage of smokers.
According to this data, the median of the obesity rate per state data is roughly 26% while the median percentage of obese students per state is 12.5%. This illustrates that on average, students may be less obese than the actual obesity rate. So, student rate obesity is mildly lower than the average obesity rate per state. This suggests that adults are more likely to be obese than students are, because adults make up for the remainder of the obesity rate. Student obesity rate are also less of a normal distribution than the average obesity rate per state.
13. What are the quartile values for each set of data? What is the Intermediate Quartile Range of the two datas?
According to these graphs, the First Quartile and Third Quartile for the obesity rate data is 24% and 28% respectively. Meanwhile, the First and THird Quartile Range for the percentage of students who were obese is: 11% and 13% respectively. The IQRs are 4% for the obesity rate data while another is 2% for the percentage of students who are obese.
14. Create a scatterplot of obesity rate vs. percentage of smokers.
15. What is the correlation and relationship between the two variables?
There is little to no correlation. There is also no positive or negative association because it is a nonlinear graph with a very weak correlation.
16. Create a power-law regression model of the data with obesity rate v. perc. of adults watching tv. Then draw in the least square regression line.
There is little to no correlation. There is also no positive or negative association because it is a nonlinear graph with a very weak correlation.
16. Create a power-law regression model of the data with obesity rate v. perc. of adults watching tv. Then draw in the least square regression line.
17. What is the correlation between the x and y variables?
Using RStudio, cor(lx,ly) function, we found it to be 0.96 which is the closest value among the 3 types: linear, exponential, and power! A 0.96 correlation is very close to 1! So there is a strong correlation.
18. What is the association?
It is positive!
19. What is the equation for the least regression line?
y-hat=10^(2.62x+0.022). Remember, this is a power law equation.
For the next few questions, we will assess the relationship between deaths per state and the obesity rate. We want to see if there is a distinct correlation between the two. Does one affect the other? Are they related? Let's find out!
20. Time for a histogram! Let's see the distribution of the deaths per state. Make a histogram of the deaths per state.
Using RStudio, cor(lx,ly) function, we found it to be 0.96 which is the closest value among the 3 types: linear, exponential, and power! A 0.96 correlation is very close to 1! So there is a strong correlation.
18. What is the association?
It is positive!
19. What is the equation for the least regression line?
y-hat=10^(2.62x+0.022). Remember, this is a power law equation.
For the next few questions, we will assess the relationship between deaths per state and the obesity rate. We want to see if there is a distinct correlation between the two. Does one affect the other? Are they related? Let's find out!
20. Time for a histogram! Let's see the distribution of the deaths per state. Make a histogram of the deaths per state.
21. Describe the shape of the distribution.
The distribution is skewed to the right.
22. Describe the 3 measures of the center of the distribution:mean, median, and standard deviation?
mean=908.2 deaths per state
median=572.5 deaths per state
sd=919.7 deaths per state
23. Describe the spread of the distribution.
Range= Highest value - Lowest value= 106.39 deaths per state
24. Now, we are going to compare the two variables: obesity rate and deaths per state. Construct a scatterplot of obesity rate vs. deaths per state.
The distribution is skewed to the right.
22. Describe the 3 measures of the center of the distribution:mean, median, and standard deviation?
mean=908.2 deaths per state
median=572.5 deaths per state
sd=919.7 deaths per state
23. Describe the spread of the distribution.
Range= Highest value - Lowest value= 106.39 deaths per state
24. Now, we are going to compare the two variables: obesity rate and deaths per state. Construct a scatterplot of obesity rate vs. deaths per state.
25. What is the correlation?
There is little to no correlation because this is not a linear graph model. So, they are not related.
Now, we are comparing more behavior with the obesity rate.
26. Create a two-sided box-plot to compare the two variables: obesity rate and percentage of adults drinking more than 1 soda per day.
There is little to no correlation because this is not a linear graph model. So, they are not related.
Now, we are comparing more behavior with the obesity rate.
26. Create a two-sided box-plot to compare the two variables: obesity rate and percentage of adults drinking more than 1 soda per day.
27. Are there any outliers?
Yes, for the adults who drink more than 1 soda per day, there is one outlier beyond the maximum or Upper Quartile THreshold. That is one state who has 80% of its population drink more than 1 soda, which is California.
28. What are the medians of the two data sets?
According to the graphs, the median for both of them are roughly the same. The median % of obesity rate is roughly 27% while the median % of people, who drink more than 1 soda per day, is slightly lower. It is around 24%.
29. Compare and contrast the data's ranges.
According to this graph, there is more versatility within the adults, drinking 1 soda per day. Meanwhile, there is less versatility within the obesity rate, because there is smaller range within the obesity range values than the adults who drink more than 1 soda per day.
30. What is Q1 and Q3 of the two data sets?
According to the box-plots, Q1 and Q3 of obesity rate are 25% and 29% respectively. Meanwhile Q1 is 21% while Q3 is 30% for the soda drinkers.
Answer to the question: We know that obesity rate has correlation with some variables, but not others. However, the data sets between behavior and obesity rate may be on the same range.
Main sources
cdc.gov
"Health and Obesity." Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, n.d. Web. 05 May 2014.
Yes, for the adults who drink more than 1 soda per day, there is one outlier beyond the maximum or Upper Quartile THreshold. That is one state who has 80% of its population drink more than 1 soda, which is California.
28. What are the medians of the two data sets?
According to the graphs, the median for both of them are roughly the same. The median % of obesity rate is roughly 27% while the median % of people, who drink more than 1 soda per day, is slightly lower. It is around 24%.
29. Compare and contrast the data's ranges.
According to this graph, there is more versatility within the adults, drinking 1 soda per day. Meanwhile, there is less versatility within the obesity rate, because there is smaller range within the obesity range values than the adults who drink more than 1 soda per day.
30. What is Q1 and Q3 of the two data sets?
According to the box-plots, Q1 and Q3 of obesity rate are 25% and 29% respectively. Meanwhile Q1 is 21% while Q3 is 30% for the soda drinkers.
Answer to the question: We know that obesity rate has correlation with some variables, but not others. However, the data sets between behavior and obesity rate may be on the same range.
Main sources
cdc.gov
"Health and Obesity." Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, n.d. Web. 05 May 2014.