US CENSUS PROJECT
PART A - DATA COLLECTION
1. Go to American Factfinder from the US Census Bureau, the link can be found here:http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml
Note: Blue headings are where to go to find data once you get to Factfinder website for each town/city.
PUT ALL OF YOUR DATA INTO GOOGLE DRIVE SPREADSHEET
2. Find 20 town/cities in Massachusetts and collect data on:
LOOK UNDER Population -> 2010 Census
a. Total Population
LOOK UNDER "HOUSING" -> "General Housing Characteristics"
b. % Owner Occupied Housing Units (add % with mortgage and % who own free and clear)-
Under "HOUSING" -> "Selected Housing Characteristics"
c. Median Housing Value
d. Median Gross Rent (in dollars)
LOOK UNDER "EDUCATION" -> "EDUCATIONAL ATTAINMENT"
e. Percentage Graduate or Professional Degree (25 years and over)
f. Percentage Bachelor's Degree or Higher - (25 years and over)
LOOK UNDER "INCOME" -> "SELECTED ECONOMIC CHARACTERISTICS"
g. Mean travel time to work -
h. Mean family income
i. Median family income
j. PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL (ALL FAMILIES) .. Under INCOME -> Selected Economic Characteristics
Next, go to boston.com MCAS information, collect the following data using the following link:
http://www.boston.com/news/special/education/mcas/scores12/10th_top_districts.htm#English_Math
OR USE THIS LINK IF THE ONE ABOCE DOESN'T WORK: http://profiles.doe.mass.edu/state_report/mcas.aspx
k. %Advanced for 10th Grade District Math Score
l. %Advanced for 10th Grade District English Score
PART B - UPLOAD DATA - INSTRUCTIONS TO UPLOAD DATA FROM GOOGLE DOCS INTO RSTUDIO:
1. Once your data is all set, make sure you don't have any commas or % or $ signs in your data. Also make sure your text (variable names and colleges) is properly formatted with NO spaces or symbols (% bad). Also, make sure you are not missing any data! Make best guess estimate for data you are missing.
2. In google docs, click file, then download as , the choose comma seperated csv file.
3. Once you save the file go to RStudio. Then click "Files" (Files can be found not on the top right, but in the window where you get plots and R help documentation), then "Upload" and then upload your .csv file.
4. Once the data has been uploaded to RStudio, in the command line type:
d=read.csv("yourfilenamehere.csv") ### make sure to copy your file name exactly with caps and spaces
PART C - DATA ANALYSIS
Once you have your data, enter your data into RStudio using the following code:
attach(d); names(d) ## the names will give you all the names of your variables
I. For each of the following variable combinations, create a scatterplot with fitted least squares regression line. Write in the equation of the least-squares regression line on each scatterplot.
a) Median Rent , Percentage Graduate or Professional Degree
b) Mean household income, MCAS math score
c) MCAS Math, MCAS English
d) What is the correlation between MCAS Math and MCAS English
e) What is the residual for your first observed value for your model of MCAS Math and MCAS English
II. Create a properly labeled boxplot of the following variables:
a. Population
b. Percentage Bachelor's Degree or Higher
c. Does the population data appear to be symmetric, skewed right, or skewed left based on the boxplot? Are there any outliers?
d. Based on the boxplot, what is the median population? What is the 3rd quartile for population? What is the 1st quartile for population?
III Create a properly labeled histogram of the following variables:
a. Median household income
b. Median housing valuec. Is the data symmetric, skewed right, or skewed left for Median housing value
Instructions for passing in project:
A) ALL projects passed in on THURSDAY must be passed in, in class, printed out in paper form (do not include any R code). Please do NOT email me projects. ONLY 1-3 students/project submission.
B) Print out and pass in 7 graphs. Make sure to have NAME & SECTION and to include the least squares equations on graphs a, b, and c from section I (hand-written is ok.)
C) Make sure name and section numbers are on paper. Points will be deducted for not having section number.
1. Go to American Factfinder from the US Census Bureau, the link can be found here:http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml
Note: Blue headings are where to go to find data once you get to Factfinder website for each town/city.
PUT ALL OF YOUR DATA INTO GOOGLE DRIVE SPREADSHEET
2. Find 20 town/cities in Massachusetts and collect data on:
LOOK UNDER Population -> 2010 Census
a. Total Population
LOOK UNDER "HOUSING" -> "General Housing Characteristics"
b. % Owner Occupied Housing Units (add % with mortgage and % who own free and clear)-
Under "HOUSING" -> "Selected Housing Characteristics"
c. Median Housing Value
d. Median Gross Rent (in dollars)
LOOK UNDER "EDUCATION" -> "EDUCATIONAL ATTAINMENT"
e. Percentage Graduate or Professional Degree (25 years and over)
f. Percentage Bachelor's Degree or Higher - (25 years and over)
LOOK UNDER "INCOME" -> "SELECTED ECONOMIC CHARACTERISTICS"
g. Mean travel time to work -
h. Mean family income
i. Median family income
j. PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL (ALL FAMILIES) .. Under INCOME -> Selected Economic Characteristics
Next, go to boston.com MCAS information, collect the following data using the following link:
http://www.boston.com/news/special/education/mcas/scores12/10th_top_districts.htm#English_Math
OR USE THIS LINK IF THE ONE ABOCE DOESN'T WORK: http://profiles.doe.mass.edu/state_report/mcas.aspx
k. %Advanced for 10th Grade District Math Score
l. %Advanced for 10th Grade District English Score
PART B - UPLOAD DATA - INSTRUCTIONS TO UPLOAD DATA FROM GOOGLE DOCS INTO RSTUDIO:
1. Once your data is all set, make sure you don't have any commas or % or $ signs in your data. Also make sure your text (variable names and colleges) is properly formatted with NO spaces or symbols (% bad). Also, make sure you are not missing any data! Make best guess estimate for data you are missing.
2. In google docs, click file, then download as , the choose comma seperated csv file.
3. Once you save the file go to RStudio. Then click "Files" (Files can be found not on the top right, but in the window where you get plots and R help documentation), then "Upload" and then upload your .csv file.
4. Once the data has been uploaded to RStudio, in the command line type:
d=read.csv("yourfilenamehere.csv") ### make sure to copy your file name exactly with caps and spaces
PART C - DATA ANALYSIS
Once you have your data, enter your data into RStudio using the following code:
attach(d); names(d) ## the names will give you all the names of your variables
I. For each of the following variable combinations, create a scatterplot with fitted least squares regression line. Write in the equation of the least-squares regression line on each scatterplot.
a) Median Rent , Percentage Graduate or Professional Degree
b) Mean household income, MCAS math score
c) MCAS Math, MCAS English
d) What is the correlation between MCAS Math and MCAS English
e) What is the residual for your first observed value for your model of MCAS Math and MCAS English
II. Create a properly labeled boxplot of the following variables:
a. Population
b. Percentage Bachelor's Degree or Higher
c. Does the population data appear to be symmetric, skewed right, or skewed left based on the boxplot? Are there any outliers?
d. Based on the boxplot, what is the median population? What is the 3rd quartile for population? What is the 1st quartile for population?
III Create a properly labeled histogram of the following variables:
a. Median household income
b. Median housing valuec. Is the data symmetric, skewed right, or skewed left for Median housing value
Instructions for passing in project:
A) ALL projects passed in on THURSDAY must be passed in, in class, printed out in paper form (do not include any R code). Please do NOT email me projects. ONLY 1-3 students/project submission.
B) Print out and pass in 7 graphs. Make sure to have NAME & SECTION and to include the least squares equations on graphs a, b, and c from section I (hand-written is ok.)
C) Make sure name and section numbers are on paper. Points will be deducted for not having section number.