College Data Project
1) Project to be completed individually. - Project due TBD
2) Obtain data from collegedata.com
3) Collect data on AT LEAST 10 colleges and at least 1 has to be a public university/college. They should be the colleges that you are thinking of applying to. If you have not applied to at least 10, randomly choose any other colleges of your choosing.
4) Collect data on the following variables and put it into a spreadsheet (preferebly GOOGLE DOCS):
GPA - UW
GPA - W
SAT - M
SAT - CR
SAT - W
# OF UND STUDENTS
ADMISSIONS RATE
COST OF ATTENDANCE
Average Indebtedness of 2012 Graduates
Full time faculty teaching undergraduates
% International Students
% First-Year Students Returning
% Students Graduating Within 4 Years
5) Once you have collected the data, enter the data into Rweb. If data is missing, just enter in a "best estimate" for what you think might be.
INSTRUCTIONS TO UPLOAD DATA FROM GOOGLE DOCS INTO RSTUDIO:
1. Once your data is all set, make sure you don't have any commas or % or $ signs in your data. Also make sure your text (variable names and colleges) is properly formatted with NO spaces or symbols (% bad). Also, make sure you are not missing any data! Make best guess estimate for data you are missing.
2. In google docs, click file, then download as , the choose comma seperated csv file.
3. Once you save the file go to RStudio. Then click "Files" (Files can be found not on the top right, but in the window where you get plots and R help documentation), then "Upload" and then upload your .csv file.
4. Once the data has been uploaded to RStudio, in the command line type:
d=read.csv("yourfilenamehere.csv") ### make sure to copy your file name exactly with caps and spaces
PROJECT INSTRUCTIONS:
a) Construct and label for boxplot of "Cost of Attendance". Make sure to label Main title and x-axis. (See below for an example)
b) Construct and label a histogram of "Cost of Attendance". Make sure to label (with Main title, x-axis labeled).
c) Using the plot(x,y) function, where x is your x variable name, and y is your y variable name construct a scatterplot of "Cost of Attendance vs Average Indebtedness of 2011 Graduates". Treat the Cost of Attendance as your x variable. Make sure to label (with Main title, x-axis labeled, y-axis labeled).e) Using the plot(x,y) function, where x is your x variable name, and y is your y variable name construct a scatterplot of "SAT-M vs SAT-W". Treat the SAT-W as your x variable. Make sure to label (with Main title, x-axis labeled, y-axis labeled).
d) Calculate the 5 number summary and mean using the function summary(variable.name) for the acceptance rate.
e) Construct any graph of your choosing for part g. Make sure to properly label it so I understand what the data is showing me.
ADDING TITLES & CREATING HORIZONTAL BOXPLOT
boxplot(variable.name, main="Main Title Here", xlab="X LABEL HERE", ylab="Y LABEL HERE", horizontal=TRUE)
Instructions for passing in project:
A) ALL projects passed in on Tuesday must be passed in, in class, printed out in paper form (do not include any R code). Please do NOT email me projects.
B) Make sure name and section numbers are on paper. Points will be deducted for not having section number.
2) Obtain data from collegedata.com
3) Collect data on AT LEAST 10 colleges and at least 1 has to be a public university/college. They should be the colleges that you are thinking of applying to. If you have not applied to at least 10, randomly choose any other colleges of your choosing.
4) Collect data on the following variables and put it into a spreadsheet (preferebly GOOGLE DOCS):
GPA - UW
GPA - W
SAT - M
SAT - CR
SAT - W
# OF UND STUDENTS
ADMISSIONS RATE
COST OF ATTENDANCE
Average Indebtedness of 2012 Graduates
Full time faculty teaching undergraduates
% International Students
% First-Year Students Returning
% Students Graduating Within 4 Years
5) Once you have collected the data, enter the data into Rweb. If data is missing, just enter in a "best estimate" for what you think might be.
INSTRUCTIONS TO UPLOAD DATA FROM GOOGLE DOCS INTO RSTUDIO:
1. Once your data is all set, make sure you don't have any commas or % or $ signs in your data. Also make sure your text (variable names and colleges) is properly formatted with NO spaces or symbols (% bad). Also, make sure you are not missing any data! Make best guess estimate for data you are missing.
2. In google docs, click file, then download as , the choose comma seperated csv file.
3. Once you save the file go to RStudio. Then click "Files" (Files can be found not on the top right, but in the window where you get plots and R help documentation), then "Upload" and then upload your .csv file.
4. Once the data has been uploaded to RStudio, in the command line type:
d=read.csv("yourfilenamehere.csv") ### make sure to copy your file name exactly with caps and spaces
PROJECT INSTRUCTIONS:
a) Construct and label for boxplot of "Cost of Attendance". Make sure to label Main title and x-axis. (See below for an example)
b) Construct and label a histogram of "Cost of Attendance". Make sure to label (with Main title, x-axis labeled).
c) Using the plot(x,y) function, where x is your x variable name, and y is your y variable name construct a scatterplot of "Cost of Attendance vs Average Indebtedness of 2011 Graduates". Treat the Cost of Attendance as your x variable. Make sure to label (with Main title, x-axis labeled, y-axis labeled).e) Using the plot(x,y) function, where x is your x variable name, and y is your y variable name construct a scatterplot of "SAT-M vs SAT-W". Treat the SAT-W as your x variable. Make sure to label (with Main title, x-axis labeled, y-axis labeled).
d) Calculate the 5 number summary and mean using the function summary(variable.name) for the acceptance rate.
e) Construct any graph of your choosing for part g. Make sure to properly label it so I understand what the data is showing me.
ADDING TITLES & CREATING HORIZONTAL BOXPLOT
boxplot(variable.name, main="Main Title Here", xlab="X LABEL HERE", ylab="Y LABEL HERE", horizontal=TRUE)
Instructions for passing in project:
A) ALL projects passed in on Tuesday must be passed in, in class, printed out in paper form (do not include any R code). Please do NOT email me projects.
B) Make sure name and section numbers are on paper. Points will be deducted for not having section number.