P2: Data Analysis
Phase overview
By now your project should have collected most, if not all, of the relevant data on your topic, including raw data and derivative data created for P1. So now comes the fun part: data visualization and analysis. Up to now, when we have discussed R in class and on your homework assignments H3 and H4, you have been told what graphs to create and asked leading questions about how to interpret them. No longer. Now, it is up to you to decide which summary statistics and visualizations are most appropriate.
Task 1: Data exploration
Using R, create graphs and summary statistics that help you explore your data, particularly the variables of interest. The goal of the data exploration is to get a feel for the distribution of values for the variables, possibly split up by relevant categorical variables. Save the graphs as png files, and relevant text output in a file called dataSummary.pdf with the graphs embedded in the file.
Did you find anything unexpected? Include a brief description of the more interesting findings, along with an explanation for why they are interesting.
Task 2: Answering your hypotheses
Having finished an initial exploration, now turn your attention to the hypotheses you proposed to explore in P0. Create appropriate graphs and summary statistics that will shed light on your hypotheses.
Which of your hypotheses and questions can be answered from the data?
For those that can be answered, include graphs and written
explanations of your conclusions. Think carefully about what graphs
best convey your point, which variables to include, and which axes to
plot each variable on. Do you need to construct a response variable
from the derivative data (e.g., by using tapply
to aggregate values
by categorical variable)?
For those questions that cannot be answered, explain why. Is the data inconclusive? Do you need to collect additional data? Do you need to create different derivative data files?
For each question that cannot be answered, you have two choices. First, you can provide an explanation for what would be required to answer the question and why it is too difficult, time-consuming or expensive to do now. Second, you can take the steps necessary to answer the question (e.g., collect more data, make new derivative data, etc.). The importance of the question should influence whether you should expend the additional effort.
Task 3: Writing up your results
In Task 3, you should begin to write up your results from Task 1 and 2 in a file called p2writeup.pdf
.
Use your judgement to decide which measurements and graphs from Task 1 and 2 should be included in the writeup. A good rule of thumb is to only include things for which you can write an interesting summary and observation. The goal of the write-up is to serve as a first draft of the data description and analysis sections of your project report.
The writeup file should also include the explanations for the questions that you could not answer, as explained in Task 2.
Here is a sample outline for the write-up that you may want to follow:
- Summary statistics and graphs to introduce the data set.
- Explain questions you hope to answer with the data.
- Question 1
- explain the question, also how the data helps to answer the question (or not)
- explain relevant graphs
- Question 2
- explain the question, also how the data helps to answer the question (or not)
- explain relevant graphs
- Questions that cannot be answered
- Explain question
- explain why difficult to answer
- explain what would it take to answer
What to turn in
Include the following:
- all R code used to examine your data
- dump of all summary statistics and graphs in dataSummary.pdf
- any additional code that has been written since P1, such as code to create new derivative data to answer questions
- intermediate graphs and statistics used in Task 2 but not included in
p2writeup.pdf
because the results were deemed not interesting or did not answer the question appropriately. Save these (and any text descriptions) in a file calledquestionRejects.pdf
P2Documentation.pdf
file that briefly explains what the other files are.