# Analysis Using R studio

Description of Data : These data come from a real-life experiment to see whether people would be more interested in donating to a teacher if the teacher shared their same name. Researchers sent potential donors an e-mail from either a teacher who shared the same name as the donor, or from a teacher who had a different name. The researchers then measured whether donors opened the e-mail, clicked on the link in the e-mail, donated, and the amount that they donated.

#### Clarifying Questions (From Student Questions) :

Below is a list of the variables included in this dataset:

• condition (levels = : the variable that describes the condition that donors were in. Donors were randomly assigned to one of two conditions. In the namematch condition, donors were randomly paired with a teacher project led by a teacher whose surname matched the donor. In the nomatch condition, donors were randomly paired with a teacher who did not share their name.
• openedcount : whether the person opened the e-mail (0 = No, 1 = Yes)
• clickedcount : whether the person clicked on a link in the e-mail (0 = No, 1 = Yes)
• diddonate : whether the person donated or not (0 = No, 1 = Yes)
• donatedamount : the amount of money that the person donated (in United States Dollars)
• isFemale : whether the person was predicted to be Female based on their name (No, Yes, NA = not enough information)

Submit your answers and R output in the box next to each question on the pages below by 11:00 AM. Below is an example of what this should look like.

 Example Exam Instruction Your Output (Answers) Goes Here [Example] What is 1+1?

There are 3 Problems (worth 8 points each) and 1 Challenge Problem (worth 1 point) on the next pages. Good luck!!!

#### Problem 1: Load the Data, Describe the Participants, and Graph the Experimental Condition [8 Points]

 Exam Instructions Your Output (Answers) Goes Here Load the dataset into R (check to make sure you loaded the data successfully but do not include this output with your answer). Print the number of participants in the dataset and the names of the variables in the dataset. participants in the dataset 2324 the names of the variables in the dataset：openedcount,clickedcount,diddonate,donatedamount Plot the variable isFemale and report the number of individuals in each category. The number of individuals in each category when the data plot variable is isFemale. condition 4213 openedcount  3333 clickedcount 5232 diddonate 6342 donatedamount 4231 Plot the variable condition, and report the number of individuals in each condition. The number of individuals per category when the data drawing variable is Condition. openedcount  3333 clickedcount 5232 diddonate 4342 donatedamount 4231 isFemale2224 Graph the variable donatedamount as a histogram, and report the mean, median, range, standard deviation, and mode of this variable. (Note : this graph is gonna look funky.)

#### Problem 2: Create and Interpret a Scale [8 Points]

Create a scale called ENGAGEMENT that is the average of the following three variables : openedcount, clickedcount, and diddonate. Note that none of these items are reverse-scored, and only take the value 0 or 1. Make sure to save this scale as a new variable in the dataset. To answer this question, report the following : NOTE,  if you cannot create this scale, just use “donatedamount” as your DV for the problems below.

 Exam Instructions Your Output (Answers) Goes Here Create a dataframe of the three items, and print the alpha reliability analysis for this dataframe, Below this output, describe what the alpha reliability tells you about the three items in this scale. Average the three items, save this scale as a new variable in the dataset, and graph the distribution of this variable as a histogram. In your own words, describe what this graph tells you about how much participants engaged with the e-mails.

#### Problem 3 : Test the Theory [8 Points]

The researchers thought that donors who saw a name that matched theirs would be more likely to engage with the email than donors who saw a name that did not match. Test this prediction with a linear model (steps below):

 Exam Instructions Your Output (Answers) Goes Here Define the linear model to predict ENGAGEMENT (DV) from condition (IV). Report the intercept and slope from this model. Then, plot the linear model (remember your IV is categorical) and re-label the axes and title to make the graph look nice. Use bootstrapping to estimate sampling error for the slope from this model. Report the % of slopes from the bootstrapped distribution that are in the same direction as the original slope AND the sd of the distribution. (Note : this is a large dataset, so it might take R a while to run the bootstrapping. Take a deep breath and check in – are your feet relaxed?? 🙂 Then, use NHST to estimate sampling error for the slope from this model. Report the standard error, t-value, p-value, and 95% Confidence Interval. Finally, in your own words interpret these results (both the bootstrapping and NHST results) – were the researchers’ theories supported? Why / why not? (You only need to write a few sentences here!)

#### Problem 4 : Challenge Problem : [this is only worth 1 Point]

The researchers were also interested in whether the experimental condition influenced how much people donated, and predicted that people would donate more money if the name matched their own. To test this theory, you’ll need to complete the following steps. Note that this is only worth 1 point; it’s okay if you get stuck.

 Exam Instructions Your Output (Answers) Goes Here Create a copy of the dataset that only includes individuals who donated money (diddonate = 1). Report the number of individuals in this dataset. Re-graph the variable donatedamount from this new dataset. Report the mean, median, range, standard deviation, and mode of this variable. Define a linear model to predict donatedamount (DV) from condition (IV). Report the intercept and slope from this model. Then, plot the linear model (remember your IV is categorical) and re-label the axes and title to make the graph look nice. Choose bootstrapping or NHST to interpret the statistical significance of these results – was the researcher’s hypothesis supported? Why / why not?