Introduction to Data Analysis
Evaluation 2
True or False (2 points each, 14 total)
1. T F Exogenous variables occur before endogenous variables in temporal order.
2. T F The dependent variable can be thought of as the “effect” and the independent variable can be thought of as the “cause” in analyses.
3. T F Residuals are the unexplained variability when predicting a dependent variable, as indicated by the distance between a data point and the line of best fit.
4. T F A goodness-of-fit statistic shows how well your regression model fits the available data.
5. T F An interaction is the same as a moderator.
6. T F A mediator is the same as an indirect effect.
7. T F A regression discontinuity design uses a cut off score as an “as if randomized” mechanism to control for selection bias.
Multiple Choice (2 points each, 8 total)
1. Randomized control trials control for
a. Measured confounds only
b. Unmeasured confounds only
c. Both measured and unmeasured confounds
d. Neither measured nor unmeasured confounds
2. Logistic regression should be used when the dependent variable is in this type of data:
a. Continuous
b. Index
c. Dichotomous
d. Ordinal
3. R2 means the following:
a. The statistical significance of the regression coefficient
b. To what extent the model’s predictors are collinear
c. The percentage of variance in the dependent variable that is explained by the regression model’s independent variables.
d. The percentage of variance in the model’s predictors that is explained by the dependent variable
4. Which of the following is NOT an assumption of multiple regression?
a. The dependent variable is a linear function of the independent variables
b. The observations are drawn independently from the population
c. Each of the independent variables are continuous
d. The errors are normally distributed
Short Answer (points vary)
1. (6 points) What are simultaneous, sequential, and stepwise regression? What are the strengths and weaknesses of these three types of regression analysis? Why is order of entry so important in sequential regression? Why should you generally avoid stepwise regression?
2. (2 points) What is an interaction? Show two different graphs of an example interaction that indicates that the relation between family economic resources and student behavior depends on gender. Label each b. Index
c. Dichotomous
d. Ordinal
3. R2 means the following:
a. The statistical significance of the regression coefficient
b. To what extent the model’s predictors are collinear
c. The percentage of variance in the dependent variable that is explained by the regression model’s independent variables.
d. The percentage of variance in the model’s predictors that is explained by the dependent variable
4. Which of the following is NOT an assumption of multiple regression?
a. The dependent variable is a linear function of the independent variables
b. The observations are drawn independently from the population
c. Each of the independent variables are continuous
d. The errors are normally distributed
Here are hypothetical data on the suspension rates for students from 12 different schools and the same school’s percentage of student receiving free or reduced lunch, number of fights per week, and whether the school is using restorative justice practices. Use Excel or SPSS to answer the following questions (if you save this in Excel, you can then open in SPSS using “Open” under “Data”). Please show the output in your answers for full credit.
School number Suspension rate for students (percentages) Number of fights per week in the school School rate of students receiving free or reduced-price lunch (percentages) Uses restorative justice practices (1=yes, 0=no)
1 3 1 20 1
2 4 3 25 1
3 9 7 30 1
4 15 12 35 0
5 10 5 40 0
6 2 1 10 1
7 1 1 5 1
8 22 9 80 0
9 11 7 45 0
10 4 3 25 0
11 9 14 65 0
12 7 9 50 1
3. (5 points) Calculate the correlation between a school’s suspension rate and its proportion of students receiving free or reduced lunch. Report this correlation. Report the correlation for fights per week and the school’s suspension rate. Insert a scatterplot, with the y- and x-axes labeled and a chart title showing one of these relations. Interpret one of the correlations.
4. (5 points) Conduct a simple bivariate regression equation in which you predict the school’s suspension rate using their percentage of students receiving free or reduced lunch. Is the percentage of students receiving free or reduced lunch is associated with the school’s suspension rate? Report the R-squared, the regression coefficient for percentage of free or reduced lunch, and the p-value. Are any schools an outlier, based on a rule of a standardized residual that is 2.5 standard deviations or more from the mean? Repeat these calculations in which you predict the school’s suspension rate using the fighting rate. Interpret the results from the simple bivariate regressions.
5. (5 points) Conduct a multiple regression equation in which you predict the school’s rate of suspension, using both free and reduced lunch status and use of restorative justice as covariates. Report the R2 and describe which of these covariates is a statistically significant predictor. Does a school’s use of restorative justice practice increase or decrease the likelihood of suspension across these 12 schools? What can we conclude about the possible effects of using restorative justice practices to reduce the risk of suspension, controlling for free and reduced lunch?
6. (5 points) Conduct a multiple regression equation, in which you predict a school’s suspension rate using fighting in the school as well as the percentage of students receiving free or reduced lunch and whether the school uses restorative justice practices as explanatory variables. Report the R squared, report the regression coefficients and their p-values for the school’s rates of fighting, poverty, and use of restorative practices. What can you conclude about the use of restorative practices as an explanatory factor of variability in school suspension rates, while also controlling for both poverty and fighting? Should, based on these analyses using hypothetical data, schools be using restorative practices as a policy to reduce suspension rates? If so, why, based on your analyses? If not, why not?