Spring 2024

Assignment 2: SOC386L/SDS385

Briefly answer the following questions. Please include selected segments of computer code you used

in your write-up if needed. You may consult with other students, but the work you hand in should

reflect your own thinking.

Background. The data set (RiesbyWide.dta) posted on Canvas comes from a study by Riesby

et al. (1977) in which hospitalized depressed patients were treated for 5 weeks with imipramine.

Clinical rating of depressive symptoms was carried out once weekly by Hamilton’s Rating Scale

(HRS). The data are in wide form as follows: id HamD0 – HamD5, EndoG, where id is the subject

identifier and HamD0-HamD5 are depression measures (HamD0 is the baseline or initial measurement)

and Endog identifies subjects whose depression, as classified on the WHO Depression Scale, was

“endogenous” or “non-endogenous” depression, with “endogenous” roughly pertaining to clinical

depression in current practice.

10 points each.

1. Convert these data from the individual-level format to the “person-period” format (i.e., from

wide to long). Carry out a basic exploratory analysis by graphing the raw trajectories for

each subject (or randomly selected subjects) along with the linear fit through each individual’s

measurements in side-by-side plots (e.g., spaghetti plots). Comment on whether a linear growth

curve model would seem appropriate given the observed and fitted patterns.

2. Fit an unconditional means model to these data and assess whether this model is an improvement over a simple regression model and why?

3. Fit an unconditional growth model to these data using a full model specification for the variance

components (i.e., a model that estimates the variance of the random slope and intercept as

well as their covariance). Use a likelihood ratio test to determine if this model offers an

improvement over the unconditional means model.

4. Using a likelihood ratio test, test whether a more parsimonious unconditional growth model fits

these data as well as the model used in Question 3. Use the preferred model as the “working”

model. Interpret the fixed effects and variance components from your working unconditional

growth model.

5. Under the assumptions of normality of the MLEs (i.e., asymptotic normality) we can determine

some aspects of the distribution of random slopes in the depressed population. Using the slope

fixed effect and the estimated variance of the random slope as a gauge of variability of the

individual slopes around the fixed effect, about what proportion of the random slopes in the

depressed population would you expect to be greater than 0?

6. Using the properties of MLEs we can determine some aspects of the distribution of random

intercept (i.e., initial levels) in the depressed population. Using the intercept fixed effect and

the estimated variance of the random intercept as gauge of variation in initial depression levels,

about what proportion of the random intercepts in the depressed population would be expected

be less than 20?

7. Obtain estimates of the random slope by predicting the random effect U1i for each subject and

adding these to the fixed effect βb1. Obtain estimates of the random intercept by predicting the

random effect U0i for each subject and adding these to the fixed effect βb0. Provide a histogram

of the random slopes and intercepts.

1

8. Using your preferred model from Question 4, assess whether there is evidence that the “endogenous depression” classification is able to statistically differentiate between depression patients’

initial levels of depression and also whether there is evidence that this classification moderates

the change in depression scores over time.

Extra Credit Problems. (2 points) Consider the OLS estimates of the individual-level slopes

obtained from the marital adjustment data (see handouts 1-2). Use the information in the handout

to obtain estimates of the slopes and their standard errors. We can follow the steps spelled out in

the Bryk and Raudenbush article (available on Canvas) to improve upon these estimates as follows:

• Denote the slope for the ith subject by bi

. The total variance in the individual slope is a

function of the sampling variance and the parameter variance.

• The parameter variance is simply the squared standard error of the slope from the OLS model,

denoted by se(bi)

2 = vi

.

• The parameter variance is the variance in the slopes, estimated as τ =

P

i

(bi − ¯b)

2/(n − 1).

• Define the weight function

Wi = τ /(τ + vi)

• An improved estimate can be obtained as a weighted average of the OLS slope and the average

slope. That is,

b

∗ = biWi + (1 − Wi)

¯b

1. Use the individual slope estimates and their std. errors from handouts 1-2 to obtain the

improved estimates and plot them against the original OLS estimates. Explain in what sense

these estimates are “improved.”

2. A simplified growth model can be fit using “centered” data, where the centered dependent

variable is y

c

ij = yij − y¯i and t

c

ij = tij − t¯i

. Consider the following model

y

c

ij = βit

c

ij + ε

and

βi = β + Ui

Discuss the differences between this model and the standard linear growth model. Fit this

model as a linear mixed model.

3. Compare the slope estimate (fixed effect) to the one from the centered model. Explain why

they are similar or different.

4. Obtain the empirical Bayes estimates of the level-2 residuals, the Ubi

. Compute the EB estimates of the slope as βb + Ubi

. Compare these to the ones computed in the first extra-credit

problem.

2Spring 2024

Assignment 2: SOC386L/SDS385

Briefly answer the following questions. Please include selected segments of computer code you used

in your write-up if needed. You may consult with other students, but the work you hand in should

reflect your own thinking.

Background. The data set (RiesbyWide.dta) posted on Canvas comes from a study by Riesby

et al. (1977) in which hospitalized depressed patients were treated for 5 weeks with imipramine.

Clinical rating of depressive symptoms was carried out once weekly by Hamilton’s Rating Scale

(HRS). The data are in wide form as follows: id HamD0 – HamD5, EndoG, where id is the subject

identifier and HamD0-HamD5 are depression measures (HamD0 is the baseline or initial measurement)

and Endog identifies subjects whose depression, as classified on the WHO Depression Scale, was

“endogenous” or “non-endogenous” depression, with “endogenous” roughly pertaining to clinical

depression in current practice.

10 points each.

1. Convert these data from the individual-level format to the “person-period” format (i.e., from

wide to long). Carry out a basic exploratory analysis by graphing the raw trajectories for

each subject (or randomly selected subjects) along with the linear fit through each individual’s

measurements in side-by-side plots (e.g., spaghetti plots). Comment on whether a linear growth

curve model would seem appropriate given the observed and fitted patterns.

2. Fit an unconditional means model to these data and assess whether this model is an improvement over a simple regression model and why?

3. Fit an unconditional growth model to these data using a full model specification for the variance

components (i.e., a model that estimates the variance of the random slope and intercept as

well as their covariance). Use a likelihood ratio test to determine if this model offers an

improvement over the unconditional means model.

4. Using a likelihood ratio test, test whether a more parsimonious unconditional growth model fits

these data as well as the model used in Question 3. Use the preferred model as the “working”

model. Interpret the fixed effects and variance components from your working unconditional

growth model.

5. Under the assumptions of normality of the MLEs (i.e., asymptotic normality) we can determine

some aspects of the distribution of random slopes in the depressed population. Using the slope

fixed effect and the estimated variance of the random slope as a gauge of variability of the

individual slopes around the fixed effect, about what proportion of the random slopes in the

depressed population would you expect to be greater than 0?

6. Using the properties of MLEs we can determine some aspects of the distribution of random

intercept (i.e., initial levels) in the depressed population. Using the intercept fixed effect and

the estimated variance of the random intercept as gauge of variation in initial depression levels,

about what proportion of the random intercepts in the depressed population would be expected

be less than 20?

7. Obtain estimates of the random slope by predicting the random effect U1i for each subject and

adding these to the fixed effect βb1. Obtain estimates of the random intercept by predicting the

random effect U0i for each subject and adding these to the fixed effect βb0. Provide a histogram

of the random slopes and intercepts.

1

8. Using your preferred model from Question 4, assess whether there is evidence that the “endogenous depression” classification is able to statistically differentiate between depression patients’

initial levels of depression and also whether there is evidence that this classification moderates

the change in depression scores over time.

Extra Credit Problems. (2 points) Consider the OLS estimates of the individual-level slopes

obtained from the marital adjustment data (see handouts 1-2). Use the information in the handout

to obtain estimates of the slopes and their standard errors. We can follow the steps spelled out in

the Bryk and Raudenbush article (available on Canvas) to improve upon these estimates as follows:

• Denote the slope for the ith subject by bi

. The total variance in the individual slope is a

function of the sampling variance and the parameter variance.

• The parameter variance is simply the squared standard error of the slope from the OLS model,

denoted by se(bi)

2 = vi

.

• The parameter variance is the variance in the slopes, estimated as τ =

P

i

(bi − ¯b)

2/(n − 1).

• Define the weight function

Wi = τ /(τ + vi)

• An improved estimate can be obtained as a weighted average of the OLS slope and the average

slope. That is,

b

∗ = biWi + (1 − Wi)

¯b

1. Use the individual slope estimates and their std. errors from handouts 1-2 to obtain the

improved estimates and plot them against the original OLS estimates. Explain in what sense

these estimates are “improved.”

2. A simplified growth model can be fit using “centered” data, where the centered dependent

variable is y

c

ij = yij − y¯i and t

c

ij = tij − t¯i

. Consider the following model

y

c

ij = βit

c

ij + ε

and

βi = β + Ui

Discuss the differences between this model and the standard linear growth model. Fit this

model as a linear mixed model.

3. Compare the slope estimate (fixed effect) to the one from the centered model. Explain why

they are similar or different.

4. Obtain the empirical Bayes estimates of the level-2 residuals, the Ubi

. Compute the EB estimates of the slope as βb + Ubi

. Compare these to the ones computed in the first extra-credit

problem.

2