Learning to Learn
This course has been designed to teach you the specifics of each test, its importance, why it is used, and how to interpret the results. While doing so, we hope that you have understand something about how you learn statistics—what steps to take (i.e., do you like to read first, work with Excel first, or listen to a lecture first), what pieces of information are important (when to use the test, how to interpret it), and what assumptions are necessary for the test you are using (normality, type of variable).
For this dialogue, imagine that you have completed this course and are working. You have been given a data set and told to use a test that you did not learn in this course. Given this scenario, respond to the following:
- What process would you use to learn about the test? Describe your process.
- Use the process you described to learn one of the tests covered in Chapter 12 in the Fox textbook and Chapters 16 and 17 in the Salkind textbook. Provide a brief description of the test, when it is used, and how you would interpret the values.
Overview & Outcomes
In the previous module, we learned how to determine the degree of association between two variables. Though this gives us more precise information than hypothesis tests, it still does not give us the ability to make any judgments about how much a variable will change, if another, related variable changes. More importantly, correlations only measure associations between two variables. Correlations do not allow us to simultaneously examine the effect of more than one independent variable on one dependent variable. To do so, we rely on regression. This module will cover the basics of regression, how to use it, and how to interpret results.
This course has focused on providing you with the foundation to understand statistics and has introduced you to some of the most common tests you will encounter. However, no one course can cover all the topics in statistics. In this module, we will briefly discuss some special topics, when they are used, and how to find more information about them. The process you have used to learn about statistics in this course will be helpful to learn about other, more specialized topics in statistics.
Regression and Special Topics Simple Regression
When we use data to test our theories about how certain phenomena work, we use hypothesis tests to determine if there are differences between two groups; ANOVAs to test if there are differences between more than two groups; and correlations to specify how closely and in which direction two variables are related. Regression allows us to specify the nature of the relationship between two variables and calculate how a variable changes based on changes in another.
When running a regression and interpreting the results, you are interested in a few specific terms:
· The intercept: This value tells you the base level of the dependent variable before you add the effect of any independent variables.
· The slope: This value tells you how much the dependent variable changes with a unit change in the independent variable.
· The R-square: This value tells you how much of the distribution of the dependent variable your regression model has explained.
The PowerPoint presentation and readings explain these terms and their usage in greater detail.
It is very rare that one independent variable explains any phenomenon in its entirety. For instance, when we are examining recidivism of offenders, we can think of many factors that affect whether they will re-offend: age, gender, education, income, type of offense, and marital status. In these situations, we are interested in asking the question: If we control for everything else, what is the effect of age (or any other independent variable) on the rate of recidivism? Put differently, if you take offenders who are identical in every way except age that is, they have the same gender, education, income, offense, and marital status how does an increase or decrease in age change the rate of recidivism?
Multiple regression allows you to answer such questions. The terms and values used in multiple regression are identical to those in single regression; the only difference is that you have more than one independent variable, and, therefore, have more than one slope.
It is important to note that when you run simple regressions, you are making some assumptions about the data. That is, the results of the regression are only valid if all of the following are true:
1. Both variables are measured at the interval level.
2. There is a straight-line relationship between the independent and dependent variables.
3. The sample has been randomly selected.
4. All variables are normally distributed.
If any of these assumptions does not hold true, you have to rely on special tests that have been designed specifically to accommodate that type of data.
There are various types of regression models that accommodate departures from the assumptions needed for simple regression. Logistic regressions, for example, can be used when Assumption 1 is violated and your data is categorical. Other regression models help you run regressions when the data is not normal, or when the sample is a clustered or stratified sample.
Thus far, we have worked with parametric tests, those tests where we have some assumptions about our data, its distribution; and, hence, the parameters that define the distribution (i.e., shape, mean, and variance). Sometimes, we have data that does not meet these assumptions, and we have to rely on non-parametric tests. Examples are chi-square, factor analysis, path analysis, and Spearman Rank-Order correlation. For the purposes of this course you are not required to know about all of these tests, but you should leave the course feeling confident that if you needed to learn the specifics and uses of any of these tests, you would be able to do so.
/* Style Definitions */
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
font-family:”Times New Roman”,serif;