With any set of data, we use summary statistics and graphs to get an overall picture. You will summarize and visualize your data set.
Instructions
(5pts) Please list your data set name and variables (both categorical and quantitive) at the top of all your writing project submissions.
Produced the following tables and graphs and gather them in your document/presentation. And write an interpretation by comparing shapes, centers, and
spreads of your quantitative distributions.
1. Tables (10pts)
a) Get a contingency table for two of your categorical variables.
Build a contingency table for your two categorical variables. There should be one table. Depending the values for your variable, you may choose to display the
table in the suitable orientation.
b) Get summary statistics (n, median, mean, Q1, Q3, IQR, min, max) for your two quantitative variables. You should have two tables or one table with both of
the quantitative variables.
2. Graphs (15 pts)
Choose a categorical variable to group the data by. Build histograms and box plots, per group, for each quantitative variable.
Take one of the chosen quantitive variable, build a set of histograms (one per group) in the same graph in StatCrunch. Then, build a set of boxplots for the
same distributions in one graph, just like what you did for the histogram.
Now you take the 2nd chosen quantitative variable, and repeat the process to build a graph with a set of histograms and another graph with a set of boxplots
(one per group).
Organize the document so that there is a section for each quantitative variable: with summary statistics table first, followed by the histograms and boxplots.
3. Interpretation (20 pts)
Compare the distributions of both of your quantitative variables in groups, one at a time. Use histograms to identify shapes of each distribution. Use boxplots
to compare center and spread of distributions. Clearly state what measures you are using for comparison. Take into account the shape of distributions being
compared. Be sure to explain your reasonings, state your findings, cite evidences from graph and or summary statistics to support your findings.
Shape: Identify the shape for each one of your histograms
Center: Based on identified shapes, use the appropriate measure to compare center.
Spread: Based on identified shapes, use the appropriate measure to compare spread.
Also note, if you have any outliers. If so, do you include them in your analysis? Why or why not?
Sample interpretation statements for tuition, grouped by type (public or private) – do this for each of your quantitative variables for your dataset:
The shape of the distribution of public-school tuitions is approximately symmetric because both tails are about the same length; while the shape of the
distribution of private school tuition is also right skewed because the tail is longer on the right side.
Measures of center are mean and median. Because one of the distributions compared is skewed median will be used to compare the centers of the
distributions of tuitions in private and public institutions. Median tuition at a public institution is $12,345 which is less than the median tuition at a private
institution, $56,789. Typically, tuition is higher at a private school than a public school.
Measures of spread are standard deviation and IQR. Because one of the distributions compared is skewed, IQR will be used to compare the spread of the
distributions of tuitions in private and public institutions. IQR of tuition at public institutions is $12,345 which is lower than the IQR of tuition at private
institutions, $23,456. Variation of tuition at private institutions is more diverse than that of public institutions.