Problem Set 3
Syllabus on Problem Sets:
There will be 3 problem sets. Some assignments will require basic knowledge of spreadsheet
software, such as excel. Assignments can be completed in excel, google sheets or R. Google
sheets and R are free programs. Excel is accessible through office 365, which is ”free” to all
OSU students. Do not wait until the last day to start the problem sets. All problem sets are
submitted on Carmen and analyzed by the turnitin app.
The problem sets should look like professional reports. Therefore all problem sets must meet
the following criteria:
1. Everything is typed.
2. All tables and figures are named and numbered.
3. The necessary information to answer the question is concisely displayed in tables or figures
when appropriate.
4. Tables and figures should be understandable without reading the text.
Use the data from 2019 MLB free agent signings found here: Data
The data set includes all players that signed a free agent contract that began in the 2019
season. The first column is player name. If ”(minor)” is after their name, it was a minor
league contract. POS. is position. FROM indicates the team for which the player played in
2018 and TO is the team that signed them. Years is the number of years for the contract.
DOLLARS is the total contract value. AVG. SALARY is DOLLARS/YRS. The 2018 stats are
hits, runs batted in, home runs, batting average, on base percentage plus slugging percentage,
innings pitched, earned run average, walks+hits per inning, saves, and wins above replacement.
American league pitchers rarely hit so many are missing batting statistics in columns K through
O. Players that do not pitch are missing pitching statistics in columns P through T. Players
that did not play major league baseball in 2018 do not have stats.
We want to use the data to see what determines wages in MLB.
1. (20) Lets start by getting an idea of the distribution of salaries in MLB. Create a histogram of average salaries. Create a similar histogram using the natural log of average
salary. Which would you prefer to use as our dependent variable and why?
2. (20) Now run a simple univariate model. Estimate Salary = β0 + β1W AR + e both with
salary and logged salary. Create a regression table that includes coefficients and standard
errors. Interpret βˆ
0 and βˆ
1 in each regression. (This means state the exact meaning of
the number, not just it has a positive/negative effect.)
3. (20) Add W AR2
, Age,and Age2
to the regression. Which factors have increasing or
decreasing returns? What theoretical reason(s) could their be for the increasing or decreasing returns?
4. (20) Since hitting stats don’t mean much for pitchers (listed as RP or SP) and pitching
stats don’t mean much for other players, separate into two data sets: pitchers and hitters.
Run regressions on each group with their respective statistics (not including WAR). Create
regression results tables. Which statistics have a statistically significant effect on the
outcome? Is this what you expected? Why or why not?
5. (20) According to your estimates in number 4, which player was the best deal? Which
was the worst? Do you think it’s right? Explain.