The task involves the use of Excel spreadsheet data analysis capabilities to carry out a sabermetrics analysis comparing offensive performance in the 2018 and 2019 seasons in Major League Baseball and to answer other questions about the 2019 season’s offensive performance as detailed in the Information to Complete Task section on pages 2–5. You will write a business-style report in Microsoft Word or similar which documents your findings using a maximum of 1500 words. It is crucial that you read all the information in the Information to Complete Task section on pages 2–5 before you start writing the report as it gives details and hints for what you need to do to fulfil the assignment brief. In your report, you need the following sections:

 

  • Introduction o Briefly outline the situation. You need not provide much background — you can assume that the readers are familiar with baseball — but you need to show that you have understood the situation and what is required of you
  • Body
    • Briefly summarise the purpose of the analysis and the analytical methods you have used – you should not give technical details of how to use a spreadsheet to do the analysis
    • Answer and discuss the various questions, supporting the points you make with copies of relevant sections of data from your spreadsheet and charts, using whichever is most appropriate for the question you are answering; discuss the policy implications for baseball of your answers as appropriate
    • Carry out some research and outline how artificial intelligence (AI) is being used in baseball both off and on the pitch.
  • Conclusion o Summarise the conclusions you have drawn
    • Summarise the policy implications for baseball where appropriate, and the information about the use of AI in baseball
  • References in Harvard format o I do not expect you to use a large number of in-text references but there should be some; you must provide a properly-formatted reference list/bibliography at the end of the report

Marking Criteria

The submitted and assessed part of this coursework is a business-style report, rather than an academic essay so the marking criteria are different from those usually required for an academic essay.  Marking criteria are shown in the Marking Rubric on the final two pages of this document.

Submission

The submission deadline is 12 noon GMT on 25 Feb 2021.  This is what you need to submit:

  1. Your report. It is this report on which your mark will be based.
  2. Your spreadsheet file. This is submitted to prove that you have carried out the spreadsheet work yourself. It will not receive a mark. It will only be reviewed if it is necessary to rule out plagiarism.

 

You must submit the assignment electronically via a Turnitin link on the Moodle page for MN1505 – there will be two tabs: Tab 1 for the report file and Tab 2 for the spreadsheet file. The timely submission of assignments is your responsibility, and excuses — such as a problem on your PC or other device, or losing a file — will not be accepted.   You will receive an electronic submission receipt ID via email.  It is your responsibility to keep this safe as proof of submission.  You are also strongly recommended to keep a copy of all submitted assignments.   

Information to Complete Task

Almost every year the MLB Players’ Association, Major League Baseball and Commissioner Rob Manfred announce a series of rule changes, and 2019 was no exception.  The focus for the last couple of seasons has been on the pace of play. “Baseball games take longer than ever to produce fewer and fewer runs and balls in play” (Verducci, 2015). It was hoped that changes made for the 2019 season might further improve pace of play without affecting offensive performance during the season.

Sabermetrics is the “the search for objective knowledge about baseball”, especially baseball statistics which measure in-game activity. Sabermetricians collect and summarise the relevant data from this in-game activity to answer specific questions. The term is derived from the acronym SABR, which stands for the Society for American Baseball Research, founded in 1971 (Birnbaum, n.d.).

Offensive performance data is now available for the 2019 season and for the prior season based on data taken from Lahman’s Baseball Database (Lahman, n.d.). In this assignment, as noted above, you will use spreadsheet data analytics capabilities to carry out a sabermetrics analysis to compare offensive performance in the past two baseball seasons and to answer other questions about the 2019 season’s offensive performance.

1.        The Baseball Analysis Data

A spreadsheet file named Baseball.xlsx has been provided on Moodle to assist you. Download this file and save it. This file contains data for batters and their performance during the 2018 and 2019 seasons. Figure 1 shows the first few records of player data in the 2019 tab. It shows both biographical information and data for each player’s offensive performance in the 2019 season.

Figure 1: 2019 Tab

 

 

Columns in the 2019 table are as follows:

  • PlayerID — a unique code is assigned to each player
  • Last Name — the player’s last name
  • First Initial — the initial letter of the player’s first name
  • Team — the player’s team
  • League — teams are in two leagues: American (AL) and National (NL)
  • Position — the player’s position in the field (P – pitcher, C – catcher, 1B – first baseman, 2B – second baseman, 3B – third baseman, SS – shortstop, OF – outfielder) birthYear — the year in which the player was born
  • PA — the number of plate appearances for a player this season; in other words, the number of times a player came to bat
  • AB — the number of at-bats the player had this season. This number is less than plate appearances because walks and some other appearances do not count as an at-bat
  • R — the number of runs the player scored in the season
  • H — the number of hits the player had in the season. A hit occurs when a player hits the ball and reaches base as a direct result. Hits are either singles, doubles, triples, or home runs
  • 1B — the number of singles (one-base hits) the batter had this season
  • 2B — the number of doubles (two-base hits) the batter had this season
  • 3B — the number of triples (three-base hits) the batter had this season
  • HR — the number of home runs the batter had this season
  • BB — the number of times the batter reached first base via a walk
  • SO — the number of times the batter struck out

 

The file has another tab called 2018 which contains the same types of biographical and offensive performance data as the 2019 table. Thus, two years of offensive performance data are available. Some players are in the starting line-up almost every day, and they accumulate hundreds of plate appearances in a season. Other players only appear in games when the starters need a rest. Player data is included in a tab only if the player had at least 100 plate appearances in each of these two seasons.

 

2.        Data Analytics Using a Spreadsheet

a) Calculating Performance Measures

In this task, you will perform calculations to compute and display a range of performance measures for each player in both the 2018 and 2019 tabs. Your output should look like that in Figure 2 which shows only the first few records and columns for the 2019 tab.

Figure 2: Performance Measures for 2019

 

 

You should create new columns in each tab, one for each of the following calculations:

  • Age — this is calculated from the player’s birth year and the season year of either 2018 or 2019 as appropriate;
  • TotalBases — this value is the number of bases accumulated by the player’s hits. A single (1B) counts as one base, a double (2B) counts as two bases, a triple (3B) counts as three bases, and a home run (HR) counts as four bases.
  • BattingAverage — this value is the ratio of the player’s (H) hits to at-bats (AB).
  • Slugging%— this value is the ratio of the player’s total bases to at-bats (AB). If two players have the same number of hits (H) and at-bats, they will have the same batting average. However, a player who has more doubles, triples, and home runs will have a higher slugging percentage than a singles hitter.
  • OnBase% — this value is the player’s hits (H) plus walks (BB) divided by at-bats (AB) plus walks (BB); it shows how frequently the player reached base.
  • OPS — this value is the sum of the player’s slugging percentage and on-base percentage. OPS is a combined measure of a player’s ability to get on base and hit for power.

Note that all calculations except Age and Total Bases should be formatted to three decimal places.

b) Using Data Tables and Pivot Tables to Gather Data

Now use the data tables and pivot tables to gather data needed to answer league officials’ questions. There is information on pages 4 and 5 on how to use a spreadsheet to answer the following questions:

 

  1. League officials want to know if offensive performance decreased this season compared with last season. You can answer this question using data table analysis.
  2. Who were the league leaders this year and last year in batting average, OPS, and on-base percentage? You can answer this question using data table analysis. This should be an overall calculation, not carried out separately for American (AL) and National (NL) leagues.
  3. Traditional baseball fans consider batting average the best measure of offensive performance. However, many modern baseball analysts say that OPS is a broader and therefore better measure. Which measure is actually better? To shed light on the question, use data table analysis to create two all-star teams. One team contains the players with the highest OPS at each position. The other team contains the players with the highest batting average at each position. If the same players make both teams, you might reasonably conclude that the two measures are equally powerful. But,

if the two teams have mostly different players, the two measures must be telling different stories

  1. Baseball fans and officials sometimes say that a player’s best offensive season occurs when the player is 27 years old. Was this statement truer for 2019 or 22018 players? You can answer this question by examining the average OPS for both years using pivot tables.
  2. What were team batting averages this year versus last year? Which teams improved their batting average this year? You can answer these questions using pivot table analysis.

 

Data Table Analyses

Use data tables to analyse three sets of data: (1) year-to-year offensive performance, (2) league leaders in 2018 and 2019 in batting average, OPS, and on-base percentage, and (3) the all-star team based on OPS versus the all-star team based on batting average. Your analysis will address the following questions:

  1. Did offensive performance increase in 2019?
    1. Add a Totals rows at the bottom of each tab’s data table and then determine the average number of plate appearances (PA), TotalBases, BattingAverage, OPS and strikeouts (SO) for each year.
    2. Open a new worksheet and name it Offensive Comparison.
    3. Copy the Totals rows for each year to the new worksheet. Delete columns which you do not need; label and format the numbers appropriately:
  Average Plate Appearances Average Total Bases Average Batting Average Average OPS Average Strike Outs
2018          
2019          
  1. Enter a note below the data which describes the change in year-to-year offensive performance. You should note averages for plate appearances (PA), TotalBases, BattingAverage and OPS are positive indicators of offensive performance, while strikeouts (SO) is a negative indicator of offensive performance.
  1. Which players were the league leaders in 2018 and 2019 in BattingAverage, OPS, and OnBase%?
    1. Sort the relevant data table columns by turn in each tab from largest to smallest. Copy the data into a new worksheet called   League Leaders. Label and format values appropriately:
  2018   2019  
  Name Value Name Value
Batting Average        
On Base %        
OPS        
  1. Did the OPS and batting average leaders form similar all-star teams in 2019?
    1. Clicking the arrow in a column heading displays a drop-down menu with a Search option at the bottom. You can use search values to create subsets of the data. In this case, you can use the Position column to show data for players at a particular position. Then you can sort the OPS column to find the best OPS performer at that position during the year. Repeat the procedure in the batting average column for the same position. Use these steps at each remaining position to develop OPS and Batting Average all-star teams for this year.
    2. Summarise data in a new worksheet named       All-Star Team with appropriate formatting:
Name Team Position MAX

OPS

  Name Team Position MAX

Batting Avg

  1. Enter a note below the all-star team data which states your conclusion. Do the two performance measures essentially lead to the same teams or not?

 

Pivot Table Analyses

Use pivot tables to analyse two sets of data: (1) Player performances at the age of 27 versus those in other years, and (2) team batting averages 2018 and 2019.

 

                 

  1. Is offensive performance best when the player is 27 years old?
    1. Assume that OPS is the best performance measure to use. Using 2019’s data, create a pivot table on a separate sheet which shows the average OPS by player age. (For example, show the average OPS for all players who are 25 years old, 26, 27, and so on.) The table should also display a count of players at each age.
      1. Age should be the Rows field.
      2. OPS and PlayerID should be the Value (Data) fields. iii. Change the summary types using the Value field settings to make sure you have

Average for OPS and Count for PlayerID iv. You should only show the top eight values by Average of OPS: after the pivot table has been created, the Age column should have a drop-down arrow; use it to set the relevant Value Filter

  1. In the pivot table, you can change the name of the Row Labels column to Age if necessary.
  2. Sort the pivot table by the values in the Average of OPS column: in the pivot table, right-click the top value cell in the Average of OPS column and then sort from largest to smallest. The player age with the best average OPS will appear at the top row of the pivot table values.
  1. Apply the same procedure for 2018’s values and then copy the table data and paste it into the sheet which contains 2019’s pivot table data so you can compare the averages for two years.
  2. Use a calculation to compare the averages for the two years.
  3. Below the tables, insert a note which states your conclusion about the 27th year rule—does it apply or not?
  1. What were team batting averages this year versus last year? Which teams improved their batting average this year?
    1. Using separate sheets for each year, create pivot tables which show the average BattingAverage of each team.
    2. Copy 2018’s pivot table to the sheet which contains 2019’s pivot table for comparative purposes.
    3. Using formulas, compute the difference between the averages for each team.
    4. In the next adjacent column, use an IF() function to show the word ‘Improved’ or ‘not improved’ as appropriate to show which teams had a better or worse batting average in 2019 than 2018.
    5. Use the Countif() function to compute a) the number of teams which had an improved batting average, and b) the number of teams which had a worse batting average; display the values beneath of the Improved? column.

 

References

Birnbaum, P. (n.d.) A Guide to Sabermetric Research.  https://sabr.org/sabermetrics (accessed 26/6/2020).

Lahman, S. (n.d.) Seanlahman.Com: Baseball, Data, and Storytelling.  http://www.seanlahman.com/ (accessed 26/6/2020).

Verducci, T. (2015) Manfred Smartly Open to Change, and Some of These Ideas Might Work.

https://www.si.com/mlb/2015/01/27/rob-manfred-commissioner-major-league-baseball (accessed 26/6/2020).

 

The data on which this assignment is based are taken from Seanlahman.Com: Baseball, Data, and Storytelling.  http://www.seanlahman.com/. The database is copyright 1996-2019 by Sean Lahman and is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/