Uh-oh, it looks like your Internet Explorer is out of date.
For a better shopping experience, please upgrade now.
Mathletics is a remarkably entertaining book that shows readers how to use simple mathematics to analyze a range of statistical and probability-related questions in professional baseball, basketball, and football, and in sports gambling. How does professional baseball evaluate hitters? Is a singles hitter like Wade Boggs more valuable than a power hitter like David Ortiz? Should NFL teams pass or run more often on first downs? Could professional basketball have used statistics to expose the crooked referee Tim Donaghy? Does money buy performance in professional sports?
In Mathletics, Wayne Winston describes the mathematical methods that top coaches and managers use to evaluate players and improve team performance, and gives math enthusiasts the practical tools they need to enhance their understanding and enjoyment of their favorite sports--and maybe even gain the outside edge to winning bets. Mathletics blends fun math problems with sports stories of actual games, teams, and players, along with personal anecdotes from Winston's work as a sports consultant. Winston uses easy-to-read tables and illustrations to illuminate the techniques and ideas he presents, and all the necessary math concepts--such as arithmetic, basic statistics and probability, and Monte Carlo simulations--are fully explained in the examples.
After reading Mathletics, you will understand why baseball teams should almost never bunt, why football overtime systems are unfair, why points, rebounds, and assists aren't enough to determine who's the NBA's best player--and much, much more.
|Publisher:||Princeton University Press|
|Product dimensions:||6.40(w) x 9.30(h) x 1.20(d)|
Read an Excerpt
MATHLETICSHow Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football
By WAYNE WINSTON
PRINCETON UNIVERSITY PRESSCopyright © 2009 Princeton University Press
All right reserved.
Chapter OneBASEBALL'S PYTHAGOREAN THEOREM
The more runs a baseball team scores, the more games the team should win. Conversely, the fewer runs a team gives up, the more games the team should win. Bill James, probably the most celebrated advocate of applying mathematics to analysis of Major League Baseball (often called sabermetrics), studied many years of Major League Baseball (MLB) standings and found that the percentage of games won by a baseball team can be well approximated by the formula
[runs scored.sup.2]/[runs scored.sup.2] + [runs allowed.sup.2] = estimate of percentage of games won. (1)
This formula has several desirable properties.
The predicted win percentage is always between 0 and 1.
An increase in runs scored increases predicted win percentage.
A decrease in runs allowed increases predicted win percentage.
Consider a right triangle with a hypotenuse (the longest side) of length c and two other sides of lengths a and b. Recall from high school geometry that the Pythagorean Theorem states that a triangle is a right triangle if and only if [a.sup.2] + [b.sup.2] + [c.sup.2]. For example, a triangle with sides of lengths 3, 4, and 5 is a right triangle because [3.sup.2] + [4.sup.2] + [5.sup.2]. The fact that equation (1) adds up the squares of two numbers led Bill James to call the relationship described in (1) Baseball's Pythagorean Theorem.
Let's define R = runs scored/runs allowed as a team's scoring ratio. If we divide the numerator and denominator of (1) by [(runs allowed).sup.2], then the value of the fraction remains unchanged and we may rewrite (1) as equation (1)'.
[R.sup.2]/[R.sup.2] + 1 = estimate of percentage of games won. (1)'
Figure 1.1 shows how well (1)' predicts MLB teams' winning percentages for the 1980-2006 seasons.
For example, the 2006 Detroit Tigers (DET) scored 822 runs and gave up 675 runs. Their scoring ratio was R = 822/675 = 1.218. Their predicted win percentage from Baseball's Pythagorean Theorem was [1.218.sup.2]/[(1.218).sup.2] + 1 = 5.97. The 2006 Tigers actually won a fraction of their games, or 95/162 = .586. Thus (1)' was off by 1.1% in predicting the percentage of games won by the Tigers in 2006.
For each team define error in winning percentage prediction as actual winning percentage minus predicted winning percentage. For example, for the 2006 Arizona Diamondbacks (ARI), error = .469 =.490 = -.021 and for the 2006 Boston Red Sox (BOS), error = .531 = -.497 = .034. A positive error means that the team won more games than predicted while a negative error means the team won fewer games than predicted. Column J in figure 1.1 computes the absolute value of the prediction error for each team. Recall that the absolute value of a number is simply the distance of the number from 0. That is, [absolute value of 5] = [absolute value of -5] = 5. The absolute prediction errors for each team were averaged to obtain a measure of how well the predicted win percentages fit the actual team winning percentages. The average of absolute forecasting errors is called the MAD (Mean Absolute Deviation). For this data set, the predicted winning percentages of the Pythagorean Theorem were off by an average of 2% per team (cell J1).
Instead of blindly assuming winning percentage can be approximated by using the square of the scoring ratio, perhaps we should try a formula to predict winning percentage, such as
[R.sup.exp]/[R.sup.exp] + 1 (2)
If we vary exp (exponent) in (2) we can make (2) better fit the actual dependence of winning percentage on scoring ratio for different sports. For baseball, we will allow exp in (2) to vary between 1 and 3. Of course, exp = 2 reduces to the Pythagorean Theorem.
Figure 1.2 shows how MAD changes as we vary exp between 1 and 3. We see that indeed exp = 1.9 yields the smallest MAD (1.96%). An exp value of 2 is almost as good (MAD of 1.97%), so for simplicity we will stick with Bill James's view that exp = 2. Therefore, exp = 2 (or 1.9) yields the best forecasts if we use an equation of form (2). Of course, there might be another equation that predicts winning percentage better than the Pythagorean Theorem from runs scored and allowed. The Pythagorean Theorem is simple and intuitive, however, and works very well. After all, we are off in predicting team wins by an average of 162 ? .02, which is approximately three wins per team. Therefore, I see no reason to look for a more complicated (albeit slightly more accurate) model.
How Well Does the Pythagorean Theorem Forecast?
To test the utility of the Pythagorean Theorem (or any prediction model), we should check how well it forecasts the future. I compared the Pythagorean Theorem's forecast for each MLB playoff series (1980-2007) against a prediction based just on games won. For each playoff series the Pythagorean method would predict the winner to be the team with the higher scoring ratio, while the "games won" approach simply predicts the winner of a playoff series to be the team that won more games. We found that the Pythagorean approach correctly predicted 57 of 106 playoff series (53.8%) while the "games won" approach correctly predicted the winner of only 50% (50 out of 100) of playoff series. The reader is probably disappointed that even the Pythagorean method only correctly forecasts the outcome of less than 54% of baseball playoff series. I believe that the regular season is a relatively poor predictor of the playoffs in baseball because a team's regular season record depends greatly on the performance of five starting pitchers. During the playoffs teams only use three or four starting pitchers, so much of the regular season data (games involving the fourth and fifth starting pitchers) are not relevant for predicting the outcome of the playoffs.
For anecdotal evidence of how the Pythagorean Theorem forecasts the future performance of a team better than a team's win-loss record, consider the case of the 2005 Washington Nationals. On July 4, 2005, the Nationals were in first place with a record of 50-32. If we extrapolate this winning percentage we would have predicted a final record of 99-63. On July 4, 2005, the Nationals scoring ratio was .991. On July 4, 2005, (1)' would have predicted a final record of 80-82. Sure enough, the poor Nationals finished 81-81.
The Importance of the Pythagorean Theorem
Baseball's Pythagorean Theorem is also important because it allows us to determine how many extra wins (or losses) will result from a trade. Suppose a team has scored 850 runs during a season and has given up 800 runs. Suppose we trade a shortstop (Joe) who "created" 150 runs for a shortstop (Greg) who created 170 runs in the same number of plate appearances. This trade will cause the team (all other things being equal) to score 20 more runs (170 - 150 = 20). Before the trade, R = 850/800 = 1.0625, and we would predict the team to have won 162[(1.0625).sup.2]/1 + [(1.0625).sup.2] = 85.9 games. After the trade, R = 870/800 = 1.0875 and we would predict the team to win 162[(1.0875).sup.2]/1 + [(1.0875).sup.2] = 87.8 games. Therefore, we estimate the trade makes our team 1.9 games better (87.8 - 85.9 = 1.9). In chapter 9, we will see how the Pythagorean Theorem can be used to help determine fair salaries for MLB players.
Football and Basketball "Pythagorean Theorems"
Does the Pythagorean Theorem hold for football and basketball? Daryl Morey, the general manager for the Houston Rockets, has shown that for the NFL, equation (2) with exp = 2.37 gives the most accurate predictions for winning percentage while for the NBA, equation (2) with exp = 13.91 gives the most accurate predictions for winning percentage. Figure 1.3 gives the predicted and actual winning percentages for the NFL for the 2006 season, while figure 1.4 gives the predicted and actual winning percentages for the NBA for the 2006-7 season.
For the 2005-7 NFL seasons, MAD was minimized by exp = 2.7. Exp = 2.7 yielded a MAD of 5.9%, while Morey's exp = 2.37 yielded a MAD of 6.1%. For the 2004-7 NBA seasons, exp = 15.4 best fit actual winning percentages. MAD for these seasons was 3.36% for exp = 15.4 and 3.40% for exp = 13.91. Since Morey's values of exp are very close in accuracy to the values we found from recent seasons we will stick with Morey's values of exp.
These predicted winning percentages are based on regular season data. Therefore, we could look at teams that performed much better than expected during the regular season and predict that "luck would catch up with them." This train of thought would lead us to believe that these teams would perform worse during the playoffs. Note that the Miami Heat and Dallas Mavericks both won about 8% more games than expected during the regular season. Therefore, we would have predicted Miami and Dallas to perform worse during the playoffs than their actual win-loss record indicated. Sure enough, both Dallas and Miami suffered unexpected first-round defeats. Conversely, during the regular season the San Antonio Spurs and Chicago Bulls won around 8% fewer games than the Pythagorean Theorem predicts, indicating that these teams would perform better than expected in the playoffs. Sure enough, the Bulls upset the Heat and gave the Detroit Pistons a tough time. Of course, the Spurs won the 2007 NBA title. In addition, the Pythagorean Theorem had the Spurs as by far the league's best team (78% predicted winning percentage). Note the team that underachieved the most was the Boston Celtics, who won nearly 9% fewer (or 7) games than predicted. Many people suggested the Celtics "tanked" games during the regular season to improve their chances of obtaining potential future superstars such as Greg Oden and Kevin Durant in the 2007 draft lottery. The fact that the Celtics won seven fewer games than expected does not prove this conjecture, but it is certainly consistent with the view that Celtics did not go all out to win every close game.
The Excel Data Table feature enables us to see how a formula changes as the values of one or two cells in a spreadsheet are modified. This appendix shows how to use a One Way Data Table to determine how the accuracy of (2) for predicting team winning percentage depends on the value of exp. To illustrate, let's show how to use a One Way Data Table to determine how varying exp from 1 to 3 changes the average error in predicting a MLB team's winning percentage (see figure 1.2).
Step 1. We begin by entering the possible values of exp (1, 1.1, ... 3) in the cell range N7:N27. To enter these values, simply enter 1 in N7, 1.1 in N8, and select the cell range N8. Now drag the cross in the lower right-hand corner of N8 down to N27.
Step 2. In cell O6 we enter the formula we want to loop through and calculate for different values of exp by entering the formula = J1.
Step 3. In Excel 2003 or earlier, select Table from the Data Menu. In Excel 2007 select Data Table from the What If portion of the ribbon's Data tab (figure 1-a).
Step 4. Do not select a row input cell but select cell L2 (which contains the value of exp) as the column input cell. After selecting OK we see the results shown in figure 1.2. In effect Excel has placed the values 1, 1.1, ... 3 into cell M2 and computed our MAD for each listed value of exp.
Chapter TwoWHO HAD A BETTER YEAR, NOMAR GARCIAPARRA OR ICHIRO SUZUKI?
The Runs-Created Approach
In 2004 Seattle Mariner outfielder Ichiro Suzuki set the major league record for most hits in a season. In 1997 Boston Red Sox shortstop Nomar Garciaparra had what was considered a good (but not great) year. Their key statistics are presented in table 2.1. (For the sake of simplicity, henceforth Suzuki will be referred to as "Ichiro" or "Ichiro 2004" and Garciaparra will be referred to as "Nomar" or "Nomar 1997.")
Recall that a batter's slugging percentage is Total Bases (TB)/At Bats (AB) where
TB = Singles + 2 ? Doubles (2B) + 3 ? Triples (3B) + 4 ? Home Runs (HR).
We see that Ichiro had a higher batting average than Nomar, but because he hit many more doubles, triples, and home runs, Nomar had a much higher slugging percentage. Ichiro walked a few more times than Nomar did. So which player had a better hitting year?
When a batter is hitting, he can cause good things (like hits or walks) to happen or cause bad things (outs) to happen. To compare hitters we must develop a metric that measures how the relative frequency of a batter's good events and bad events influence the number of runs the team scores.
In 1979 Bill James developed the first version of his famous Runs Created Formula in an attempt to compute the number of runs "created" by a hitter during the course of a season. The most easily obtained data we have available to determine how batting events influence Runs Scored are season-long team batting statistics. A sample of this data is shown in figure 2.1.
James realized there should be a way to predict the runs for each team from hits, singles, 2B, 3B, HR, outs, and BB + HBP. Using his great intuition, James came up with the following relatively simple formula.
runs created = (hits + BB + HBP) ? (TB)/(AB + BB + HBP). (1)
As we will soon see, (1) does an amazingly good job of predicting how many runs a team scores in a season from hits, BB, HBP, AB, 2B, 3B, and HR. What is the rationale for (1)? To score runs you need to have runners on base, and then you need to advance them toward home plate: (Hits + Walks + HBP) is basically the number of base runners the team will have in a season. The other part of the equation, TB/(AB + BB + HBP), measures the rate at which runners are advanced per plate appearance. Therefore (1) is multiplying the number of base runners by the rate at which they are advanced. Using the information in figure 2.1 we can compute Runs Created for the 2000 Anaheim Angels.
runs created = (1,574 + 655) ? (995 + 2(309) + 3(34) + 4(236))/(5,628 + 655) = 943.
Actually, the 2000 Anaheim Angels scored 864 runs, so Runs Created overestimated the actual number of runs by around 9%. The file teams.xls calculates Runs Created for each team during the 2000-2006 seasons and compares Runs Created to actual Runs Scored. We find that Runs Created was off by an average of 28 runs per team. Since the average team scored 775 runs, we find an average error of less than 4% when we try to use (1) to predict team Runs Scored. It is amazing that this simple, intuitively appealing formula does such a good job of predicting runs scored by a team. Even though more complex versions of Runs Created more accurately predict actual Runs Scored, the simplicity of (1) has caused this formula to continue to be widely used by the baseball community.
Beware Blind Extrapolation!
The problem with any version of Runs Created is that the formula is based on team statistics. A typical team has a batting average of .265, hits home runs on 3% of all plate appearances, and has a walk or HBP in around 10% of all plate appearances. Contrast these numbers to those of Barry Bonds's great 2004 season in which he had a batting average of .362, hit a HR on 7% of all plate appearances, and received a walk or HBP during approximately 39% of his plate appearances. One of the first ideas taught in business statistics class is the following: do not use a relationship that is fit to a data set to make predictions for data that are very different from the data used to fit the relationship. Following this logic, we should not expect a Runs Created Formula based on team data to accurately predict the runs created by a superstar such as Barry Bonds or by a very poor player. In chapter 4 we will remedy this problem.
Ichiro vs. Nomar
Despite this caveat, let's plunge ahead and use (1) to compare Ichiro Suzuki's 2004 season to Nomar Garciaparra's 1997 season. Let's also compare Runs Created for Barry Bonds's 2004 season to compare his statistics with those of the other two players. (See figure 2.2.)
Excerpted from MATHLETICS by WAYNE WINSTON Copyright © 2009 by Princeton University Press. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
List of Abbreviations xv
Part I. Baseball 1
Chapter 1: Baseball's Pythagorean Theorem 3
Chapter 2: Who Had a Better Year, Nomar Garciaparra or Ichiro Suzuki? 11
The Runs- Created Approach
Chapter 3: Evaluating Hitters by Linear Weights 17
Chapter 4: Evaluating Hitters by Monte Carlo Simulation 30
Chapter 5: Evaluating Baseball Pitchers and Forecasting Future Pitcher Performance 41
Chapter 6: Baseball Decision- Making 52
Chapter 7: Evaluating Fielders 64
Sabermetrics' Last Frontier
Chapter 8: Player Win Averages 71
Chapter 9: The Value of Replacement Players 79
Evaluating Trades and Fair Salary
Chapter 10: Park Factors 84
Chapter 11: Streakiness in Sports 87
Chapter 12: The Platoon Effect 102
Chapter 13: Was Tony Perez a Great Clutch Hitter? 106
Chapter 14: Pitch Count and Pitcher Effectiveness 110
Chapter 15: Would Ted Williams Hit .406 Today? 113
Chapter 16: Was Joe DiMaggio's 56- Game Hitting Streak the Greatest Sports Record of All Time? 116
Chapter 17: Major League Equivalents 123
Part II. Football 125
Chapter 18: What Makes NFL Teams Win? 127
Chapter 19: Who's Better, Tom Brady or Peyton Manning? 132
Chapter 20: Football States and Values 138
Chapter 21: Football Decision- Making 101 143
Chapter 22: A State and Value Analysis of the 2006 Super Bowl 151
Chapter 23: If Passing Is Better Than Running, Why Don't 158
Teams Always Pass?
Chapter 24: Should We Go for a One- Point or Two- Point Conversion? 165
Chapter 25: To Give Up the Ball Is Better Than to Receive 172
The Case of College Football Overtime
Chapter 26: Why Is the NFL's Overtime System Fatally Flawed? 175
Chapter 27: How Valuable Are High Draft Picks in the NFL? 180
Part III. Basketball 185
Chapter 28: Basketball Statistics 101 187
The Four- Factor Model
Chapter 29: Linear Weights for Evaluating NBA Players 195
Chapter 30: Adjusted_/_Player Ratings 202
Chapter 31: NBA Lineup Analysis 224
Chapter 32: Analyzing Team and Individual Matchups 228
Chapter 33: NBA Players' Salaries and the Draft 233
Chapter 34: Are NBA Officials Prejudiced? 237
Chapter 35: Are College Basketball Games Fixed? 242
Chapter 36: Did Tim Donaghy Fix NBA Games? 244
Chapter 37: End- Game Basketball Strategy 248
Part IV. Playing with Money, and Other Topics for Serious Sports Fans 253
Chapter 38: Sports Gambling 101 255
Chapter 39: Freakonomics Meets the Bookmaker 262
Chapter 40: Rating Sports Teams 266
Chapter 41: Which League Has Greater Parity, The NFL or the NBA? 283
Chapter 42: The Ratings Percentage Index (RPI) 287
Chapter 43: From Point Ratings to Probabilities 290
Chapter 44: Optimal Money Management 298
The Kelly Growth Criteria
Chapter 45: Ranking Great Sports Collapses 303
Chapter 46: Can Money Buy Success? 311
Chapter 47: Does Joey Crawford Hate the Spurs? 319
Chapter 48: Does Fatigue Make Cowards of Us All? 321
The Case of NBA Back- to- Back Games and NFL Bye Weeks
Chapter 49: Can the Bowl Championship Series Be Saved? 324
Chapter 50: Comparing Players from Different Eras 331
Chapter 51: Conclusions 335
Index of Databases 341
Annotated Bibliography 343
What People are Saying About This
I really enjoyed this unique book, as will anyone who is a serious sports fan with some interest in mathematics. Winston is very knowledgeable about baseball, basketball, and football, and about the mathematical techniques needed to analyze a multitude of questions that arise in them. He does a very good job of explaining complex mathematical ideas in a simple way.
George L. Nemhauser, Georgia Institute of Technology
People who want the details on the analysis of baseball need to read Mathletics. This book provides the statistics behind Moneyball.
Pete Palmer, coeditor of "The ESPN Baseball Encyclopedia" and "The ESPN Pro Football Encyclopedia"
Wayne Winston's Mathletics combines rigorous analytical methodologies with a very inquisitive approach. This should be a required starting point for anyone desiring to use mathematics in the world of sports.
KC Joyner, author of "Blindsided: Why the Left Tackle Is Overrated and Other Contrarian Football Thoughts"
Winston has an uncanny knack for bringing the game alive through the fascinating mathematical questions he explores. He gets inside professional sports like no other writer I know. Mathletics is like a seat at courtside.
Mark Cuban, owner of the Dallas Mavericks
Mathletics offers insights into the mathematical analysis of three major sports and sports gambling. The basketball and sports bookies sections are particularly interesting and loaded with in-depth examples and analysis. The author's passion seems to jump right off the page.
Michael Huber, Muhlenberg College
Winston has brought together the latest thinking on sports mathematics in one comprehensive place. This volume is perfect for someone seeking a general overview or who wants to dive into advanced thinking on the latest sports-analytics topics.
Daryl Morey, general manager of the Houston Rockets