Analyzing Baseball Data with R

Paperback (Print)
Buy New
Buy New from
Used and New from Other Sellers
Used and New from Other Sellers
from $35.32
Usually ships in 1-2 business days
(Save 15%)
Other sellers (Paperback)
  • All (8) from $35.32   
  • New (5) from $35.32   
  • Used (3) from $37.26   


With its flexible capabilities and open-source platform, R has become a major tool for analyzing detailed, high-quality baseball data. Analyzing Baseball Data with R provides an introduction to R for sabermetricians, baseball enthusiasts, and students interested in exploring the rich sources of baseball data. It equips readers with the necessary skills and software tools to perform all of the analysis steps, from gathering the datasets and entering them in a convenient format to visualizing the data via graphs to performing a statistical analysis.

The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the traditional graphics functions in the base package and introduce more sophisticated graphical displays available through the lattice and ggplot2 packages. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and fielding measures. Each chapter contains exercises that encourage readers to perform their own analyses using R. All of the datasets and R code used in the text are available online.

This book helps readers answer questions about baseball teams, players, and strategy using large, publically available datasets. It offers detailed instructions on downloading the datasets and putting them into formats that simplify data exploration and analysis. Through the book’s various examples, readers will learn about modern sabermetrics and be able to conduct their own baseball analyses.

Read More Show Less

Editorial Reviews

From the Publisher
"If you are interested in statistics, especially baseball statistics, you will find this book fascinating and very useful. It provides many details. websites, and useful descriptions for using the R programming environment. This is not only a book on statistics; there are many references to famous player statistics, making this a very enjoyable book to read. And even if you don’t like baseball but still find statistics very exciting, then this book provides a great introduction to R that can be used for any other type of statistical data set."
IEEE Insulation Magazine, November/December 2014

"I have spent most of the past decade working in baseball as a statistical analyst for the New York Mets. … This type of employment can be highly valued, especially among quantitatively inclined college students who are coincidentally passionate baseball fans. It is from these students from whom I am most frequently asked, ‘what book would you recommend for someone who wants to get started in sabermetrics?’ Invariably, my response has been [Jim Albert and Jay Bennett’s] Curve Ball. I have a new response. …
I always felt that Curve Ball was the best place for a budding sabermetrician to start … However, it later dawned on me that while Curve Ball provided a sound framework for thinking probabilistically about baseball, I devoted a huge proportion of my time at work to computer programming. …
In their new book, Albert and Max Marchi, a native Italian who now works for the Cleveland Indians, have closed the loop by offering the aspiring sabermetrician a blueprint. … The reader who digests this book alongside her keyboard will emerge as a practicing sabermetrician—having knowledge of the key ideas in sabermetric theory, a historical understanding of from whence those ideas came, and the practical ability to compute with baseball data. It is a sabermetric workshop in paperback."
—Ben S. Baumer, International Statistical Review (2014), 82

Read More Show Less

Product Details

  • ISBN-13: 9781466570221
  • Publisher: Taylor & Francis
  • Publication date: 11/5/2013
  • Series: Chapman & Hall/CRC The R Series, #14
  • Pages: 352
  • Sales rank: 279,875
  • Product dimensions: 5.90 (w) x 9.20 (h) x 1.00 (d)

Meet the Author

Max Marchi is a baseball analyst with the Cleveland Indians. He was previously a statistician at the Emilia-Romagna Regional Health Agency. He has been a regular contributor to The Hardball Times and Baseball Prospectus websites and has consulted for MLB clubs.

Jim Albert is a professor of statistics at Bowling Green State University. He has authored or coauthored several books and is the editor of the Journal of Quantitative Analysis of Sports. His interests include Bayesian modeling, statistics education, and the application of statistical thinking in sports.

Read More Show Less

Table of Contents

The Baseball Datasets
The Lahman Database: Season-by-Season Data
Retrosheet Game-by-Game Data
Retrosheet Play-by-Play Data
Pitch-by-Pitch Data

Introduction to R
Installing R and RStudio
Objects and Containers in R
Collection of R Commands
Reading and Writing Data in R
Data Frames
Splitting, Applying, and Combining Data

Traditional Graphics
Factor Variable
Saving Graphs
Dot Plots
Numeric Variable: Stripchart and Histogram
Two Numeric Variables
A Numeric Variable and a Factor Variable
Comparing Ruth, Aaron, Bonds, and A-Rod
The 1998 Home Run Race

The Relation between Runs and Wins
The Teams Table in Lahman's Database
Linear Regression
The Pythagorean Formula for Winning Percentage
The Exponent in the Pythagorean Formula
Good and Bad Predictions by the Pythagorean Formula
How Many Runs for a Win?

Value of Plays Using Run Expectancy
The Runs Expectancy Matrix
Runs Scored in the Remainder of the Inning
Creating the Matrix
Measuring Success of a Batting Play
Albert Pujols
Opportunity and Success for All Hitters
Position in the Batting Lineup
Run Values of Different Base Hits
Value of Base Stealing

Advanced Graphics
The lattice Package
The ggplot2 Package

Balls and Strikes Effects
Hitter's Counts and Pitcher's Counts
Behaviors by Count

Career Trajectories
Mickey Mantle's Batting Trajectory
Comparing Trajectories
General Patterns of Peak Ages
Trajectories and Fielding Position

Simulating a Half Inning
Simulating a Baseball Season

Exploring Streaky Performances
The Great Streak
Streaks in Individual At-Bats
Local Patterns of Weighted On-Base Average

Learning about Park Effects by Database Management Tools
Installing MySQL and Creating a Database
Connecting R to MySQL
Filling a MySQL Game Log Database from R
Querying Data from R
Baseball Data as MySQL Dumps
Calculating Basic Park Factors

Exploring Fielding Metrics with Contributed R Packages
A Motivating Example: Comparing Fielding Metrics
Comparing Two Shortstops

Appendix A: Retrosheet Files Reference
Appendix B: Accessing and Using MLBAM Gameday and PITCHf/x Data



Further Reading and Exercises appear at the end of each chapter.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)