Pub. Date:
O'Reilly Media, Incorporated
Pub. Date:
O'Reilly Media, Incorporated
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data / Edition 1

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data / Edition 1

by Hadley Wickham, Garrett GrolemundHadley Wickham
Current price is , Original price is $49.99. You

Temporarily Out of Stock Online

Please check back later for updated availability.


Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.

You’ll learn how to:

  • Wrangle—transform your datasets into a form convenient for analysis
  • Program—learn powerful R tools for solving data problems with greater clarity and ease
  • Explore—examine your data, generate hypotheses, and quickly test them
  • Model—provide a low-dimensional summary that captures true "signals" in your dataset
  • Communicate—learn R Markdown for integrating prose, code, and results

Related collections and offers

Product Details

ISBN-13: 9781491910399
Publisher: O'Reilly Media, Incorporated
Publication date: 01/07/2017
Pages: 520
Sales rank: 85,361
Product dimensions: 5.90(w) x 8.90(h) x 1.10(d)

About the Author

Hadley Wickham is an Assistant Professor and the Dobelman FamilyJunior Chair in Statistics at Rice University. He is an active memberof the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization. His research focuses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualization to better understand data and models.

Garrett Grolemund is a statistician, teacher and R developer who currently works for RStudio. He sees data analysis as a largely untapped fountain of value for both industry and science. Garrett received his Ph.D at Rice University in Hadley Wickham's lab, where his research traced the origins of data analysis as a cognitive process and identified how attentional and epistemological concerns guide every data analysis.

Garrett is passionate about helping people avoid the frustration and unnecessary learning he went through while mastering data analysis. Even before he finished his dissertation, he started teaching corporate training in R and data analysis for Revolutions Analytics. He's taught at Google, eBay, Axciom and many other companies, and is currently developing a training curriculum for RStudio that will make useful know-how even more accessible.

Outside of teaching, Garrett spends time doing clinical trials research, legal research, and financial analysis. He also develops R software, he's co-authored the lubridate R package--which provides methods to parse, manipulate, and do arithmetic with date-times--and wrote the ggsubplot package, which extends the ggplot2 package.

Table of Contents

Preface ix

Part I Explore

1 Data Visualization with ggplot2 3

Introduction 3

First Steps 4

Aesthetic Mappings 7

Common Problems 13

Facets 14

Geometric Objects 16

Statistical Transformations 22

Position Adjustments 27

Coordinate Systems 31

The Layered Grammar of Graphics 34

2 Workflow: Basics 37

Coding Basics 37

What's in a Name? 38

Calling Functions 39

3 Data Transformation with dplyr 43

Introduction 43

Filter Rows with filter() 45

Arrange Rows with arrange() 50

Select Columns with select() 51

Add New Variables with mutate() 54

Grouped Summaries with summarize() 59

Grouped Mutates (and Filters) 73

4 Workflow: Scripts 77

Running Code 78

RStudio Diagnostics 79

5 Exploratory Data Analysis 81

Introduction 81

Questions 82

Variation 83

Missing Values 91

Covariation 93

Patterns and Models 105

Ggplot2 Calls 108

Learning More 108

6 Workflow: Projects 111

What Is Real? 111

Where Does Your Analysis Live? 113

Paths and Directories 113

RStudio Projects 114

Summary 116

Part II Wrangle

7 Tibbles with tibble 119

Introduction 119

Creating Tibbles 119

Tibbles Versus data.frame 121

Interacting with Older Code 123

8 Data Import with readr 125

Introduction 125

Getting Started 125

Parsing a Vector 129

Parsing a File 137

Writing to a File 143

Other Types of Data 145

9 Tidy Data with tidyr 147

Introduction 147

Tidy Data 148

Spreading and Gathering 151

Separating and Pull 157

Missing Values 161

Case Study 163

Nontidy Data 168

10 Relational Data with dplyr 171

Introduction 171

Nycflights13 172

Keys 175

Mutating Joins 178

Filtering Joins 188

Join Problems 191

Set Operations 192

11 Strings with stringr 195

Introduction 195

String Basics 195

Matching Patterns with Regular Expressions 200

Tools 207

Other Types of Pattern 218

Other Uses of Regular Expressions 221

Stringi 222

12 Factors with forcats 223

Introduction 223

Creating Factors 224

General Social Survey 225

Modifying Factor Order 227

Modifying Factor Levels 232

13 Dates and Times with lubridate 237

Introduction 237

Creating Date/Times 238

Date-Time Components 243

Time Spans 249

Time Zones 254

Part III Program

14 Pipes with magrittr 261

Introduction 261

Piping Alternatives 261

When Not to Use the Pipe 266

Other Tools from magrittr 267

15 Functions 269

Introduction 269

When Should You Write a Function? 270

Functions Are for Humans and Computers 273

Conditional Execution 276

Function Arguments 280

Return Values 285

Environment 288

16 Vectors 291

Introduction 291

Vector Basics 292

Important Types of Atomic Vector 293

Using Atomic Vectors 296

Recursive Vectors (Lists) 302

Attributes 307

Augmented Vectors 309

17 Iteration with purrr 313

Introduction 313

For Loops 314

For Loop Variations 317

For Loops Versus Functionals 322

The Map Functions 325

Dealing with Failure 329

Mapping over Multiple Arguments 332

Walk 335

Other Patterns of For Loops 336

Part IV Model

18 Model Basics with modelr 345

Introduction 345

A Simple Model 346

Visualizing Models 354

Formulas and Model Families 358

Missing Values 371

Other Model Families 372

19 Model Building 375

Introduction 375

Why Are Low-Quality Diamonds More Expensive? 376

What Affects the Number of Daily Flights? 384

Learning More About Models 396

20 Many Models with purrr and broom 397

Introduction 397

Gapminder 398

List-Columns 409

Creating List-Columns 411

Simplifying List-Columns 416

Making Tidy Data with broom 419

Part V Communicate

21 R Markdown 423

Introduction 423

R Markdown Basics 424

Text Formatting with Markdown 427

Code Chunks 428

Troubleshooting 435

YAML Header 435

Learning More 438

22 Graphics for Communication with ggplot2 441

Introduction 441

Label 442

Annotations 445

Scales 451

Zooming 461

Themes 462

Saving Your Plots 464

Learning More 467

23 R Markdown Formats 469

Introduction 469

Output Options 470

Documents 470

Notebooks 471

Presentations 472

Dashboards 473

Interactivity 474

Websites 477

Other Formats 477

Learning More 478

24 R Markdown Workflow 479

Index 483

Customer Reviews