Table of Contents
Preface ix
1 Introduction 1
Prerequisites 2
Who This Book Is for and How to Use It 2
What Is Efficiency? 4
What Is Efficient R Programming? 4
Why Efficiency? 6
Cross-Transferable Skills for Efficiency 7
Touch Typing 7
Consistent Style and Code Conventions 8
Benchmarking and Profiling 9
Benchmarking 9
Benchmarking Example 10
Profiling 11
Book Resources 14
R Package 14
Online Version 14
References 14
2 Efficient Setup 17
Prerequisites 18
Top Five Tips for an Efficient R Setup 18
Operating System 18
Operating System and Resource Monitoring 19
R Version 21
Installing R 22
Updating R 23
Installing R Packages 23
Installing R Packages with Dependencies 24
Updating R Packages 24
R Startup 25
R Startup Arguments 25
An Overview of R's Startup Files 26
The Location of Startup Files 27
The .Rprofile File 28
Example .Rprofile File 29
The .Renviron File 33
RStudio 35
Installing and Updating RStudio 35
Window Pane Layout 36
RStudio Options 38
Autocompletion 39
Keyboard Shortcuts 40
Object Display and Output Table 41
Project Management 41
BLAS and Alternative R Interpreters 43
Testing Performance Gains from BLAS 44
Other Interpreters 45
Useful BLAS/Benchmarking Resources 46
References 46
3 Efficient Programming 47
Prerequisites 47
Top Five Tips for Efficient Programming 47
General Advice 48
Memory Allocation 49
Vectorized Code 50
Communicating with the User 53
Fatal Errors: stop() 53
Warnings: warning() 54
Informative Output: message() and cat() 55
Invisible Returns 55
Factors 56
Inherent Order 56
Fixed Set of Categories 57
The Apply Family 57
Example: Movies Dataset 59
Type Consistency 60
Caching Variables 61
Function Closures 63
The Byte Compiler 64
Example: The Mean Function 65
Compiling Code 66
References 67
4 Efficient Workflow 69
Prerequisites 69
Top Five Tips for Efficient Workflow 70
A Project Planning Typology 70
Project Planning and Management 72
Chunking Your Work 73
Making Your Workflow SMART 74
Visualizing Plans with R 75
Package Selection 76
Searching for R Packages 78
How to Select a Package 78
Publication 80
Dynamic Documents with R Markdown 81
R Packages 83
Reference 84
5 Efficient Input/Output 85
Prerequisites 86
Top Five Tips for Efficient Data I/O 86
Versatile Data Import with rio 86
Plain-Text Formats 88
Differences Between iread() and read_csv() 90
Preprocessing Text Outside R 92
Binary File Formats 93
Native Binary Formats: Rdata or Rds? 94
The Feather File Format 94
Benchmarking Binary File Formats 94
Protocol Buffers 96
Getting Data from the Internet 96
Accessing Data Stored in Packages 97
References 98
6 Efficient Data Carpentry 99
Prerequisites 100
Top Five Tips for Efficient Data Carpentry 100
Efficient Data Frames with tibble 100
Tidying Data with tidyr and Regular Expressions 102
Make Wide Tables Long with gather() 103
Split Joint Variables with separate() 104
Other tidyr Functions 105
Regular Expressions 106
Efficient Data Processing with dplyr 108
Renaming Columns 110
Changing Column Classes 110
Filtering Rows 111
Chaining Operations 112
Data Aggregation 114
Nonstandard Evaluation 117
Combining Datasets 118
Working with Databases 119
Databases and dplyr 121
Data Processing with data.table 123
References 125
7 Efficient Optimization 127
Prerequisites 128
Top Five Tips for Efficient Optimization 128
Code Profiling 128
Getting Started with profvis 129
Example: Monopoly Simulation 130
Efficient Base R 131
The if() Versus ifelse() Functions 131
Sorting and Ordering 132
Reversing Elements 133
Which Indices are TRUE? 133
Converting Factors to Numerics 134
Logical AND and OR 134
Row and Column Operations 134
Is.na() and anyNA() 135
Matrices 135
Example: Optimizing the move_square() Function 138
Parallel Computing 139
Parallel Versions of Apply Functions 140
Example: Snakes and Ladders 140
Exit Functions with Care 141
Parallel Code under Linux and OS X 141
Repp 142
A Simple C++ Function 143
The cppFunction() Command 144
C++ Data Types 145
The sourceCpp() Function 145
Vectors and Loops 146
Matrices 149
C++ with Sugar on Top 149
Repp Resources 150
References 151
8 Efficient Hardware 153
Prerequisites 153
Top Five Tips for Efficient Hardware 153
Background: What Is a Byte? 154
Random Access Memory 155
Hard Drives: HDD Versus SSD 158
Operating Systems: 32-Bit or 64-Bit 159
Central Processing Unit 160
Cloud Computing 162
Amazon EC2 162
9 Efficient Collaboration 163
Prerequisites 164
Top Five Tips for Efficient Collaboration 164
Coding Style 164
Reformatting Code with RStudio 165
Filenames 165
Loading Packages 166
Commenting 166
Object Names 167
Example Package 167
Assignment: 168
Spacing 168
Indentation 168
Curly Braces 169
Version Control 169
Commits 170
Git Integration in RStudio 170
GitHub 171
Branches, Forks, Pulls, and Clones 172
Code Review 173
References 174
10 Efficient Learning 175
Prerequisites 175
Top Five Tips for Efficient Learning 175
Using R's Internal Help 176
Searching R for Topics 177
Finding and Using Vignettes 178
Getting Help on Functions 179
Reading R Source Code 181
Swirl 182
Online Resources 182
Stack Overflow 183
Mailing Lists and Groups 184
Asking a Question 184
Minimal Dataset 184
Minimal Example 185
Learning In Depth 185
Spread the Knowledge 187
References 187
A Package Dependencies 189
B References 191
Index 197