Table of Contents
Preface xi 1 Part I The Basics
1 Introduction to Scalable Systems 3
What Is Scalability? 3
Examples of System Scale in the Early 2000s 6
How Did We Get Here? A Brief History of System Growth 7
Scalability Basic Design Principles 9
Scalability and Costs 11
Scalability and Architecture Trade-Offs 13
Performance 13
Availability 14
Security 15
Manageability 16
Summary and Further Reading 17
2 Distributed Systems Architectures: An Introduction 19
Basic System Architecture 19
Scale Out 21
Scaling the Database with Caching 23
Distributing the Database 25
Multiple Processing Tiers 27
Increasing Responsiveness 30
Systems and Hardware Scalability 32
Summary and Further Reading 34
3 Distributed Systems Essentials 35
Communications Basics 35
Communications Hardware 36
Communications Software 39
Remote Method Invocation 43
Partial Failures 49
Consensus in Distributed Systems 53
Time in Distributed Systems 56
Summary and Further Reading 58
4 An Overview of Concurrent Systems 61
Why Concurrency? 62
Threads 64
Order of Thread Execution 67
Problems with Threads 68
Race Conditions 69
Deadlocks 73
Thread States 78
Thread Coordination 79
Thread Pools 82
Barrier Synchronization 84
Thread-Safe Collections 86
Summary and Further Reading 88
Part II Scalable Systems
5 Application Services 93
Service Design 93
Application Programming Interface (API) 94
Designing Services 97
State Management 100
Applications Servers 103
Horizontal Scaling 106
Load Balancing 107
Load Distribution Policies 109
Health Monitoring 109
Elasticity 110
Session Affinity 111
Summary and Further Reading 113
6 Distributed Caching 115
Application Caching 115
Web Caching 120
Cache-Control 121
Expires and Last-Modified 121
Etag 122
Summary and Further Reading 124
7 Asynchronous Messaging 127
Introduction to Messaging 128
Messaging Primitives 128
Message Persistence 130
Publish-Subscribe 131
Message Replication 132
Example: RabbitMQ 133
Messages, Exchanges, and Queues 133
Distribution and Concurrency 135
Data Safety and Performance Trade-offs 138
Availability and Performance Trade-Offs 140
Messaging Patterns 141
Competing Consumers 141
Exactly-Once Processing 142
Poison Messages 143
Summary and Further Reading 144
8 Serverless Processing Systems 147
The Attractions of Serverless 147
Google App Engine 149
The Basics 149
GAE Standard Environment 150
Autoscaling 151
AWS Lambda 152
Lambda Function Life Cycle 153
Execution Considerations 154
Scalability 155
Case Study: Balancing Throughput and Costs 157
Choosing Parameter Values 158
GAE Autoscaling Parameter Study Design 159
Results 160
Summary and Further Reading 161
3 Microservices 163
The Movement to Microservices 164
Monolithic Applications 164
Breaking Up the Monolith 166
Deploying Microservices 168
Principles of Microservices 170
Resilience in Microservices 172
Cascading Failures 173
Bulkhead Pattern 178
Summary and Further Reading 180
Part III Scalable Distributed Databases
10 Scalable Database Fundamentals 185
Distributed Databases 185
Scaling Relational Databases 186
Scaling Up 186
Scaling Out: Read Replicas 188
Scale Out: Partitioning Data 189
Example: Oracle RAC 191
The Movement to NoSQL 192
NoSQL Data Models 196
Query Languages 197
Data Distribution 198
The CAP Theorem 202
Summary and Further Reading 203
11 Eventual Consistency 205
What Is Eventual Consistency? 205
Inconsistency Window 206
Read Your Own Writes 207
Tunable Consistency 209
Quorum Reads and Writes 211
Replica Repair 213
Active Repair 214
Passive Repair 214
Handling Conflicts 215
Last Writer Wins 216
Version Vectors 217
Summary and Further Reading 221
12 Strong Consistency 223
Introduction to Strong Consistency 224
Consistency Models 226
Distributed Transactions 227
Two-Phase Commit 228
2PC Failure Modes 230
Distributed Consensus Algorithms 232
Raft 234
Leader Election 236
Strong Consistency in Practice 238
VoltDB 238
Google Cloud Spanner 241
Summary and Further Reading 244
13 Distributed Database Implementations 247
Redis 248
Data Model and API 248
Distribution and Replication 250
Strengths and Weaknesses 251
MongoDB 253
Data Model and API 254
Distribution and Replication 256
Strengths and Weaknesses 259
Amazon DynamoDB 260
Data Model and API 261
Distribution and Replication 264
Strengths and Weaknesses 266
Summary and Further Reading 267
Part IV Event and Stream Processing
14 Scalable Event-Driven Processing 271
Event-Driven Architectures 272
Apache Kafka 274
Topics 275
Producers and Consumers 276
Scalability 279
Availability 283
Summary and Further Reading 284
15 Stream Processing Systems 287
Introduction to Stream Processing 288
Stream Processing Platforms 291
Case Study: Apache Flink 293
DataStream API 293
Scalability 295
Data Safety 298
Conclusions and Further Reading 300
16 Final Tips for Success 303
Automation 304
Observability 305
Deployment Platforms 306
Data Lakes 307
Further Reading and Conclusions 307
Index 309