Seeking SRE: Conversations About Running Production Systems at Scale
Organizations big and small have started to realize just how crucial system and application reliability is to their business. They've also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.

SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful O'Reilly book that described Google's creation of the discipline and the implementation that's allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.

Listen as engineers and other leaders in the field discuss:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE

David N. Blank-Edelman is the book's curator and editor.

1126958644
Seeking SRE: Conversations About Running Production Systems at Scale
Organizations big and small have started to realize just how crucial system and application reliability is to their business. They've also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.

SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful O'Reilly book that described Google's creation of the discipline and the implementation that's allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.

Listen as engineers and other leaders in the field discuss:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE

David N. Blank-Edelman is the book's curator and editor.

59.99 In Stock
Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE: Conversations About Running Production Systems at Scale

by David Blank-Edelman
Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE: Conversations About Running Production Systems at Scale

by David Blank-Edelman

Paperback

$59.99 
  • SHIP THIS ITEM
    In stock. Ships in 1-2 days.
  • PICK UP IN STORE

    Your local store may have stock of this item.

Related collections and offers


Overview

Organizations big and small have started to realize just how crucial system and application reliability is to their business. They've also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.

SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful O'Reilly book that described Google's creation of the discipline and the implementation that's allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.

Listen as engineers and other leaders in the field discuss:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE

David N. Blank-Edelman is the book's curator and editor.


Product Details

ISBN-13: 9781491978863
Publisher: O'Reilly Media, Incorporated
Publication date: 09/17/2018
Pages: 587
Product dimensions: 6.90(w) x 9.10(h) x 1.30(d)

About the Author

David N. Blank-Edelman is the Director of Technology at the Northeastern UniversityCollege of Computer and Information Science. He has spent the last 25 years as a system/network administrator in large multi- platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was also the program chair of the LISA 2005 conference and one of the LISA 2006 Invited Talks co-chairs.

Table of Contents

Introduction ix

Part I SRE Implementation

1 Context Versus Control in SRE 3

2 Interviewing Site Reliability Engineers 15

3 So, You Want to Build an SRE Team? 25

4 Using Incident Metrics to Improve SRE at Scale 33

5 Working with Third Parties Shouldn't Suck 43

6 How to Apply SRE Principles Without Dedicated SRE Teams 65

7 SRE Without SRE: The Spotify Case Study 81

8 Introducing SRE in Large Enterprises 111

9 From SysAdmin to SRE in 8,963 Words 123

10 Clearing the Way for SRE in the Enterprise 147

11 SRE Patterns Loved by DevOps People Everywhere 177

12 DevOps and SRE: Voices from the Community 187

13 Production Engineering at Facebook 207

Part II Near Edge SRE

14 In the Beginning, There Was Chaos 233

15 The Intersection of Reliability and Privacy 245

16 Database Reliability Engineering 257

17 Engineering for Data Durability 275

18 Introduction to Machine Learning for SRE 293

Part III SRE Best Practices and Technologies

19 Do Docs Better: Integrating Documentation into the Engineering Workflow 325

20 Active Teaching and Learning 343

21 The Art and Science of the Service-Level Objective 355

22 SRE as a Success Culture 365

23 SRE Antipatterns 379

24 Immutable Infrastructure and SRE 407

25 Scriptable Load Balancers 415

26 The Service Mesh: Wrangler of Your Microservices? 433

Part IV The Human Side of SRE

27 Psychological Safety in SRE 453

28 SRE Cognitive Work 465

29 Beyond Burnout 491

30 Against On-Call: A Polemic 511

31 Elegy for Complex Systems 533

32 Intersections Between Operations and Social Activism 541

33 Conclusion 559

Index 561

From the B&N Reads Blog

Customer Reviews