Web Operations: Keeping the Data On Time

A web application involves many specialists, but it takes people in web ops to ensure that everything works together throughout an application's lifetime. It's the expertise you need when your start-up gets an unexpected spike in web traffic, or when a new feature causes your mature application to fail. In this collection of essays and interviews, web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into this evolving field. You'll learn stories from the trenches--from builders of some of the biggest sites on the Web--on what's necessary to help a site thrive.

  • Learn the skills needed in web operations, and why they're gained through experience rather than schooling
  • Understand why it's important to gather metrics from both your application and infrastructure
  • Consider common approaches to database architectures and the pitfalls that come with increasing scale
  • Learn how to handle the human side of outages and degradations
  • Find out how one company avoided disaster after a huge traffic deluge
  • Discover what went wrong after a problem occurs, and how to prevent it from happening again

Contributors include:

John Allspaw

Heather Champ

Michael Christian

Richard Cook

Alistair Croll

Patrick Debois

Eric Florenzano

Paul Hammond

Justin Huff

Adam Jacob

Jacob Loomis

Matt Massie

Brian Moon

Anoop Nagwani

Sean Power

Eric Ries

Theo Schlossnagle

Baron Schwartz

Andrew Shafer

1100377695
Web Operations: Keeping the Data On Time

A web application involves many specialists, but it takes people in web ops to ensure that everything works together throughout an application's lifetime. It's the expertise you need when your start-up gets an unexpected spike in web traffic, or when a new feature causes your mature application to fail. In this collection of essays and interviews, web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into this evolving field. You'll learn stories from the trenches--from builders of some of the biggest sites on the Web--on what's necessary to help a site thrive.

  • Learn the skills needed in web operations, and why they're gained through experience rather than schooling
  • Understand why it's important to gather metrics from both your application and infrastructure
  • Consider common approaches to database architectures and the pitfalls that come with increasing scale
  • Learn how to handle the human side of outages and degradations
  • Find out how one company avoided disaster after a huge traffic deluge
  • Discover what went wrong after a problem occurs, and how to prevent it from happening again

Contributors include:

John Allspaw

Heather Champ

Michael Christian

Richard Cook

Alistair Croll

Patrick Debois

Eric Florenzano

Paul Hammond

Justin Huff

Adam Jacob

Jacob Loomis

Matt Massie

Brian Moon

Anoop Nagwani

Sean Power

Eric Ries

Theo Schlossnagle

Baron Schwartz

Andrew Shafer

35.99 In Stock
Web Operations: Keeping the Data On Time

Web Operations: Keeping the Data On Time

Web Operations: Keeping the Data On Time

Web Operations: Keeping the Data On Time

eBook

$35.99 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

A web application involves many specialists, but it takes people in web ops to ensure that everything works together throughout an application's lifetime. It's the expertise you need when your start-up gets an unexpected spike in web traffic, or when a new feature causes your mature application to fail. In this collection of essays and interviews, web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into this evolving field. You'll learn stories from the trenches--from builders of some of the biggest sites on the Web--on what's necessary to help a site thrive.

  • Learn the skills needed in web operations, and why they're gained through experience rather than schooling
  • Understand why it's important to gather metrics from both your application and infrastructure
  • Consider common approaches to database architectures and the pitfalls that come with increasing scale
  • Learn how to handle the human side of outages and degradations
  • Find out how one company avoided disaster after a huge traffic deluge
  • Discover what went wrong after a problem occurs, and how to prevent it from happening again

Contributors include:

John Allspaw

Heather Champ

Michael Christian

Richard Cook

Alistair Croll

Patrick Debois

Eric Florenzano

Paul Hammond

Justin Huff

Adam Jacob

Jacob Loomis

Matt Massie

Brian Moon

Anoop Nagwani

Sean Power

Eric Ries

Theo Schlossnagle

Baron Schwartz

Andrew Shafer


Product Details

ISBN-13: 9781449394158
Publisher: O'Reilly Media, Incorporated
Publication date: 06/21/2010
Sold by: Barnes & Noble
Format: eBook
Pages: 338
File size: 3 MB

About the Author

John Allspaw is currently Operations Engineering Manager at Flickr, the popular photo site. He has had extensive experience working with growing web sites since 1999. These include online news magazines Salon.com, InfoWorld.com, Macworld.com and social networking sites that experienced extreme growth (Friendster and Flickr). During his time at Friendster, traffic increased 5X. He was responsible for their transition from a couple dozen servers in a failing data center to over 400 machines across two data centers, and the complete redesign of the backing infrastructure. When he joined Flickr, they had 10 servers in a tiny data center in Vancouver; they are now located in multiple data centers across the US. Prior to his web experience, Allspaw worked in modeling and simulation as a mechanical engineer doing car crash simulations for the NHTSA.

Jesse Robbins (@jesserobbins) is CEO of Opscode (makers of Chef) and a recognized expert in Infrastructure, Web Operations, and Emergency Management.



He serves as co-chair of the Velocity Web Performance & Operations Conference and contributes to the O'Reilly Radar . Prior to co-founding Opscode, he worked at Amazon.com with a title of "Master of Disaster" where he was responsible for Website Availability for every property bearing the Amazon brand.



Robbins is a volunteer Firefighter/EMT and Emergency Manager, and led a task force deployed in Operation Hurricane Katrina. His experiences in the fire service profoundly influence his efforts in technology, and he strives to distill his knowledge from these two worlds and apply it in service of both.

Table of Contents

Dedication; Foreword; Preface; How This Book Is Organized; Who This Book Is For; Conventions Used in This Book; Using Code Examples; How to Contact Us; Safari® Books Online; Acknowledgments; Chapter 1: Web Operations: The Career; 1.1 Why Does Web Operations Have It Tough?; 1.2 From Apprentice to Master; 1.3 Conclusion; Chapter 2: How Picnik Uses Cloud Computing: Lessons Learned; 2.1 Where the Cloud Fits (and Why!); 2.2 Where the Cloud Doesn't Fit (for Picnik); 2.3 Conclusion; Chapter 3: Infrastructure and Application Metrics; 3.1 Time Resolution and Retention Concerns; 3.2 Locality of Metrics Collection and Storage; 3.3 Layers of Metrics; 3.4 Providing Context for Anomaly Detection and Alerts; 3.5 Log Lines Are Metrics, Too; 3.6 Correlation with Change Management and Incident Timelines; 3.7 Making Metrics Available to Your Alerting Mechanisms; 3.8 Using Metrics to Guide Load-Feedback Mechanisms; 3.9 A Metrics Collection System, Illustrated: Ganglia; 3.10 Conclusion; Chapter 4: Continuous Deployment; 4.1 Small Batches Mean Faster Feedback; 4.2 Small Batches Mean Problems Are Instantly Localized; 4.3 Small Batches Reduce Risk; 4.4 Small Batches Reduce Overhead; 4.5 The Quality Defenders' Lament; 4.6 Getting Started; 4.7 Continuous Deployment Is for Mission-Critical Applications; 4.8 Conclusion; Chapter 5: Infrastructure As Code; 5.1 Service-Oriented Architecture; 5.2 Conclusion; Chapter 6: Monitoring; 6.1 Story: "The Start of a Journey"; 6.2 Step 1: Understand What You Are Monitoring; 6.3 Step 2: Understand Normal Behavior; 6.4 Step 3: Be Prepared and Learn; 6.5 Conclusion; Chapter 7: How Complex Systems Fail; 7.1 How Complex Systems Fail; 7.2 Further Reading; Chapter 8: Community Management and Web Operations; Chapter 9: Dealing with Unexpected Traffic Spikes; 9.1 How It All Started; 9.2 Alarms Abound; 9.3 Putting Out the Fire; 9.4 Surviving the Weekend; 9.5 Preparing for the Future; 9.6 CDN to the Rescue; 9.7 Proxy Servers; 9.8 Corralling the Stampede; 9.9 Streamlining the Codebase; 9.10 How Do We Know It Works?; 9.11 The Real Test; 9.12 Lessons Learned; 9.13 Improvements Since Then; Chapter 10: Dev and Ops Collaboration and Cooperation; 10.1 Deployment; 10.2 Shared, Open Infrastructure; 10.3 Trust; 10.4 On-call Developers; 10.5 Avoiding Blame; 10.6 Conclusion; Chapter 11: How Your Visitors Feel: User-Facing Metrics; 11.1 Why Collect User-Facing Metrics?; 11.2 What Makes a Site Slow?; 11.3 Measuring Delay; 11.4 Building an SLA; 11.5 Visitor Outcomes: Analytics; 11.6 Other Metrics Marketing Cares About; 11.7 How User Experience Affects Web Ops; 11.8 The Future of Web Monitoring; 11.9 Conclusion; Chapter 12: Relational Database Strategy and Tactics for the Web; 12.1 Requirements for Web Databases; 12.2 How Typical Web Databases Grow; 12.3 The Yearning for a Cluster; 12.4 Database Strategy; 12.5 Database Tactics; 12.6 Conclusion; Chapter 13: How to Make Failure Beautiful: The Art and Science of Postmortems; 13.1 The Worst Postmortem; 13.2 What Is a Postmortem?; 13.3 When to Conduct a Postmortem; 13.4 Who to Invite to a Postmortem; 13.5 Running a Postmortem; 13.6 Postmortem Follow-Up; 13.7 Conclusion; Chapter 14: Storage; 14.1 Data Asset Inventory; 14.2 Data Protection; 14.3 Capacity Planning; 14.4 Storage Sizing; 14.5 Operations; 14.6 Conclusion; Chapter 15: Nonrelational Databases; 15.1 NoSQL Database Overview; 15.2 Some Systems in Detail; 15.3 Conclusion; Chapter 16: Agile Infrastructure; 16.1 Agile Infrastructure; 16.2 So, What's the Problem?; 16.3 Communities of Interest and Practice; 16.4 Trading Zones and Apologies; 16.5 Conclusion; Chapter 17: Things That Go Bump in the Night (and How to Sleep Through Them); 17.1 Definitions; 17.2 How Many 9s?; 17.3 Impact Duration Versus Incident Duration; 17.4 Datacenter Footprint; 17.5 Gradual Failures; 17.6 Trust Nobody; 17.7 Failover Testing; 17.8 Monitoring and History of Patterns; 17.9 Getting a Good Night's Sleep; Contributors; Colophon;
From the B&N Reads Blog

Customer Reviews