Effective Monitoring and Alerting: For Web Operations
With this practical book, you’ll discover how to catch complications in your distributed system before they develop into costly problems. Based on his extensive experience in systems ops at large technology companies, author Slawek Ligus describes an effective data-driven approach for monitoring and alerting that enables you to maintain high availability and deliver a high quality of service.

Learn methods for measuring state changes and data flow in your system, and set up alerts to help you recover quickly from problems when they do arise. If you’re a system operator waging the daily battle to provide the best performance at the lowest cost, this book is for you.

  • Monitor every component of your application stack, from the network to user experience
  • Learn how to draw the right conclusions from the metrics you obtain
  • Develop a robust alerting system that can identify problematic anomalies—without raising false alarms
  • Address system failures by their impact on resource utilization and user experience
  • Plan an alerting configuration that scales with your expanding network
  • Learn how to choose appropriate maintenance times automatically
  • Develop a work environment that fosters flexibility and adaptability
1113138230
Effective Monitoring and Alerting: For Web Operations
With this practical book, you’ll discover how to catch complications in your distributed system before they develop into costly problems. Based on his extensive experience in systems ops at large technology companies, author Slawek Ligus describes an effective data-driven approach for monitoring and alerting that enables you to maintain high availability and deliver a high quality of service.

Learn methods for measuring state changes and data flow in your system, and set up alerts to help you recover quickly from problems when they do arise. If you’re a system operator waging the daily battle to provide the best performance at the lowest cost, this book is for you.

  • Monitor every component of your application stack, from the network to user experience
  • Learn how to draw the right conclusions from the metrics you obtain
  • Develop a robust alerting system that can identify problematic anomalies—without raising false alarms
  • Address system failures by their impact on resource utilization and user experience
  • Plan an alerting configuration that scales with your expanding network
  • Learn how to choose appropriate maintenance times automatically
  • Develop a work environment that fosters flexibility and adaptability
21.99 In Stock
Effective Monitoring and Alerting: For Web Operations

Effective Monitoring and Alerting: For Web Operations

by Slawek Ligus
Effective Monitoring and Alerting: For Web Operations

Effective Monitoring and Alerting: For Web Operations

by Slawek Ligus

Paperback

$21.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

With this practical book, you’ll discover how to catch complications in your distributed system before they develop into costly problems. Based on his extensive experience in systems ops at large technology companies, author Slawek Ligus describes an effective data-driven approach for monitoring and alerting that enables you to maintain high availability and deliver a high quality of service.

Learn methods for measuring state changes and data flow in your system, and set up alerts to help you recover quickly from problems when they do arise. If you’re a system operator waging the daily battle to provide the best performance at the lowest cost, this book is for you.

  • Monitor every component of your application stack, from the network to user experience
  • Learn how to draw the right conclusions from the metrics you obtain
  • Develop a robust alerting system that can identify problematic anomalies—without raising false alarms
  • Address system failures by their impact on resource utilization and user experience
  • Plan an alerting configuration that scales with your expanding network
  • Learn how to choose appropriate maintenance times automatically
  • Develop a work environment that fosters flexibility and adaptability

Product Details

ISBN-13: 9781449333522
Publisher: O'Reilly Media, Incorporated
Publication date: 12/05/2012
Pages: 164
Product dimensions: 6.90(w) x 9.10(h) x 0.40(d)

About the Author

Slawek is a systems and software engineer with a background in web operations and service-oriented architectures. He specializes in implementing solutions to tough problems in large-scale information systems. Slawek has been involved in automation of infrastructures and product development, working with leading Internet giants.

Table of Contents

Preface; Who Should Read This Book; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgements; Chapter 1: Introduction; 1.1 Monitoring, Alerting, and What They Can Do for You; 1.2 Monitoring and Alerting in a Nutshell; 1.3 The Challenges; 1.4 Important Terms; Chapter 2: Monitoring; 2.1 The Building Blocks; 2.2 Drawing Conclusions from Timeseries Plots; Chapter 3: Alerting; 3.1 The Challenge; 3.2 Prerequisites; 3.3 Understanding Failure and Its Impact; 3.4 Anatomy of an Alarm; 3.5 Case Study: A Data Pipeline; 3.6 Types of Alerts; 3.7 Setting Up Alarms; 3.8 Alerting Suggestions; Chapter 4: At Scale; 4.1 Implications of Scale; 4.2 Composition of Large-Scale Systems; 4.3 Commonalities of Large-Scale Alerting Configurations; 4.4 Monitoring Coverage; 4.5 Managing Large Alerting Configurations; Chapter 5: Monitoring in System Automation; 5.1 Choosing Appropriate Maintenance Times Automatically; 5.2 Controlling the Rate of Upgrade; 5.3 Recovery-Oriented Admission Control; 5.4 Automated Deployment and Rollback; Chapter 6: The Work Environment; 6.1 Keeping an Audit Trail; 6.2 Working with Tickets; 6.3 Dealing with Anomalies; 6.4 Learning from Outages; 6.5 Using Checklists; 6.6 Creating Dashboards; 6.7 Service-Level Agreements; 6.8 Preventing the Ironies of Automation; 6.9 Culture; Chapter 7: Measuring Success; 7.1 The Feedback Loop; 7.2 Ticket Reporting; 7.3 Measuring Detectability; 7.4 Transition to Automated Alarms; 7.5 Maintenance Overhead; 7.6 How (Not) to Measure; Chapter 8: The Principles; 8.1 Get in the Habit of Measuring; 8.2 Draw Conclusions Reliably; 8.3 Monitor Extensively; 8.4 Alarm Selectively; 8.5 Work Smart, Not Hard; Setting Up OpenTSDB; The Software; First Steps; Gathering Data System-Wide; Timeseries Plots; Get Involved;
From the B&N Reads Blog

Customer Reviews