Hadoop Operations: A Guide for Developers and Administrators

Hadoop Operations: A Guide for Developers and Administrators

by Eric Sammer

NOOK Book(eBook)

$32.49 $42.99 Save 24% Current price is $32.49, Original price is $42.99. You Save 24%.
View All Available Formats & Editions

Available on Compatible NOOK Devices and the free NOOK Apps.
WANT A NOOK?  Explore Now


If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.

Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.

  • Get a high-level overview of HDFS and MapReduce: why they exist and how they work
  • Plan a Hadoop deployment, from hardware and OS selection to network requirements
  • Learn setup and configuration details with a list of critical properties
  • Manage resources by sharing a cluster across multiple groups
  • Get a runbook of the most common cluster maintenance tasks
  • Monitor Hadoop clusters—and learn troubleshooting with the help of real-world war stories
  • Use basic tools and techniques to handle backup and catastrophic failure

Related collections and offers

Product Details

ISBN-13: 9781449327293
Publisher: O'Reilly Media, Incorporated
Publication date: 09/26/2012
Sold by: Barnes & Noble
Format: NOOK Book
Pages: 298
File size: 6 MB

About the Author

Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He's been involved in the open source community and has contributed to a large number of projects over the last decade.

Table of Contents

Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
Chapter 1: Introduction;
Chapter 2: HDFS;
2.1 Goals and Motivation;
2.2 Design;
2.3 Daemons;
2.4 Reading and Writing Data;
2.5 Managing Filesystem Metadata;
2.6 Namenode High Availability;
2.7 Namenode Federation;
2.8 Access and Integration;
Chapter 3: MapReduce;
3.1 The Stages of MapReduce;
3.2 Introducing Hadoop MapReduce;
3.3 YARN;
Chapter 4: Planning a Hadoop Cluster;
4.1 Picking a Distribution and Version of Hadoop;
4.2 Hardware Selection;
4.3 Operating System Selection and Preparation;
4.4 Kernel Tuning;
4.5 Disk Configuration;
4.6 Network Design;
Chapter 5: Installation and Configuration;
5.1 Installing Hadoop;
5.2 Configuration: An Overview;
5.3 Environment Variables and Shell Scripts;
5.4 Logging Configuration;
5.5 HDFS;
5.6 Namenode High Availability;
5.7 Namenode Federation;
5.8 MapReduce;
5.9 Rack Topology;
5.10 Security;
Chapter 6: Identity, Authentication, and Authorization;
6.1 Identity;
6.2 Kerberos and Hadoop;
6.3 Authorization;
6.4 Tying It Together;
Chapter 7: Resource Management;
7.1 What Is Resource Management?;
7.2 HDFS Quotas;
7.3 MapReduce Schedulers;
Chapter 8: Cluster Maintenance;
8.1 Managing Hadoop Processes;
8.2 HDFS Maintenance Tasks;
8.3 MapReduce Maintenance Tasks;
Chapter 9: Troubleshooting;
9.1 Differential Diagnosis Applied to Systems;
9.2 Common Failures and Problems;
9.3 “Is the Computer Plugged In?”;
9.4 Treatment and Care;
9.5 War Stories;
Chapter 10: Monitoring;
10.1 An Overview;
10.2 Hadoop Metrics;
10.3 Health Monitoring;
Chapter 11: Backup and Recovery;
11.1 Data Backup;
11.2 Namenode Metadata;
Deprecated Configuration Properties;

Customer Reviews