Programming Elastic MapReduce: Using AWS services to build an end-to-end application

Overview

Although you don’t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).

Authors Kevin Schmidt and Christopher Phillips demonstrate best practices for using EMR and various AWS and Apache technologies by walking ...

See more details below
Other sellers (Paperback)
  • All (13) from $21.16   
  • New (10) from $21.16   
  • Used (3) from $21.16   
Programming Elastic MapReduce: Using AWS Services to Build an End-to-End Application

Available on NOOK devices and apps  
  • NOOK Devices
  • NOOK HD/HD+ Tablet
  • NOOK
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK Study
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$15.49
BN.com price
(Save 44%)$27.99 List Price

Overview

Although you don’t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS).

Authors Kevin Schmidt and Christopher Phillips demonstrate best practices for using EMR and various AWS and Apache technologies by walking you through the construction of a sample MapReduce log analysis application. Using code samples and example configurations, you’ll learn how to assemble the building blocks necessary to solve your biggest data analysis problems.

  • Get an overview of the AWS and Apache software tools used in large-scale data analysis
  • Go through the process of executing a Job Flow with a simple log analyzer
  • Discover useful MapReduce patterns for filtering and analyzing data sets
  • Use Apache Hive and Pig instead of Java to build a MapReduce Job Flow
  • Learn the basics for using Amazon EMR to run machine learning algorithms
  • Develop a project cost model for using Amazon EMR and other AWS tools
Read More Show Less

Product Details

  • ISBN-13: 9781449363628
  • Publisher: O'Reilly Media, Incorporated
  • Publication date: 1/3/2014
  • Edition number: 1
  • Pages: 174
  • Product dimensions: 7.00 (w) x 9.19 (h) x 0.43 (d)

Meet the Author

Kevin J. Schmidt is a senior manager at Dell SecureWorks, Inc., anindustry leading MSSP, which is part of Dell. He is responsible for the design and development of a major part of the company’s SIEM platform. This includes data acquisition, correlation, and analysis of log data. Prior to SecureWorks, Kevin worked for Reflex Security, where he worked on an IPS engine and anti-virus software. And prior to this, he was a lead developer and architect at GuardedNet, Inc., which built one of the industry’s first SIEM platforms.

He is also a commissioned officer in the United States Navy Reserve (USNR). He has over 19 years of experience in software development and design, 11 of which have been in the network security space. He holds a Bachelor of Science in Computer Science.

Kevin has spent time designing cloud services components at Dell, including virtualized components to run in Dell’s own vCloud. These components are used to protect customers who use Dell’s cloud infrastructure. Additionally, he has been working with Hadoop, machine learning, and other technology in the cloud.

Kevin is co-author of Essential SNMP, second edition (O’Reilly and Associates, ISBN: 978-0-596-00840-6) and also Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management (Syngress, ISBN: 978-1-597-49635-3).

Christopher Phillips is a manager and senior software developer at Dell SecureWorks, Inc, an industry leading MSSP, which is part of Dell. He is responsible for the design and development of the company’s Threat Intelligence service platform. He also has responsibility for a team involved in integrating log and event information from many third-party providers that allow customers to have all of their core security information delivered to and analyzed by the Dell SecureWorks systems and security professionals.

Prior to Dell SecureWorks, Chris worked for McKesson and Allscripts, where he worked with clients on HIPAA compliance, security, and healthcare systems integration. He has over 18 years of experience in software development and design. He holds a Bachelor of Science in Computer Science and an MBA.

Chris has spent time designing and developing virtualization and cloud Infrastructure as a Service strategies at Dell to help our security services scale globally Additionally, he has been working with Hadoop, Pig scripting languages, and Amazon Elastic Map Reduce to develop strategies to gain insights and analyze Big Data issues in the cloud.

Chris is co-author of Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management (Syngress, ISBN: 978-1-597-49635-3).

Read More Show Less

Table of Contents

Preface;
What Is AWS?;
What’s in This Book?;
Sign Up for AWS;
Code Samples in This Book;
Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
Acknowledgments;
Chapter 1: Introduction to Amazon Elastic MapReduce;
1.1 Amazon Web Services Used in This Book;
1.2 Amazon Elastic MapReduce;
1.3 Amazon EMR and the Hadoop Ecosystem;
1.4 Amazon Elastic MapReduce Versus Traditional Hadoop Installs;
1.5 Application Building Blocks;
Chapter 2: Data Collection and Data Analysis with AWS;
2.1 Log Analysis Application;
2.2 Log Messages as a Data Set for Analytics;
2.3 Understanding MapReduce;
2.4 Collection Stage;
2.5 Simulating Syslog Data;
2.6 Developing a MapReduce Application;
2.7 Custom JAR MapReduce Job;
2.8 Running an Amazon EMR Cluster;
2.9 Viewing Our Results;
2.10 Debugging a Job Flow;
2.11 Our Application and Real-World Uses;
Chapter 3: Data Filtering Design Patterns and Scheduling Work;
3.1 Extending the Application Example;
3.2 Understanding Web Server Logs;
3.3 Finding Errors in the Web Logs Using Data Filtering;
3.4 Building Summary Counts in Data Sets;
3.5 Job Flow Scheduling;
3.6 Scheduling with AWS Data Pipeline;
3.7 Real-World Uses;
Chapter 4: Data Analysis with Hive and Pig in Amazon EMR;
4.1 Amazon Job Flow Technologies;
4.2 What Is Pig?;
4.3 Utilizing Pig in Amazon EMR;
4.4 What Is Hive?;
4.5 Utilizing Hive in Amazon EMR;
4.6 Our Application with Hive and Pig;
Chapter 5: Machine Learning Using EMR;
5.1 A Quick Tour of Machine Learning;
5.2 Python and EMR;
5.3 What’s Next?;
Chapter 6: Planning AWS Projects and Managing Costs;
6.1 Developing a Project Cost Model;
6.2 Optimizing AWS Resources to Reduce Project Costs;
6.3 Amazon Tools for Estimating Your Project Costs;
Amazon Web Services Resources and Tools;
Amazon AWS Online Resources;
Amazon AWS Cost Estimation Tools;
AWS Best Practices and Architecture;
Amazon EMR Distributions;
Cloud Computing, Amazon Web Services, and Their Impacts;
AWS Service Delivery Models;
Performance;
Elasticity and Growth;
Security;
Uptime and Availability;
Installation and Setup;
Prerequisites;
Installing Hadoop;
Building MapReduce Applications;
Running MapReduce Applications Locally;
Installing Pig;
Installing Hive;
Index;
Colophon;

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)