Gift Guide


Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works ...

See more details below
Programming Hive

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK 7.0
  • Samsung Galaxy Tab 4 NOOK 10.1
  • NOOK HD Tablet
  • NOOK HD+ Tablet
  • NOOK eReaders
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$18.99 price
(Save 44%)$33.99 List Price


Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.

  • Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
  • Customize data formats and storage options, from files to external databases
  • Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
  • Gain best practices for creating user defined functions (UDFs)
  • Learn Hive patterns you should use and anti-patterns you should avoid
  • Integrate Hive with other data processing programs
  • Use storage handlers for NoSQL databases and other datastores
  • Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce
Read More Show Less

Product Details

  • ISBN-13: 9781449326975
  • Publisher: O'Reilly Media, Incorporated
  • Publication date: 9/19/2012
  • Sold by: Barnes & Noble
  • Format: eBook
  • Edition number: 1
  • Pages: 352
  • Sales rank: 499,340
  • File size: 7 MB

Meet the Author

Edward Capriolo is currently System Administrator at Media6degrees where he helps design and maintain distributed data storage systems for the internet advertising industry.

Edward is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. He has experience as a developer as well Linux and network administrator and enjoys the rich world of open source software.

Dean Wampler is a Principal Consultant at Think Big Analytics, where he specializes in "Big Data" problems and tools like Hadoop and Machine Learning. Besides Big Data, he specializes in Scala, the JVM ecosystem, JavaScript, Ruby, functional and object-oriented programming, and Agile methods. Dean is a frequent speaker at industry and academic conferences on these topics. He has a Ph.D. in Physics from the University of Washington.

Jason Rutherglen is a software architect at Think Big Analytics and specializes in Big Data, Hadoop, search, and security.

Read More Show Less

Table of Contents

Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
What Brought Us to Hive?;
Chapter 1: Introduction;
1.1 An Overview of Hadoop and MapReduce;
1.2 Hive in the Hadoop Ecosystem;
1.3 Java Versus Hive: The Word Count Algorithm;
1.4 What’s Next;
Chapter 2: Getting Started;
2.1 Installing a Preconfigured Virtual Machine;
2.2 Detailed Installation;
2.3 What Is Inside Hive?;
2.4 Starting Hive;
2.5 Configuring Your Hadoop Environment;
2.6 The Hive Command;
2.7 The Command-Line Interface;
Chapter 3: Data Types and File Formats;
3.1 Primitive Data Types;
3.2 Collection Data Types;
3.3 Text File Encoding of Data Values;
3.4 Schema on Read;
Chapter 4: HiveQL: Data Definition;
4.1 Databases in Hive;
4.2 Alter Database;
4.3 Creating Tables;
4.4 Partitioned, Managed Tables;
4.5 Dropping Tables;
4.6 Alter Table;
Chapter 5: HiveQL: Data Manipulation;
5.1 Loading Data into Managed Tables;
5.2 Inserting Data into Tables from Queries;
5.3 Creating Tables and Loading Them in One Query;
5.4 Exporting Data;
Chapter 6: HiveQL: Queries;
6.1 SELECT … FROM Clauses;
6.2 WHERE Clauses;
6.3 GROUP BY Clauses;
6.4 JOIN Statements;
6.8 Casting;
6.9 Queries that Sample Data;
Chapter 7: HiveQL: Views;
7.1 Views to Reduce Query Complexity;
7.2 Views that Restrict Data Based on Conditions;
7.3 Views and Map Type for Dynamic Tables;
7.4 View Odds and Ends;
Chapter 8: HiveQL: Indexes;
8.1 Creating an Index;
8.2 Rebuilding the Index;
8.3 Showing an Index;
8.4 Dropping an Index;
8.5 Implementing a Custom Index Handler;
Chapter 9: Schema Design;
9.1 Table-by-Day;
9.2 Over Partitioning;
9.3 Unique Keys and Normalization;
9.4 Making Multiple Passes over the Same Data;
9.5 The Case for Partitioning Every Table;
9.6 Bucketing Table Data Storage;
9.7 Adding Columns to a Table;
9.8 Using Columnar Tables;
9.9 (Almost) Always Use Compression!;
Chapter 10: Tuning;
10.1 Using EXPLAIN;
10.3 Limit Tuning;
10.4 Optimized Joins;
10.5 Local Mode;
10.6 Parallel Execution;
10.7 Strict Mode;
10.8 Tuning the Number of Mappers and Reducers;
10.9 JVM Reuse;
10.10 Indexes;
10.11 Dynamic Partition Tuning;
10.12 Speculative Execution;
10.13 Single MapReduce MultiGROUP BY;
10.14 Virtual Columns;
Chapter 11: Other File Formats and Compression;
11.1 Determining Installed Codecs;
11.2 Choosing a Compression Codec;
11.3 Enabling Intermediate Compression;
11.4 Final Output Compression;
11.5 Sequence Files;
11.6 Compression in Action;
11.7 Archive Partition;
11.8 Compression: Wrapping Up;
Chapter 12: Developing;
12.1 Changing Log4J Properties;
12.2 Connecting a Java Debugger to Hive;
12.3 Building Hive from Source;
12.4 Setting Up Hive and Eclipse;
12.5 Hive in a Maven Project;
12.6 Unit Testing in Hive with hive_test;
12.7 The New Plugin Developer Kit;
Chapter 13: Functions;
13.1 Discovering and Describing Functions;
13.2 Calling Functions;
13.3 Standard Functions;
13.4 Aggregate Functions;
13.5 Table Generating Functions;
13.6 A UDF for Finding a Zodiac Sign from a Day;
13.7 UDF Versus GenericUDF;
13.8 Permanent Functions;
13.9 User-Defined Aggregate Functions;
13.10 User-Defined Table Generating Functions;
13.11 Accessing the Distributed Cache from a UDF;
13.12 Annotations for Use with Functions;
13.13 Macros;
Chapter 14: Streaming;
14.1 Identity Transformation;
14.2 Changing Types;
14.3 Projecting Transformation;
14.4 Manipulative Transformations;
14.5 Using the Distributed Cache;
14.6 Producing Multiple Rows from a Single Row;
14.7 Calculating Aggregates with Streaming;
14.9 GenericMR Tools for Streaming to Java;
14.10 Calculating Cogroups;
Chapter 15: Customizing Hive File and Record Formats;
15.1 File Versus Record Formats;
15.2 Demystifying CREATE TABLE Statements;
15.3 File Formats;
15.4 Record Formats: SerDes;
15.5 CSV and TSV SerDes;
15.6 ObjectInspector;
15.7 Think Big Hive Reflection ObjectInspector;
15.8 XML UDF;
15.9 XPath-Related Functions;
15.10 JSON SerDe;
15.11 Avro Hive SerDe;
15.12 Binary Output;
Chapter 16: Hive Thrift Service;
16.1 Starting the Thrift Server;
16.2 Setting Up Groovy to Connect to HiveService;
16.3 Connecting to HiveServer;
16.4 Getting Cluster Status;
16.5 Result Set Schema;
16.6 Fetching Results;
16.7 Retrieving Query Plan;
16.8 Metastore Methods;
16.9 Administrating HiveServer;
16.10 Hive ThriftMetastore;
Chapter 17: Storage Handlers and NoSQL;
17.1 Storage Handler Background;
17.2 HiveStorageHandler;
17.3 HBase;
17.4 Cassandra;
17.5 DynamoDB;
Chapter 18: Security;
18.1 Integration with Hadoop Security;
18.2 Authentication with Hive;
18.3 Authorization in Hive;
Chapter 19: Locking;
19.1 Locking Support in Hive with Zookeeper;
19.2 Explicit, Exclusive Locks;
Chapter 20: Hive Integration with Oozie;
20.1 Oozie Actions;
20.2 A Two-Query Workflow;
20.3 Oozie Web Console;
20.4 Variables in Workflows;
20.5 Capturing Output;
20.6 Capturing Output to Variables;
Chapter 21: Hive and Amazon Web Services (AWS);
21.1 Why Elastic MapReduce?;
21.2 Instances;
21.3 Before You Start;
21.4 Managing Your EMR Hive Cluster;
21.5 Thrift Server on EMR Hive;
21.6 Instance Groups on EMR;
21.7 Configuring Your EMR Cluster;
21.8 Persistence and the Metastore on EMR;
21.9 HDFS and S3 on EMR Cluster;
21.10 Putting Resources, Configs, and Bootstrap Scripts on S3;
21.11 Logs on S3;
21.12 Spot Instances;
21.13 Security Groups;
21.14 EMR Versus EC2 and Apache Hive;
21.15 Wrapping Up;
Chapter 22: HCatalog;
22.1 Introduction;
22.2 MapReduce;
22.3 Command Line;
22.4 Security Model;
22.5 Architecture;
Chapter 23: Case Studies;
23.1 (Media6Degrees);
23.2 Outbrain;
23.3 NASA’s Jet Propulsion Laboratory;
23.4 Photobucket;
23.5 SimpleReach;
23.6 Experiences and Needs from the Customer Trenches;

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)