Premium Members Get 10% Off and Earn Rewards Find Out More

Programming Hive: Data Warehouse and Query Language for Hadoop

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.

Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
Customize data formats and storage options, from files to external databases
Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
Gain best practices for creating user defined functions (UDFs)
Learn Hive patterns you should use and anti-patterns you should avoid
Integrate Hive with other data processing programs
Use storage handlers for NoSQL databases and other datastores
Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce

1125060423

Programming Hive: Data Warehouse and Query Language for Hadoop

Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
Customize data formats and storage options, from files to external databases
Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
Gain best practices for creating user defined functions (UDFs)
Learn Hive patterns you should use and anti-patterns you should avoid
Integrate Hive with other data processing programs
Use storage handlers for NoSQL databases and other datastores
Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce

39.99 In Stock

Programming Hive: Data Warehouse and Query Language for Hadoop

Add to Wishlist

Programming Hive: Data Warehouse and Query Language for Hadoop

Paperback

$39.99

View All Available Formats & Editions

Paperback
$39.99

View All Available Formats & Editions

SHIP THIS ITEM

Qualifies for Free Shipping
PICK UP IN STORE
Check Availability at Nearby Stores

Available within 2 business hours

Want it Today?
Check Store Availability

Related collections and offers

Overview

Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
Customize data formats and storage options, from files to external databases
Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
Gain best practices for creating user defined functions (UDFs)
Learn Hive patterns you should use and anti-patterns you should avoid
Integrate Hive with other data processing programs
Use storage handlers for NoSQL databases and other datastores
Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce

Product Details

ISBN-13:	9781449319335
Publisher:	O'Reilly Media, Incorporated
Publication date:	10/09/2012
Pages:	328
Product dimensions:	9.00(w) x 7.00(h) x 0.80(d)

About the Author

Edward Capriolo is currently System Administrator at Media6degrees where he helps design and maintain distributed data storage systems for the internet advertising industry.

Edward is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. He has experience as a developer as well Linux and network administrator and enjoys the rich world of open source software.

Dean Wampler is a Principal Consultant at Think Big Analytics, where he specializes in "Big Data" problems and tools like Hadoop and Machine Learning. Besides Big Data, he specializes in Scala, the JVM ecosystem, JavaScript, Ruby, functional and object-oriented programming, and Agile methods. Dean is a frequent speaker at industry and academic conferences on these topics. He has a Ph.D. in Physics from the University of Washington.

Jason Rutherglen is a software architect at Think Big Analytics and specializes in Big Data, Hadoop, search, and security.

Preface
Chapter 1: Introduction
Chapter 2: Getting Started
Chapter 3: Data Types and File Formats
Chapter 4: HiveQL: Data Definition
Chapter 5: HiveQL: Data Manipulation
Chapter 6: HiveQL: Queries
Chapter 7: HiveQL: Views
Chapter 8: HiveQL: Indexes
Chapter 9: Schema Design
Chapter 10: Tuning
Chapter 11: Other File Formats and Compression
Chapter 12: Developing
Chapter 13: Functions
Chapter 14: Streaming
Chapter 15: Customizing Hive File and Record Formats
Chapter 16: Hive Thrift Service
Chapter 17: Storage Handlers and NoSQL
Chapter 18: Security
Chapter 19: Locking
Chapter 20: Hive Integration with Oozie
Chapter 21: Hive and Amazon Web Services (AWS)
Chapter 22: HCatalog
Chapter 23: Case Studies
Glossary
References
Colophon

From the B&N Reads Blog

Page 1 of

Programming Hive: Data Warehouse and Query Language for Hadoop

Programming Hive: Data Warehouse and Query Language for Hadoop

Paperback

Paperback

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Related Subjects

Customer Reviews