Apache Hadoop is a big data processing framework that allows large data sets to be stored and analysed across clusters of computers. Due to it's ease of use and highly-available service, Hadoop is widely used by companies including Facebook, Ebay and Google for research and production purposes.
Learn to tame the elephant and master Hadoop with this comprehensive and invaluable guide to developing your Hadoop skills and knowledge.
Firstly this book will refresh your Hadoop knowledge and introduce you to the new enhancements that Hadoop 2 brings to the table. Next, we'll walk you through optimizations with Hadoop MapReduce, Pig and Hive, before teaching you about the different data types available and Hadoop I/O. Next, YARN and Storm will be fully covered and you'll learn how to use YARN to integrate Storm with Hadoop. HDFS Federation will also be covered in depth and with our chapter on HDFS replacements you will gain insights into why you may need a HDFS replacement and what you can use for this.
Additionally, you'll learn how you can deploy Hadoop on the cloud, how to secure Hadoop and finally how to use analytics tools within the Hadoop ecosystem with a case study. As a useful bonus, you'll find a practical walk through of deploying Hadoop on Microsoft Windows in the Appendix.