This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance.
If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.
|Product dimensions:||7.50(w) x 9.25(h) x 0.25(d)|
About the Author
Kaled Tannir has been working with computers since 1980. He began programming with the legendary Sinclair Zx81 and after with all Commodore home computers products (Vic 20, Commodore 64, Commodore 128D and Amiga 500).
He has a Bachelor's degree in Electronics, a Master degree in System Information Architectures in which graduated with a professional thesis and completed its education with a Research Master degree.
He is a Microsoft Certified Solution Developer (MCSD) and has more than twenty years of technical experience leading the development, implementation of software solutions and giving technical presentations. He works as an independent IT Consultant and has worked as an infrastructure engineer, senior developer, and enterprise / solution architect for many companies in France and Canada.
With a very significant experience in Microsoft .Net/Servers and Oracle Java technologies, he has extensive skills in online/offline applications design, system conversions and multi-language applications in both industries Internet and Desktops.
He is always researching new technologies, learns about them and looking for new adventures between France, North America and the Middle-east area. He owns an IT and electronics laboratory with many servers, monitors, open electronics board such Arduino, Netduino, RaspBerry Pi, .Net Gadgeteer and some Smartphone devices based on Windows Phone, Android and iOS operating systems.
In 2012 he contributes to the EGC 2012 (International Complex Data Mining forum at Bordeaux University - France) and presented, in a workshop session, his work about "How to optimize data distribution in a cloud computing environment". This work aims to define an approach to optimize using of Data Mining algorithms such as k-means and Apriori in a cloud computing environment.
He is the author of the RavenDB 2.x Beginner's Guide book (Packt Publishing) and is a technical reviewer for the Pentaho+MongoDB transformation & reporting book(Packt Publishing)
He aims to get a PhD in Cloud Computing, Big Data and wants to learn more and more about these technologies.
He enjoys taking landscape and night photos, travelling, playing video games, creating funny electronics gadgets with Arduino /.Net Gadgeteer and of course spending time with his wife and family.
You can reach him at: firstname.lastname@example.org
Most Helpful Customer Reviews
I had a chance to review another book titled “Optimizing Hadoop for MapReduce” and must say this book is an good resource for devops professionals who build MapReduce programs in Hadoop. The book is well organized — starts off with introducing basic concepts, identifying system bottlenecks and resource weaknesses, suggesting ways to fix and optimize them, followed by Hadoop best practices and recommendations. Though packed with advanced concepts and information on Hadoop architecture, the author writing is such that it could appeal to all types of audience (from novice to expert) with helpful hints on each chapter. The first chapter on map reduce is written for people who are new to this paradigm. It contains pictorial representations on how the “low-level” MapReduce works. It’s easier to misunderstand the low-level MapReduce process and this chapter will clarify that. The second chapter discusses performance tuning parameters — allocating map/reduce tasks based on number of cores in the respective Hadoop cluster. It also suggests widely used cluster management tools such as Ambari, Chukwa, etc. The third and fourth chapter discusses identifying system bottlenecks and resource weaknesses respectively. The author takes an organized approach by introducing performance tuning process cycle and demystifying how various major components of a given Hadoop cluster (CPU, RAM, Storage and network bandwidth) could cause a bottleneck and how to eliminate them. Especially in the fourth chapter, I particularly liked the idea of discussing formulas that could be used as part of planning the Hadoop cluster and demonstrated using examples. The remaining three chapters focus on enhancing and optimizing the Map/Reduce tasks and best practices and recommendations. The author introduces performance metrics for Map/Reduce tasks and suggests ways to enhance the map/reduce tasks and fine-tuning parameters to improve performance of a MapReduce job. The final chapter on Best practices is packed with valuable information on hardware tuning for optimal performance of the Hadoop cluster and Hadoop best practices. Few minor points here and there should be read with caution. For instance, the author says each slave is called a task tracker in the first chapter — could have been better by saying it assumes the responsibilities of task tracker while in general it is actually called a data node. That is just my suggestion. In short, this book is a compilation of all the MapReduce performance related issues and ideas on troubleshooting and optimizing the performance of the same including best practices. Must have book especially for hadoop administrators and developers. This book is available at packtpub Pavan