Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
Jump—start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats

  • Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
  • Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
  • Find out how to enable seamless integration, data management, and interoperability using Apache XTable
  • Purchase of the print or Kindle book includes a free PDF eBook
Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. You'll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You’ll also work with each table format with hands—on exercises using popular computing engines, such as Apache Spark, Flink, Trino, dbt, and Python—based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production—ready lakehouses. With step—by—step explanations, you’ll get to grips with the key components of lakehouse architecture and build, maintain, and optimize them. By the end of this book, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real—world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization’s data needs.
  • Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
  • Gain a complete understanding of data lifecycle management in lakehouses
  • Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
  • Optimize performance with sorting, clustering, and indexing techniques
  • Use the open table format data with ML frameworks like Spark MLlib, TensorFlow, and MLflow
  • Interoperate across different table formats with Apache XTable and UniForm
  • Secure your lakehouse with access controls and ensure regulatory compliance

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.

1146435996
Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
Jump—start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats

  • Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
  • Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
  • Find out how to enable seamless integration, data management, and interoperability using Apache XTable
  • Purchase of the print or Kindle book includes a free PDF eBook
Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. You'll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You’ll also work with each table format with hands—on exercises using popular computing engines, such as Apache Spark, Flink, Trino, dbt, and Python—based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production—ready lakehouses. With step—by—step explanations, you’ll get to grips with the key components of lakehouse architecture and build, maintain, and optimize them. By the end of this book, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real—world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization’s data needs.
  • Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
  • Gain a complete understanding of data lifecycle management in lakehouses
  • Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
  • Optimize performance with sorting, clustering, and indexing techniques
  • Use the open table format data with ML frameworks like Spark MLlib, TensorFlow, and MLflow
  • Interoperate across different table formats with Apache XTable and UniForm
  • Secure your lakehouse with access controls and ensure regulatory compliance

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.

44.99 In Stock
Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Paperback

$44.99 
  • SHIP THIS ITEM
    In stock. Ships in 1-2 days.
  • PICK UP IN STORE

    Your local store may have stock of this item.

Related collections and offers


Overview

Jump—start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats

  • Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
  • Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
  • Find out how to enable seamless integration, data management, and interoperability using Apache XTable
  • Purchase of the print or Kindle book includes a free PDF eBook
Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. You'll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You’ll also work with each table format with hands—on exercises using popular computing engines, such as Apache Spark, Flink, Trino, dbt, and Python—based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production—ready lakehouses. With step—by—step explanations, you’ll get to grips with the key components of lakehouse architecture and build, maintain, and optimize them. By the end of this book, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real—world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization’s data needs.
  • Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
  • Gain a complete understanding of data lifecycle management in lakehouses
  • Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
  • Optimize performance with sorting, clustering, and indexing techniques
  • Use the open table format data with ML frameworks like Spark MLlib, TensorFlow, and MLflow
  • Interoperate across different table formats with Apache XTable and UniForm
  • Secure your lakehouse with access controls and ensure regulatory compliance

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.


Product Details

ISBN-13: 9781836207238
Publisher: Packt Publishing
Publication date: 12/26/2025
Pages: 414
Product dimensions: 7.50(w) x 9.25(h) x 0.85(d)

About the Author

Dipankar Mazumdar is currently a Staff Data Engineer Advocate at Onehouse.ai, where he focuses on open source projects such as Apache Hudi and XTable to help engineering teams build and scale robust data analytics platforms. Before this, he worked on critical open source projects such as Apache Iceberg and Apache Arrow at Dremio. For most of his career, he worked at the intersection of data visualization and machine learning. He has also been a speaker at numerous conferences, such as Data+AI, ApacheCon, Scale By the Bay, and Data Day Texas, among others. Dipankar has a master's degree in computer science with research focused on explainable AI techniques.

Vinoth Govindarajan is a seasoned data expert and staff software engineer at Apple Inc., where he spearheads data platforms using open—source technologies like Iceberg, Spark, Trino, and Flink. Before this, he worked on designing incremental ETL frameworks for real—time data processing at Uber. He is a dedicated contributor to the open source community in projects such as Apache Hudi and dbt—spark. As a thought leader, Vinoth has shared his expertise through speaking engagements at conferences such as dbt Coalesce and Hudi OSS community meetups. He has published several blogs on building open lakehouses. Holding a bachelor's degree in information technology, Vinoth has also authored multiple research papers published in journals like IEEE.

Table of Contents

  1. Open Data Lakehouse – a New Architectural Paradigm
  2. Transactional Capabilities in Lakehouse
  3. Apache Iceberg Deep Dive
  4. Apache Hudi Deep Dive
  5. Delta Lake Deep Dive
  6. Catalogs and Metatdata Management
  7. Interoperability and Data Federation
  8. Performance Optimization and Tuning
  9. Data Governance and Security in Lakehouse
  10. Decisions on Open Table Formats
  11. Real—World Lakehouse Use Cases
From the B&N Reads Blog

Customer Reviews