- Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
- Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
- Find out how to enable seamless integration, data management, and interoperability using Apache XTable
- Purchase of the print or Kindle book includes a free PDF eBook
- Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
- Gain a complete understanding of data lifecycle management in lakehouses
- Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
- Optimize performance with sorting, clustering, and indexing techniques
- Use the open table format data with ML frameworks like Spark MLlib, TensorFlow, and MLflow
- Interoperate across different table formats with Apache XTable and UniForm
- Secure your lakehouse with access controls and ensure regulatory compliance
This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.
- Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
- Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
- Find out how to enable seamless integration, data management, and interoperability using Apache XTable
- Purchase of the print or Kindle book includes a free PDF eBook
- Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs
- Gain a complete understanding of data lifecycle management in lakehouses
- Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
- Optimize performance with sorting, clustering, and indexing techniques
- Use the open table format data with ML frameworks like Spark MLlib, TensorFlow, and MLflow
- Interoperate across different table formats with Apache XTable and UniForm
- Secure your lakehouse with access controls and ensure regulatory compliance
This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.
Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
414
Engineering Lakehouses with Open Table Formats: Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
414Product Details
| ISBN-13: | 9781836207238 |
|---|---|
| Publisher: | Packt Publishing |
| Publication date: | 12/26/2025 |
| Pages: | 414 |
| Product dimensions: | 7.50(w) x 9.25(h) x 0.85(d) |