Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
Confidently orchestrate your data pipelines with Apache Airflow by applying industry best practices and scalable strategies

Key Features

  • Seamlessly migrate from Airflow 1.x to 2.x and explore the key features and improvements in version 2.x
  • Learn Apache Airflow workflow authoring through practical, real-world use cases
  • Discover strategies to optimize and scale Airflow pipelines for high availability and operational resilience
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Data professionals face the challenge of managing complex data pipelines, orchestrating workflows across diverse systems, and ensuring scalable, reliable data processing. This definitive guide to mastering Apache Airflow, written by experts in engineering, data strategy, and problem-solving across tech, financial, and life sciences industries, is your key to overcoming these challenges. Covering everything from Airflow fundamentals to advanced topics such as custom plugin development, multi-tenancy, and cloud deployment, this book provides a structured approach to workflow orchestration. You’ll start with an introduction to data orchestration and Apache Airflow 2.x updates, followed by DAG authoring, managing Airflow components, and connecting to external data sources. Through real-world use cases, you’ll learn how to implement ETL pipelines and orchestrate ML workflows in your environment, and scale Airflow for high availability and performance. You’ll also learn how to deploy Airflow in cloud environments, tackle operational considerations for scaling, and apply best practices for CI/CD and monitoring. By the end of this book, you’ll be proficient in operating and using Apache Airflow, authoring high-quality workflows in Python, and making informed decisions crucial for production-ready Airflow implementations.

What you will learn

  • Explore the new features and improvements in Apache Airflow 2.0
  • Design and build scalable data pipelines using DAGs
  • Implement ETL pipelines, ML workflows, and advanced orchestration strategies
  • Develop and deploy custom plugins and UI extensions
  • Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
  • Plan and execute a scalable deployment strategy for long-term growth
  • Apply best practices for monitoring and maintaining Airflow

Who this book is for

This book is ideal for data engineers, developers, IT professionals, and data scientists looking to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow’s potential and want to avoid common implementation pitfalls. Whether you’re new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.

1146386250
Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
Confidently orchestrate your data pipelines with Apache Airflow by applying industry best practices and scalable strategies

Key Features

  • Seamlessly migrate from Airflow 1.x to 2.x and explore the key features and improvements in version 2.x
  • Learn Apache Airflow workflow authoring through practical, real-world use cases
  • Discover strategies to optimize and scale Airflow pipelines for high availability and operational resilience
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Data professionals face the challenge of managing complex data pipelines, orchestrating workflows across diverse systems, and ensuring scalable, reliable data processing. This definitive guide to mastering Apache Airflow, written by experts in engineering, data strategy, and problem-solving across tech, financial, and life sciences industries, is your key to overcoming these challenges. Covering everything from Airflow fundamentals to advanced topics such as custom plugin development, multi-tenancy, and cloud deployment, this book provides a structured approach to workflow orchestration. You’ll start with an introduction to data orchestration and Apache Airflow 2.x updates, followed by DAG authoring, managing Airflow components, and connecting to external data sources. Through real-world use cases, you’ll learn how to implement ETL pipelines and orchestrate ML workflows in your environment, and scale Airflow for high availability and performance. You’ll also learn how to deploy Airflow in cloud environments, tackle operational considerations for scaling, and apply best practices for CI/CD and monitoring. By the end of this book, you’ll be proficient in operating and using Apache Airflow, authoring high-quality workflows in Python, and making informed decisions crucial for production-ready Airflow implementations.

What you will learn

  • Explore the new features and improvements in Apache Airflow 2.0
  • Design and build scalable data pipelines using DAGs
  • Implement ETL pipelines, ML workflows, and advanced orchestration strategies
  • Develop and deploy custom plugins and UI extensions
  • Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
  • Plan and execute a scalable deployment strategy for long-term growth
  • Apply best practices for monitoring and maintaining Airflow

Who this book is for

This book is ideal for data engineers, developers, IT professionals, and data scientists looking to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow’s potential and want to avoid common implementation pitfalls. Whether you’re new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.

35.99 In Stock
Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow

Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow

Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow

Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow

eBook

$35.99 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

Confidently orchestrate your data pipelines with Apache Airflow by applying industry best practices and scalable strategies

Key Features

  • Seamlessly migrate from Airflow 1.x to 2.x and explore the key features and improvements in version 2.x
  • Learn Apache Airflow workflow authoring through practical, real-world use cases
  • Discover strategies to optimize and scale Airflow pipelines for high availability and operational resilience
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Data professionals face the challenge of managing complex data pipelines, orchestrating workflows across diverse systems, and ensuring scalable, reliable data processing. This definitive guide to mastering Apache Airflow, written by experts in engineering, data strategy, and problem-solving across tech, financial, and life sciences industries, is your key to overcoming these challenges. Covering everything from Airflow fundamentals to advanced topics such as custom plugin development, multi-tenancy, and cloud deployment, this book provides a structured approach to workflow orchestration. You’ll start with an introduction to data orchestration and Apache Airflow 2.x updates, followed by DAG authoring, managing Airflow components, and connecting to external data sources. Through real-world use cases, you’ll learn how to implement ETL pipelines and orchestrate ML workflows in your environment, and scale Airflow for high availability and performance. You’ll also learn how to deploy Airflow in cloud environments, tackle operational considerations for scaling, and apply best practices for CI/CD and monitoring. By the end of this book, you’ll be proficient in operating and using Apache Airflow, authoring high-quality workflows in Python, and making informed decisions crucial for production-ready Airflow implementations.

What you will learn

  • Explore the new features and improvements in Apache Airflow 2.0
  • Design and build scalable data pipelines using DAGs
  • Implement ETL pipelines, ML workflows, and advanced orchestration strategies
  • Develop and deploy custom plugins and UI extensions
  • Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
  • Plan and execute a scalable deployment strategy for long-term growth
  • Apply best practices for monitoring and maintaining Airflow

Who this book is for

This book is ideal for data engineers, developers, IT professionals, and data scientists looking to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow’s potential and want to avoid common implementation pitfalls. Whether you’re new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.


Product Details

ISBN-13: 9781805129332
Publisher: Packt Publishing
Publication date: 10/31/2024
Sold by: Barnes & Noble
Format: eBook
Pages: 188
File size: 4 MB

About the Author

Dylan Intorf is a seasoned technology leader with a B.Sc. in computer science from Arizona State University. With over a decade of experience in software and data engineering, he has delivered custom, tailored solutions to the technology, financial, and insurance sectors. Dylan's expertise in data and infrastructure management has been instrumental in optimizing Airflow deployments and operations for several Fortune 25 companies.
Dylan Storey holds a B.Sc. and M.Sc. in biology from California State University, Fresno, and a Ph.D. in life sciences from the University of Tennessee, Knoxville where he specialized in leveraging computational methods to study complex biological systems. With over 15 years of experience, Dylan has successfully built, grown, and led teams to drive the development and operation of data products across various scales and industries, including many of the top Fortune-recognized organizations. He is also an expert in leveraging AI and machine learning to automate processes and decisions, enabling businesses to achieve their strategic goals.
Kendrick van Doorn is an accomplished engineering and business leader with a strong foundation in soft ware development, honed through impactful work with federal agencies and consulting technology firms. With over a decade of experience in crafting technology and data strategies for leading brands, he has consistently driven innovation and efficiency. Kendrick holds a B.Sc. in computer engineering from Villanova University, an M.Sc. in systems engineering from George Mason University, and an MBA from Columbia University.

Table of Contents

Table of Contents
  1. Getting Started with Airflow 2.0
  2. Core Airflow Concepts
  3. Components of Airflow
  4. Basics of Airflow and DAG Authoring
  5. Connecting to External Sources
  6. Extending Functionality with UI Plugins
  7. Writing and Distributing Custom Providers
  8. Orchestrating a Machine Learning Workflow
  9. Using Airflow as a Driving Service
  10. Airflow Ops: Development and Deployment
  11. Airflow Ops Best Practices: Observation and Monitoring
  12. Multi-Tenancy in Airflow
  13. Migrating Airflow
From the B&N Reads Blog

Customer Reviews