What is Apache Airflow?

  





Apache Airflow is a platform to programmatically author, schedule and monitor workflows.

Airflow was created as an internal project at Airbnb in October 2014. The project was open sourced in June 2015. Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Airflow is written in Python and uses the Jinja templating engine to generate workflows. Airflow workflows are composed of DAGs (Directed Acyclic Graphs). A DAG is a collection of tasks that have dependencies on each other. Tasks in Airflow are operators, which perform some sort of action. Operators can be written in any language. Airflow scheduler executes your DAGs on an array of workers while following the specified dependencies. When a DAG is executed, the scheduler creates a DAG Run entry in the database. A DAG Run represents an instance of a DAG. Airflow also has a web interface that allows you to view DAGs, DAG Runs, and task instances. The web interface also allows you to trigger DAGs and view logs. The rich command line utilities make performing complex surgeries on DAGs a breeze.

The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. Airflow is open source and released under the Apache License. It is built on top of other popular projects Celery, Mako, and SQLAlchemy.

Find more information at https://airflow.apache.org/

Comments

Popular posts from this blog

ZooKeeper as distributed consensus service

What is Apache Druid?

What is Apache Kafka?