Apache Airflow
is an open-source workflow management platform.
It started at
Airbnb
in October
2014
and was later open source to the
apache
community
Apache airflow
is used to define and schedule the
data pipelines
.
In production
, most people use apache airflow to
schedule
the
spark
job.
Apache airflow
uses
python
to define the workflows.
Airflow
used
PostgreSQL
to store the metadata information.
Apache airflow can be easily integrated with version control systems like
GitHub
.
Apache airflow
also provides an excellent UI to view and trigger the data pipelines.
Apache airflow can
run
on standalone
VM, docker, and k8s
.
Click to
learn more