Apache airflow is one of the excellent tools out there in the market. Nowadays, many companies are using apache airflow to define and schedule their complex data pipelines. Apache airflow uses python to define your workflows. In this tutorial, I will explain how to install airflow in your system.
There are a wide variety of options available to install airflow. But for this tutorial, I will be using Docker to install airflow. To follow along, I assume that you have basic knowledge of Docker.
Install Airflow using Docker.
We will be using Docker to install airflow. To proceed further, make sure to have installed Docker and docker-compose in your system. If not, please follow the below document to set up Docker and docker-compose.
Awesome, let’s verify the Docker version. Make sure you have the latest version of Docker and docker-compose installed.
➜ ~ docker --version Docker version 20.10.3, build 48d30b5
and verify the docker-compose version
➜ docker-compose --version docker-compose version 1.28.5, build c4eb3a1f
The other thing we have to make sure of is providing sufficient resources to Docker. Click on the Docker icon and go to preferences.
Goto Resources assign at least 3 CPU cores and 5 GB RAM.
Click on apply and restart.
The Docker will be up after some time. To check, Docker is working fine. Run a simple hello-world image
➜ ~ docker run hello-world Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/ For more examples and ideas, visit: https://docs.docker.com/get-started/
If you are getting the above output, it means your docker setup is working fine.
Now let’s proceed further and finally install airflow. We will use an official airflow docker image to install airflow as a docker container.
Create a folder called airflow.
mkdir airflow
--- version: '3' x-airflow-common: &airflow-common image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0} environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:[email protected]/airflow AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:[email protected]/airflow AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0 AIRFLOW__CORE__FERNET_KEY: '' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'true' AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth' volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}" depends_on: redis: condition: service_healthy postgres: condition: service_healthy services: postgres: image: postgres:13 environment: POSTGRES_USER: airflow POSTGRES_PASSWORD: airflow POSTGRES_DB: airflow volumes: - postgres-db-volume:/var/lib/postgresql/data healthcheck: test: ["CMD", "pg_isready", "-U", "airflow"] interval: 5s retries: 5 restart: always redis: image: redis:latest ports: - 6379:6379 healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 30s retries: 50 restart: always airflow-webserver: <<: *airflow-common command: webserver ports: - 8080:8080 healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/health"] interval: 10s timeout: 10s retries: 5 restart: always airflow-scheduler: <<: *airflow-common command: scheduler healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"'] interval: 10s timeout: 10s retries: 5 restart: always airflow-worker: <<: *airflow-common command: celery worker healthcheck: test: - "CMD-SHELL" - 'celery --app airflow.executors.celery_executor.app inspect ping -d "[email protected]$${HOSTNAME}"' interval: 10s timeout: 10s retries: 5 restart: always airflow-init: <<: *airflow-common command: version environment: <<: *airflow-common-env _AIRFLOW_DB_UPGRADE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} flower: <<: *airflow-common command: celery flower ports: - 5555:5555 healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:5555/"] interval: 10s timeout: 10s retries: 5 restart: always volumes: postgres-db-volume:
Let’s go inside the folder and copy the above code in Docker. Compose.yaml file
➜ airflow ll total 8 -rw-r--r-- 1 XXXXXXX staff 1.8K May 21 19:16 docker-compose.yml
Before proceeding further, we must create some folders in our local machine, which will mound data between a container and the local environment. Let’s create the below folders:
mkdir ./dags ./plugins ./logs
Run the below command to ensure the container and host computer have matching file permissions.
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
Let’s verify if all folder gets created.
-rw-r--r-- 1 XXXX staff 4617 May 22 11:09 docker-compose.yaml drwxr-xr-x 2 XXXX staff 64 May 22 11:13 dags drwxr-xr-x 2 XXXX staff 64 May 22 11:13 plugins drwxr-xr-x 2 XXXX staff 64 May 22 11:13 logs drwxr-xr-x 6 XXXX staff 192 May 22 11:13 . -rw-r--r-- 1 XXX staff 30 May 22 10:51 .env [email protected] 93 XXXX staff 2976 May 22 11:16 ..
Let’s run the initialization script to create a user and upgrade the airflow.
➜ docker-compose up airflow-init Creating network "airflow_default" with the default driver ............................................ airflow-init_1 | Upgrades done .................. airflow-init_1 | Admin user airflow created airflow-init_1 | 2.1.0 airflow_airflow-init_1 exited with code 0
If the script ran successfully, you would get the above message. Make sure you are getting the same.
Now, let’s start all the airflow services.
➜ docker-compose up -d airflow_postgres_1 is up-to-date airflow_redis_1 is up-to-date Creating airflow_airflow-scheduler_1 ... done Creating airflow_airflow-worker_1 ... done Starting airflow_airflow-init_1 ... done Creating airflow_airflow-webserver_1 ... done Creating airflow_flower_1 ... done
Since airflow uses Redis, Postgress, etc., it will take time to download all images. Let’s take a quick coffee break, and once you are back, your airflow will be ready to use.
Now let’s verify if all containers are created by typing the below command. Make sure all containers are healthy.
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1bdb56ed6a3c apache/airflow:2.1.0 "/usr/bin/dumb-init …" 13 minutes ago Up 13 minutes (healthy) 0.0.0.0:5555->5555/tcp, 8080/tcp airflow_flower_1 d228189f263a apache/airflow:2.1.0 "/usr/bin/dumb-init …" 13 minutes ago Up 13 minutes (healthy) 0.0.0.0:8080->8080/tcp airflow_airflow-webserver_1 a4bd0ae3c9a0 apache/airflow:2.1.0 "/usr/bin/dumb-init …" 13 minutes ago Up 13 minutes (healthy) 8080/tcp airflow_airflow-worker_1 f244ca2862bc apache/airflow:2.1.0 "/usr/bin/dumb-init …" 13 minutes ago Up 13 minutes (healthy) 8080/tcp airflow_airflow-scheduler_1 6e34597d05df redis:latest "docker-entrypoint.s…" 15 minutes ago Up 15 minutes (healthy) 0.0.0.0:6379->6379/tcp airflow_redis_1 99c99e33967b postgres:13 "docker-entrypoint.s…" 15 minutes ago Up 15 minutes (healthy) 5432/tcp airflow_postgres_1
If containers are not healthy yet, type the below command to check the logs
docker-compose logs -f
Verify airflow UI
Goto localhost:8080 to access airflow UI.
use the below credentials to log in to airflow
username: airflow
password: airflow
If you are getting the above output, it means your airflow setup is complete.
Verify Airflow version
Now airflow is installed in your system as Docker containers; let’s verify the airflow version by typing the below command.
➜ ~ docker exec a4bd0ae3c9a0 airflow version 2.1.0
Note: a4bd0ae3c9a0 is the container id for my airflow worker. Your container id will be different from mine. Type docker ps and look for airflow-worker container id to get the container id.
Test Airflow installation
Now, let’s RUN a DAG and verify if the installation is correct.
Unpause example_bash_operator DAG, and let’s wait till it RAN
To know if DAG ran successfully, verify the recent task output.
Congrats, you have successfully installed airflow and run your first DAG.
To uninstall airflow
If you wish to stop the airflow instance, go to the airflow folder where the docker-compose.yml file is present and type the below command to stop airflow completely.
➜ docker-compose down Stopping airflow_flower_1 ... done Stopping airflow_airflow-worker_1 ... done Stopping airflow_airflow-webserver_1 ... done Stopping airflow_airflow-scheduler_1 ... done Stopping airflow_redis_1 ... done Stopping airflow_postgres_1 ... done Removing airflow_flower_1 ... done Removing airflow_airflow-worker_1 ... done Removing airflow_airflow-webserver_1 ... done Removing airflow_airflow-scheduler_1 ... done Removing airflow_airflow-init_1 ... done Removing airflow_redis_1 ... done Removing airflow_postgres_1 ... done Removing network airflow_default
Conclusion
I hope you have found this article useful. Please do let me know in the comment box if you face any airflow installation issues.
More to Read?
How to send email from airflow