install airflow

Install Airflow | setup airflow using Docker | how to setup airflow [2022]

Apache airflow is one of the excellent tools out there in the market. Nowadays, many companies are using apache airflow to define and schedule their complex data pipelines. Apache airflow uses python to define your workflows. In this tutorial, I will explain how to install airflow in your system.

There are a wide variety of options available to install airflow. But for this tutorial, I will be using Docker to install airflow. To follow along, I assume that you have basic knowledge about Docker.

Install Airflow using Docker.

We will be using Docker to install airflow. To proceed further, make sure to have installed Docker and docker-compose in your system. If not, please follow the below document to set up Docker and docker-compose.

Setup Docker

setup docker-compose

Awesome, let’s verify the Docker version. Make sure you have the latest version of Docker and docker-compose installed.

➜  ~ docker --version
Docker version 20.10.3, build 48d30b5

and verify the docker-compose version

➜ docker-compose --version
docker-compose version 1.28.5, build c4eb3a1f

The other thing we have to make sure of is to provide sufficient resources to Docker. Click on the Docker icon and go to preferences.

check resource

Goto Resources assign at least 3 CPU cores and 5 GB RAM.

Modify docker resource

Click on apply and restart.


The Docker will be up after some time. To check, Docker is working fine. Run a simple hello-world image

➜  ~ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

If you are getting the above output, it means your docker setup is working fine.

Now let’s proceed further and finally install airflow. We will use an official airflow docker image to install airflow as a docker container.

Create a folder called airflow.

mkdir airflow
---
version: '3'
x-airflow-common:
  &airflow-common
  image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0}
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:[email protected]/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:[email protected]/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always

  redis:
    image: redis:latest
    ports:
      - 6379:6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-scheduler:
    <<: *airflow-common
    command: scheduler
    healthcheck:
      test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-worker:
    <<: *airflow-common
    command: celery worker
    healthcheck:
      test:
        - "CMD-SHELL"
        - 'celery --app airflow.executors.celery_executor.app inspect ping -d "[email protected]$${HOSTNAME}"'
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-init:
    <<: *airflow-common
    command: version
    environment:
      <<: *airflow-common-env
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
      _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}

  flower:
    <<: *airflow-common
    command: celery flower
    ports:
      - 5555:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

volumes:
  postgres-db-volume:

Let’s go inside the folder and copy the above code in Docker. Compose.yaml file

➜  airflow ll
total 8
-rw-r--r--  1 XXXXXXX  staff   1.8K May 21 19:16 docker-compose.yml

Before proceeding further, we must create some folders in our local machine, which will mound data between a container and the local environment. Let’s create the below folders:

mkdir ./dags ./plugins ./logs

Run the below command to ensure the container and host computer have matching file permissions.

echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env

Let’s verify if all folder gets created.

-rw-r--r--   1 XXXX  staff  4617 May 22 11:09 docker-compose.yaml
drwxr-xr-x   2 XXXX  staff    64 May 22 11:13 dags
drwxr-xr-x   2 XXXX  staff    64 May 22 11:13 plugins
drwxr-xr-x   2 XXXX  staff    64 May 22 11:13 logs
drwxr-xr-x   6 XXXX  staff   192 May 22 11:13 .
-rw-r--r--   1 XXX  staff    30 May 22 10:51 .env
[email protected] 93 XXXX  staff  2976 May 22 11:16 ..

Let’s run the initialization script to create a user and upgrade the airflow.

➜  docker-compose up airflow-init
Creating network "airflow_default" with the default driver
............................................
airflow-init_1       | Upgrades done
..................
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.1.0
airflow_airflow-init_1 exited with code 0

If the script ran successfully, you would get the above message. Make sure you are getting the same.
Now, let’s start all the airflow services.

➜ docker-compose up -d
airflow_postgres_1 is up-to-date
airflow_redis_1 is up-to-date
Creating airflow_airflow-scheduler_1 ... done
Creating airflow_airflow-worker_1    ... done
Starting airflow_airflow-init_1      ... done
Creating airflow_airflow-webserver_1 ... done
Creating airflow_flower_1            ... done

Since airflow uses Redis, Postgress, etc., it will take time to download all images. Let’s take a quick coffee break, and once you are back, your airflow will be ready to use.

Now let’s verify if all containers are created by typing the below command. Make sure all containers are healthy.

docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                      PORTS                              NAMES
1bdb56ed6a3c   apache/airflow:2.1.0   "/usr/bin/dumb-init …"   13 minutes ago   Up 13 minutes (healthy)     0.0.0.0:5555->5555/tcp, 8080/tcp   airflow_flower_1
d228189f263a   apache/airflow:2.1.0   "/usr/bin/dumb-init …"   13 minutes ago   Up 13 minutes (healthy)     0.0.0.0:8080->8080/tcp             airflow_airflow-webserver_1
a4bd0ae3c9a0   apache/airflow:2.1.0   "/usr/bin/dumb-init …"   13 minutes ago   Up 13 minutes (healthy)   8080/tcp                           airflow_airflow-worker_1
f244ca2862bc   apache/airflow:2.1.0   "/usr/bin/dumb-init …"   13 minutes ago   Up 13 minutes (healthy)     8080/tcp                           airflow_airflow-scheduler_1
6e34597d05df   redis:latest           "docker-entrypoint.s…"   15 minutes ago   Up 15 minutes (healthy)     0.0.0.0:6379->6379/tcp             airflow_redis_1
99c99e33967b   postgres:13            "docker-entrypoint.s…"   15 minutes ago   Up 15 minutes (healthy)     5432/tcp                           airflow_postgres_1

If containers are not healthy yet, type the below command to check logs

docker-compose logs -f

Verify airflow UI

Goto localhost:8080 to access airflow UI.

airflow UI

use the below credentials to log in to airflow

username: airflow

password: airflow

airflow home page

If you are getting the above output, it means your airflow setup is complete.

Verify Airflow version

Now airflow is installed in your system as Docker containers; let’s verify the airflow version by typing the below command.

➜  ~ docker exec a4bd0ae3c9a0 airflow version
2.1.0

Note: a4bd0ae3c9a0 is the container id for my airflow worker. Your container id will be different from mine. Type docker ps and look for airflow-worker container id to get the container id.

Test Airflow installation

Now, let’s RUN a DAG and verify if the installation is correct.

Unpause example_bash_operator DAG, and let’s wait till it RAN

airflow DAG

To know if DAG ran successfully, verify the recent task output.

airflow DAG run

Congrats, you have successfully installed airflow and ran your first DAG.

To uninstall airflow

If you wish to stop the airflow instance, go to the airflow folder where the docker-compose.yml file is present and type the below command to stop airflow completely.

➜  docker-compose down
Stopping airflow_flower_1            ... done
Stopping airflow_airflow-worker_1    ... done
Stopping airflow_airflow-webserver_1 ... done
Stopping airflow_airflow-scheduler_1 ... done
Stopping airflow_redis_1             ... done
Stopping airflow_postgres_1          ... done
Removing airflow_flower_1            ... done
Removing airflow_airflow-worker_1    ... done
Removing airflow_airflow-webserver_1 ... done
Removing airflow_airflow-scheduler_1 ... done
Removing airflow_airflow-init_1      ... done
Removing airflow_redis_1             ... done
Removing airflow_postgres_1          ... done
Removing network airflow_default

Conclusion

I hope you have found this article useful. Please do let me know in the comment box if you face any airflow installation issues.

More to Read?

How to send email from airflow

How to send slack channel alert using airflow

Airflow commands

Leave a Comment

Your email address will not be published.

Scroll to Top