As data size grows, the complexity of managing compliance and governance grows in the same proportion. This blog will understand how to install atlas in your system.
What is apache atlas?
Before installing an apache atlas, let’s first understand what apache atlas is and why every organization must have a data governance and compliance tool.
Apache Atlas is an open-source tool used for data governance and metadata management. Apache Atlas allows companies to effectively and efficiently meet their compliance requirements.
Apache atlas’s popularity is growing because it easily integrates with well-known big data tools like Hadoop, Kafka, spark, hive, impala, etc. It also provides REST API which we can use to update and create the data lineage.
I have many pre-defined types, and users can add new types based on their requirements. It also supports a SQL-like query engine to search entities. You can check this link to understand more about the atlas features.
How to install atlas
In this session, we will learn how to install apache atlas using docker. You can follow this tutorial to install docker on your system.
Now type the below command to pull the atlas docker image
docker pull sburn/apache-atlas Using default tag: latest latest: Pulling from sburn/apache-atlas d519e2592276: Pull complete d22d2dfcfa9c: Pull complete b3afe92c540b: Pull complete 9070b09379d6: Pull complete 968e3feb8e26: Pull complete 4568df43ab62: Pull complete 6cd5206cb36f: Pull complete 7e90f6010249: Pull complete 9646c7ee49f9: Pull complete 57a26972c6b6: Pull complete 4ddabc3ff1ef: Pull complete Digest: sha256:1eca23ef34204ee9a15ec809b695fb0a1a2a12cf68db18642c9e90875675a5c6 Status: Downloaded newer image for sburn/apache-atlas:latest docker.io/sburn/apache-atlas:latest
Now type the below command to verify if the images get pulled successfully
docker images
Now run the docker image by typing the below command
docker run -d \ -p 21000:21000 \ --name atlas \ sburn/apache-atlas \ /opt/apache-atlas-2.1.0/bin/atlas_start.py
Verify if the docker container is running
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7092a4f8e34b sburn/apache-atlas "/opt/apache-atlas-2…" 3 seconds ago Up 2 seconds 0.0.0.0:21000->21000/tcp, :::21000->21000/tcp atlas
Go to http://localhost:21000 to access atlas UI
use admin/admin creds to log in to the Atlas UI.
On successful login, users will see this page. You can use different search options to filter out the required data.
Add sample data to apache atlas
When you log in to Apache Atlas UI, you won’t get any preloaded data in the atlas. But atlas provides a way to load data into the apache atlas.
Now to load the data into the docker container, go inside the docker container by typing the below command
docker exec -it 7092a4f8e34b /bin/bash
Finally, type the command to load data to apache atlas
python /opt/apache-atlas-2.1.0/bin/quick_start.py
use admin/admin as username and password
Once you run the above script, the following data will get inserted into the apache atlas.
python /opt/apache-atlas-2.1.0/bin/quick_start.py Enter username for atlas :- admin Enter password for atlas :- Creating sample types: Created type [DB] Created type [Table] Created type [StorageDesc] Created type [Column] Created type [LoadProcess] Created type [LoadProcessExecution] Created type [View] Created type [JdbcAccess] Created type [ETL] Created type [Metric] Created type [PII] Created type [Fact] Created type [Dimension] Created type [Log Data] Created type [Table_DB] Created type [View_DB] Created type [View_Tables] Created type [Table_Columns] Created type [Table_StorageDesc] Creating sample entities: Created entity of type [DB], guid: c20def06-fb2a-452c-8ef9-bbd57708dede Created entity of type [DB], guid: e3be9ce6-5eac-4e0b-9f25-01294fd72f34 Created entity of type [DB], guid: ae2d7638-9319-4ea6-b46c-e29ccf2a2079 Created entity of type [Table], guid: 4d8f5fad-0e91-4b7b-876d-7373ce8286c9 Created entity of type [Table], guid: fd4648ce-7e06-4ce2-b12b-24d1c0f38f41 Created entity of type [Table], guid: 85e75fdf-9ef7-4a88-bd97-6c9b1183de31 Created entity of type [Table], guid: ff1a2797-af24-4712-98bc-a7524a67d652 Created entity of type [Table], guid: f4de886d-00be-4664-aa3c-fa6e1f0cc44f Created entity of type [Table], guid: fd5ec4df-a5d8-401b-91a8-1cc1435de000 Created entity of type [Table], guid: 2956330a-03c0-483e-accb-423edc39d4b7 Created entity of type [Table], guid: cfce1b26-73c5-4e16-819d-cc4d5cebb317 Created entity of type [View], guid: 7220d223-17ac-4042-9dea-9d087314c976 Created entity of type [View], guid: 274afdbe-3a40-4e25-b53c-7f8bc0efa9fa Created entity of type [LoadProcess], guid: 9f0135c1-8c1e-4f29-a95a-50e539066d8d Created entity of type [LoadProcessExecution], guid: 821a79af-9942-46ac-94fb-a5c4814f1cde Created entity of type [LoadProcessExecution], guid: cfd36c58-e1e5-48df-839f-65a50a25d15a Created entity of type [LoadProcess], guid: d68ed431-47c6-4163-8a47-6e220cee7aaa Created entity of type [LoadProcessExecution], guid: c632fc7d-c24d-466e-997e-2be170bfd533 Created entity of type [LoadProcessExecution], guid: 9d852052-5314-4b3e-81a7-084c24949e2b Created entity of type [LoadProcess], guid: 55d3b414-67b2-4384-bc5d-11ac1e2c3b49 Created entity of type [LoadProcessExecution], guid: b1793392-5292-4596-9fac-aeb44fe871eb Created entity of type [LoadProcessExecution], guid: fafbe5d8-b179-425a-b85a-c6a37c0dd7ed Sample DSL Queries: query [from DB] returned [3] rows. query [DB] returned [3] rows. query [DB where name=%22Reporting%22] returned [1] rows. query [DB where name=%22encode_db_name%22] returned [ 0 ] rows. query [Table where name=%2522sales_fact%2522] returned [1] rows. query [DB where name="Reporting"] returned [1] rows. query [DB where DB.name="Reporting"] returned [1] rows. query [DB name = "Reporting"] returned [1] rows. query [DB DB.name = "Reporting"] returned [1] rows. query [DB where name="Reporting" select name, owner] returned [1] rows. query [DB where DB.name="Reporting" select name, owner] returned [1] rows. query [DB has name] returned [3] rows. query [DB where DB has name] returned [3] rows. query [DB is JdbcAccess] returned [ 0 ] rows. query [from Table] returned [8] rows. query [Table] returned [8] rows. query [Table is Dimension] returned [5] rows. query [Column where Column isa PII] returned [3] rows. query [View is Dimension] returned [2] rows. query [Column select Column.name] returned [10] rows. query [Column select name] returned [9] rows. query [Column where Column.name="customer_id"] returned [1] rows. query [from Table select Table.name] returned [8] rows. query [DB where (name = "Reporting")] returned [1] rows. query [DB where DB is JdbcAccess] returned [ 0 ] rows. query [DB where DB has name] returned [3] rows. query [DB as db1 Table where (db1.name = "Reporting")] returned [ 0 ] rows. query [Dimension] returned [9] rows. query [JdbcAccess] returned [2] rows. query [ETL] returned [10] rows. query [Metric] returned [4] rows. query [PII] returned [3] rows. query [`Log Data`] returned [4] rows. query [Table where name="sales_fact", columns] returned [4] rows. query [Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment] returned [4] rows. query [from DataSet] returned [10] rows. query [from Process] returned [3] rows. Sample Lineage Info: sales_fact_daily_mv(Table) -> loadSalesMonthly(LoadProcess) time_dim(Table) -> loadSalesDaily(LoadProcess) loadSalesDaily(LoadProcess) -> sales_fact_daily_mv(Table) loadSalesMonthly(LoadProcess) -> sales_fact_monthly_mv(Table) sales_fact(Table) -> loadSalesDaily(LoadProcess) Sample data added to Apache Atlas Server.
Now you can use the atlas UI to explore and analyze the preloaded data.
Conclusion
I hope you liked this tutorial on installing an atlas using the docker container. Feel free to ask your valuable questions in the comments section below.