Hadoop or hdfs is the primary storage used by various haddop applications. In this tutorial, we will learn everything about the hdfs client. Let’s get started.
What is hdfs client
Hdfs client is the Hadoop interface that allows users to interact with the Hadoop file system. There are various clients available in haddop. The basic one is hdfs dfs which connects the Hadoop distributed file system.
The other hdfs clients are hdfs dfsadmin, which is used to perform the administration work on the Hadoop file system.
Hdfs client download
Hdfs client can be downloaded from the below repo
http://apache.mirrors.pair.com/hadoop/common/
select the version of the client you need to download. For the sake of this demo, I will be installing the version hadoop-3.3.1
Hdfs client install
To install the hdfs client, type the wget command to download the zip file.
wget http://apache.mirrors.pair.com/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
--2022-12-02 10:31:16-- http://apache.mirrors.pair.com/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Resolving apache.mirrors.pair.com (apache.mirrors.pair.com)... 216.92.2.131
Connecting to apache.mirrors.pair.com (apache.mirrors.pair.com)|216.92.2.131|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 605187279 (577M) [application/x-gzip]
Saving to: ‘hadoop-3.3.1.tar.gz’
100%[==================================================================================================================================================================>] 605,187,279 13.7MB/s in 50s
2022-12-02 10:32:07 (11.5 MB/s) - ‘hadoop-3.3.1.tar.gz’ saved [605187279/605187279]
Once the hdfs client in downloaded, go to the folder where the zip file is present and type the below command to extract the zip.
tar -xvzf hadoop-3.3.1.tar.gz
Hdfs client configuration
In this session, we will configure the hdfs client to interact with the file system.
JAVA configuration for hdfs client
user needs to configure the JAVA_HOME path properly to interact with hdfs. Make sure java is installed in your system. The java version can be found by typing.
java -version
Users can type the below command to check the JAVA_HOME path
echo $JAVA_HOME
Set the correct java path if the path is not set up properly.
Configure Kerberos to interact with the hdfs client
If the haddop cluster is secured, then the user needs to install the KDC library to interact with hdfs. Type the below command to install KDC
yum -y install krb5-server krb5-libs
Configure the ‘/etc/krb5.conf’ file as per your cluster.
Download the hdfs configuration file
The user needs to download the below configuration file from the Cloudera manager.
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site-xml
yarn-site.xml
These files contain all the necessary configurations to interact with the Hadoop cluster. To download the file, follow the below steps:-
- Go to the Cloudera Manager Admin Console page
- Go to the hdfs/hive client configuration
- Select Actions > Download Client Configuration.
Copy the config file under the directory /opt/hadoop_conf/
Setup environment variables for the hdfs client
Now the user needs to export the below environment variables before interacting with the hdfs client.
export KRB5_CONFIG="/etc/krb5.conf"
export HADOOP_CONF_DIR="/opt/hadoop_conf/"
export HADOOP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.17.0.8-2.el7_9.x86_64
Interact with Hadoop using hdfs client
In the cluster is secured perform the kinit by typing the below command
kinit -kt <keytab_name.keytab> <principal>
Verify if the kinit is done properly by typing
klist
Now type the below command to list files in hdfs
<path_to_hdfs_client>/hadoop-3.3.1/bin/hdfs --config $HADOOP_CONF_DIR dfs -ls /
Conclusion
I hope you have liked this small tutorial about the hdfs client. please do let me know in the comment box if you are facing any issues while following along.