HDFS important and useful commands
To check Hadoop Version
$ hadoop version
Creating user home directory
hadoop fs -mkdir -p /user/dpq (-p will create directory if directory not present if directory already present then it wont throw exception)
hadoop fs -mkdir /user/retails (without -p if directory already present then it will throw exception)
List all directories
hdfs dfs -ls /
hdfs dfs -ls /user
Copy file from local to HDFS
hadoop fs -copyFromLocal /etc/data/retial_data.csv /user/retailshadoop fs -put /etc/data/retial_data.csv /user/retails (if 'retail_data.csv' already present inside '/user/retails' hdfs directory then it will throw excpetion 'File already present')hadoop fs -put -f /etc/data/retial_data.csv /user/retails (-f will forcefully override retail_data.csv if already present)
Checking data in HDFS
hdfs dfs -cat /user/retails/retial_data.csv
Create empty file on HDFS
hdfs dfs -touchz /user/retails/empty.csv
Copy file from HDFS
hdfs dfs -get /user/retails/retial_data.csv
hdfs dfs -get /user/retails
Copy files in HDFS from one location to another location
hdfs dfs -cp <src(on hdfs)> <dest(on hdfs)>hdfs dfs -cp /user/retails/empty.csv /user/dpq/
To rename file or location
hdfs dfs -mv <src(on hdfs)> <src(on hdfs)>
hdfs dfs -mv /user/retails/empty.csv /user/dpq/emp.csv (changing file name)
hdfs dfs -mv /user/retails/ /user/stocks/ (changing hdfs location now emp.csv will be present under stocks hdfs directory)
To remove file or directory
hdfs dfs -rmr /user/stocks/emp.csv (it will remove emp.csv file)
hdfs dfs -rmr /user/stocks (it will remove stocks)
du: It will give the size of each file in directory
hdfs dfs -du /geeks
dus:: This command will give the total size of directory/file
hdfs dfs -dus /geeks
stat: It will give the last modified time of directory or path. In short it will give stats of the directory or file.
hdfs dfs -stat /geeks
setrep: This command is used to change the replication factor of a file/directory in HDFS. By default it is 3 for anything which is stored in HDFS (as set in hdfs core-site.xml).
Example 1: To change the replication factor to 6 for geeks.txt stored in HDFS.
hdfs dfs -setrep -R -w 6 geeks.txt
Example 2: To change the replication factor to 4 for a directory geeksInput stored in HDFS.
hdfs dfs -setrep -R 4 /geeks
Note: The -w means wait till the replication is completed. And -R means recursively, we use it for directories as they may also contain many files and folders inside them.
Note: There are more commands in HDFS but we discussed the commands which are commonly used when working with Hadoop. You can check out the list of dfs commands using the following command:
hdfs dfs
Few more commands:
hdfs dfs -ls /data/dpq//dummy/
hdfs dfs -mkdir -p /data/dpq/dummy/temp
hdfs dfs -put /data/sparkSetup/files/*.* /data/dpq/dummy/temp/
hdfs dfs -ls /data/dpq/dummy/temp/ –default replication will be 3
hdfs dfs -rm /data/dpq/dummy/temp/GDGA111_1_20180918111340752.csv — it will trash
To change replication factor below command will be used while storing data to HDFS
hdfs dfs -Ddfs.replication=1 -put /data/dpq/dummy/temp/GDGA111_1_20180918111340752.csv /data/dpq/temp
hdfs dfs -get /data/dpq/temp/GDGA111_1_20180918111340752.csv
hdfs dfs -cat /data/dpq/temp/GDGA111_1_20180918111340752.ctrl
hdfs dfs -du -h /data/dpq/temp/ — size with replication can be checked here
hdfs dfs -chmod 777 /data/dpq/temp/GDGA111_1_20180918111340752.csv
hdfs dfs -mkdir -p /data/dpq/model_temp
hdfs dfs -ls /data/dpq/
hdfs dfs -chmod 777 /data/dpq/model_temp
hdfs dfs -ls /data/dpq/
hdfs dfs -touchz /data/dpq/model_temp/1.txt –create empty file