.sparkStaging/ 下的文件夹未删除时的 HDFS + 替代方案

问题描述

我们有 HDP 集群版本 2.6.5,并且正在运行某些 spark 作业,我们看到 hdfs 中的 .sparkstaging 目录在作业完成后仍然存在

/user/hdfs/.sparkStaging/ 下,我们有 2021-03-30 的旧文件夹 我们只想保留上个月的文件夹,并删除 /user/hdfs/.sparkStaging/ 下那个月之前的文件

我正在搜索可以删除文件夹的 HDFS 命令,

现在我想使用以下命令(而 43800 分钟是一个月)

hdfs dfs -ls /user/hdfs/.sparkStaging  |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=43800; LAST=60*MIN; "date +%s" | getline Now } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=Now-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

参考 - Delete files older than 10days on HDFS

但是当 /user/hdfs/.sparkStaging/* 文件夹由于某种原因没有被删除时,我不确定这种手动方法是否是一个好的解决方

 hdfs dfs -ls  /user/hdfs/.sparkStaging/ | more
Found 2324 items
drwx------   - hdfs hdfs          0 2021-03-30 06:40 /user/hdfs/.sparkStaging/application_1617025601058_0195
drwx------   - hdfs hdfs          0 2021-03-30 06:45 /user/hdfs/.sparkStaging/application_1617025601058_0224
drwx------   - hdfs hdfs          0 2021-03-30 06:56 /user/hdfs/.sparkStaging/application_1617025601058_0289
drwx------   - hdfs hdfs          0 2021-03-30 06:56 /user/hdfs/.sparkStaging/application_1617025601058_0290
drwx------   - hdfs hdfs          0 2021-03-30 07:01 /user/hdfs/.sparkStaging/application_1617025601058_0320
drwx------   - hdfs hdfs          0 2021-03-30 07:01 /user/hdfs/.sparkStaging/application_1617025601058_0323
drwx------   - hdfs hdfs          0 2021-03-30 07:06 /user/hdfs/.sparkStaging/application_1617025601058_0348
drwx------   - hdfs hdfs          0 2021-03-30 07:06 /user/hdfs/.sparkStaging/application_1617025601058_0352
drwx------   - hdfs hdfs          0 2021-03-30 07:11 /user/hdfs/.sparkStaging/application_1617025601058_0379
drwx------   - hdfs hdfs          0 2021-03-30 07:11 /user/hdfs/.sparkStaging/application_1617025601058_0383
drwx------   - hdfs hdfs          0 2021-03-30 07:12 /user/hdfs/.sparkStaging/application_1617025601058_0388
drwx------   - hdfs hdfs          0 2021-03-30 07:16 /user/hdfs/.sparkStaging/application_1617025601058_0410
drwx------   - hdfs hdfs          0 2021-03-30 07:16 /user/hdfs/.sparkStaging/application_1617025601058_0412
drwx------   - hdfs hdfs          0 2021-03-30 07:17 /user/hdfs/.sparkStaging/application_1617025601058_0416
drwx------   - hdfs hdfs          0 2021-03-30 07:26 /user/hdfs/.sparkStaging/application_1617025601058_0473
drwx------   - hdfs hdfs          0 2021-03-30 07:31 /user/hdfs/.sparkStaging/application_1617025601058_0505
drwx------   - hdfs hdfs          0 2021-03-30 07:31 /user/hdfs/.sparkStaging/application_1617025601058_0506
drwx------   - hdfs hdfs          0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0533
drwx------   - hdfs hdfs          0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0534
drwx------   - hdfs hdfs          0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0536
drwx------   - hdfs hdfs          0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0537
drwx------   - hdfs hdfs          0 2021-03-30 07:41 /user/hdfs/.sparkStaging/application_1617025601058_0566
drwx------   - hdfs hdfs          0 2021-03-30 07:41 /user/hdfs/.sparkStaging/application_1617025601058_0567

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)