问题描述
我们有 HDP 集群版本 2.6.5,并且正在运行某些 spark 作业,我们看到 hdfs 中的 .sparkstaging 目录在作业完成后仍然存在
在 /user/hdfs/.sparkStaging/
下,我们有 2021-03-30 的旧文件夹
我们只想保留上个月的文件夹,并删除 /user/hdfs/.sparkStaging/
下那个月之前的文件夹
现在我想使用以下命令(而 43800 分钟是一个月)
hdfs dfs -ls /user/hdfs/.sparkStaging | tr -s " " | cut -d' ' -f6-8 | grep "^[0-9]" | awk 'BEGIN{ MIN=43800; LAST=60*MIN; "date +%s" | getline Now } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=Now-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'
参考 - Delete files older than 10days on HDFS
但是当 /user/hdfs/.sparkStaging/* 文件夹由于某种原因没有被删除时,我不确定这种手动方法是否是一个好的解决方案
hdfs dfs -ls /user/hdfs/.sparkStaging/ | more
Found 2324 items
drwx------ - hdfs hdfs 0 2021-03-30 06:40 /user/hdfs/.sparkStaging/application_1617025601058_0195
drwx------ - hdfs hdfs 0 2021-03-30 06:45 /user/hdfs/.sparkStaging/application_1617025601058_0224
drwx------ - hdfs hdfs 0 2021-03-30 06:56 /user/hdfs/.sparkStaging/application_1617025601058_0289
drwx------ - hdfs hdfs 0 2021-03-30 06:56 /user/hdfs/.sparkStaging/application_1617025601058_0290
drwx------ - hdfs hdfs 0 2021-03-30 07:01 /user/hdfs/.sparkStaging/application_1617025601058_0320
drwx------ - hdfs hdfs 0 2021-03-30 07:01 /user/hdfs/.sparkStaging/application_1617025601058_0323
drwx------ - hdfs hdfs 0 2021-03-30 07:06 /user/hdfs/.sparkStaging/application_1617025601058_0348
drwx------ - hdfs hdfs 0 2021-03-30 07:06 /user/hdfs/.sparkStaging/application_1617025601058_0352
drwx------ - hdfs hdfs 0 2021-03-30 07:11 /user/hdfs/.sparkStaging/application_1617025601058_0379
drwx------ - hdfs hdfs 0 2021-03-30 07:11 /user/hdfs/.sparkStaging/application_1617025601058_0383
drwx------ - hdfs hdfs 0 2021-03-30 07:12 /user/hdfs/.sparkStaging/application_1617025601058_0388
drwx------ - hdfs hdfs 0 2021-03-30 07:16 /user/hdfs/.sparkStaging/application_1617025601058_0410
drwx------ - hdfs hdfs 0 2021-03-30 07:16 /user/hdfs/.sparkStaging/application_1617025601058_0412
drwx------ - hdfs hdfs 0 2021-03-30 07:17 /user/hdfs/.sparkStaging/application_1617025601058_0416
drwx------ - hdfs hdfs 0 2021-03-30 07:26 /user/hdfs/.sparkStaging/application_1617025601058_0473
drwx------ - hdfs hdfs 0 2021-03-30 07:31 /user/hdfs/.sparkStaging/application_1617025601058_0505
drwx------ - hdfs hdfs 0 2021-03-30 07:31 /user/hdfs/.sparkStaging/application_1617025601058_0506
drwx------ - hdfs hdfs 0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0533
drwx------ - hdfs hdfs 0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0534
drwx------ - hdfs hdfs 0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0536
drwx------ - hdfs hdfs 0 2021-03-30 07:36 /user/hdfs/.sparkStaging/application_1617025601058_0537
drwx------ - hdfs hdfs 0 2021-03-30 07:41 /user/hdfs/.sparkStaging/application_1617025601058_0566
drwx------ - hdfs hdfs 0 2021-03-30 07:41 /user/hdfs/.sparkStaging/application_1617025601058_0567
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)