Hive 创建 -ext-10000 子文件夹

问题描述

当我们尝试插入覆盖 hive 表时，hive 创建子文件夹 -ext-10000。这些表中的数据对 Spark 不可见。只有行数少的表有这些问题。

spark 版本：3.1.1 版蜂巢版本：蜂巢3.1.0.3.1.4.0-315

我们尝试设置

"hive.input.dir.recursive" = "TRUE"
"hive.mapred.supports.subdirectories" = "TRUE"
"hive.supports.subdirectories" = "TRUE"
"mapred.input.dir.recursive" = "TRUE"

它不会影响

查询示例：

insert overwrite table categories
select
    n2.id as category1_ccode,n2.name as category1_name,n3.id as category2_ccode,n3.name as category2_name
from nomenclature as n1
left join nomenclature as n2
    on n1.id = n2.parent_id
left nomenclature as n3
    on n2.id = n3.parent_id
where
    n1.name = 'Goods'
    and n1.delete_mark = '00'
    and n2.delete_mark = '00'
    and n3.delete_mark = '00'
    and n1.is_group = '00'
    and n2.is_group = '00';

文件以 ORC 格式存储

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

apache-spark hdfs hive orc