问题描述
嗨,我正在创建一个表格 -
CREATE EXTERNAL TABLE `historyrecordjson`(
`last_name` string COMMENT 'from deserializer',`first_name` string COMMENT 'from deserializer',`email` string COMMENT 'from deserializer',`country` string COMMENT 'from deserializer',`city` string COMMENT 'from deserializer',`event_time` bigint COMMENT 'from deserializer'
)
PARTITIONED BY (
`account_id` string,`year` string,`month` string,`day` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeytextoutputFormat'
LOCATION
's3://aguptahistoryrecordcopy/recordshistoryjson/'
TBLPROPERTIES (
'projection.account_id.type'='injected','projection.day.range'='01,31','projection.day.type'='integer','projection.enabled'='true','projection.month.range'='01,12','projection.month.type'='integer','projection.year.range'='2020,3000','projection.year.type'='integer','storage.location.template'='s3://aguptahistoryrecordcopy/historyrecordjson/${account_id}/${year}/${month}/${day}')
当我在查询下运行时,它返回零记录-
SELECT * FROM "historyrecordjson" where account_id='acc-1234' AND year= '2021' AND month= '1' AND day='1' limit 10 ;
我的 S3 目录看起来像-
s3://aguptahistoryrecordcopy/historyrecordjson/account_id=acc-1234/year=2021/month=1/day=1/1b339139-326c-432f-90aa-15bf30f37be2.json
我可以看到分区正在加载为 - account_id=acc-1234/year=2021/month=1/day=1
我不确定我错过了什么。我在查询结果中看到数据扫描:0 KB
解决方法
您使用的 DDL 用于文本分隔文件,因为您在 S3 中的实际数据是 JSON 数据。参考 https://github.com/rcongiu/Hive-JSON-Serde 并使用正确的 SerDe 和 JSOn 数据定义创建表。