问题描述
我在本地目录 /home/hive/part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parque
中有一个快速压缩的镶木地板文件。
当使用以下命令创建外部配置单元表时,它会被执行,但当 select * from parquet_hive123456789 运行时,则不返回任何行。
CREATE EXTERNAL TABLE parquet_hive123456789 (
`ip` string,`request` string,`status` string,`userid` string,`bytes` string,`agent` string,`timestamp` timestamp
) STORED AS PARQUET
LOCATION '/home/hive/';
parquet-tools show part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parquet
+-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------+
| ip | request | status | userid | bytes | agent | timestamp |
|-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------|
| 222.203.236.146 | GET /site/user_status.html HTTP/1.1 | 405 | 13 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 122.152.45.245 | GET /site/login.html HTTP/1.1 | 407 | 5 | 278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 222.152.45.45 | GET /site/user_status.html HTTP/1.1 | 302 | 22 | 4096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 222.245.174.248 | GET /index.html HTTP/1.1 | 404 | 7 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | NaT |
| 122.173.165.203 | GET /index.html HTTP/1.1 | 200 | 39 | 278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 122.168.57.222 | GET /images/logo-small.png HTTP/1.1 | 404 | 2 | 14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | NaT |
| 122.152.45.245 | GET /images/track.png HTTP/1.1 | 405 | 5 | 278 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | NaT |
| 122.173.165.203 | GET /site/user_status.html HTTP/1.1 | 407 | 39 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 222.245.174.248 | GET /images/track.png HTTP/1.1 | 302 | 7 | 278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
| 122.173.165.203 | GET /site/user_status.html HTTP/1.1 | 200 | 39 | 14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT |
+-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------+
有人可以帮忙吗?
解决方法
LOCATION
应该是 HDFS 目录,而不是本地。像“/home/hive”这样的目录也可能存在于 HDFS 中,但这样命名表位置是个坏主意。它应该是特定于表的名称,因为所有表数据都应该在自己的位置,与其他表分开。通常表目录如下所示:/user/hadoop/mytable
- 其中 mytable 是表名。
将您的文件放入 HDFS 目录。例如像这样(在 HDFS 中使用您的路径):
hdfs dfs -put /home/hive/part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parque /user/hadoop/table_dir/
检查 HDFS 中存在的文件(使用您的 HDFS 路径):
hdfs dfs -ls '/user/hadoop/table_dir/'
然后使用 HDFS 中的位置创建表(EXTERNAL 或 MANAGED,在此上下文中无关紧要):'/user/hadoop/table_dir/'
或者,您可以创建表,然后使用 this answer 中的 LOAD DATA LOCAL INPATH
命令将本地文件加载到其中。