如何将数据从CSV加载到impala中的外部表中

问题描述

我正在遵循this解决方案将外部表加载到Impala中,因为如果通过引用文件加载数据时会遇到相同的错误

所以,如果我跑步:

[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,dispatch_Area string,Received_Date string,Received_Time int,dispatch_Time int,Arrival_Time int,Cleared_Time int,disposition string) row format delimited
                            > fields terminated by ',' 
                            > STORED as TEXTFILE
                            > location '/user/cloudera/rdpdata/rpd_data_all.csv' ;

我得到:

Query: create external table Police2 (Priority string,disposition string) row format delimited
fields terminated by ','
STORED as TEXTFILE
location '/user/cloudera/rdpdata/rpd_data_all.csv'
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive metastore: 
CAUSED BY: MetaException: hdfs://quickstart.cloudera:8020/user/cloudera/rdpdata/rpd_data_all.csv is not a directory or unable to create one

并且如果执行以下操作,则不会导入任何内容

[quickstart.cloudera:21000] > create external table Police2 (Priority string,disposition string) row format delimited
                            >  fields terminated by ',' 
                            > location '/user/cloudera/rdpdata' ;
Query: create external table Police2 (Priority string,disposition string) row format delimited
 fields terminated by ','
location '/user/cloudera/rdpdata'
Fetched 0 row(s) in 1.01s

文件夹的内容

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/rdpdata
Found 1 items
-rwxrwxrwx   1 cloudera cloudera   75115191 2020-09-02 19:36 /user/cloudera/rdpdata/rpd_data_all.csv

文件内容

[cloudera@quickstart ~]$ hadoop fs -cat  /user/cloudera/rdpdata/rpd_data_all.csv
1,EMSP,RP,RC,03/21/2013,095454,000000,101659,CANC

和cloudera quickstart vm的屏幕截图

enter image description here

解决方法

impala create table语句中的location选项确定存储数据文件的hdfs_path或HDFS目录。尝试提供目录位置而不是应该使用现有数据的文件名。

供参考:https://impala.apache.org/docs/build/html/topics/impala_tables.html