问题描述
我的数据是:
(10,1) [70#3300]
(10,2) [71#3300]
(10,1) [70#3300]
(11,1) [71#3300]
(12,1) [72#3300]
(10,3) [74#3300]
其余为:
grunt> a = LOAD '/user/maria_dev/complex_2.txt' USING PigStorage(' ') AS (T:tuple(driverId:int,week:int),M:[mileslogged:int]);
grunt> medians = FOREACH (GROUP a ALL) GENERATE a.T;
以下命令的输出
grunt> describe medians;
是
medians: {{(T: (driverId: int,week: int))}}
但是当我跑步
m1 = FOREACH medians GENERATE T.driverId;
我收到以下错误:
2020-07-24 00:24:32,094 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1128: Cannot find field driverId in T:tuple(driverId:int,week:int)
Details at logfile: /home/maria_dev/pig_1595549443230.log
如何仅选择driverId?
解决方法
a = LOAD '/user/maria_dev/complex_2.txt' USING PigStorage(' ') AS (T:tuple(driverId:int,week:int),M:[mileslogged:int]);
medians = FOREACH (GROUP a ALL) GENERATE FLATTEN(a.T) AS T:tuple(driverId:int,week:int);
driverIds = FOREACH medians GENERATE T.driverId;