Apache Pig无法获取MIN列,意外符号

问题描述

在这种情况下,我很难弄清楚如何使用MIN()函数我有以下Pig脚本:

A = LOAD '/home/mqp/Documents/p1/data/test_customers.csv' USING CsvexcelStorage (',') AS 

(custid:int,name:chararray,age:int,gender:chararray,country:int,salary:float);
B = LOAD '/home/mqp/Documents/p1/data/test_transactions.csv' USING CsvexcelStorage (',') AS (transid:int,custid:int,ttotal:float,items:int,tdesc:chararray);

C = JOIN B BY custid,A BY custid USING 'replicated';
D = GROUP C BY $1;
DESCRIBE D;

out = FOREACH D {
    allids = FOREACH C GENERATE B::custid;
    singleids = disTINCT allids;
    
    allnames = FOREACH C GENERATE name;
    singlenames = disTINCT allnames;

    allsal= FOREACH C GENERATE salary;
    singlesal = disTINCT allsal;
    
    alltotals = FOREACH C GENERATE B::ttotal as bt;
    mintotals = FOREACH alltotals GENERATE MIN(alltotals.bt);


    transtotal = FOREACH C GENERATE ttotal;
    GENERATE flatten(singleids),flatten(singlenames),flatten(singlesal),COUNT(C),SUM(transtotal),flatten(mintotals);
};

STORE out INTO '/home/mqp/Documents/p1/pig_test' USING CsvexcelStorage();

我尝试了无数种不同的方法来使MIN()函数在这里无法正常工作。我尝试过使用不同的索引对所有人进行分组,等等。我真的不明白我需要做什么。

我收到“在foo处或附近出现意外的符号”和“无效的标量投影”错误

解决方法

您应该在主 MIN 块中使用 GENERATE 函数,而不是像当前那样在嵌套的 FOREACH 中使用它。这是因为 MIN 函数需要一包值,例如您正确使用的 COUNTSUM

这是一个应该可以工作的脚本版本(为了清晰起见,我对原始脚本所做的更改添加了一些注释):

A = LOAD 'test_customers.csv' USING CSVExcelStorage (',') AS (
    custid:int,name:chararray,age:int,gender:chararray,country:int,salary:float
);

B = LOAD 'test_transactions.csv' USING CSVExcelStorage (',') AS (
    transid:int,custid:int,ttotal:float,items:int,tdesc:chararray
);

C = JOIN B BY custid,A BY custid USING 'replicated';

-- Removed the DESCRIBE and used field name in GROUP BY for clarity.

D = GROUP C BY B::custid;

out = FOREACH D {
    -- Removed DISTINCT custid 
    -- Because we grouped by this field,we can just generate group
    -- Consider adding name and salary to the GROUP BY if you
    -- expect these to be the same for each custid.
    allnames = FOREACH C GENERATE name;
    singlenames = DISTINCT allnames;

    allsal= FOREACH C GENERATE salary;
    singlesal = DISTINCT allsal;

    alltotals = FOREACH C GENERATE B::ttotal as bt;    
    -- transtotals was effectively the same as alltotals
    GENERATE 
    group,FLATTEN(singlenames),FLATTEN(singlesal),COUNT(C),SUM(alltotals.bt),MIN(alltotals.bt);
}

STORE out INTO 'pig_test' USING CSVExcelStorage();

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...