hadoop pig:使用startswith显示条目

4nkexdtk  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(252)

我在使用startswith string函数时遇到问题。我想显示从20040开始的系统周期中的所有记录

transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:int);

sysGroup = GROUP transactions BY System_Period;

sysFilter = FILTER sysGroup BY STARTSWITH(transactions.System_Period, 20040);

DUMP sysFilter;

我收到的错误是

Could not infer the matching function for org.apache.pig.builtin.STARTSWITH as multiple or none of them fit. Please use an explicit cast.
mzsu5hc0

mzsu5hc01#

STARTSWITH 仅用于比较tuple1和tuple2,以检查tuple1是否包含tuple2。你不能把一个关系或一个包传给那个人。还有一点需要注意的是它只接受字符串(chararray)而不是整数。在groupby之前过滤从20040开始的系统\u period,并将系统\u period加载为chararray,然后根据需要在过滤器之后强制转换它。

transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:chararray);
sysFilter = FILTER transactions BY STARTSWITH(System_Period, '20040');

其他之后
GROUP BY FLATTEN 结果,然后过滤

transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:chararray);
sysGroup = GROUP transactions BY System_Period;
flatres = FOREACH sysGroup GENERATE group,FLATTEN(transactions);
sysFilter = FILTER flatres BY STARTSWITH(System_Period, '20040');

相关问题