apache pig query-dataset连接错误1031

p4rjhz4m  于 2021-06-01  发布在  Hadoop
关注(0)|答案(1)|浏览(377)

我有以下四个任务要做,但我不知道如何连接这两个数据集,以使任何任务正常工作。。。
a) 查询交易次数最少的客户名称,输出客户名称和交易次数。
b) 使用广播(复制)连接来连接客户和事务。报告:customerid、name、salary、numof transactions、totalsum、minitems(其中numoftransactions是客户完成的事务总数,totalsum是该客户的字段“transtotal”的总和,minitems是客户完成的事务中的最小项目数)
c) 报告客户数大于5000或小于2000的国家代码。
d) 假设我们要对数据设计一个分析任务,如下所示:年龄属性分为六组,分别是[10,20]、[20,30]、[30,40]、[40,50]、[50,60]和[60,70]。在上述每个年龄段内,根据“性别”进行进一步划分,即6个年龄组中的每个年龄组进一步划分为两组。每组报告:年龄范围、性别、mintranstotal、maxtranstotal、avgtranstotal。注:括号“[”表示包含范围下限,其中as“)”表示不包含范围上限。
这就是我的出发点:

hadoop fs -mkdir /piginput
sudo hadoop fs -put customer.txt /piginput
sudo hadoop fs -put transaction.txt /piginput
sudo hadoop fs -put transaction_small.txt /piginput

pig 

customers = LOAD '/piginput/customers.txt' USING PigStorage(',') AS (id:int,name:chararray,age:int,gender:chararray,CountryCode:int,salary:float);

transactions = LOAD '/piginput/transaction.txt' USING PigStorage(',') as (trans_id:int, id:int, age:int, total:float, num_items:int, description:chararray);

alldata = JOIN customers BY id, transactions BY id;

by_clusters_terms_count = FOREACH alldata COUNT(id);

从而产生错误:

清管器堆放痕迹

ERROR 1031: Incompatable schema: left is          "id:NULL,name:NULL,num_items:NULL", right is "customers::id:int"

Failed to parse: Pig script failed to parse: 
<line 4, column 26> pig script failed to validate:     org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031:     Incompatable schema: left is "id:NULL,name:NULL,num_items:NULL", right is     "customers::id:int"
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1684)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1657)
at org.apache.pig.PigServer.registerQuery(PigServer.java:600)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1069)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: 
<line 4, column 26> pig script failed to validate:     org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "id:NULL,name:NULL,num_items:NULL", right is "customers::id:int"
at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1041)
at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15870)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 15 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "id:NULL,name:NULL,num_items:NULL", right is "customers::id:int"
at org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760)
at org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:158)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:123)
at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:245)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
at     org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1039)
... 21 more

有什么想法吗?我是否错误地加入了数据集导致了问题?

h9a6wy2h

h9a6wy2h1#

customers = LOAD 'hdfs://hadoop-VirtualBox:8020/piginput/customer.txt' USING  PigStorage(',') AS  (id:int,name:chararray,age:int,gender:chararray,CountryCode:int,salary:float);
 A = foreach customers generate id, name;
 transactions = LOAD 'hdfs://hadoop-VirtualBox:8020/piginput/transaction_small.txt' USING PigStorage(',') as (trans_id:int, cust_id:int, total:float, num_items:int,  description:chararray);
 B = foreach transactions generate cust_id,num_items; 
 alldata = JOIN A BY id, B BY cust_id;
 C = GROUP alldata by $0;

这最终解决了问题

相关问题