在apache清管器中实现上、下修和更换

ha5z0ras  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(407)

我对Pig的环境很陌生。我试着用两种方法实现我的pig脚本文件。
一。

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray);

distinct_data = DISTINCT data;

val = foreach distinct_data generate campaign_id,date,time,UPPER(keyword),display_site,placement,was_clicked,cpc;

val1 = foreach val generate campaign_id,date,time,TRIM(keyword),display_site,placement,was_clicked,cpc;

val2 = foreach val1 generate campaign_id,REPLACE(date, '-', '/'),time,keyword,display_site,placement,was_clicked,cpc;

dump val2;

我得到错误:
2016-09-29 02:45:40826 info org.apache.pig.main:apache pig版本0.10.0-cdh4.2.1(rexported)编译于2013年4月22日,12:04:54 2016-09-29 02:45:40827 info org.apache.pig.main:将错误消息记录到:/home/training/training\u materials/analyst/exerces/pig\u etl/pig\u 1475131540824.log 2016-09-29 02:45:42,371 error org.apache.pig.tools.grunt.grunt:错误1025:字段投影无效。投影字段[keyword]在schema:campaign\u id:chararray中不存在,date:chararray,time:chararray,org.apache.pig.builtin.upper\u关键字\u12:chararray,显示_site:chararray,placement:chararray,是_clicked:int,cpc:int. details 在日志文件中:/home/hduser/pig\u etl/pig\u 1475131540824.log
但当我把鞋面,修剪和替换整合在一个语句中时,它就起作用了:
二。

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray);

distinct_data = DISTINCT data;

val = foreach distinct_data generate campaign_id,REPLACE(date, '-', '/'),time,TRIM(UPPER(keyword)),display_site,placement,was_clicked,cpc;
dump val;

所以,我只想有人来解释我为什么。方法不起作用,错误消息是什么。

z5btuh9x

z5btuh9x1#

当你申请的时候 TRIMval1 没有什么叫做 keyword “在 val .
注意:在应用任何函数时,请使用别名,以避免出现错误。。
或者在创建新关系之前,最好使用它 describe 所以这个模式对你来说很清楚。。
解决方案是:

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray);

distinct_data = DISTINCT data;

val = foreach distinct_data generate campaign_id,date,time,UPPER(keyword) as keyword,display_site,placement,was_clicked,cpc;

val1 = foreach val generate campaign_id,date,time,TRIM(keyword) as keyword,display_site,placement,was_clicked,cpc;

val2 = foreach val1 generate campaign_id,REPLACE(date, '-', '/') as date,time,keyword,display_site,placement,was_clicked,cpc;

dump val2;

相关问题