我想用python udf在pig拉丁语脚本中用零填充列值

kr98yfug  于 2021-06-21  发布在  Pig
关注(0)|答案(2)|浏览(413)

我在python脚本中遵循了zeropad.py下面的步骤

!/usr/bin/python

from org.apache.pig.scripting import *

@outputSchema('time:int')

def zero():
    time.zfill(4)

=======================================
使用org.apache.pig.scripting.jython.jythonscriptengine作为myfuncs注册'zeropad.py';

Airlines_data_schema = LOAD 'AirlinesData_sample-1.csv' USING PigStorage('\t') AS (Year,Month,DayofMonth,DayofWeek,DepTime_actual:int,CRSDeptime:int,Arrtime_actual:int,CRSArrtime:int,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay);

===================================================

airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,myfuncs.zero.DepTime_actual AS DepTime_actual_new,myfuncs.zero.CRSDeptime AS CRSDeptime_new,myfuncs.zero.Arrtime_actual AS Arrtime_actual_new,myfuncs.zero.CRSArrtime AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;

我得到以下错误

2017-02-26 19:37:19,606 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:

字段投影无效。架构中不存在投影字段[myfuncs]:year:bytearray,month:bytearray,第ayofmonth:bytearray,dayofweek:bytearray,部门时间_actual:int,c级rsdeptime:int,到达时间_actual:int,c级rsarrtime:int,统一uecarrier:bytearray,flightnum:bytearray,尾数_plane:bytearray,实际值apsedtime:bytearray,crsel公司apsedtime:bytearray,airtime:bytearray,arrdelay:bytearray,depdelay:bytearray,origin:bytearray,dest:bytearray,distance:bytearray,taxiin:bytearray,taxiout:bytearray,cancelled:bytearray,取消ationcode:bytearray,diverted:bytearray,汽车rierdelay:bytearray,wea公司therdelay:bytearray,nasdelay:bytearray,塞库ritydelay:bytearray,lateairc公司raftdelay:bytearray.
想知道为什么我不能使用python函数来操作我的列值

8ljdwjyq

8ljdwjyq1#

成功了!!!小的修正如下所示


# !/usr/bin/python

@outputSchema("num:int")

def zero(time):
        return time.zfill(4);

REGISTER '/home/Jig13517/zeropad.py' using jython AS func ;

airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,func.zero(Airlines_data_schema.DepTime_actual) AS DepTime_actual_new:int,func.zero(Airlines_data_schema.CRSDeptime) AS CRSDeptime_new:int,func.zero(Airlines_data_schema.Arrtime_actual) AS Arrtime_actual_new:int,func.zero(Airlines_data_schema.CRSArrtime) AS CRSArrtime_new:int,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;
4uqofj5v

4uqofj5v2#

尝试使用以下语法:

airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek, myfuncs.zero(DepTime_actual) AS DepTime_actual_new,myfuncs.zero.CRSDeptime AS CRSDeptime_new,myfuncs.zero.Arrtime_actual AS Arrtime_actual_new,myfuncs.zero.CRSArrtime AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;

相关问题