我使用的是emr5.19hive2.3.3,如果不能从java字符串转换为hadoop文本,或者反之亦然,我会遇到nullif的问题。源代码是来自aws的cloudtrail serde,它似乎写得很扎实。问题似乎来自内置的nullif udf,正如您在错误消息中看到的:
我正在测试regexp提取的结果是否为空字符串,如果是,我希望为null,因此我的列看起来有点像 NULLIF(REGEXP_EXTRACT(key,'([^\/]+)(\/\d+)?(\/.*)', 1), '') AS key_prefix
但我得到的错误如下:
2020-02-11 11:06:34,034 INFO [IPC Server handler 26 on 43627] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1574116917806_1754132_r_000008_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:93)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:820)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:834)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:837)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:938)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:264)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:196)
... 7 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.comparePrimitiveObjects(PrimitiveObjectInspectorUtils.java:421)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:93)
2条答案
按热度按时间dwbf0jvd1#
将nullif替换为执行nullif所说的操作的宏,避免了异常,并且工作正常:
导致:
使用不匹配的字符串时:
而不是用空字符串产生不方便的结果
qyuhtwio2#
也许不能直接回答你的问题,但希望这会有所帮助。
regexp_extract
返回空字符串''
如果regex不匹配,它可以返回null
仅当源字符串为null
. 所以,在这里使用nullif看起来不正确使用双反斜杠屏蔽特殊字符,如
\\d
在Hiveregexp中。/
-不是特殊字符,不需要转义/屏蔽。我建议如下:
它将把空字符串转换成空字符串、空字符串和其他所有原样。