如何用python解决hive流中的管道中断问题？

在将大量行（33000行）从一个表流式传输到python时，我在hive中面临管道中断问题。同样的脚本在7656行之前工作正常。

0: jdbc:hive2://xxx.xx.xxx.xxxx:10000> insert overwrite table test.transform_inner_temp select * from test.transform_inner_temp_view2 limit 7656;
INFO  : Table test.transform_inner_temp stats: [numFiles=1, numRows=7656, totalSize=4447368, rawDataSize=4439712]
No rows affected (19.867 seconds)

0: jdbc:hive2://xxx.xx.xxx.xxxx:10000> insert overwrite table test.transform_inner_temp select * from test.transform_inner_temp_view2;
INFO  : Status: Running (Executing on YARN cluster with App id application_xxxxxxxxxxxxx_xxxxx)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error occurred while reading or writing to your custom script. It may have crashed with an error.
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:456)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
        ... 18 more
Caused by: java.io.IOException: Broken pipe
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:425)
        ... 24 more

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error occurred while reading or writing to your custom script. It may have crashed with an error.
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:456)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
        ... 18 more
Caused by: java.io.IOException: Broken pipe
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:425)
        ... 24 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_xxxxxxxxxxx_xxxxx_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

我可以看到同样的问题，只有与连接条件单独在下面的链接。
https://community.hortonworks.com/questions/11025/hive-query-issue.html
我通过设置hive.vectorized.execution.enabled=false尝试了上面链接中建议的解决方案；但我的问题是一样的。
我还可以在下面的链接和从single table test.transform\u inner\u temp1中选择的列中看到相同的错误，其中连接结果已经写入。
配置单元断开管道错误
如果脚本是错误的，那么它不应该在较小的行中正常工作。希望脚本是好的，问题是与配置单元设置或内存设置的问题。我在互联网上下了很大的功夫，却找不出类似的问题。请提供您的建议/解决方案。
配置单元代码：

CREATE VIEW IF NOT EXISTS test.transform_inner_temp_view2
as
select * from 
(select transform (*)
USING "scl enable python27 'python TestPython.py'" 
as (Col_1     STRING,
col_2        STRING,
...
..
col_125 STRING
)
FROM
test.transform_inner_temp1 a) b;

用三种不同的方式尝试了python脚本，如下所示。但这个问题并没有得到解决。
脚本1：


# !/usr/bin/env python

'''
Created on June 2, 2017

@author: test
'''
import sys
from datetime import datetime
import decimal
import string
D = decimal.Decimal

while True:
    line = sys.stdin.readline()

    if not line:
        break
    line = string.strip(line, "\n ")
    outList = []
    TempList = line.strip().split('\t')
    col_1 = TempList[0]
    ... 
    ....
    col_125 = TempList[34] + TempList[32]

    outList.extend((col_1,....col_125))
    outValue = "\t".join(map(str,outList))
    print "%s"%(outValue)

脚本2：


# !/usr/bin/env python

'''
Created on June 2, 2017

@author: test
'''
import sys
from datetime import datetime
import decimal
import string
D = decimal.Decimal

try:
    for line in sys.stdin:
    line = sys.stdin.readline()   
        TempList = line.strip().split('\t')
    col_1 = TempList[0]
        ... 
        ....
        col_125 = TempList[34] + TempList[32]

        outList.extend((col_1,....col_125))
        outValue = "\t".join(map(str,outList))
        print "%s"%(outValue)
except:
    print sys.exc_info()

剧本3：


# !/usr/bin/env python

'''
Created on June 2, 2017

@author: test
'''
import sys
from datetime import datetime
import decimal
import string
D = decimal.Decimal
for line in sys.stdin:
    line = sys.stdin.readline()   
    TempList = line.strip().split('\t')
    col_1 = TempList[0]
    ... 
    ....
    col_125 = TempList[34] + TempList[32]
    outList.extend((col_1,....col_125))
    outValue = "\t".join(map(str,outList))
    print "%s"%(outValue)

我正在使用带有beeline的apachehive（版本1.2.1000.2.5.3.0-37）。提前谢谢

如何用python解决hive流中的管道中断问题？

暂无答案！

相关问题

热门标签

最新问答