使用分布式缓存-hive流

ia2d9nvy 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(410)

我想压缩ruby gem的文件，并将它们分发到我的emr集群。我还想使用一个简单的ruby脚本，在hive流作业中引用这个gem中的文件。
我使用以下方法将文件和存档添加到hadoop分布式缓存：

ADD FILE /home/user/mobile.rb; 
ADD ARCHIVE /home/user/browser-master.zip;

在mobile.rb中，我使用下面的代码使用gem进行模拟：

$.push File.expand_path("../browser-master/lib", __FILE__)
require "browser"

当我在本地机器上的同一个目录中有解压的归档文件和mobile.rb文件时，我就可以将数据流传输到它并运行程序。
但是，当我将这些文件添加到hadoop集群时，会出现以下错误：

FAILED: Execution Error, return code 20003 from org.apache.hadoop.hive.ql.exec.MapRedTask. An error occurred when trying to close the Operator running your custom script.

在分布式缓存中解压缩存档文件时，my mobile.rb是否需要指向其他内容？
我用的是Hive0.11。

hadoop Hive streaming ruby distributed-cache

来源：https://stackoverflow.com/questions/19736515/use-distributed-cache-hive-streaming

1条答案

按热度按时间

ef1yzkbh1#

在做了一些测试之后，使用 ADD FILE 似乎奏效了：

ADD FILE /home/user/browser-master

赞(0）回复(0）举报 2021-06-04

我来回答

使用分布式缓存-hive流

1条答案

相关问题

热门标签

最新问答