如何停用hadoop流中的输出？

0qx6xfy6 于 2021-06-09 发布在 Hbase

关注(0)|答案(1)|浏览(332)

我正在集群上编写python mapreduce程序。我的Map器解析数据并将它们存储在hbase中。没有减速器，没有输出。
如有必要，以下代码可供参考。

class Mapper:
  ...
  def __init__(...)
     ...

  def start(self, file):
    generator = self.read_input(file)
    connection = happybase.Connection(Mapper.IP)
    self.table = connection.table(Mapper.table_name)
    for line in generator:
      self.parse(line)
      self.write()
      self.buffers = []
    self.table = None
    connection.close()

  def read_input(self, file):
    ...
  def parse(self, line):
    ...
  def write(self):
    # write buffers into HBase
    for cell in self.buffers:
      self.table.put(cell[0], cell[1])     <-  Into HBase yay

我的问题是：如果我在集群中使用此命令：

bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \
-D mapred.reduce.tasks=1 \
-file /home/hduser/mapper.py    -mapper /home/hduser/mapper.py \
-input /user/hduser/streamingTest/testFile.csv

它将显示：oops，error streaming.streamjob:缺少必需的选项：output
我可以将输出重定向到stdout，或者完全停用它吗？
ps：我是一个糟糕的python程序员，请指出任何让你不舒服的代码。

hbase mapreduce python hadoop-streaming

来源：https://stackoverflow.com/questions/29335375/how-to-deactivate-output-in-hadoop-streaming

1条答案

按热度按时间

agxfikkp1#

您需要生成一些输出。考虑到不输出任何东西的愿望，使用

NullOutputFormat

具体如下：

---outputformat org.apache.mapreduce.lib.NullOutputFormat

赞(0）回复(0）举报 2021-06-09

我来回答

如何停用hadoop流中的输出？

1条答案

相关问题

热门标签

最新问答