用于hadoop流媒体的go客户端

hyrbngr7 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(425)

对于支持hadoop流的go编程语言，是否有一个众所周知的客户机？我到处找，找不到有价值的东西。

hadoop go hadoop-streaming

来源：https://stackoverflow.com/questions/16698825/go-client-for-hadoop-streaming

1条答案

按热度按时间

3j86kqsm1#

你可以直接在go上运行hadoop流媒体工作，我听说有人这样做，下面是一个例子，来自一个在go中进行wordcount的博客。这是Map器：

package main

import (
        "bufio"
        "fmt"
        "os"
        "regexp"
)

func main() {
        /* Word regular experssion. */
        re, _ := regexp.Compile("[a-zA-Z0-9]+")
        reader := bufio.NewReader(os.Stdin)

        for {
                line, _, err := reader.ReadLine()
                if err != nil {
                        if err != os.EOF {
                                fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err.String())
                        }
                        break
                }
                matches := re.FindAll(line, -1)
                for _, word := range(matches) {
                        fmt.Printf("%s\t1\n", word)
                }
        }
}

这是减速器：

package main

import (
        "bufio"
        "bytes"
        "fmt"
        "os"
        "strconv"
)

func main() {
        counts := make(map[string]uint)
        reader := bufio.NewReader(os.Stdin)

        for {
                line, _, err := reader.ReadLine()
                if err != nil {
                        if err != os.EOF {
                                fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err)
                        }
                        break
                }
                i := bytes.IndexByte(line, '\t')
                if i == -1 {
                        fmt.Fprintln(os.Stderr, "error: can't find tab")
                        continue
                }
                word := string(line[0:i])
                count, err := strconv.Atoui(string(line[i+1:]))
                if err != nil {
                        fmt.Fprintln(os.Stderr, "error: bad number - %s\n", err)
                        continue
                }

                counts[word] = counts[word] + count
        }

        /* Output aggregated counts. */
        for word, count := range(counts) {
                fmt.Printf("%s\t%d\n", word, count)
        }
}

或者，您也可以使用dmrgo来更轻松地编写流式处理作业。这里有一个wordcount示例。
我看到了另一个名为gomrjob的图书馆，但它看起来维护得不太好，也不是很好，但如果你有冒险精神，可以试试看：）

赞(0）回复(0）举报 2021-06-03

我来回答

用于hadoop流媒体的go客户端

1条答案

相关问题

热门标签

最新问答