如何在python中从日志中提取密钥

j5fpnvbx 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(359)

为了从日志中提取密钥，我编写了python代码。使用同一个日志，它在一台机器上运行得很好。但是当我在hadoop中运行它时，它失败了 regex 谁能给我一些意见？是吗 regex 不能支持hadoop？
这个python代码的目的是提取 qry 以及 rc ，并计算 rc ，然后打印为 qry query_count rc_count 。在hadoop中运行时，它会报告 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 .
我在google上搜索，你的mapper代码中可能有一些bug。那我怎么修复呢？
像这样的日志格式，
注意：01-03 23:57:23:[a.cpp][b][222]show\u ver=11 sid=ae1d esid=6wvj uid=d1 a=20 qry=cars qid0=293 loc\u src=4 phn=0 mid=0 wvar=c op=0 qry\u src=0 op\u type=1 src=110 | 120 | 111 at=60942 rc=3 | 1 | 1折扣=20 indv\u type=0 rep query=
我的python代码是

import sys
import re

for line in sys.stdin:
    count_result = 0
    line = line.strip()
    match=re.search('.*qry=(.*?)qid0.*rc=(.*?)discount',line).groups()
    if (len(match)<2):
       continue
    counts_tmp = match[1].strip()
    counts=counts_tmp.split('|')
    for count in counts:
       if count.isdigit():
         count_result += int(count)
    key_tmp = match[0].strip()
    if key_tmp.strip():
       key = key_tmp.split('\t')
       key = ' '.join(key)
       print '%s\t%s\t%s' %(key,1,count_result)

hadoop python hadoop-streaming

来源：https://stackoverflow.com/questions/18394844/how-to-extract-the-key-from-the-log-in-python

2条答案

按热度按时间

66bbxpm51#

最有可能的情况是正则表达式捕获的内容超出了预期。我建议把它分成一些更简单的部分，比如：

(?<= qry=).*(?= quid0)

和

(?<= rc=).*(?= discount)

赞(0）回复(0）举报 2021-06-03

zmeyuzjn2#

做了很多假设，冒着有根据的猜测的风险，你也许可以这样解析你的日志：

from collections import defaultdict

input = """NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|1|1 discount=20 indv_type=0 rep_query=
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=boats qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|5|2 discount=20 indv_type=0 rep_query=
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|somestring|12 discount=20 indv_type=0 rep_query="""

d = defaultdict (lambda: 0)

for line in input.split ("\n"):
    tokens = line.split (" ")
    count = 0
    qry = None
    for token in tokens:
        pair = token.split ("=")
        if len (pair) != 2: continue
        key, value = pair
        if key == "qry":
            qry = value
        if key == "rc":
            values = value.split ("|")
            for value in values:
                try: count += int (value)
                except: pass
    if qry: d [qry] += count

print (d)

假设（a）键-值对由空格分隔，并且（b）键和值中都没有空格。

赞(0）回复(0）举报 2021-06-03

我来回答

如何在python中从日志中提取密钥

2条答案

相关问题

热门标签

最新问答