我有一个python lambda代码来将syslog记录转换为JSON。当部署时,它会出现意外错误。
from __future__ import print_function
import base64
import json
import gzip
import re
print('Loading function')
def lambda_handler(event, context):
output = []
succeeded_record_cnt = 0
failed_record_cnt = 0
for record in event['records']:
print(record['recordId'])
payload = base64.b64decode(record['data'])
regex_string = (r"^((?:\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?"
r"|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b\s+(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])\s+"
r"(?:(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9]):(?:(?:[0-5]?[0-9]|60)(?:[:\.,][0-9]+)?)))) (?:<(?:[0-9]+).(?:[0-9]+)> )"
r"?((?:[a-zA-Z0-9._-]+)) ([\w\._/%-]+)(?:\[((?:[1-9][0-9]*))\])?: (.*)")
p = re.compile(regex_string)
m = p.match(payload)
if m:
succeeded_record_cnt += 1
data_field = {
'timestamp': m.group(1),
'hostname': m.group(2),
'program': m.group(3),
'processid': m.group(4),
'message': m.group(5)
}
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(data_field))
}
else:
print('Parsing failed')
failed_record_cnt += 1
output_record = {
'recordId': record['recordId'],
'result': 'ProcessingFailed',
'data': record['data']
}
output.append(output_record)
print('Processing completed. Successful records {}, Failed records {}.'.format(succeeded_record_cnt, failed_record_cnt))
return {'records': output}
当我部署这个时,我得到了类似的错误,它期望数据作为对象,我解码了记录并再次部署它,但得到了类似的错误:
[ERROR] TypeError: cannot use a string pattern on a bytes-like object
Traceback (most recent call last):
File "/var/task/ec2_logs_parquet.py", line 24, in lambda_handler
m = p.match(payload)
我试图修复,使用下面的补丁来解码它,并创建单独的变量来传递数据,但它不起作用。
payload = base64.b64decode(record['data'])
payload_str = payload.decode('utf-8')
p = re.compile(regex_string)
m = p.match(payload_str)
仍出现错误。我错过什么了吗?
1条答案
按热度按时间zc0qhyus1#
这是一种更简单的解析系统日志记录的方法。