我有一个大日志超过26000个文件,每个文件将有如下内容..我需要将所有的线,其中有404与JSON。在下面的情况下,我需要得到的最后一行,因为这是有404,而不是JSON的内容。在编写过滤器正则表达式的任何帮助?Linux大师的帮助是赞赏。
- -错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == www.example.com http 151 0.004-错误2015 - 07 - 28 11:34:55 SIN 3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.jpg 404*
版本号:1.0
字段:日期时间x-edge-location sc-bytes c-ip cs-method cs(Host)cs-uri-stem sc-status cs(Referer)cs(User-Agent)cs-uri-query cs(Cookie)x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type
2015 - 07 - 28 11:34:00 57 MAD50 658 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404-NDS % 2520VM%2520发动机/002%2520Apr%252004%25202014%2520(OSD:%252032%2520; SD)--错误tdlmnsfrOCxOelbe82y3kIp_QfbBF7S3dDCn4rHR65JOMkOtZu4dz A == www.example.com http 151 0.004-Error 2015 - 07 - 28 11:34:53 SIN3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404-NDS%2520VM %2520发动机/002%2520Apr%252004% 124.13.170.152mediaInfo_211.json 404-NDS%2520VM%2520引擎/002%2520Apr%252004%2520201 4%2520(OSD:%252032%2520; SD)--错误bvLIe540oNMCeZ0QpOmX1OKoClgNgvSWppGuOmgVS85WnAXKJ1ryDg == www.cnX1000000.example.com http 151 0.002-错误2015 - 07 - 28 11:34:54 SIN3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404-NDS%252 0VM%2520发动机/002%2520Apr%252004% d2v2sjgehuhalt.cloudfront.net211.json 404-NDS%2520VM%2520引擎/002%2520Apr%252004%25202014%25二十(OSD:%252032%2520; SD)--错误hTbk9HE5nyFSla1DmeC1D1jhuMtoUY6E7QQvyf0v1YYJ1GBp-I40bw == www.example.com http 151 0.001-错误2015 - 07 - 28 11:34:55 SIN3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404-NDS%2520错误@%2520发动机/002%2520Apr%252004%25202014% pdl.astro.com.my HD)--Error avWgysZyGeGXdt.ZHLfP5uLJ4ie5Hx8pa6ZJC5GHXfvOkyEXXp8o0g == www.example.com http 151@.001-错误2015 - 07 - 28 11:34:55 SIN3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404-NDS%25 20VM%2520引擎/002%2520Apr%252004%25202014%2520(OSD:%252032%2520; SD)--错误wBepjCn58o9AiTifvtrCprkjdAdg--zsLTsjDpUBkxnEU5tahmJxxQ == www.wbjCn58o9AiTifvtrCprkjdAdg@.example.com http 151 0.004-错误2015 - 07 - 28 11:34:55 SIN3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404-NDS%252 0VM % 14.192.214.93(OSD:%252032%2520; SD)--错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == www.example.com http 151 0.004-错误2015 - 07 - 28 11:34:55 SIN 3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.json 404
- 错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == www.example.com http 151 0.004-错误2015 - 07 - 28 11:34:55 SIN 3 659 www.example.com GET www.example.com/thumbnail/mediaInfo_211.jpg 404
1条答案
按热度按时间bcs8qyzn1#
如果你想解析大的HTTP日志,你应该使用visitors,如果你想要一个JSON输出,因为这个社区是关于编码的,你可以扩展它来实现。
否则,对于你最初的问题,这里有一个
awk
的方法:字符串