hadoop中的数据包计数(使用mapreduce)

2sbarzqh  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(357)

事情已经完成:
从以下链接安装hadoop:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/cdh4-installation-guide/cdh4ig_topic_4_4.html
已安装hping3以生成洪水请求,使用:

sudo hping3 -c 10000 -d 120 -S -w 64 -p 8000 --flood --rand-source 192.168.1.12

已安装snort以使用以下命令记录上述请求:

sudo snort -ved -h 192.168.1.0/24 -l .

这将生成日志文件snort.log.1427021231
我可以用它来读

sudo snort -r snort.log.1427021231

它给出了表单的输出:
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
03/22-16:17:14.259633 192.168.1.12:8000->117.247.194.105:46639 tcpttl:64 tos:0x0 id:0iplen:20 dgmlen:44 df as seq:0x6eee4a6b ack:0x6df6015b win:0x7210 tcplen:24 tcp选项(1)=>mss:1460=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
我曾经

hdfs dfs -put <localsrc> ... <dst>

将此日志文件复制到hdfs。
现在,我需要帮助的是:
如何统计日志文件中源ip地址、目的ip地址、端口地址、协议、时间戳的总数。
(我是否必须编写自己的map reduce程序?或者有一个图书馆。)
我还发现
https://github.com/ssallys/p3
但无法使它运行。查看了jar文件的内容,但无法运行它。

ratan@lenovo:~/Desktop$ hadoop jar ./p3lite.jar p3.pcap.examples.PacketCount

Exception in thread "main" java.lang.ClassNotFoundException:        nflow.runner.Runner
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:201)

谢谢。

li9yvcax

li9yvcax1#

在快速搜索之后,这似乎是您可能需要自定义mapreduce作业的地方。
该算法类似于以下伪代码:

Parse the file line by line (or parse every n lines if logs are more than one line long).

in the mapper, use regex to figure out if something is a source IP, destination IP etc.

output these with key value structure of <Type, count> 
    type is the type of text that was matched (ex. source IP)
    count is the number of times it was matched in the record

have reducer sum all of the values from the mappers, and get global totals for each type of information you want

write to file in desired format.

相关问题