python 如何在panda元素中找到最常用的值

bpzcxfmw  于 2023-02-02  发布在  Python
关注(0)|答案(1)|浏览(94)
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 202
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 209
10.223.157.186 - - [15/Jul/2009:15:50:35 -0700] "GET / HTTP/1.1" 200 9157
10.211.47.159 - - [10/Aug/2009:20:52:19 -0700] "GET /assets/js/lowpro.js HTTP/1.1" 304 -
10.216.113.172 - - [12/Aug/2009:06:04:50 -0700] "GET /release-schedule/ HTTP/1.1" 200 9306
10.216.113.172 - - [12/Aug/2009:06:04:50 -0700] "GET /release-schedule/ HTTP/1.1" 200 9306
10.216.113.172 - - [12/Aug/2009:06:04:52 -0700] "GET /displaytitle.php?id=10 HTTP/1.1" 200 10234
10.216.113.172 - - [12/Aug/2009:06:04:52 -0700] "GET /displaytitle.php?id=10 HTTP/1.1" 200 10234

假设我有一列包含所有主机,表示(10.223.157.186),我想找到最常用的主机。

result = Mainpanda['host'].mode()
print("Mode:\n",result)

我知道通过使用主机可以找到他们,但不知何故,它只显示前1名,我需要把他们在一个列表中从1到N可以有人请帮助我吗?

yyyllmsg

yyyllmsg1#

假设您有一个字符串形式的单列,首先使用extract表示IP,然后使用mode

result = Mainpanda['host'].str.extract(r'(\d+\.\d+\.\d+\.\d+)', expand=False).mode()

输出:

0    10.216.113.172
Name: host, dtype: object

如果您希望得到前N个计数,请使用value_counts

N = 10
result = (Mainpanda['host']
          .str.extract(r'(\d+\.\d+\.\d+\.\d+)', expand=False)
          .value_counts().head(10)
         )

输出:

10.216.113.172    4
10.223.157.186    3
10.211.47.159     1
...
Name: host, dtype: int64

如果只需要IP列表:

N = 10
result = (Mainpanda['host']
          .str.extract(r'(\d+\.\d+\.\d+\.\d+)', expand=False)
          .value_counts().index[:N].tolist()
         )

输出:

['10.216.113.172', '10.223.157.186', '10.211.47.159', ...]

相关问题