pandas 对DataFrame中的多个列执行value_counts()会在每次迭代后将count列向右移动

mnemlml8  于 2022-12-16  发布在  其他
关注(0)|答案(1)|浏览(137)

从DataFrame的子集:

>>> df[['Source','Destination','Attack Name']].head()

               Source                     Destination    Attack Name
0              10.x.x.116                 10.x.x.71      RDP Enforcement Violation
1              43.x.x.233                 152.x.x.148    Scanner Enforcement Violation
2  hn.kd.dhcp (61.x.x.192)                152.x.x.148    NaN
3             104.x.x.241                 152.x.x.116    Scanner Enforcement Violation
4              117.x.x.61                 152.x.x.52     NaN

我想统计每个目标来自前10个来源的攻击数量。
我试过这样的方法:

import pandas as pd

outReport='test.xlsx'
df = pd.read_csv("IPSLogs2.csv")

def statsPerAttacker():
        topSrc = df['Source'].value_counts()[:10]
        mastaSR = pd.Series()
        for ip in topSrc.to_dict():
                df_statsPerAttacker = df[df['Source']==ip][['Source', 'Destination', 'Attack Name']].value_counts().to_frame()
                mastaSR = pd.concat([mastaSR, df_statsPerAttacker], axis=1)

        with pd.ExcelWriter(outReport, engine='openpyxl') as writer:
                mastaSR.to_excel(writer, startcol=2, startrow=2, header=False)

if __name__ == '__main__':
    statsPerAttacker()

我确实得到了结果,但是最后一列在每次源IP迭代后向右移动一个位置(见屏幕截图):
https://postimg.cc/8FBnq07g
我做错什么了?谢谢

332nm8kg

332nm8kg1#

问题可能是由于我对Series和DataFrame对象的无知引起的。我使用不同的方法解决了我的问题:

def statsPerAttacker():
    topSrc = df['Source'].value_counts()[:10]    
    stats = df.groupby(['Source','Destination'])['Attack Name'].value_counts()
    ips = topSrc.index.tolist()

    statsPerSourceIP = stats.loc[ips]
    with pd.ExcelWriter(outReport, engine='openpyxl', mode='a', if_sheet_exists='overlay') as writer:
        statsPerSourceIP.to_excel(writer, sheet_name='StatisticsByCriticality', startcol=2, startrow=35, header=False)

相关问题