pandas 根据条件获取嵌套框架行计数

qncylg1j  于 2023-11-15  发布在  其他
关注(0)|答案(5)|浏览(120)

我想得到基于条件选择的嵌套框行的计数。我尝试了下面的代码。

print df[(df.IP == head.idxmax()) & (df.Method == 'HEAD') & (df.Referrer == '"-"')].count()

字符串
产出:

IP          57
Time        57
Method      57
Resource    57
Status      57
Bytes       57
Referrer    57
Agent       57
dtype: int64


输出显示的计数为每一个列在嵌套框.而不是我需要得到一个单一的计数,其中所有上述条件满足?如何做到这一点?如果你需要更多的解释我的嵌套框请让我知道.

dl5txlt9

dl5txlt91#

你问的是所有条件都为真的条件,所以框架的len就是答案,除非我误解了你的问题

In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))

In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]: 
           A         B         C         D
12  0.491683  0.137766  0.859753 -1.041487
13  0.376200  0.575667  1.534179  1.247358
14  0.428739  1.539973  1.057848 -1.254489

In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]: 
A    3
B    3
C    3
D    3
dtype: int64

In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3

字符串

wmtdaxz3

wmtdaxz32#

在Pandas中,我喜欢使用shape属性来获取行数。

df[df.A > 0].shape[0]

字符串
根据需要,给出符合条件A > 0的行数。

9vw9lbht

9vw9lbht3#

为了提高性能,不应该使用 predicate 来计算DataFrame。最好直接使用sum(predecate),如下所示:

In [1]: import pandas as pd
        import numpy as np
        df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))
 

In [2]: df.head()
Out[2]:
          A         B         C         D
0 -2.019868  1.227246 -0.489257  0.149053
1  0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956  0.517803
3 -0.589489  0.400487  0.107856  0.194890
4  1.309088 -0.596996 -0.623519  0.020400

In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 µs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4

In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 µs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4

字符串
请记住,这种技术只适用于计算符合 predicate 的行数。

cvxl0en2

cvxl0en24#

你可以使用query方法来获取结果的shape。例如:

A  B  C
0  1  1  x
1  2  2  y
2  3  3  z

df.query("A == 2 & B > 1 & C != 'z'").shape[0]

字符串
输出量:

ddrv8njm

ddrv8njm5#

import pandas as pd
data = {'title': ['Manager', 'Technical Analyst', 'Software Engineer', 'Sales Manager'], 'Description': [
'''a man or woman who controls an organization or part of an organization,a person who looks after the business affairs of a singer, actor, etc''',
'''Technical analysts, also known as chartists or technicians, employ technical analysis in their trading and research. Technical analysis looks for price patterns and trends based on historical performance to identify signals based on market sentiment and psychology.''',
'''A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term programmer is sometimes used as a synonym, but may also lack connotations of engineering education or skills.''',
'''A sales manager is someone who leads and supervises sales agents and runs the day-to-day sales operations of a business. They oversee the sales strategy, set sales goals, and track sales performance'''
]}
df = pd.DataFrame(data)
data2 = {'title': ['Manager', 'Technical Analyst', 'Software Engineer', 'Sales Manager'], 'Keywords': [
['organization','business','people','arrange']
,['technicians','analysis','research','business']
,['engineering', 'design', 'develop', 'maintain']
,['supervises', 'agents','business','performance','target']
]}
df2 = pd.DataFrame(data2)
print(df2)
df2=df2.explode('Keywords')

print(df2)
print("checking df3")
df3=df.merge(df2,how='left',on='title')
print(df3)
df3['match'] = df3.apply(lambda x: x.Keywords in x.Description, axis=1)
print(df3)
df4=df3.loc[df3['match']==True].groupby(['Description']).count()
print(df4)

字符串

相关问题