我试图在pandas中复制一些excel功能,我们有一个巨大的数据透视表,我们在上面做了许多非常慢的操作。
这就是我正在努力做的
import pandas as pd
Data = [["NonLin1", "NestleBig-100", "daily", "solved", "NestleBig", "v10_10", 435, 1.4],
["NonLin1", "NestleBig-100", "daily", "solved", "NestleBig", "v10_11", 743, 1.3],
["NonLin1", "NestleBig-101", "daily", "solved", "NestleBig", "v10_10", 542, 1.5],
["NonLin1", "NestleBig-101", "daily", "solved", "NestleBig", "v10_11", 324, 1.2],
["NonLin1", "NestleBig-102", "daily", "solved", "NestleBig", "v10_10", 243, 1.8],
["NonLin1", "NestleBig-102", "daily", "solved", "NestleBig", "v10_11", 444, 1.2],
["NonLin2", "NestleSmall-100", "daily", "solved", "NestleBig", "v10_10", 655, 1.0],
["NonLin2", "NestleSmall-100", "daily", "solved", "NestleBig", "v10_11", 252, 1.3],
["NonLin2", "NestleSmall-101", "daily", "solved", "NestleBig", "v10_10", 435, 1.1],
["NonLin2", "NestleSmall-101", "daily", "solved", "NestleBig", "v10_11", 542, 1.3],
["NonLin2", "NestleSmall-102", "daily", "solved", "NestleBig", "v10_10", 645, 1.5],
["NonLin2", "NestleSmall-102", "daily", "solved", "NestleBig", "v10_11", 435, 1.1],
["NonLin3", "NestleBig-100", "daily", "solved", "NestleBig", "v10_10", 653, 1.2],
["NonLin3", "NestleBig-100", "daily", "solved", "NestleBig", "v10_11", 435, 1.4],
["NonLin3", "NestleBig-101", "daily", "unsolved", "NestleBig", "v10_10", 875, 1.4],
["NonLin3", "NestleBig-101", "daily", "solved", "NestleBig", "v10_11", 214, 1.5],
["NonLin3", "NestleBig-102", "daily", "solved", "NestleBig", "v10_10", 890, 1.2],
["NonLin3", "NestleBig-102", "daily", "unsolved", "NestleBig", "v10_11", 432, 1.5]]
df = pd.DataFrame(Data, columns = ["ProblemClass", "inputID", "profile", "Status", "TestID","runID", "NofIters", "time"])
pivottab = pd.pivot_table(data=df, index=["ProblemClass", "inputID", "Status", "TestID","runID"], values=['time'], aggfunc={'count'})
pivottab.columns = list("#") #Can we rename this column white creating the pivot table itself?
数据透视表本身很好,看起来是这样的
但我想突出显示v10_11和v10_10为Status.产生不同结果的行,如进一步过滤数据透视表以仅查看状态不同的那些行。在excel数据透视表中,我们可以在Status上设置一个值过滤器,然后选择value filter on(#)!= 2但是我想知道如何进一步过滤这个数据透视表。我想在这个数据透视表上做操作,因为它会更容易,我想从数据透视表中排除这些,以便进一步分析。但任何其他选择都是受欢迎的。
1条答案
按热度按时间cyvaqqii1#
要获取包含(或不包含)所选数据的数据透视表,您可以先预过滤 Dataframe ,然后再创建数据透视表。一种可能的方法是:
['ProblemClass', 'inputID', 'TestID', 'Status']
示例
输出:
~
恢复过滤条件)输出
解释
df.duplicated(subset=['ProblemClass', 'inputID', 'TestID', 'Status'], keep=False)
用于查找给定列中重复的所有行。对于您的数据,只有['ProblemClass', 'inputID', 'TestID']
的值相同但'Status'
的值不同的行不会具有此类重复'Status'
列获得了多个结果