pandas 从表中查找公共数据

我正在使用Jupyter，并试图找到共同的数据，让它告诉我有多少次公牛已经在前3名或后3名

LowWeaningData = {}
LowWeightData = {}
LowP8Data = {}
LowRibData = {}
LowEmaData = {}
LowImfData = {}
LowGrowthData = {}

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)'])

LowWeightData = df_bulls.nsmallest(3,['Weight (kg)'])

LowP8Data = df_bulls.nsmallest(3,['P8'])

LowRibData = df_bulls.nsmallest(3,['RIB'])

LowEmaData = df_bulls.nsmallest(3,['EMA(cm)'])

LowImfData = df_bulls.nsmallest(3,['IMF'])

LowGrowthData = df_bulls.nsmallest(3,['Growth %'])

打印这个

This list is of the Lowest Growth % Data 
   Bull         Sire  Dam  Weaning Weight(kg)  Weight (kg)  P8  RIB  EMA(cm)  \
5  S10  Black Magic  L16               522.0          818   7    6      124   
1  S24          P42  L11               469.0          774   7    6      116   
2  S32          P41   M6               401.0          662   6    5      105   

   IMF   Growth %  
5  6.3  56.704981  
1  5.6  65.031983  
2  4.3  65.087282

下一部分是尝试找出bull在列表中出现了多少次以及是哪几次，所以我使用这段代码

lower_elements_in_all = 
list(set.intersection(*map(set[LowWeaningData, LowWeightData, 
LowP8Data, LowRibData, LowEmaData, LowImfData, LowGrowthData])))

我一直收到这个而不是真正的公牛名字

['Bull', 'Sire', 'P8', 'EMA(cm)', 'RIB', 'Growth %', 'Dam', 'Weaning Weight(kg)', 'IMF', 'Weight (kg)']

我希望数据返回的方式如下

S10 has appeared 3 times in category Low Weaning Data, Low P8 Data and Low Rib Data
S24 has appeared 3 times in etc
S32 has appeared 1 times in etc

按降序排列，所以很容易看到

首先，我将向每个 Dataframe 添加一个名为"Category"的新列，并为相应的类别添加一个字符串。

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)']).assign(category="Low Weaning Data")
LowWeightData = df_bulls.nsmallest(3,['Weight (kg)']).assign(category="Low Weight Data")
LowP8Data = df_bulls.nsmallest(3,['P8']).assign(category="Low P8 Data")
LowRibData = df_bulls.nsmallest(3,['RIB']).assign(category="Low Rib Data")
LowEmaData = df_bulls.nsmallest(3,['EMA(cm)']).assign(category="Low EMA Data")
LowImfData = df_bulls.nsmallest(3,['IMF']).assign(category="Low IMF Data")
LowGrowthData = df_bulls.nsmallest(3,['Growth %']).assign(category="Low Growth Data")

然后使用pd.concat将所有这些 Dataframe 组合在一起：

low_bulls_by_category_df = pd.concat([
    LowWeaningData,
    LowWeightData,
    LowP8Data,
    LowRibData,
    LowEmaData,
    LowImfData,
    LowGrowthData
])[['Bull','Category']]

使用我创建的一些数据，low_bulls_by_category_df应该看起来如下所示（为了简化示例，删除了一些类别）：

Bull          Category
0  S10  Low Weaning Data
1  S11  Low Weaning Data
2  S12  Low Weaning Data
0  S10   Low Weight Data
1  S13   Low Weight Data
2  S14   Low Weight Data
0  S12       Low P8 Data
1  S11       Low P8 Data
2  S10       Low P8 Data

我们向每个最小值DataFrame添加一个"Category"列的原因是，当您将所有这些 Dataframe 组合在一起时，您仍然知道每个牛市来自哪个低类别。
然后，我们可以循环遍历每个唯一公牛using a groupby的 Dataframe 部分，并使用每个公牛出现的次数和它们出现在什么类别的信息填充字典。

low_bull_info = {}
for bull,df_group in low_bulls_by_category_df.groupby("Bull"):
    low_bull_info[bull] = {'count':len(df_group), 'categories':df_group['Category'].tolist()}

字典里有我们需要的所有资料，而且我相信Pandas会自动根据出现的次数分组，所以出现在最多类别中的公牛会先出现在你的字典里--当我们以后想按出现次数的降序打印时，这会很方便。

{
    'S10': 
        {'count': 3, 
         'categories': ['Low Weaning Data', 'Low Weight Data', 'Low P8 Data']}, 
     'S11': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S12': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S13': 
        {'count': 1, 
         'categories': ['Low Weight Data']}, 
     'S14': {'count': 1, 'categories': ['Low Weight Data']}
}

然后我们可以循环遍历这个字典，并在循环的每次迭代中以格式化字符串的形式打印出信息：

for bull,bull_info in low_bull_info.items():
    count = bull_info['count']
    categories_str = ', '.join(bull_info['categories'])
    print(f"{bull} has appeared {count} times in category {categories_str}")

这将产生以下输出：

S10 has appeared 3 times in category Low Weaning Data, Low Weight Data, Low P8 Data
S11 has appeared 2 times in category Low Weaning Data, Low P8 Data
S12 has appeared 2 times in category Low Weaning Data, Low P8 Data
S13 has appeared 1 times in category Low Weight Data
S14 has appeared 1 times in category Low Weight Data

pandas 从表中查找公共数据

1条答案

相关问题

热门标签

最新问答