pandas 从表中查找公共数据

h7wcgrx3  于 2023-01-11  发布在  其他
关注(0)|答案(1)|浏览(141)

我正在使用Jupyter,并试图找到共同的数据,让它告诉我有多少次公牛已经在前3名或后3名

LowWeaningData = {}
LowWeightData = {}
LowP8Data = {}
LowRibData = {}
LowEmaData = {}
LowImfData = {}
LowGrowthData = {}

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)'])

LowWeightData = df_bulls.nsmallest(3,['Weight (kg)'])

LowP8Data = df_bulls.nsmallest(3,['P8'])

LowRibData = df_bulls.nsmallest(3,['RIB'])

LowEmaData = df_bulls.nsmallest(3,['EMA(cm)'])

LowImfData = df_bulls.nsmallest(3,['IMF'])

LowGrowthData = df_bulls.nsmallest(3,['Growth %'])

打印这个

This list is of the Lowest Growth % Data 
   Bull         Sire  Dam  Weaning Weight(kg)  Weight (kg)  P8  RIB  EMA(cm)  \
5  S10  Black Magic  L16               522.0          818   7    6      124   
1  S24          P42  L11               469.0          774   7    6      116   
2  S32          P41   M6               401.0          662   6    5      105   

   IMF   Growth %  
5  6.3  56.704981  
1  5.6  65.031983  
2  4.3  65.087282

下一部分是尝试找出bull在列表中出现了多少次以及是哪几次,所以我使用这段代码

lower_elements_in_all = 
list(set.intersection(*map(set[LowWeaningData, LowWeightData, 
LowP8Data, LowRibData, LowEmaData, LowImfData, LowGrowthData])))

我一直收到这个而不是真正的公牛名字

['Bull', 'Sire', 'P8', 'EMA(cm)', 'RIB', 'Growth %', 'Dam', 'Weaning Weight(kg)', 'IMF', 'Weight (kg)']

我希望数据返回的方式如下

S10 has appeared 3 times in category Low Weaning Data, Low P8 Data and Low Rib Data
S24 has appeared 3 times in etc
S32 has appeared 1 times in etc

按降序排列,所以很容易看到

jm81lzqq

jm81lzqq1#

首先,我将向每个 Dataframe 添加一个名为"Category"的新列,并为相应的类别添加一个字符串。

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)']).assign(category="Low Weaning Data")
LowWeightData = df_bulls.nsmallest(3,['Weight (kg)']).assign(category="Low Weight Data")
LowP8Data = df_bulls.nsmallest(3,['P8']).assign(category="Low P8 Data")
LowRibData = df_bulls.nsmallest(3,['RIB']).assign(category="Low Rib Data")
LowEmaData = df_bulls.nsmallest(3,['EMA(cm)']).assign(category="Low EMA Data")
LowImfData = df_bulls.nsmallest(3,['IMF']).assign(category="Low IMF Data")
LowGrowthData = df_bulls.nsmallest(3,['Growth %']).assign(category="Low Growth Data")

然后使用pd.concat将所有这些 Dataframe 组合在一起:

low_bulls_by_category_df = pd.concat([
    LowWeaningData,
    LowWeightData,
    LowP8Data,
    LowRibData,
    LowEmaData,
    LowImfData,
    LowGrowthData
])[['Bull','Category']]

使用我创建的一些数据,low_bulls_by_category_df应该看起来如下所示(为了简化示例,删除了一些类别):

Bull          Category
0  S10  Low Weaning Data
1  S11  Low Weaning Data
2  S12  Low Weaning Data
0  S10   Low Weight Data
1  S13   Low Weight Data
2  S14   Low Weight Data
0  S12       Low P8 Data
1  S11       Low P8 Data
2  S10       Low P8 Data

我们向每个最小值DataFrame添加一个"Category"列的原因是,当您将所有这些 Dataframe 组合在一起时,您仍然知道每个牛市来自哪个低类别。
然后,我们可以循环遍历每个唯一公牛using a groupby的 Dataframe 部分,并使用每个公牛出现的次数和它们出现在什么类别的信息填充字典。

low_bull_info = {}
for bull,df_group in low_bulls_by_category_df.groupby("Bull"):
    low_bull_info[bull] = {'count':len(df_group), 'categories':df_group['Category'].tolist()}

字典里有我们需要的所有资料,而且我相信Pandas会自动根据出现的次数分组,所以出现在最多类别中的公牛会先出现在你的字典里--当我们以后想按出现次数的降序打印时,这会很方便。

{
    'S10': 
        {'count': 3, 
         'categories': ['Low Weaning Data', 'Low Weight Data', 'Low P8 Data']}, 
     'S11': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S12': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S13': 
        {'count': 1, 
         'categories': ['Low Weight Data']}, 
     'S14': {'count': 1, 'categories': ['Low Weight Data']}
}

然后我们可以循环遍历这个字典,并在循环的每次迭代中以格式化字符串的形式打印出信息:

for bull,bull_info in low_bull_info.items():
    count = bull_info['count']
    categories_str = ', '.join(bull_info['categories'])
    print(f"{bull} has appeared {count} times in category {categories_str}")

这将产生以下输出:

S10 has appeared 3 times in category Low Weaning Data, Low Weight Data, Low P8 Data
S11 has appeared 2 times in category Low Weaning Data, Low P8 Data
S12 has appeared 2 times in category Low Weaning Data, Low P8 Data
S13 has appeared 1 times in category Low Weight Data
S14 has appeared 1 times in category Low Weight Data

相关问题