从pandas嵌套框架创建嵌套字典

nc1teljy 于 12个月前发布在其他

关注(0)|答案(5)|浏览(114)

我有这张table：
| 团队|X或Y|百分比|
| --|--|--|
| 一|X|百分之八十|
| 一|Y|百分之二十|
| B| X|百分之七十|
| B| Y|百分之三十|
| C| X|百分之六十|
| C| Y|百分之四十|
我想创建一个嵌套的字典，这样如果我输入球队名称和X或Y，我会得到百分比作为返回值。
在Python中，我使用.tolist（）方法来创建每个列的列表。
我最初的策略是先从后两列dict_1 = dict(zip(list2, list3))和dict_2 = dict(zip(list1, dict_1))中创建一个dict，但这并没有成功，因为列“X或Y”有类似的值，而字典键不能有重复的值。
我想的输出是

{'A':{'X':80%, 'Y':20%}, 'B':{'X':70%,'Y':30%}, ...}

字符串
我该怎么做呢？有更好的方法吗？

pandas

来源：https://stackoverflow.com/questions/77669166/create-nested-dictionary-from-a-pandas-dataframe

5条答案

按热度按时间

qni6mghb1#

使用pd.DataFrame.pivot：

>>> df.pivot(columns='Team', index='X or Y', values='Percentage').to_dict()
{'A': {'X': '80%', 'Y': '20%'}, 'B': {'X': '70%', 'Y': '30%'}, 'C': {'X': '60%', 'Y': '40%'}}

字符串

赞(0）回复(0）举报 12个月前

92vpleto2#

完成此任务的最快方法是使用itertuples()遍历该框架并动态创建字典。

result = {}
for Team, XorY, Percentage in df.itertuples(index=False):
    result.setdefault(Team, {})[XorY] = Percentage

字符串
其中result现在变成期望值：

{'A': {'X': '80%', 'Y': '20%'},
 'B': {'X': '70%', 'Y': '30%'},
 'C': {'X': '60%', 'Y': '40%'}}

型
一个更“pandas”的代码可以在groupby中调用to_dict：

result = (
    df.groupby('Team')
    .apply(lambda g: g.set_index('X or Y')['Percentage'].to_dict())
    .to_dict()
)

型
这比itertuples循环慢。
不完全相同，但this answer也从一个嵌套对象构造一个嵌套对象，并包含一个基准。

赞(0）回复(0）举报 12个月前

blpfk2vs3#

解决方案1

一个可能的解决方案，使用pandas.stack，然后是pandas.unstack：

(df.set_index(['Team', 'X or Y'])
.stack().droplevel(2).unstack('X or Y').T.to_dict())

字符串
@cottontail在下面的评论中建议，这个解决方案的一个更短，更有效的版本：

df.set_index(['Team', 'X or Y'])['Percentage'].unstack('Team').to_dict()

型

解决方案2

另一种可能的解决方案是使用groupby.apply来构造字典：

(df.groupby('Team').apply(lambda x: 
    {'X': x.loc[x['X or Y'].eq('X'), 'Percentage'].iloc[0], 
     'Y': x.loc[x['X or Y'].eq('Y'), 'Percentage'].iloc[0]})
.to_dict())

型

输出

{'A': {'X': '80%', 'Y': '20%'},
 'B': {'X': '70%', 'Y': '30%'},
 'C': {'X': '60%', 'Y': '40%'}}

型

赞(0）回复(0）举报 12个月前

798qvoo84#

循环所有列（使用zip()函数）：

nd = {}  # nested dict
for team, xy, percentage in zip(data['Team'], data['X or Y'], data['Percentage']):
    if team not in nd:
        nd[team] = {}
    nd[team][xy] = percentage

字符串
示例代码：

import pandas as pd
data = {
    'Team': ['A', 'A', 'B', 'B', 'C', 'C'],
    'X or Y': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
    'Percentage': ['80%', '20%', '70%', '30%', '60%', '40%']
}
df = pd.DataFrame.from_dict(data)
print(df)

nd = {}  # nested dict
for team, xy, percentage in zip(data['Team'], data['X or Y'], data['Percentage']):
    if team not in nd:
        nd[team] = {}
    nd[team][xy] = percentage
print(nd)

型
DF和嵌套Dict输出：

Team X or Y Percentage
0    A      X        80%
1    A      Y        20%
2    B      X        70%
3    B      Y        30%
4    C      X        60%
5    C      Y        40%
{'A': {'X': '80%', 'Y': '20%'}, 'B': {'X': '70%', 'Y': '30%'}, 'C': {'X': '60%', 'Y': '40%'}}

型

赞(0）回复(0）举报 12个月前

w8f9ii695#

我回答了你的最终目标，你想“输入球队名称和输入X或Y，[和]得到的百分比作为返回值”，但输出将看起来与你上面建议的不同。如果这对你不起作用，请忽略这个答案。
我会重新索引表，将输入作为索引，然后从那里获取字典，所以：

import pandas as pd
data = {
    'Team': ['A', 'A', 'B', 'B', 'C', 'C'],
    'X or Y': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
    'Percentage': ['80%', '20%', '70%', '30%', '60%', '40%']
}
df = pd.DataFrame.from_dict(data)
df = df.set_index(['Team', 'X or Y'])
df1.to_dict()

{'Percentage': {('A', 'X'): '80%', ('A', 'Y'): '20%', ('B', 'X'): '70%', ('B', 'Y'): '30%', ('C', 'X'): '60%', ('C', 'Y'): '40%'}}

字符串

赞(0）回复(0）举报 12个月前

我来回答

从pandas嵌套框架创建嵌套字典

5条答案

相关问题

热门标签

最新问答