pandas 使用np.where访问字典中的值-矢量化

jjjwad0x  于 2023-01-24  发布在  其他
关注(0)|答案(1)|浏览(94)

给我一本字典

coaching_hours_per_level = {1:30, 2: 55, 3:80, 4:115}
coaching_hours_per_level

和 Dataframe :

df1 = {'skill_0': {'jay': 1, 'roy': 4, 'axel': 5, 'billy': 1, 'charlie': 2},
 'skill_1': {'jay': 5, 'roy': 3, 'axel': 2, 'billy': 5, 'charlie': 1},
 'skill_2': {'jay': 4, 'roy': 1, 'axel': 2, 'billy': 1, 'charlie': 4},
 'skill_3': {'jay': 1, 'roy': 3, 'axel': 5, 'billy': 4, 'charlie': 3},
 'skill_4': {'jay': 3, 'roy': 4, 'axel': 2, 'billy': 3, 'charlie': 4},
 'skill_5': {'jay': 5, 'roy': 2, 'axel': 4, 'billy': 2, 'charlie': 4},
 'skill_6': {'jay': 5, 'roy': 5, 'axel': 2, 'billy': 5, 'charlie': 1},
 'skill_7': {'jay': 3, 'roy': 3, 'axel': 4, 'billy': 2, 'charlie': 1},
 'skill_8': {'jay': 1, 'roy': 4, 'axel': 2, 'billy': 1, 'charlie': 2},
 'skill_9': {'jay': 4, 'roy': 3, 'axel': 4, 'billy': 2, 'charlie': 1}}

我的目标是:

target = {'skill_0': {'jim': 3},
 'skill_1': {'jim': 5},
 'skill_2': {'jim': 1},
 'skill_3': {'jim': 2},
 'skill_4': {'jim': 1},
 'skill_5': {'jim': 2},
 'skill_6': {'jim': 3},
 'skill_7': {'jim': 5},
 'skill_8': {'jim': 3},
 'skill_9': {'jim': 3}}

我想做的是了解一个人需要多少小时的辅导才能赶上某个技能的水平。例如,对于技能为0的周杰伦来说,周杰伦需要提升2级技能(即30 + 55,总共85小时)。如果技能已经达到或超过同一级别,则应为0。
我已经尝试了如下的np.where,它可以获得差值

np.where(df1>=target.values, 0, target.values-df1)

但是,当我尝试访问字典以获得指导所需的总小时数时,就像np一样。where不再进行矢量化,即使我尝试简单地访问dict中的值

np.where(df1>=target.values, 0, coaching_hours_per_level[target.values+1])
8i9zcol2

8i9zcol21#

您可以构建一个小时矩阵来指示从级别x到级别y所需的时间。
首先是一些示例数据:

current = {
    "skill_0": {"jay": 1, "roy": 4, "axel": 5, "billy": 1, "charlie": 2},
    "skill_1": {"jay": 5, "roy": 3, "axel": 2, "billy": 5, "charlie": 1},
    "skill_2": {"jay": 4, "roy": 1, "axel": 2, "billy": 1, "charlie": 4},
    "skill_3": {"jay": 1, "roy": 3, "axel": 5, "billy": 4, "charlie": 3},
    "skill_4": {"jay": 3, "roy": 4, "axel": 2, "billy": 3, "charlie": 4},
    "skill_5": {"jay": 5, "roy": 2, "axel": 4, "billy": 2, "charlie": 4},
    "skill_6": {"jay": 5, "roy": 5, "axel": 2, "billy": 5, "charlie": 1},
    "skill_7": {"jay": 3, "roy": 3, "axel": 4, "billy": 2, "charlie": 1},
    "skill_8": {"jay": 1, "roy": 4, "axel": 2, "billy": 1, "charlie": 2},
    "skill_9": {"jay": 4, "roy": 3, "axel": 4, "billy": 2, "charlie": 1},
}

# We will up the challenge a bit by saying not everyone
# wants to level up every skill
target = {
    "skill_0": {"jay": 3, "charlie": 5},
    "skill_1": {"jay": 5, "charlie": 5},
    "skill_2": {"jay": 1, "charlie": 1},
    "skill_3": {"jay": 2, "charlie": 1},
    "skill_4": {"jay": 1, "charlie": 1},
    "skill_5": {"jay": 2},
    "skill_6": {"jay": 3},
    "skill_7": {"jay": 5},
    "skill_8": {"jay": 3},
    "skill_9": {"jay": 3},
}

算法:

coaching_hours_per_level = {1:30, 2: 55, 3:80, 4:115}
hours = [0] + list(coaching_hours_per_level.values())

# The value in hours_matrix[i, j] is the total time it takes
# to go from level (i + 1) to level (j + 1). Notice that
# hours_matrix[i, j] = 0 if i < j -- no time is needed to
# down-level.
hours_matrix = np.triu(
    np.tile(hours, (len(hours), 1)),
    k=1,
).cumsum(axis=1)

# Now line up the data
result = (
    pd.concat(
        [pd.DataFrame(current).unstack(), pd.DataFrame(target).unstack()],
        axis=1,
        keys=["current", "target"],
    )
    .dropna()
    .astype("int")
)

# And the final step is just taking data from hours_matrix
result["hours"] = hours_matrix[result["current"] - 1, result["target"] - 1]

结果:

current  target  hours
skill_0 jay            1       3     85
        charlie        2       5    250
skill_1 jay            5       5      0
        charlie        1       5    280
skill_2 jay            4       1      0
        charlie        4       1      0
skill_3 jay            1       2     30
        charlie        3       1      0
skill_4 jay            3       1      0
        charlie        4       1      0
skill_5 jay            5       2      0
skill_6 jay            5       3      0
skill_7 jay            3       5    195
skill_8 jay            1       3     85
skill_9 jay            4       3      0

相关问题