pandas Python循环并根据限值重新分配数量

vsnjm48y  于 2023-01-24  发布在  Python
关注(0)|答案(1)|浏览(120)
    • bounty将在5天后过期**。回答此问题可获得+50声望奖励。Havard Kleven希望引起更多人关注此问题。

社区!
我一直在解决一个问题,我想这是for循环中的一个简单错误,但我似乎找不到它失败的地方。
"我想达到什么目的"
1.以如下所示的格式遍历数据集(来自. csv)。
1.评估# Capacity constraints部分中提供的限值。
1.对于数据列表中的每一行,检查水平-类别(level_capacity = {"HIGH": 30, "MED": 100, "LOW": 100}。"水平"列包含dict level_capacity中定义的三个限值之一。
1.如果项目尚未分配一周("周"列为"无"),并且level_capacity用于行的级别,则将其分配给current_week并将其写入"周"列。
1.重复此循环,直到level_capacities中的所有密钥都满足level_capacity中指定的要求。
1.当level_capacities.values() == level_capacity时,将current_week递增1(移至下一周)并重复该过程。

    • 什么不起作用-输出**

我试图实现这一点,但它似乎是不完整的。不知何故,当DateFrame已经完全循环通过时,代码停止并中断了循环。我的目标是让它保持循环,直到所有行都在列"Week"中分配了一周,并且所有周都具有最大数量(即level_capacity == level_capacities.values()
为了保持问题的压缩,下面你会发现我的. csv文件的前8行:

Week    Quantity    Level
    1       1           LOW
    19      4           LOW
    39      1           LOW
    4       2           HIGH
    9       18          MED
    12      23          HIGH
    51      11          MED

实际的数据集包含1703行数据,我已经运行了代码,并提取到Excel中查看分布情况:

然而,正如你所看到的分布并不符合上面指定的限制。任何帮助都将非常感激!

    • 代码**
import pandas as pd
    
    # Define capacity constraints
    level_capacity = {"HIGH": 30, "MED": 100, "LOW": 100}
    weekly_capacity = sum(level_capacity.values())
    
    # Init variables
    current_week = 1
    current_weekly_capacity = 0
    level_capacities_met = {"HIGH": False, "MED": False, "LOW": False}
    
    # Load data to DataFrame
    d = {'Week': [1, 19, 39, 4, 9, 12, 51], 'Quantity': [1, 4, 1, 2, 18, 23, 11], 'Level': ['LOW','LOW','LOW','HIGH','MED','HIGH','MED']}
    data = pd.DataFrame(data=d)
    max_week = data["Week"].max()
    
    while current_week <= max_week:
        for index, row in data.iterrows():
                # Check if the level capacity is met
            if not level_capacities_met.get(row["Level"], False):
                if current_weekly_capacity + row["Quantity"] <= weekly_capacity:
                    # Assign the current week
                    data.at[index, "Week"] = current_week
                    current_weekly_capacity += row["Quantity"]
                    if current_weekly_capacity == weekly_capacity:
                        level_capacities_met[row["Level"]] = True
                else:
                    # Move to next week and reset capacities
                    current_week += 1
                    current_weekly_capacity = 0
            elif current_weekly_capacity + row["Quantity"] <= weekly_capacity:
                # Assign the current week
                data.at[index, "Week"] = current_week
                current_weekly_capacity += row["Quantity"]
    # check if all level capacities are met
        if all(level_capacities_met.values()):
            current_week += 1
            current_weekly_capacity = 0
            level_capacities_met = {"HIGH": False, "MED": False, "LOW": False}
            
    print (data)
czq61nw1

czq61nw11#

我的印象是,问题中提供的代码在推理上有(至少一个)错误,也使事情过于复杂。
主要的事情可能是current_weekly_capacity与总weekly_capacity进行比较,但是这个总weekly_capacity没有考虑是否达到了级别(HIGHMEDIUMLOW)的容量。
把下面的场景当作一个(虚构的)例子:你遍历一个数据框,weekly_capacity还没有达到(假设当前值为100),level_capacities_met = {"HIGH": False, ...}也没有达到,当前行的Week值为None,级别为HIGH,数量为50,那么本周的产能就可以很好的规划了。即使它超过级别X1 M12 N1 X的可用容量。
我建议使用current_weekly_capacities = {"HIGH": 0, "MED": 0, "LOW": 0}分别计算每个级别的容量,而不是计算current_weekly_capacity(另请参见代码中的其他注解)。

import pandas as pd
# Define capacity constraints
level_capacity = {"HIGH": 30, "MED": 100, "LOW": 100}
#weekly_capacity = sum(level_capacity.values())  # not required

# Load data to DataFrame    
data = {
        'Week': [1, 19, 39, 4, 9, 12, 51, 39, 39, 39, 39, 39, 40, 40, 41, 41],
        'Quantity': [1, 4, 1, 2, 18, 23, 11, 1, 10, 8, 10, 5, 3, 1, 40, 2],
        'Level': ["LOW", "LOW", "LOW", "HIGH", "MED", "HIGH", "MED", "HIGH", "LOW", "LOW", "LOW", "HIGH", "HIGH", "HIGH", "HIGH", "HIGH" ]
        }
df = pd.DataFrame(data)
#df.at["Week"]  = None  # ?? --> replaced by df["Week"] = None (after getting value for max_week)
max_week = df["Week"].max()
df["Week"] = None

检查任何级别和任何周的Quantity之和是否超过level_capacity

g = df.groupby(by=['Week', 'Level']).sum().reset_index()
g.rename(columns={'Quantity':'sum'}, inplace=True)
def set_level_capacity(row):
    if row['Level'] == 'HIGH':
        return 30
    if row['Level'] == 'MED':
        return 100
    if row['Level'] == 'LOW':
        return 100
    return -1
g['level_capacity'] = g.apply(lambda row: set_level_capacity(row), axis=1)
print(g[g['sum'] > g['level_capacity']])

现在,规划逻辑(* 由于问题 * 的一些附加信息,需要修改):

# Init variables for loop
current_week = 1
current_weekly_capacities = {"HIGH": 0, "MED": 0, "LOW": 0}  #  instead of current_weekly_capacity = 0
level_capacities_met = {"HIGH": False, "MED": False, "LOW": False}  # (1)

while current_week <= max_week:
    for index, row in df.iterrows():
        # Check if already assigned to a week
        if pd.isna(row["Week"]):
            level = row["Level"]
            # Check if the level capacity is met
            if not level_capacities_met[level]:  # (2)
                if current_weekly_capacities[level] + row["Quantity"] <= level_capacity[level]: # (3) 
                    # Assign the current week
                    df.at[index, "Week"] = current_week
                    current_weekly_capacities[level] += row["Quantity"]  # (4)
                    if current_weekly_capacities[level] == level_capacity[level]:
                        level_capacities_met[level] = True
                        if all(level_capacities_met.values()):  # (5)
                            current_week += 1
                            current_weekly_capacity = 0
                            level_capacities_met = {"HIGH": False, "MED": False, "LOW": False}
                            break
                # removed/not required
                #else:
                #    # Move to next week and reset capacities
                #    current_week += 1
                #    current_weekly_capacity = 0
            # (?? - not required) Level capacities are met but current weekly capacity + row capacity < weekly capacity
            #elif current_weekly_capacity + row["Quantity"] <= weekly_capacity:
            #    # Assign the current week
            #    df.at[index, "Week"] = current_week
            #    current_weekly_capacity += row["Quantity"]
    # (6) moved logic up (see 5) 
    #if all(level_capacities_met.values()):
    #    current_week += 1
    #    current_weekly_capacity = 0
    #    level_capacities_met = {"HIGH": False, "MED": False, "LOW": False}
    # (7) current_week should always be increased after iterating over the complete data frame to avoid an infinite loop
    current_week += 1
    current_weekly_capacities = {"HIGH": 0, "MED": 0, "LOW": 0} 
    level_capacities_met = {"HIGH": False, "MED": False, "LOW": False}

(1)我保留了level_capacities_met,尽管实际上并不需要该变量(--〉可以通过检查给定级别的current_weekly_capacities[level]是否低于容量级别来替换)。
(2)我建议使用level_capacities_met[level]而不是get操作符level_capacities_met.get(level, False),这样在级别未知的情况下代码会抛出一个错误(--〉数据库中的错误)。
(3)检查当前级别的容量(不是汇总所有级别的容量值)。
(4)将数量添加到当前特定级别。
(5)如果满足所有每周容量,则重新开始迭代 Dataframe

相关问题