pandas 在我的代码上使用itterrows()以外的更有效的方法？

4c8rllxm 于 2022-11-27 发布在其他

关注(0)|答案(1)|浏览(166)

这段代码要花很长时间才能运行，因为我有100万行和43列。它的想法是尝试找到对特定数量的列具有相同的值，但"CA"列必须相反，我们删除这对，因为它们将被视为反向行。
即我有一个 Dataframe = df
| A列|B栏|C列|D栏|
| - -| - -| - -| - -|
| '棕色'| '瓶'|小行星1234555| 100个|
| '黄色'| "杯"|小行星1234555|八十|
| '红色'| '瓶'|小行星1234555| -100个|
| '红色'| '瓶'|小行星1234555| -100个|
| '棕色'| '瓶'|小行星1234533| 100个|
如果我决定考虑B列和C列，程序将删除第一行和第三行，因为它们在B列和C列中的值相同，而在D列中的值相反（一个正，一个负）。它们也将被视为反转行，因此只删除这对行。
所需输出：
| A列|B栏|C列|D栏|
| - -| - -| - -| - -|
| '黄色'| "杯"|小行星1234555|八十|
| '红色'| '瓶'|小行星1234555| -100个|
| '棕色'| '瓶'|小行星1234533| 100个|
我目前拥有的代码是这样的，但是效率非常低：

df_dupes = data[data.duplicated(subset = criteria_, keep=False)]
df_dupes_list = np.array(df_dupes.to_numpy().tolist())

df_1 = df_dupes_list[:,[0,1,7,9,8,23,35]]

df_2 = df_1.tolist()

for i, row in df_dupes.iterrows():
    if row.ConvertedAUD < 0 and [row.BA, row.OA, row.BN, row.DN, row.DT,row.D, -row.CA] in df_2:
        try:
            c = np.where((data['BA'] ==row.BA) & (data['OA'] ==row.OA) & (data['BN'] ==row.BN)& (data['DT']         ==row.DT)& (data['DN'] ==row.DN)& (data['D'] ==row.D)&  (data['CA'] ==-row.CA))[0][0]

            data.drop(labels=[i,data.index.values[c]], axis=0, inplace=True)
        except:
            pass

pandas

来源：https://stackoverflow.com/questions/74526965/more-efficient-ways-other-than-itterrows-on-my-code

1条答案

按热度按时间

ahy6op9u1#

我的解决方案是这样的：增加一个结构来快速找到相反的对，并创建一个布尔掩码进行过滤，而不是在循环中调用drop()。

import pandas as pd

data = pd.DataFrame(
    [
        ["Brown", "Bottle", 1234555, 100],
        ["yellow", "Cup", 1234555, 80],
        ["Red", "Bottle", 1234555, -100],
        ["Red", "Bottle", 1234555, -100],
        ["Brown", "Bottle", 1234533, 100],
    ],
    columns=["A", "B", "C", "D"],
)

# "lookup table"
seen = {} # {(key1, key2): (index, value)}
# which rows to keep?
mask = pd.Series(True, index=data.index)

# itertuples is faster than iterrows
for row in data.itertuples():
    # create a lookup key
    key = (row.B, row.C)
    if key not in seen:
        # store Index and Value in the "lookup table"
        # if we haven't seen this key before
        seen[key] = (row.Index, row.D)
    else:
        prev_index, prev_value = seen[key]
        # if the stored value is the opposite of the current one
        if prev_value == -row.D:
            # we don't want to keep both rows
            mask.loc[prev_index] = False
            mask.loc[row.Index] = False
            # and remove the key from the lookup table
            del seen[key]
        # else:
            # undefined case:
            # the key exists, but the value is not
            # the opposite of the previous one

# remove "collapsed" rows from the data
result = data[mask]

赞(0）回复(0）举报 2022-11-27

我来回答

pandas 在我的代码上使用itterrows()以外的更有效的方法？

1条答案

相关问题

热门标签

最新问答