DataFrame的真值不明确,Python.Pandas

9rnv2umw  于 2023-02-28  发布在  Python
关注(0)|答案(3)|浏览(98)

I ran into a problem like "value of a DataFrame is ambiguous". I tried to use logical indexing and special symbols like '&' but it doesn't help.
I have a table
| minuts | seconds | total_sec | cost | total_cost |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 49 | 109 | 1.5 | |
| 0 | 57 | 57 | 0.0 | |
| 0 | 34 | 34 | 0.0 | |
| 2 | 0 | 120 | 2.0 | |
| 0 | 55 | 55 | 0.0 | |
| 6 | 47 | 407 | 4.0 | |
I need to fill in the last column (cost) based on the following logic:

  • if minuts >= 1 and seconds >= 1 then total_cost = cost + 0.5
  • if minuts < 1 and seconds >= 1 then total_cost = cost + 1.5
  • if minuts < 1 and seconds < 1 then total_cost = cost

I tried this code, but it doesn't work:

def check (minuts, seconds):
    if df.loc[(df['minuts'] >= 1) & (df['seconds'] >= 1)]:
        return df['total_cost'] + 0.5

    if df.loc[(df['minuts'] < 1) & (df['seconds'] >= 1)]:
        return df['total_cost'] + 1.5

    else: return df['cost']
qvsjd97n

qvsjd97n1#

以下是使用np.where()的方法:这比服装函数import numpy as np更高效

df['total_cost'] = np.where((df['minuts'] >= 1) & (df['seconds'] >= 1), df['cost'] + 0.5,
                   np.where((df['minuts'] < 1) & (df['seconds'] >= 1), df['cost'] + 1.5,
                   np.where(df['minuts'] < 1, df['cost'], np.nan)))
                   
print(df)
minuts  seconds  total_sec  cost  total_cost
0       1       49        109   1.5         2.0
1       0       57         57   0.0         1.5
2       0       34         34   0.0         1.5
3       2        0        120   2.0         NaN
4       0       55         55   0.0         1.5
5       6       47        407   4.0         4.5
roejwanj

roejwanj2#

按行应用所需条件以计算total_cost值:

def f(row):
    if row['minuts'] < 1:
        if row['seconds'] >= 1:
            return row['cost'] + 1.5
        else:
            return row['cost']
    elif row['seconds'] >= 1:
        return row['cost'] + 0.5
    return row['total_cost']

df['total_cost'] = df.apply(f, axis=1)
minuts  seconds  total_sec  cost  total_cost
0       1       49        109   1.5         2.0
1       0       57         57   0.0         1.5
2       0       34         34   0.0         1.5
3       2        0        120   2.0         NaN
4       0       55         55   0.0         1.5
5       6       47        407   4.0         4.5
6uxekuva

6uxekuva3#

当你执行df.loc[(df['minuts'] >= 1) & (df['seconds'] >= 1)]时,你会得到满足第一个条件的子集,但是当你把if加到它上面时:
if df.loc[(df['minuts'] >= 1) & (df['seconds'] >= 1)]:
现在您尝试计算if <dataframe>:,并且正如错误所述,DataFrame的真值是不明确的。
一种不需要额外导入的解决方案是获取与每个条件对应的索引,然后使用iloc赋值:

q1 = df.query('minuts >= 1 & seconds >= 1').index
df.at[q1,'total_cost'] = df.iloc[q1]['cost'] + 0.5

q2 = df.query('minuts < 1 & seconds >= 1').index
df.at[q2,'total_cost'] = df.iloc[q2]['cost'] + 1.5

q3 = df.query('minuts < 1 & seconds < 1').index
df.at[q3,'total_cost'] = df.iloc[q3]['cost']

注意,我使用了query,但它应该可以与q1 = df.loc[(df['minuts'] >= 1) & (df['seconds'] >= 1)].index一起使用。

相关问题