如何更新一个数据框中存在于另一个数据框中的特定行- Pyton/Pandas

yk9xbfzb  于 2023-05-21  发布在  其他
关注(0)|答案(1)|浏览(88)

我正在尝试将我们拥有的一些SQL查询迁移到Python脚本中,该脚本最近将所有最新更新的数据插入到DB中。
我一直在尝试更新dataframe中id存在的特定位置处的行,为此我使用了numpy intersect,然后更新了值,但我不确定如何仅在特定位置更新值,而不是循环遍历每个位置。
查看pandas文档,我发现它们有一个df.at语法,但这会根据索引号更新值,这与根据另一种类型的id更新不同。
如何使用.at或其他语法更新值?
SQL查询是简单的更新,检查它是否在特定的表中,然后更新用户的状态。例如:

UPDATE users
SET status = 'Active Customer'
WHERE id IN (SELECT id FROM logs)
AND id IS NOT NULL

UPDATE users
SET status = 'Non Active Customer'
WHERE id NOT IN (SELECT id FROM logs)
AND id IS NOT NULL
AND status = 'Unknown'

UPDATE users
SET status = 'Active Customer (Calls)'
WHERE id IN (SELECT user_id FROM calls)
AND id > 0
AND status = 'Unknown'

以下是我在Jupyter中尝试的示例

import pandas as pd
import numpy as np

### creating a test DF
users_df = pd.DataFrame(
{'name':['Bob','Olivia','Gustav','Bella', 'Fran', 'Marco'],
 'user_id':['10',np.nan,'30','50','32',np.nan]
}
)

users_df["user_id"] = pd.to_numeric(users_df["user_id"])
users_df['status'] = 'Unknown'
users_df

### creating a log test DF
logs = pd.DataFrame(
{'user_id':['10','20','30','32','55']}
)
logs["user_id"] = pd.to_numeric(logs["user_id"])

### Updating the values in the status column -- updates everything instead of only certain rows
intersect = np.intersect1d(users_df['user_id'], logs['user_id'])
users_df['status'] = ['Active Customer' if x in intersect else 'Unknown' for x in users_df['user_id']]
users_df
zzoitvuj

zzoitvuj1#

下面是另一种使用Pandas loc属性和isin方法的方法:

# Updating the values in the status column
users_df.loc[
    users_df["user_id"].isin(logs["user_id"].values), "status"
] = "Active Customer"

然后:

print(users_df)
# Output

     name  user_id           status
0     Bob     10.0  Active Customer
1  Olivia      NaN          Unknown
2  Gustav     30.0  Active Customer
3   Bella     50.0          Unknown
4    Fran     32.0  Active Customer
5   Marco      NaN          Unknown

相关问题