更有效的方法来测试大 Dataframe ,并基于不同大小的另一个值添加值/不合并

hpcdzsge  于 2022-10-23  发布在  其他
关注(0)|答案(2)|浏览(156)

关于合并和完全合并的答案很多,但无法找到更有效的方法。为了我的处境。
当前版本的python、pandas、numpy和文件格式为Parquet
简单地说,如果col1==x,col10=1,col11=2,col……等等。

look1 = 'EMPLOYEE'
look2 = 'CHESTER'
look3 = "TONY'S"
look4 = "VICTOR'S"

tgt1 = 'inv_group'
tgt2 = 'acc_num'

for x in range(len(df['ph_name'])):
    df[tgt1][x] = 'MEMORIAL'
    df[tgt2][x] = 12345
elif df['ph_name'][x] == look2:
    df[tgt1][x] = 'WALMART'
    df[tgt2][x] = 45678
elif df['ph_name'][x] == look3:
    df[tgt1][x] = 'TONYS'
    df[tgt2][x] = 27359
elif df['ph_name'][x] == look4:
    df[tgt1][x] = 'VICTOR'
    df[tgt2][x] = 45378

basic sample:
  unit_name        tgt1        tgt2
0 EMPLOYEE         Nan         Nan
1 EMPLOYEE         Nan         Nan
2 TONY'S           Nan         Nan
3 CHESTER          Nan         Nan
4 VICTOR'S         Nan         Nan
5 EMPLOYEE         Nan         Nan

GOAL:
  unit_name        tgt1        tgt2
0 EMPLOYEE         MEMORIAL    12345
1 EMPLOYEE         MEMORIAL    12345
2 TONY'S           TONYS       27359
3 CHESTER          WALMART     45678
4 VICTOR'S         VICTOR      45378
5 EMPLOYEE         MEMORIAL    12345

所以这是有效的……我添加了自定义列值,它在阳光下不是最快的,但它是有效的。
它在28896行上占6.2429744。我担心当我把它付诸实践时,它会开始拖累我。
另一个缺点是我有这种烦恼……是的,我可以保持沉默,但我觉得这可能是由于一个糟糕的做法,我应该知道如何减少。

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

大体上
1.是否有优化的方法?
1.这个警告是由于一个坏习惯、我的无知,还是我需要沉默?

exdqitrt

exdqitrt1#

给定:(拥有所有NaN列是愚蠢的)

unit_name
0  EMPLOYEE
1  EMPLOYEE
2    TONY'S
3   CHESTER
4  VICTOR'S
5  EMPLOYEE

df = pd.DataFrame({'unit_name': {0: 'EMPLOYEE', 1: 'EMPLOYEE', 2: "TONY'S", 3: 'CHESTER', 4: "VICTOR'S", 5: 'EMPLOYEE'}})

做:(让我们使用pd.Series.map并创建一个字典,以便于将来修改)

looks = ['EMPLOYEE', 'CHESTER', "TONY'S", "VICTOR'S"]

new_cols = {
   'inv_group': ["MEMORIAL", "WALMART", "TONYS", "VICTOR"],
   'acc_num': [12345, 45678, 27359, 45378]
}

for col, values in new_cols.items():
    df[col] = df['unit_name'].map(dict(zip(looks, values)))

print(df)

输出:(我假设您键入了错误的列名)

unit_name inv_group  acc_num
0  EMPLOYEE  MEMORIAL    12345
1  EMPLOYEE  MEMORIAL    12345
2    TONY'S     TONYS    27359
3   CHESTER   WALMART    45678
4  VICTOR'S    VICTOR    45378
5  EMPLOYEE  MEMORIAL    12345
rekjcdws

rekjcdws2#

因为我看不到你的数据,所以在这里盲目飞行:

cond_list = [df["ph_name"] == look for look in [look1, look2, look3, look4]]

# Rows ph_name outside of the list will keep their original values

df[tgt1] = np.select(cond_list, ["MEMORIAL", "WALMART", "TONY'S", "VICTOR"])
df[tgt2] = np.select(cond_list, [12345, 45678, 27359, 45378])

相关问题