pandas 使用原始 Dataframe 中的值迭代替换 Dataframe 中的每个单元格

jjjwad0x  于 2023-04-18  发布在  其他
关注(0)|答案(3)|浏览(113)

下面是一个示例dataframe:

我需要能够获得“2023-01-01”* 编辑:(一串随机数,不是真正的Date对象)* 和“Python太棒了”,并通过函数发送(do_calculations(date, phrase)),它将返回一个新值,然后我将通过一个函数发送“2023-01-01”和“Is the pizza”,新的返回值将被放在“Is the pizza”的位置。最后,我会得到“2023-01-01”和“披萨”,并做同样的事情。
然后,我将沿着列向下,对“2023-01-02”进行相同的操作,然后是“2023-01-03”,依此类推,直到所有单元格都被替换。
我试过以下方法:

for i, row in new_df.iterrows():
    print('index: ', i)
    print('row: ', row['Date'], row['Title1'], row.index)
    if row['Title1']:
        text = do_calculations(row['Date'], row['Title1'][0])
        #print("TEXT:", text)
        value = new_df.at[i, row.index[1]]
        print("VALUE:", value)
        
        new_df.at[i, row.index[2]] = text

但是不能让它工作。我想这里需要另一个for循环,并且更好地使用i索引。
无论是生成新的 Dataframe ,还是就地更新 Dataframe ,都不重要,无论哪个更快都是优选的。
下面是生成示例 Dataframe 的代码:

import pandas as pd
import random
import datetime

# Create a list of dates
date_rng = pd.date_range(start='1/1/2023', end='1/10/2023', freq='D')

# Generate random phrases
phrases = ['Hello world', 'Python is awesome', None, 'Data science is fun', 'I love coding', 'Pandas is powerful', 'Pineapples', 'Pizza', 'Krusty', 'krab', 'Is the pizza']

# Create an empty DataFrame
df = pd.DataFrame(columns=['Date', 'title1', 'title2', 'title3'])

# Populate DataFrame with random phrases
for date in date_rng:
    # Generate random phrases for each column
    row = [date]
    row.extend(random.sample(phrases, 3))
    
    # Append row to DataFrame
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)

# Print DataFrame
print(df)

edit:我已经澄清了传递的参数之一是一个数字字符串,而不是一个真正的日期对象,大多数答案似乎都考虑到了这一点。

qzlgjiam

qzlgjiam1#

IIUC,你可以用两个循环来做以下操作

for i, row in new_df.iterrows():
    for col in ['title1', 'title2', 'title3']:
        if row[col]:
            text = do_calculations(row['Date'], row[col])
            new_df.loc[i, col] = text
zpf6vheq

zpf6vheq2#

下面是通过使用apply逐个单元调用函数来完成您所要求的操作的方法:

df[df.columns[1:]] = (
    df.apply(lambda row: [do_calculations(row.Date.date(), val) 
    for val in row[1:]], axis=1, result_type='expand') )

示例功能:

callnum = [0]
def do_calculations(date, phrase):
    callnum[0] += 1
    return f'{phrase} {date} {callnum[0]}'

输出:

Date                            title1                          title2                             title3
0  2023-01-01 00:00:00               Krusty 2023-01-01 1      I love coding 2023-01-01 2    Pandas is powerful 2023-01-01 3
1  2023-01-02 00:00:00                 krab 2023-01-02 4  Python is awesome 2023-01-02 5   Data science is fun 2023-01-02 6
2  2023-01-03 00:00:00                 None 2023-01-03 7      I love coding 2023-01-03 8                 Pizza 2023-01-03 9
3  2023-01-04 00:00:00   Python is awesome 2023-01-04 10        Pineapples 2023-01-04 11                 krab 2023-01-04 12
4  2023-01-05 00:00:00       I love coding 2023-01-05 13        Pineapples 2023-01-05 14                 krab 2023-01-05 15
5  2023-01-06 00:00:00               Pizza 2023-01-06 16              None 2023-01-06 17               Krusty 2023-01-06 18
6  2023-01-07 00:00:00          Pineapples 2023-01-07 19             Pizza 2023-01-07 20    Python is awesome 2023-01-07 21
7  2023-01-08 00:00:00        Is the pizza 2023-01-08 22        Pineapples 2023-01-08 23                 None 2023-01-08 24
8  2023-01-09 00:00:00  Pandas is powerful 2023-01-09 25             Pizza 2023-01-09 26  Data science is fun 2023-01-09 27
9  2023-01-10 00:00:00  Pandas is powerful 2023-01-10 28              None 2023-01-10 29           Pineapples 2023-01-10 30

或者,如果你的函数可以为日期和短语设置Series参数的日期,你可以在逐列的基础上使用apply:

df[df.columns[1:]] = ( df[df.columns[1:]]
    .apply(lambda col: vect_calculations(df.Date.infer_objects(), col)) )

示例功能:

vect_callnum = [0]
def vect_calculations(date, phrase):
    vect_callnum[0] += 1
    return phrase + ' ' + date.astype(str) + f' {vect_callnum[0]}'

输出:

Date                            title1                            title2                            title3
0  2023-01-01 00:00:00          Hello world 2023-01-01 1               Krusty 2023-01-01 2                 krab 2023-01-01 3
1  2023-01-02 00:00:00         Is the pizza 2023-01-02 1               Krusty 2023-01-02 2                               NaN
2  2023-01-03 00:00:00          Hello world 2023-01-03 1    Python is awesome 2023-01-03 2                 krab 2023-01-03 3
3  2023-01-04 00:00:00           Pineapples 2023-01-04 1               Krusty 2023-01-04 2    Python is awesome 2023-01-04 3
4  2023-01-05 00:00:00          Hello world 2023-01-05 1               Krusty 2023-01-05 2         Is the pizza 2023-01-05 3
5  2023-01-06 00:00:00           Pineapples 2023-01-06 1  Data science is fun 2023-01-06 2    Python is awesome 2023-01-06 3
6  2023-01-07 00:00:00   Pandas is powerful 2023-01-07 1          Hello world 2023-01-07 2                Pizza 2023-01-07 3
7  2023-01-08 00:00:00           Pineapples 2023-01-08 1                               NaN  Data science is fun 2023-01-08 3
8  2023-01-09 00:00:00  Data science is fun 2023-01-09 1                Pizza 2023-01-09 2                               NaN
9  2023-01-10 00:00:00                Pizza 2023-01-10 1         Is the pizza 2023-01-10 2                 krab 2023-01-10 3

请注意,在上面的第一个解决方案中,函数调用的数量(如输出df中可见的callnum的值所指示的)等于单元格的数量,而在第二个解决方案中,对于每个短语列,函数仅被调用一次(如输出中的vect_callnum的值所指示的)。

js5cn81o

js5cn81o3#

如果您需要矢量化方法,一个可能的解决方案是:

def f(x):
    return x

df.iloc[:,1:] = f(df.iloc[:,1:].values)

相关问题