pandas 基于DataFrame中的现有列创建新列时出现问题- PerformanceWarning:DataFrame高度碎片化

iqxoj9l9  于 2023-04-19  发布在  其他
关注(0)|答案(1)|浏览(312)

我想在一个pandas dataframe中创建X个新的列,基于dataframe的一个现有列。我想创建新的列,每次将原始列中的值移动1。
我为此编写了以下代码:

import pandas as pd

x = range(1,10000)
df = pd.DataFrame({'QObs':x})

for i in range(1,120):
    nameQ = 'QObs' + str(i)
    df[nameQ] = df['QObs'].shift(i)

然而,我得到了以下消息:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead.  To get a de-fragmented frame, use `newframe = frame.copy()`
  df[nameQ] = df['QObs'].shift(i)

我试过使用pd.concat和pd.join,但我遇到了类似的问题:

df_new = pd.DataFrame()
for i in range(1,120):
    nameQ = 'QObs' + str(i)
    df_new[nameQ] = df['QObs'].shift(i)
    df = pd.concat([df,df_new], axis=1)

这个版本需要更长的时间来运行。
非常感谢您的帮助!

lokaqttq

lokaqttq1#

先建立你的清单,然后在最后进行一次连接:

qobs = [df['QObs'].shift(i).rename(f'QObs{i}') for i in range(1, 120)]
out = pd.concat([df['QObs'], *qobs], axis=1)

输出:

>>> out
     QObs  QObs1  QObs2  QObs3  QObs4  QObs5  ...  QObs114  QObs115  QObs116  QObs117  QObs118  QObs119
0      23    NaN    NaN    NaN    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
1      89   23.0    NaN    NaN    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
2      40   89.0   23.0    NaN    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
3      60   40.0   89.0   23.0    NaN    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
4      30   60.0   40.0   89.0   23.0    NaN  ...      NaN      NaN      NaN      NaN      NaN      NaN
..    ...    ...    ...    ...    ...    ...  ...      ...      ...      ...      ...      ...      ...
195    74   94.0   77.0    1.0   68.0    6.0  ...     28.0      7.0     19.0     74.0     46.0     46.0
196     2   74.0   94.0   77.0    1.0   68.0  ...     50.0     28.0      7.0     19.0     74.0     46.0
197    71    2.0   74.0   94.0   77.0    1.0  ...     77.0     50.0     28.0      7.0     19.0     74.0
198    52   71.0    2.0   74.0   94.0   77.0  ...     94.0     77.0     50.0     28.0      7.0     19.0
199    48   52.0   71.0    2.0   74.0   94.0  ...     69.0     94.0     77.0     50.0     28.0      7.0

[200 rows x 120 columns]

相关问题