在for循环中合并pandas Dataframe 而不添加列后缀?

l7mqbcuq  于 2023-04-04  发布在  其他
关注(0)|答案(1)|浏览(156)

我正在尝试编写一个函数来收集传感器数据,我已经 * 几乎 * 让它工作-所有数据似乎都在表中结束,但Python添加了后缀(即_x)添加到最后一个合并列中的列名。我希望,比如说,所有 Bx 的值都在一个列中结束,这样我就可以通过工具将它们分开-我下一个目标是将其转换为多索引 Dataframe 。

代码:

import os
import pandas as pd

# For pulling data from CDAweb:
from ai import cdas
import datetime

def magdf(
    start = datetime.datetime(2016, 1, 24, 0, 0, 0), 
    end = datetime.datetime(2016, 1, 25, 0, 0, 0), 
    maglist_a = ['upn', 'umq', 'gdh', 'atu', 'skt', 'ghb'],  # Arctic magnetometers
    maglist_b = ['pg0', 'pg1', 'pg2', 'pg3', 'pg4', 'pg5'],  # Antarctic magnetometers
    is_saved = False, 
):
        # Magnetometer parameter dict so that we don't have to type the full string: 
        d = {'Bx':'MAGNETIC_NORTH_-_H', 'By':'MAGNETIC_EAST_-_E','Bz':'VERTICAL_DOWN_-_Z'}
        d_i = dict((v, k) for k, v in d.items()) # inverted mapping for col renaming later
        if is_saved:
            fname = 'output/' +str(start) + '_' + '.csv'
            if os.path.exists(fname):
                print('Looks like ' + fname + ' has already been generated.')
                return 
        UT = pd.date_range(start, end, freq ='S')    # preallocate time range
        full_df = pd.DataFrame(UT, columns=['UT'])   # preallocate dataframe
        full_df['UT'] = full_df['UT'].astype('datetime64[s]') # enforce 1s precision
        full_df['Magnetometer'] = ""
        for mags in [maglist_a, maglist_b]:
            for idx, magname in enumerate(mags):   # For each magnetometer, pull data and merge into full_df:
                print('Pulling data for magnetometer: ' + magname.upper())
                try:                
                    data = cdas.get_data(
                        'sp_phys',
                        'THG_L2_MAG_'+ magname.upper(),
                        start,
                        end,
                        ['thg_mag_'+ magname]
                    )
                    data['UT'] = pd.to_datetime(data['UT'])# unit='s')
                    df = pd.DataFrame(data)
                    df.rename(columns=d_i, inplace=True)    # mnemonic column names
                    # df['UT_1'] = [] # discard superfluous column
                    
                    # # multi index attempt 1: 
                    df['Magnetometer'] = magname.upper()
                    df.set_index('Magnetometer', append=True, inplace=True)
                    
                    # merge dataframes
                    full_df = full_df.merge(df, on=['UT', 'Magnetometer'], how='outer')
                    
                    print(df)
                except Exception as e:
                    print(e)
                    continue
        full_df['UT'] = full_df['UT'].astype('datetime64[s]') # enforce 1s precision
        if is_saved:
            print('Saving as a CSV.')
            full_df.to_csv(fname)
        # print(full_df)
        return full_df # TODO: Figure out how to suppress output here
        
foo = magdf(maglist_a = ['upn'], maglist_b = ['pg1'])
foo

**输出:**单个数据框看起来不错,但合并后的数据框有后缀:

Pulling data for magnetometer: UPN
100% [................................................] 141200 / 141200
                                   UT          UT_1       Bx       By       Bz
     Magnetometer                                                             
0    UPN          2016-01-24 00:00:00  1.453590e+09  7841.68  320.623  55419.2
1    UPN          2016-01-24 00:01:00  1.453590e+09  7844.55  314.371  55418.7
2    UPN          2016-01-24 00:02:00  1.453590e+09  7844.76  306.730  55418.5
3    UPN          2016-01-24 00:03:00  1.453590e+09  7845.35  294.430  55418.3
4    UPN          2016-01-24 00:04:00  1.453590e+09  7844.61  291.630  55420.9
...                               ...           ...      ...      ...      ...
1436 UPN          2016-01-24 23:56:00  1.453680e+09  7833.95  336.020  55406.7
1437 UPN          2016-01-24 23:57:00  1.453680e+09  7832.59  336.220  55407.0
1438 UPN          2016-01-24 23:58:00  1.453680e+09  7831.56  336.523  55407.2
1439 UPN          2016-01-24 23:59:00  1.453680e+09  7830.79  336.817  55407.5
1440 UPN          2016-01-25 00:00:00  1.453680e+09  7831.16  336.085  55407.5

[1441 rows x 5 columns]
Pulling data for magnetometer: PG1
100% [..............................................] 8211858 / 8211858
                                        UT          UT_1     Bx      By     Bz
      Magnetometer                                                            
0     PG1          2016-01-24 00:00:00.001  1.453590e+09  10.25 -570.86  61.98
1     PG1          2016-01-24 00:00:01.001  1.453590e+09  10.50 -571.11  61.98
2     PG1          2016-01-24 00:00:02.001  1.453590e+09  10.75 -571.11  61.98
3     PG1          2016-01-24 00:00:03.001  1.453590e+09  11.00 -571.24  61.98
4     PG1          2016-01-24 00:00:04.001  1.453590e+09  11.13 -571.36  61.85
...                                    ...           ...    ...     ...    ...
86395 PG1          2016-01-24 23:59:55.001  1.453680e+09  -9.61 -624.49  77.16
86396 PG1          2016-01-24 23:59:56.001  1.453680e+09  -9.87 -624.49  77.29
86397 PG1          2016-01-24 23:59:57.001  1.453680e+09  -9.87 -624.36  77.29
86398 PG1          2016-01-24 23:59:58.001  1.453680e+09  -9.87 -624.36  77.29
86399 PG1          2016-01-24 23:59:59.001  1.453680e+09  -9.99 -624.36  77.29

[86400 rows x 5 columns]
UT  Magnetometer    UT_1_x  Bx_x    By_x    Bz_x    UT_1_y  Bx_y    By_y    Bz_y
0   2016-01-24 00:00:00     NaN NaN NaN NaN NaN NaN NaN NaN
1   2016-01-24 00:00:01     NaN NaN NaN NaN NaN NaN NaN NaN
2   2016-01-24 00:00:02     NaN NaN NaN NaN NaN NaN NaN NaN
3   2016-01-24 00:00:03     NaN NaN NaN NaN NaN NaN NaN NaN
4   2016-01-24 00:00:04     NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
174237  2016-01-24 23:59:55 PG1 NaN NaN NaN NaN 1.453680e+09    -9.61   -624.49 77.16
174238  2016-01-24 23:59:56 PG1 NaN NaN NaN NaN 1.453680e+09    -9.87   -624.49 77.29
174239  2016-01-24 23:59:57 PG1 NaN NaN NaN NaN 1.453680e+09    -9.87   -624.36 77.29
174240  2016-01-24 23:59:58 PG1 NaN NaN NaN NaN 1.453680e+09    -9.87   -624.36 77.29
174241  2016-01-24 23:59:59 PG1 NaN NaN NaN NaN 1.453680e+09    -9.99   -624.36 77.29
174242 rows × 10 columns

**注:**我认为问题出在我合并 Dataframe 的那一行:我不知道我应该做什么不同的,虽然.任何帮助是感激!

eyh26e7m

eyh26e7m1#

成功了!删除了df.set_index('Magnetometer', append=True, inplace=True)行,并将合并行替换为full_df = pd.concat([full_df, df])

相关问题