我正在尝试编写一个函数来收集传感器数据,我已经 * 几乎 * 让它工作-所有数据似乎都在表中结束,但Python添加了后缀(即_x
)添加到最后一个合并列中的列名。我希望,比如说,所有 Bx 的值都在一个列中结束,这样我就可以通过工具将它们分开-我下一个目标是将其转换为多索引 Dataframe 。
代码:
import os
import pandas as pd
# For pulling data from CDAweb:
from ai import cdas
import datetime
def magdf(
start = datetime.datetime(2016, 1, 24, 0, 0, 0),
end = datetime.datetime(2016, 1, 25, 0, 0, 0),
maglist_a = ['upn', 'umq', 'gdh', 'atu', 'skt', 'ghb'], # Arctic magnetometers
maglist_b = ['pg0', 'pg1', 'pg2', 'pg3', 'pg4', 'pg5'], # Antarctic magnetometers
is_saved = False,
):
# Magnetometer parameter dict so that we don't have to type the full string:
d = {'Bx':'MAGNETIC_NORTH_-_H', 'By':'MAGNETIC_EAST_-_E','Bz':'VERTICAL_DOWN_-_Z'}
d_i = dict((v, k) for k, v in d.items()) # inverted mapping for col renaming later
if is_saved:
fname = 'output/' +str(start) + '_' + '.csv'
if os.path.exists(fname):
print('Looks like ' + fname + ' has already been generated.')
return
UT = pd.date_range(start, end, freq ='S') # preallocate time range
full_df = pd.DataFrame(UT, columns=['UT']) # preallocate dataframe
full_df['UT'] = full_df['UT'].astype('datetime64[s]') # enforce 1s precision
full_df['Magnetometer'] = ""
for mags in [maglist_a, maglist_b]:
for idx, magname in enumerate(mags): # For each magnetometer, pull data and merge into full_df:
print('Pulling data for magnetometer: ' + magname.upper())
try:
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ magname.upper(),
start,
end,
['thg_mag_'+ magname]
)
data['UT'] = pd.to_datetime(data['UT'])# unit='s')
df = pd.DataFrame(data)
df.rename(columns=d_i, inplace=True) # mnemonic column names
# df['UT_1'] = [] # discard superfluous column
# # multi index attempt 1:
df['Magnetometer'] = magname.upper()
df.set_index('Magnetometer', append=True, inplace=True)
# merge dataframes
full_df = full_df.merge(df, on=['UT', 'Magnetometer'], how='outer')
print(df)
except Exception as e:
print(e)
continue
full_df['UT'] = full_df['UT'].astype('datetime64[s]') # enforce 1s precision
if is_saved:
print('Saving as a CSV.')
full_df.to_csv(fname)
# print(full_df)
return full_df # TODO: Figure out how to suppress output here
foo = magdf(maglist_a = ['upn'], maglist_b = ['pg1'])
foo
**输出:**单个数据框看起来不错,但合并后的数据框有后缀:
Pulling data for magnetometer: UPN
100% [................................................] 141200 / 141200
UT UT_1 Bx By Bz
Magnetometer
0 UPN 2016-01-24 00:00:00 1.453590e+09 7841.68 320.623 55419.2
1 UPN 2016-01-24 00:01:00 1.453590e+09 7844.55 314.371 55418.7
2 UPN 2016-01-24 00:02:00 1.453590e+09 7844.76 306.730 55418.5
3 UPN 2016-01-24 00:03:00 1.453590e+09 7845.35 294.430 55418.3
4 UPN 2016-01-24 00:04:00 1.453590e+09 7844.61 291.630 55420.9
... ... ... ... ... ...
1436 UPN 2016-01-24 23:56:00 1.453680e+09 7833.95 336.020 55406.7
1437 UPN 2016-01-24 23:57:00 1.453680e+09 7832.59 336.220 55407.0
1438 UPN 2016-01-24 23:58:00 1.453680e+09 7831.56 336.523 55407.2
1439 UPN 2016-01-24 23:59:00 1.453680e+09 7830.79 336.817 55407.5
1440 UPN 2016-01-25 00:00:00 1.453680e+09 7831.16 336.085 55407.5
[1441 rows x 5 columns]
Pulling data for magnetometer: PG1
100% [..............................................] 8211858 / 8211858
UT UT_1 Bx By Bz
Magnetometer
0 PG1 2016-01-24 00:00:00.001 1.453590e+09 10.25 -570.86 61.98
1 PG1 2016-01-24 00:00:01.001 1.453590e+09 10.50 -571.11 61.98
2 PG1 2016-01-24 00:00:02.001 1.453590e+09 10.75 -571.11 61.98
3 PG1 2016-01-24 00:00:03.001 1.453590e+09 11.00 -571.24 61.98
4 PG1 2016-01-24 00:00:04.001 1.453590e+09 11.13 -571.36 61.85
... ... ... ... ... ...
86395 PG1 2016-01-24 23:59:55.001 1.453680e+09 -9.61 -624.49 77.16
86396 PG1 2016-01-24 23:59:56.001 1.453680e+09 -9.87 -624.49 77.29
86397 PG1 2016-01-24 23:59:57.001 1.453680e+09 -9.87 -624.36 77.29
86398 PG1 2016-01-24 23:59:58.001 1.453680e+09 -9.87 -624.36 77.29
86399 PG1 2016-01-24 23:59:59.001 1.453680e+09 -9.99 -624.36 77.29
[86400 rows x 5 columns]
UT Magnetometer UT_1_x Bx_x By_x Bz_x UT_1_y Bx_y By_y Bz_y
0 2016-01-24 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1 2016-01-24 00:00:01 NaN NaN NaN NaN NaN NaN NaN NaN
2 2016-01-24 00:00:02 NaN NaN NaN NaN NaN NaN NaN NaN
3 2016-01-24 00:00:03 NaN NaN NaN NaN NaN NaN NaN NaN
4 2016-01-24 00:00:04 NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
174237 2016-01-24 23:59:55 PG1 NaN NaN NaN NaN 1.453680e+09 -9.61 -624.49 77.16
174238 2016-01-24 23:59:56 PG1 NaN NaN NaN NaN 1.453680e+09 -9.87 -624.49 77.29
174239 2016-01-24 23:59:57 PG1 NaN NaN NaN NaN 1.453680e+09 -9.87 -624.36 77.29
174240 2016-01-24 23:59:58 PG1 NaN NaN NaN NaN 1.453680e+09 -9.87 -624.36 77.29
174241 2016-01-24 23:59:59 PG1 NaN NaN NaN NaN 1.453680e+09 -9.99 -624.36 77.29
174242 rows × 10 columns
**注:**我认为问题出在我合并 Dataframe 的那一行:我不知道我应该做什么不同的,虽然.任何帮助是感激!
1条答案
按热度按时间eyh26e7m1#
成功了!删除了
df.set_index('Magnetometer', append=True, inplace=True)
行,并将合并行替换为full_df = pd.concat([full_df, df])
。