将多个标头拆分为csv文件

a0x5cqrl  于 2023-05-26  发布在  其他
关注(0)|答案(1)|浏览(125)

我有一个很大的CSV文件。在csv文件中有几个标题如下。

"Level and Distortion",,,,,,,,,,,,,,,
"Ch1 (F)",,"Ch1 (H2)",,"Ch1 (H3)",,"Ch1 (Total)",,"Ch2 (F)",,"Ch2 (H2)",,"Ch2 (H3)",,"Ch2 (Total)",
X,Y,X,Y,X,Y,X,Y,X,Y,X,Y,X,Y,X,Y
Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms
20,0.00772013164376534,20,5.60982648239952E-05,20,0.000389709733151927,20,0.011492581958802,20,0.00699792689186063,20,0.000151471712877565,20,0.000389940899485093,20,0.010080448380793
21.1179638886716,0.00747175133180212,21.1179638886716,8.83327496082501E-05,21.1179638886716,0.000426696028852445,21.1179638886716,0.0122462876404656,21.1179638886716,0.00756340531214287,21.1179638886716,0.000181697169530165,21.1179638886716,0.000443499862648762,21.1179638886716,0.0108494276048029
"THD Ratio",,,,,,,,,,,,,,,
Ch1,,Ch2,,,,,,,,,,,,,
X,Y,X,Y,,,,,,,,,,,,
Hz,%,Hz,%,,,,,,,,,,,,
20,83.009797319554,20,82.1460991930652,,,,,,,,,,,,
21.1179638886716,85.3656629417084,21.1179638886716,82.0338466400102,,,,,,,,,,,,
22.2984199401618,90.6674826441566,22.2984199401618,85.7190774666039,,,,,,,,,,,,
"Reference Waveform",,,,,,,,,,,,,,,
Ch1,,Ch2,,,,,,,,,,,,,
X,Y,X,Y,,,,,,,,,,,,
s,V,s,V,,,,,,,,,,,,
0,0,0,0,,,,,,,,,,,,
2.08333333333333E-05,6.47890208369956E-08,2.08333333333333E-05,6.47890208369956E-08,,,,,,,,,,,,
4.16666666666667E-05,5.18304721721536E-07,4.16666666666667E-05,5.18304721721536E-07,,,,,,,,,,,,
6.25E-05,1.74923655865586E-06,6.25E-05,1.74923655865586E-06,,,,,,,,,,,,

Level and DistortionTHD RatioReference Waveform是我喜欢分开的测量类型。我必须在上面的csv中分离一些标题。例如,当我调用变量a时,它显示Levels and Distortion dataframe及其数据,如果我们调用变量b,它显示THD ratio dataframe及其数据。有谁知道怎么把它们分开。
dataframe如下所示。

并且也适用于THD ratioReference Waveform

gojuced7

gojuced71#

假设总是有三个标题行,上面将是dataframe的标题(在第一列中),然后是后面的值-并且您知道标题的名称-您可以使用以下内容:

import pandas as pd
df = pd.read_csv("so_code.csv", header=None)

# find header rows
df_titles = ["Level and Distortion", "THD Ratio", "Reference Waveform"]
# create groups for each section
groupings = df.iloc[:, 0].str.contains("|".join(df_titles)).cumsum()

# split into new dataframes as dictionary
d = {}
for i, j in df.groupby(groupings):
    # define name of dictionary key as title, and set data of DF as values
    d[j.iloc[0, 0]] = pd.DataFrame(data=j.values[4:, :],
                                   # create MultiIndex from 3 header rows
                                   columns=pd.MultiIndex.from_arrays(
                                       j.iloc[1:4, :].ffill(axis=1).values))
    # suggested not to use, but you can set the variables directly (outside of the dictionary)
    globals()[j.iloc[0, 0]] = pd.DataFrame(data=j.values[4:, :],
                                           columns=pd.MultiIndex.from_arrays(
                                               j.iloc[1:4, :].ffill(axis=1).values))

globals()可用于查找所有已定义的变量,globals()["a"] = 1将定义一个变量a = 1。建议不要使用它,而是显式地定义所有内容,但如果你真的不想使用字典,那么这就是你设置变量的方式。
如果标题事先不知道,并且如果标题行的第一列之后的所有列始终为“NA”,则可以使用以下命令查找标题:

df_titles = df.loc[df.iloc[:, 1:].isna().all(axis=1), 0].to_list()

# split into parts:
# return boolean for whether all columns (2nd to last) contain "NA" for each row
df.iloc[:, 1:].isna().all(axis=1)
# return values in column `0` where above is `True`
df.loc[df.iloc[:, 1:].isna().all(axis=1), 0]
# Add `.to_list()` to convert to a list of values

相关问题