我有一个很大的CSV文件。在csv文件中有几个头文件。我已经分开了三个标题使用代码如下。
import pandas as pd
df = pd.read_csv("three measurement.csv", header=None)
# find header rows
df_titles = ["Level and Distortion", "THD Ratio", "Reference Waveform"]
# create groups for each section
groupings = df.iloc[:, 0].str.contains("|".join(df_titles)).cumsum()
# split into new dataframes as dictionary
d = {}
for i, j in df.groupby(groupings):
# define name of dictionary key as title, and set data of DF as values
d[j.iloc[0, 0]] = pd.DataFrame(data=j.values[4:, :],
# create MultiIndex from 3 header rows
columns=pd.MultiIndex.from_arrays(
j.iloc[0:4, :].ffill(axis=1).values))
# suggested not to use, but you can set the variables directly (outside of the dictionary)
globals()[j.iloc[0, 0]] = pd.DataFrame(data=j.values[4:, :],
columns=pd.MultiIndex.from_arrays(
j.iloc[0:4, :].ffill(axis=1).values))
Level and Distortion
、THD Ratio
和Reference Waveform
是我分离的测量类型。但是,测量类型可以改变(增加或减少)如何检测测量类型,而无需像上述代码(df_titles=[".."
)中那样指定测量类型。这意味着在上面的代码中,要找到标题行,必须根据下面显示的csv指定度量类型的名称。如果度量类型不同并且是递增的,我如何找到标题行而不必定义df_title变量。此处CSV文件
"Level and Distortion",,,,,,,,,,,,,,,
"Ch1 (F)",,"Ch1 (H2)",,"Ch1 (H3)",,"Ch1 (Total)",,"Ch2 (F)",,"Ch2 (H2)",,"Ch2 (H3)",,"Ch2 (Total)",
X,Y,X,Y,X,Y,X,Y,X,Y,X,Y,X,Y,X,Y
Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms
20,0.00772013164376534,20,5.60982648239952E-05,20,0.000389709733151927,20,0.011492581958802,20,0.00699792689186063,20,0.000151471712877565,20,0.000389940899485093,20,0.010080448380793
21.1179638886716,0.00747175133180212,21.1179638886716,8.83327496082501E-05,21.1179638886716,0.000426696028852445,21.1179638886716,0.0122462876404656,21.1179638886716,0.00756340531214287,21.1179638886716,0.000181697169530165,21.1179638886716,0.000443499862648762,21.1179638886716,0.0108494276048029
"THD Ratio",,,,,,,,,,,,,,,
Ch1,,Ch2,,,,,,,,,,,,,
X,Y,X,Y,,,,,,,,,,,,
Hz,%,Hz,%,,,,,,,,,,,,
20,83.009797319554,20,82.1460991930652,,,,,,,,,,,,
21.1179638886716,85.3656629417084,21.1179638886716,82.0338466400102,,,,,,,,,,,,
22.2984199401618,90.6674826441566,22.2984199401618,85.7190774666039,,,,,,,,,,,,
"Reference Waveform",,,,,,,,,,,,,,,
Ch1,,Ch2,,,,,,,,,,,,,
X,Y,X,Y,,,,,,,,,,,,
s,V,s,V,,,,,,,,,,,,
0,0,0,0,,,,,,,,,,,,
2.08333333333333E-05,6.47890208369956E-08,2.08333333333333E-05,6.47890208369956E-08,,,,,,,,,,,,
4.16666666666667E-05,5.18304721721536E-07,4.16666666666667E-05,5.18304721721536E-07,,,,,,,,,,,,
6.25E-05,1.74923655865586E-06,6.25E-05,1.74923655865586E-06,,,,,,,,,,,,
实际上在csv文件中有很多行数据。
由于测量类型可以改变(增加或减少),因此有11种测量类型。我希望python能够检测度量类型,而不必像df_titles = ["Level and Distortion", "THD Ratio", "Reference Waveform"]
那样定义它。
谢谢
1条答案
按热度按时间eblbsuwk1#
您可以使用
pd.read_csv
和正确的参数:输出:
连接所有 Dataframe :