我必须使用python读取和分析一些日志文件,这些文件通常包含以下所需格式的字符串:
date Don Dez 10 21:41:41.747 2020
base hex timestamps absolute
no internal events logged
// version 13.0.0
//28974.328957 previous log file: 21-41-41_Voltage.asc
// Measurement UUID: 9e0029d6-43a0-49e3-8708-3ec70363124c
28976.463987 LoggingString := "Log,5:45 AM, Friday, December 11, 2020,05:45:20.6,65.48,11.99,0.009843,12,0.01078,11.99,0.01114,11.99,0.01096,12,0.009984,4.595,0,1.035,0,0.1745,0,2,OM_2_1,0"
28978.600018 LoggingString := "Log,5:45 AM, Friday, December 11, 2020,05:45:22.7,65.47,11.99,0.009896,12,0.01079,11.99,0.01117,11.99,0.01097,12,0.009965,4.628,0,1.044,0,0.1698,0,2,OM_2_1,0"
但是,有时会发生创建的文件具有不需要的格式,如以下:
date Die Jul 13 08:40:22.878 2021
base hex timestamps absolute
no internal events logged
// version 13.0.0
//1035.595166 previous log file: 08-40-22_Voltage.asc
// Measurement UUID: 2baf3f3f-300a-4f0a-bcbf-0ba5679d8be2
"1203.997816 LoggingString := ""Log" 9:01 am Tuesday July 13 2021 09:01:58.3 24.53 13.38 0.8948 13.37 0.8801 13.37 0.89 13.37 0.9099 13.47 0.8851 4.551 0.00115 0.8165 0 0.2207 0 5 OM_3_2 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 "0"""
"1206.086064 LoggingString := ""Log" 9:02 am Tuesday July 13 2021 09:02:00.4 24.53 13.37 0.8945 13.37 0.8801 13.37 0.8902 13.37 0.9086 13.46 0.8849 5.142 0.001185 1.033 0 0.1897 0 5 OM_3_2 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 "0"""
或
date Mit Jun 16 10:11:43.493 2021
base hex timestamps absolute
no internal events logged
// version 13.0.0
// Measurement UUID: fe4a6a97-d907-4662-89f9-bd246aa54a33
10025.661597 LoggingString := """""""Log""" 12:59 PM Wednesday June 16 2021 12:59:01.1 66.14 0.00423 0 0.001206 0 0.001339 0 0.001229 0 0.001122 0 0.05017 0 0.01325 0 0.0643 0 0 OM_2_1_transition 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 """0"""""""
10030.592652 LoggingString := """""""Log""" 12:59 PM Wednesday June 16 2021 12:59:06.1 66.14 11.88 0.1447 11.88 0.1444 11.88 0.1442 11.87 0.005552 11.9 0.00404 2.55 0 0.4712 0 0.09924 0 0 OM_2_1_transition 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 """0"""""""
因为我只关心“// Measurement UUID“行下面的数据,所以我使用以下代码从所需格式的字符串中提取数据:
files = os.listdir(directory)
files = natsorted(files)
for file in files:
base, ext = os.path.splitext(file)
if file not in processed_files and ext == '.asc':
print("File added:", file)
file_path = os.path.join(directory, file)
count = 0
with open(file_path, 'r') as file_in:
processed_files.append(file)
Output_list = [] # Each string from file is read into this list
Final = [] # Required specific data from each string is isolated & stored here
for line in map(str.strip, file_in):
if "LoggingString" in line:
first_quote = line.index(
'"') # returns the column number where " first appears in the whole string
last_quote = line.index('"', first_quote + 1)
# returns the column value where " appears last in the whole string ( end of line )
# print(first_quote)
Output_list.append(
line[:first_quote].split(maxsplit=1)
+ line[first_quote + 1: last_quote].split(","),
)
Final.append(Output_list[count][7:27])
不需要的格式包含一个或多个空格之间的每个字符串字符如上所示。我猜这是因为日志文件生成器有时生成一个非逗号分隔的文件或逗号分隔的文件与错误可能,我不确定。
我试着把条件放在后面:
if "LoggingString" in line :
if ',' in line:
first_quote = line.index('"')
last_quote = line.index('"', first_quote + 1)
Output_list.append(line[:first_quote].split(maxsplit=1)
+ line[first_quote + 1: last_quote].split(","),)
Final.append(Output_list[count][7:27])
else:
raise Exception("Error in File")
然而,这并没有达到目的,因为如果在任何其他不需要的格式中,即使字符串中有一个',',程序也会认为它有效并处理它,这会导致错误的结果。
如何确保在处理完包含所需格式字符串的文件后,如果处理了其他文件,则会打印出错误消息?这里可以实现什么类型的条件检查?
1条答案
按热度按时间dbf7pr2w1#
您可以使用带有 * regex * 分隔符的
pandas.read_csv
:输出: