在python中提取文件的内容作为变量

yws3nbqq  于 2023-04-04  发布在  Python
关注(0)|答案(3)|浏览(139)

我在Linux中有一个类似下面的文件。
file_namebatch_file.txt
sub_directorycode_base/workflow_1
script_namecode_base/workflow_1/session_1.py
batch_file.txt内容为:

1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

我想读取session_1.py文件中batch_file.txt的内容,并基于file_namesub_directory创建变量。变量如下:

batch_id = number before 1st #
workflow_id = number between 1st and 2nd #
workflow_name = number between 2nd and 3rd #    
session_id = number between 3rd and 4th #   
session_name = number between 4th and 5th #
run_date = number between 5th and 6th # 
flow_name = number between 6th and 7th #    
flow_id = number after 7th #

我有这个:

batch_content = open('batch_file.txt', 'r')
batch_content.readlines()

但我不知道如何进一步进行?

sg24os4d

sg24os4d1#

如果你想要在运行时命名的变量,你可以这样做,但不应该。
相反,我会使用字典列表。

[
  {'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2], 
   'session_name': x[3], 'session_id': x[4], 'run_date': x[5], 
   'flow_name': x[6], 'flow_id': x[7]}
  for line in text.splitlines()
  for x in (line.split('#'),)
]

结果:

[
  {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
]
5t7ly7z5

5t7ly7z52#

使用csv模块将数据读入字典(或者可选地使用pandas读入数据框)。
例如:

import csv 

with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(line)
{'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_id': '1', 'session_name': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
{'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_id': '2', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
{'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_id': '3', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}

对于从文件中读取的每一行,你都会得到一个字典,字典中的“变量”名作为键,文件内容作为值。有了这个字典,你可以做任何你想做的事情。
例如:
一个二个一个一个
如果您需要更多地处理这些数据或执行任何数据转换,pandas可能更合适。
一个愚蠢的例子:

import pandas as pd

df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')

df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)

display(df['message'])
0     The workflow name of workflow id 1 is workflow_1
1     The workflow name of workflow id 2 is workflow_2
2    The workflow name of workflow id 3 is workflow...
8yparm6h

8yparm6h3#

您可以使用拆分来实现您的输出

with open("batch_details.txt") as search:
    for line in search:
        line = line.rstrip() # remove '\n' at end of line
        if 'workflow_1' and 'session_1' in line:
                batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')

print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)

相关问题