我正在写一些Python代码(从ConLL-U format文件中提取数据),我希望我的数据存储在.json
文件中,我希望实现如下的输出格式(x是键,y是值):
{
"lemma": {"x": "y","x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x": "" },
"lemma1":{"x": "y", "x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x": "y" }...
}
我的代码的最后一部分(可能效率很低,但现在我只对json输出的格式感兴趣):
token_info= {}
...
sentences = []
tokens = []
idn_dep_dict = {}
for line in lines:
if line == '\n':
sentences.append(tokens)
tokens = []
else:
fields = line.strip().split('\t')
if len(fields) >= 1:
if fields[0].isdigit():
idn = fields[0]
lemma = fields[1]
upos = fields[3]
xpos = fields[4]
feats = fields[5]
dep = fields[6]
pos_pair = (upos,xpos)
tokens.append((idn, lemma, pos_pair,feats,dep))
idn_dep_dict[idn]=[dep]
else:
continue
for sentence in sentences:
dependencies_dict = {} #dictionary for the dependencies of the current sentence
for token in sentence:
idn, lemma, pos_pair, feats, dep = token
if dep == '0':
dependencies_dict[idn] = 'root'
if dep in idn_dep_dict:
for head_token in sentence:
if head_token[0] == dep:
dependencies_dict[idn] = head_token[2]
# Create a dictionary for the current token's information
current_token = {'x1': [upos], 'x2': [{'0': pos_pair}],'x3': [{'0': dependencies_dict[idn]}],'x4': feats}
token_info[lemma] = current_token
# Write the JSON data to a file
with open('token_info.json', 'w', encoding='utf-8') as f:
json.dump(token_info, f, ensure_ascii=False, indent = 2, separators=(',', ': '))
当前代码在json文件中的每个[,]
或{,}
或逗号后面生成一个换行符。我希望每行都有每个lemma = {corrisponding dictionary}
。可以吗?提前感谢大家
1条答案
按热度按时间ubby3x7f1#
手动序列化字典结构的一个级别,如下所示。
这将输出以下内容。