管理JSON中的换行符和缩进

qyzbxkaa  于 2023-01-27  发布在  其他
关注(0)|答案(1)|浏览(166)

我正在写一些Python代码(从ConLL-U format文件中提取数据),我希望我的数据存储在.json文件中,我希望实现如下的输出格式(x是键,y是值):

{
    "lemma": {"x": "y","x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x":  "" },
    "lemma1":{"x": "y", "x": [{"x":"y"}], "x": "y", "x": [{"x":"y"}], "x":  "y" }...
}

我的代码的最后一部分(可能效率很低,但现在我只对json输出的格式感兴趣):

token_info= {}

...

sentences = []
tokens = []
idn_dep_dict = {}

for line in lines:
    if line == '\n': 
        sentences.append(tokens)
        tokens = [] 
    else:
        fields = line.strip().split('\t') 
            if len(fields) >= 1:
               if fields[0].isdigit(): 
                     idn = fields[0] 
                     lemma = fields[1]
                     upos = fields[3]
                     xpos = fields[4]
                     feats = fields[5]
                     dep = fields[6]
                
                     pos_pair = (upos,xpos)
                     tokens.append((idn, lemma, pos_pair,feats,dep))
                     idn_dep_dict[idn]=[dep]                                 
                else:
                   continue

for sentence in sentences:
    dependencies_dict = {} #dictionary for the dependencies of the current sentence
    for token in sentence:
        idn, lemma, pos_pair, feats, dep = token 
        if dep == '0':
            dependencies_dict[idn] = 'root'
        if dep in idn_dep_dict:
            for head_token in sentence: 
                if head_token[0] == dep: 
                    dependencies_dict[idn] = head_token[2] 

        # Create a dictionary for the current token's information
        current_token = {'x1': [upos], 'x2': [{'0': pos_pair}],'x3': [{'0': dependencies_dict[idn]}],'x4': feats}
        token_info[lemma] = current_token
        
# Write the JSON data to a file
with open('token_info.json', 'w', encoding='utf-8') as f:
    json.dump(token_info, f, ensure_ascii=False, indent = 2, separators=(',', ': '))

当前代码在json文件中的每个[,]{,}或逗号后面生成一个换行符。我希望每行都有每个lemma = {corrisponding dictionary}。可以吗?提前感谢大家

ubby3x7f

ubby3x7f1#

手动序列化字典结构的一个级别,如下所示。

import json
token_info = json.loads('''
{
    "lemma": {"x": "y","x2": [{"x":"y"}], "x3": "y", "x4": [{"x":"y"}], "x5":  "" },
    "lemma1":{"x": "y", "x2": [{"x":"y"}], "x3": "y", "x4": [{"x":"y"}], "x5":  "y" }
}
''')

lines = []
for k, v in token_info.items():
    ks = json.dumps(k, ensure_ascii=False)
    vs = json.dumps(v, ensure_ascii=False, separators=(',', ': '))
    lines.append(ks + ': ' + vs)
src = '{\n    ' + (',\n    '.join(lines)) + '\n}'
print(src)

这将输出以下内容。

{
    "lemma": {"x": "y","x2": [{"x": "y"}],"x3": "y","x4": [{"x": "y"}],"x5": ""},
    "lemma1": {"x": "y","x2": [{"x": "y"}],"x3": "y","x4": [{"x": "y"}],"x5": "y"}
}

相关问题