如何使用python将csv文件中的列内容拆分为不同的列?

biswetbf  于 2022-12-27  发布在  Python
关注(0)|答案(1)|浏览(312)

我有一个CSV文件,其中包含机器学习模型的输出。理想情况下,它应该有三列(源、关系类型、目标)。当提取输出时,我的输出将存储为n行单元格的单个内容。我不需要实体,我需要单独列中的关系内容。
我附上了我的输出和我的预期输出。
有人能指导我使用python将单元格的内容提取到不同的列中吗?

{'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}], 'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, {'source': 'AOL', 'target': 'WarnerMedia', 'type': O 'subsidiary'}]}
{'entities': [{'title': 'Europe', 'wikild': 'Q46', 'label': 'Location'}, {'title': 'London', 'wikild': 'Q84', 'label': 'Organization'}, {'title': 'Federal Reserve', 'wikild': 'Q53536', 'label': 'Organization'}, {'title': 'United States', 'wikild': 'Q30', 'label': 'Organization'}, {'title': 'Federal government of the United States', 'wikild': 'Q48525', 'label': 'Organization'}, {'title': 'Bank of America', 'wikild': 'Q487907', 'label': 'Organization'}, {'title': 'Group of Seven', 'wikild': 'Q1764511', 'label': 'Organization'}, {'title': 'United States dollar', 'wikild': 'Q4917', 'label': 'Organization'}, {'title': 'New York (state)', 'wikild': 'Q1384', 'label': 'Organization'}, {'title': 'Alan Greenspan', 'wikild': 'Q193635', 'label': 'Person'}, {'title': 'Euro', 'wikild': 'Q4916', 'label': 'Organization'}, {'title': 'Germany', 'wikild': 'Q183', 'label': 'Organization'}], 'relations': [{'source': 'Federal Reserve', 'target': 'London', 'type': 'headquarters location'}, {'source': 'Bank of America', 'target': 'New York (state)', 'type': 'headquarters location'}, {'source': 'London', 'target': 'Federal Reserve', 'type': 'headquarters location'}, {'source': 'New York (state)', 1 'target': 'Bank of America', 'type': 'headquarters location'}]}

预期输出应为:

pbwdgjma

pbwdgjma1#

这就是您所需要的吗?您没有提到第二个字典的用途,因为示例输出只引用第一个字典。

inp = {'entities': [{'title': 'WarnerMedia', 'wikild': 'Q191715', 'label': 'Organization'}, 
                    {'title': 'Time (magazine)', 'wikild': 'Q43297', 'label': 'Organization'}, 
                    {'title': 'AOL', 'wikild': 'Q27585', 'label': 'Organization'}, 
                    {'title': 'Google', 'wikild': 'Q95', 'label': 'Organization'}, 
                    {'title': 'Warner Bros.', 'wikild': 'Q126399', 'label': 'Organization'}, 
                    {'title': 'U.S. Securities and Exchange Commission', 'wikild': 'Q953944', 'label': 'Organization'}
                   ], 
       'relations': [{'source': 'Time (magazine)', 'target': 'WarnerMedia', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'Time (magazine)', 'type': 'owned by'}, 
                     {'source': 'WarnerMedia', 'target': 'U.S. Securities and Exchange Commission', 'type': 'subsidiary'}, 
                     {'source': 'U.S. Securities and Exchange Commission', 'target': 'WarnerMedia', 'type': 'subsidiary'}, 
                     {'source': 'WarnerMedia', 'target': 'AOL', 'type': 'subsidiary'}, 
                     {'source': 'AOL', 'target': 'WarnerMedia', 'type': 'subsidiary'}
                    ]
      }

df = pd.DataFrame(inp['relations'])       #Simply conversion to dataframe
output = df[['source','type','target']]   #Reordering columns
output

相关问题