如何用pandas python在文件csv中将单列更改为行

ybzsozfc  于 2023-05-26  发布在  Python
关注(0)|答案(1)|浏览(168)

我有一个这样的csv文件:

Authors,Title,Link,Author Keywords
"Bennacer S.A., Sabiri K., Aaroud A., Akodadi K., Cherradi B.","A comprehensive survey on blockchain-based healthcare industry: applications and challenges","https://www.scopus.com/inward/record.uri?eid=2-s2.0-85152097785&doi=10.11591%2fijeecs.v30.i3.pp1558-1571&partnerID=40&md5=abddc224666bbc71d4bfe0a69f4d425f%22,%22Blockchain technology; Data management; Data security and privacy; Data sharing; Electronic health record; Healthcare"

我想把它改成这样的csv文件,这样每个作者都有自己的关键词:

author,keyword1,keyword2,keyword3,...
Bennacer S.A,Blockchain technology,Data management, Data security and privacy, Data sharing, Electronic health record, Healthcare
Sabiri K.,Blockchain technology,Data management, Data security and privacy, Data sharing, Electronic health record, Healthcare

等等
我试过这个代码,我只需要删除标题和链接

import pandas as pd

df = pd.read_csv("scopus.csv")

df1 = df\[\["Authors","Author Keywords"\]\]

df1
oyjwcjzk

oyjwcjzk1#

看起来你分享的csv示例可能没有按预期设置。仍然有可能解决这个问题,但如果在“Blockchain”之前和URL之后的某个地方没有缺少一些双引号,可能值得确认(只是一个例子):
作者,标题,链接,作者关键词“Bennacer S.A.,Sabiri K.,Aaroud A.,Akodadi K.,Cherradi B.",“基于区块链的医疗保健行业综合调查:应用和挑战”、“https://www.scopus.com/inward/record.uri?eid=2-s2.0-85152097785&doi=10.11591%2fijeecs.v30.i3.pp1558-1571&partnerID=40&md5=abddc224666bbc71d4bfe0a69f4d425f%22%22”、“区块链技术;数据管理;数据安全和隐私;数据共享;电子健康档案;医疗保健”

如果csv是这样的,那么你可以这样做:

df = pd.read_csv('scopus.csv', usecols=['Authors', 'Author Keywords'])  # Read file

auth_lst = [ele.strip() for ele in df['Authors'].str.split(',')[0]]  # Get list of authors
kwd_lst = [ele.strip() for ele in df['Author Keywords'].str.split(';')[0]]  # Get list of keywords
kwd_dict = {f'keyword{i+1}': kwd_lst[i] for i in range(len(keyword_lst))}  # Create a dictionary with keyword id & value combinations

auth_df = pd.DataFrame({'Author': author_lst})  # Create dataframe with authors
kw_df = pd.DataFrame([kwd_dict])  # Create dataframe with keywords

# Merge author and keyword dataframes
auth_df['dummy'] = 1
kw_df['dummy'] = 1
auth_kwd_df = auth_df.merge(kw_df, on='dummy')
auth_kwd_df.drop(columns=['dummy'], inplace=True)

该解决方案适用于上面的示例。如果你的csv文件的格式有点不同(或者如果你正在迭代许多这样的文件/条目),你需要调整建议的解决方案。

相关问题