此问题已在此处有答案:
how to merge two data frames based on particular column in pandas python?(3个答案)
7天前关闭
所以我是python的新手,我想发现它的潜力,并获得更多关于我能用它做什么的知识。我做了这个代码来比较CSV,基本上它做了什么,你提供它2个CSV,CSV 1有一些id列和一个列的值,你想添加到其他CSV(CSV 2)
注意:这个脚本做的正是我想要的,而且看起来工作的很好,希望它也能对某些人有用,我的问题是我能做些什么来提高它的性能,甚至使代码更干净
# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)
with open('csv1.csv', encoding="utf8") as check_file:
# Get Column that will be used to Compare values and add it to a list
columnToCompare = list([row.split(',')[0].strip() for row in check_file])
with open('csv1.csv', encoding="utf8") as check_file:
# Get Column that will be used to add to a row values and add it to a list
columnToAdd = list([row.split(',')[2].strip() for row in check_file])
with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
i = 0
# For each Row in CSV2
for line in in_file:
# Write Headers
if i == 0:
out_file.write(line)
else:
# GET Column on CSV2 containing value that will be compared on CVS1
value = line.split(',')[1].strip()
# Check if first Column value on CSV2 either variable is in
if value in columnToCompare:
# Check for duplicates in the list
numberOfOccurences = list(columnToCompare).count(value)
concatRow = ""
if numberOfOccurences > 1:
# Concatenate all values of occurences
for x in range(numberOfOccurences):
index = list(columnToCompare).index(value)
concatRow = concatRow + columnToAdd[index]
if x != numberOfOccurences - 1:
concatRow = concatRow + " + "
# Remove value so list.index doesn't found same row
columnToCompare[index] = ""
else:
# Add other row that doesn't match
index = list(columnToCompare).index(value)
concatRow = columnToAdd[index]
# Concat to last column of CSV2
out_file.write(line.strip() + concatRow + "\n")
else:
# Still concat value in CSV2 to last column if not found in csv1
out_file.write(line.strip() + "not found" + "\n")
i = i + 1
我知道它可以改进,也许可以用一些库缩小。。让我知道你的想法!
我尝试使用pd merge,但我不太明白如何在其中添加串联和值。
1条答案
按热度按时间zhte4eai1#
您可以使用Pandas库将两个CSV文件读入 Dataframe ,并将两列合并到第二个CSV中,然后输出一个新的CSV和合并的列。