一种更有效的方法来比较CSV列并添加到第二个CSV上的某一行(Python)[重复]

qaxu7uf2  于 2023-05-20  发布在  Python
关注(0)|答案(1)|浏览(133)

此问题已在此处有答案

how to merge two data frames based on particular column in pandas python?(3个答案)
7天前关闭
所以我是python的新手,我想发现它的潜力,并获得更多关于我能用它做什么的知识。我做了这个代码来比较CSV,基本上它做了什么,你提供它2个CSV,CSV 1有一些id列和一个列的值,你想添加到其他CSV(CSV 2)
注意:这个脚本做的正是我想要的,而且看起来工作的很好,希望它也能对某些人有用,我的问题是我能做些什么来提高它的性能,甚至使代码更干净

# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to Compare values and add it to a list
    columnToCompare = list([row.split(',')[0].strip() for row in check_file])

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to add to a row values and add it to a list
    columnToAdd = list([row.split(',')[2].strip() for row in check_file])

with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
    i = 0
    # For each Row in CSV2
    for line in in_file:
        # Write Headers
        if i == 0:
            out_file.write(line)
        else:
            # GET Column on CSV2 containing value that will be compared on CVS1
            value = line.split(',')[1].strip()
            # Check if first Column value on CSV2  either variable is in 
            if value in columnToCompare:
                # Check for duplicates in the list 
                numberOfOccurences = list(columnToCompare).count(value)
                concatRow = ""
                if numberOfOccurences > 1:
                    # Concatenate all values of occurences
                    for x in range(numberOfOccurences):
                        index = list(columnToCompare).index(value)
                        concatRow = concatRow + columnToAdd[index]
                        if x != numberOfOccurences - 1:
                            concatRow = concatRow + " + "
                        # Remove value so list.index doesn't found same row
                        columnToCompare[index] = ""
                else:
                    # Add other row that doesn't match
                    index = list(columnToCompare).index(value)
                    concatRow = columnToAdd[index]

                # Concat to last column of CSV2
                out_file.write(line.strip() + concatRow + "\n")
            else:
                # Still concat value in CSV2 to last column if not found in csv1 
                out_file.write(line.strip() + "not found" + "\n")
        i = i + 1

我知道它可以改进,也许可以用一些库缩小。。让我知道你的想法!
我尝试使用pd merge,但我不太明白如何在其中添加串联和值。

zhte4eai

zhte4eai1#

您可以使用Pandas库将两个CSV文件读入 Dataframe ,并将两列合并到第二个CSV中,然后输出一个新的CSV和合并的列。

import pandas as pd

# read first CSV
df1 = pd.read_csv('first.csv')

# read second CSV
df2 = pd.read_csv('second.csv')

# merge the id column and a "column with values 
# that you want to add to other CSV (CSV2)"
# for the example the second column is named 'data'.
merged_df = pd.merge(df2, df1[['id', 'data']], on='id', how='left')

# save new dataframe to csv.
merged_df.to_csv('merged.csv', index=False)

相关问题