一种更有效的方法来比较CSV列并添加到第二个CSV上的某一行(Python)[重复]

qaxu7uf2 于 2023-05-20 发布在 Python

关注(0)|答案(1)|浏览(134)

此问题已在此处有答案：

how to merge two data frames based on particular column in pandas python?（3个答案）
7天前关闭
所以我是python的新手，我想发现它的潜力，并获得更多关于我能用它做什么的知识。我做了这个代码来比较CSV，基本上它做了什么，你提供它2个CSV，CSV 1有一些id列和一个列的值，你想添加到其他CSV（CSV 2）
注意：这个脚本做的正是我想要的，而且看起来工作的很好，希望它也能对某些人有用，我的问题是我能做些什么来提高它的性能，甚至使代码更干净

# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to Compare values and add it to a list
    columnToCompare = list([row.split(',')[0].strip() for row in check_file])

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to add to a row values and add it to a list
    columnToAdd = list([row.split(',')[2].strip() for row in check_file])

with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
    i = 0
    # For each Row in CSV2
    for line in in_file:
        # Write Headers
        if i == 0:
            out_file.write(line)
        else:
            # GET Column on CSV2 containing value that will be compared on CVS1
            value = line.split(',')[1].strip()
            # Check if first Column value on CSV2  either variable is in 
            if value in columnToCompare:
                # Check for duplicates in the list 
                numberOfOccurences = list(columnToCompare).count(value)
                concatRow = ""
                if numberOfOccurences > 1:
                    # Concatenate all values of occurences
                    for x in range(numberOfOccurences):
                        index = list(columnToCompare).index(value)
                        concatRow = concatRow + columnToAdd[index]
                        if x != numberOfOccurences - 1:
                            concatRow = concatRow + " + "
                        # Remove value so list.index doesn't found same row
                        columnToCompare[index] = ""
                else:
                    # Add other row that doesn't match
                    index = list(columnToCompare).index(value)
                    concatRow = columnToAdd[index]

                # Concat to last column of CSV2
                out_file.write(line.strip() + concatRow + "\n")
            else:
                # Still concat value in CSV2 to last column if not found in csv1 
                out_file.write(line.strip() + "not found" + "\n")
        i = i + 1

我知道它可以改进，也许可以用一些库缩小。。让我知道你的想法！
我尝试使用pd merge，但我不太明白如何在其中添加串联和值。

csv

来源：https://stackoverflow.com/questions/76236782/a-more-efficient-way-to-compare-csv-columns-and-adding-to-a-certain-row-on-secon

1条答案

按热度按时间

zhte4eai1#

您可以使用Pandas库将两个CSV文件读入 Dataframe ，并将两列合并到第二个CSV中，然后输出一个新的CSV和合并的列。

import pandas as pd

# read first CSV
df1 = pd.read_csv('first.csv')

# read second CSV
df2 = pd.read_csv('second.csv')

# merge the id column and a "column with values 
# that you want to add to other CSV (CSV2)"
# for the example the second column is named 'data'.
merged_df = pd.merge(df2, df1[['id', 'data']], on='id', how='left')

# save new dataframe to csv.
merged_df.to_csv('merged.csv', index=False)

赞(0）回复(0）举报 2023-05-20

我来回答

一种更有效的方法来比较CSV列并添加到第二个CSV上的某一行(Python)[重复]

1条答案

相关问题

热门标签

最新问答