csv 如何在Python中遍历两个列?

ffscu2ro  于 2023-05-04  发布在  Python
关注(0)|答案(3)|浏览(179)

我试图迭代通过两列在csv文件使用python?,我听说你必须为此导入Pandas,但我只是在编码部分挣扎。

import csv as csv
import numpy as np
import pandas as pd

csv_file_object = csv.reader(open('train.csv', 'rb'))  # Load in the csv file
header = csv_file_object.next()                   # Skip the fist line as it is a header
data=[]                                     # Create a variable to hold the data

for row in csv_file_object:                      # Skip through each row in the csv file,
    data.append(row[0:])                        # adding each row to the data variable
data = np.array(data)   


def number_of_female_in_class_3(data):
    for row in data.iterow:
        if row[2] == 'female' and row[4] == '3':
            sum += 1

问题是函数number_of_female_in_class_3我想遍历两列,我想遍历第2列以检查行是否包含字符串'female',并遍历第4列并检查状态是否为'3'。如果为真,那么我想递增1以sum
我想知道是否有人可以发布一个简单的代码如何实现这一点?
这是火车.csv文件IM试图检索.

**PassengerID** | **Survived** | **Pclass**   | **Name**  |  **Sex**   |
          1     |          0   |         3    |  mary     |  Female    |
          2     |          1   |         2    |  james    |  Male      |
          3     |          1   |         3    |  Tanya    |  Female    |

谢谢你

lsmepo6l

lsmepo6l1#

实际上,pandas可以在这里帮助您。
我从一个更干净的CSV开始:

PassengerID,Survived,Pclass,Name,Sex
1,0,3,mary,female
2,1,2,james,male
3,1,3,tanya,female

如果您的CSV实际上看起来像您发布的内容(不是真正的CSV),那么您将有一些争论要做(见下文)。但是如果你能让pandas吃掉它:

>>> import pandas as pd
>>> df = pd.DataFrame.from_csv('data.csv')
>>> result = df[(df.Sex=='female') & (df.Survived==False)]

生成新的DataFrame

>>> result
             Survived  Pclass  Name     Sex
PassengerID                                
1                   0       3  mary  female

您可以执行len(result)以获得所需的计数。

加载该CSV

如果你被这个讨厌的CSV卡住了,你可以像这样得到你的df

# Load using a different delimiter.
df = pd.DataFrame.from_csv('data.csv', sep="|")

# Rename the index.
df.index.names = ['PassID']

# Rename the columns, using X for the bogus one.
df.columns = ['Survived', 'Pclass', 'Name', 'Sex', 'X']

# Remove the 'extra' column.
del df['X']
8ulbf1ek

8ulbf1ek2#

我想这就是你需要的:

import csv

def number_of_female_in_class_3(data):
    # initialize sum variable
    sum = 0
    for row in data:
        if row[4] == 'Female' and row[2] == '3':
            # match
            sum += 1
    # return the result
    return sum

# Load in the csv file
csv_file_object = csv.reader(open('train.csv', 'rb'), delimiter='|')
# skip the header
header = csv_file_object.next()
data = []

for row in csv_file_object:
    # add each row of data to the data list, stripping excess whitespace
    data.append(map(str.strip, row))

# print the result
print number_of_female_in_class_3(data)

一些解释:
首先,在你的文件中,你有一个大写的F女性,其次,你有你的列数字向后(性别在第5列,类在第3列)你需要初始化的总和变量为0之前,你开始递增它。这里不需要numpy和pandas,但需要对每行中的每个元素应用strip函数以删除多余的空格(map(str.strip, row)),并将delimiter='|'传递到csv.reader,因为默认分隔符是逗号。最后,你需要在函数的最后加上return sum

rpppsulh

rpppsulh3#

将pandas导入为pd

将CSV文件1加载到DataFrame

df1 =pd.read_csv(“file1.csv”)

将CSV文件2加载到DataFrame

df2 =pd.read_csv(“file2.csv”)

遍历df1中的每一行

for i,row1 in df1.iterrows():

# Check if there is a matching row in df2 where col1+col2 equals row1['col1']+row1['col2']
match = df2[(df2['col1']+df2['col2']) == (row1['col1']+row1['col2'])]

# Check if there is a matching row in df2 where col1+col3 equals row1['col1']+row1['col3']
match2 = df2[(df2['col1']+df2['col3']) == (row1['col1']+row1['col3'])]

# Flag the matching rows in df1
if not match.empty:
    df1.loc[i, 'flag'] = 'match'
elif not match2.empty:
    df1.loc[i, 'flag'] = 'match2'
else:
    df1.loc[i, 'flag'] = 'no match'

将更新后的df1保存到新的CSV文件中

df1.to_csv(“file1_flagged.csv”,index=False)

相关问题