基于特定列值删除csv中的行- python不使用panda

wtlkbnrh  于 2023-02-14  发布在  Python
关注(0)|答案(2)|浏览(109)

我有一个很大的csv文件,标题列如下:idtypestatelocationnumber of students
和以下值:

124, preschool, Pennsylvania, Pittsburgh, 1242
421, secondary school, Ohio, Cleveland, 1244
213, primary school, California, Los Angeles, 3213
155, secondary school, Pennsylvania, Pittsburgh, 2141
etc...

该文件没有订购,我想一个新的csv文件,其中包含所有的学校与学生人数超过2000。
我找到的答案是关于有序的csv文件,或者在特定数量的行后将其拆分。

syqv5f0l

syqv5f0l1#

下面是使用csv模块的解决方案:

import csv

with open('fin.csv', 'r') as fin, open('fout.csv', 'w', newline='') as fout:

    # define reader and writer objects
    reader = csv.reader(fin, skipinitialspace=True)
    writer = csv.writer(fout, delimiter=',')

    # write headers
    writer.writerow(next(reader))

    # iterate and write rows based on condition
    for i in reader:
        if int(i[-1]) > 2000:
            writer.writerow(i)

结果:

id,type,state,location,number of students
213,primary school,California,Los Angeles,3213
155,secondary school,Pennsylvania,Pittsburgh,2141
72qzrwbm

72qzrwbm2#

如果您只想读取文件并避免任何其他处理,您可以使用regex -(假设这是最后一列,并且值是正整数)-

import re
f1 = open('Test1.txt','wb')
with open("Test.txt") as f:
    for line in f:
        match = re.search(r'[2-9][0-9]{3,}$', line)
        if (match):
            f1.write(line)

f1.close()

如果你在bash上做同样的事情会快得多-

while read line; do
  K='[2-9][0-9]{3,}$'
  if [[ $line =~ $K ]] ; then echo $line; fi
done <Test.txt

相关问题