使用从另一个CSV文件中选择的列创建新的CSV文件

0s0u357o  于 2023-09-28  发布在  其他
关注(0)|答案(4)|浏览(102)

我试图创建一个脚本,从CSV文件中选择一些列,并将它们保存到另一个文件中(理想情况下指定列标题)。这是我开始的查询,它将复制所有列。如何将其更改为仅复制其中的一部分?

# importing openpyxl module
import openpyxl as xl;
  
# opening the source excel file
filename ="C:\\Users\\...\\input.clv"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
  
# opening the destination excel file 
filename1 ="C:\\Users\\...\\output.clv"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.active
  
# calculate total number of rows and 
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
  
# copying the cell values from source 
# excel file to destination excel file
for i in range (1, mr + 1):
    for j in range (1, mc + 1):
        # reading cell value from source excel file
        c = ws1.cell(row = i, column = j)
  
        # writing the read value to destination excel file
        ws2.cell(row = i, column = j).value = c.value
  
# saving the destination excel file
wb2.save(str(filename1))

在此先谢谢您!

kqhtkvqz

kqhtkvqz1#

这就是我使用Python文件阅读/写的方式。

def readCsv(fileName):
  data = []
  myFile = open(fileName, "r")
  for line in myFile:
    lineList = line.split(",")
    lineList[len(lineList)-1] = lineList[len(lineList) - 1].replace("\n", "")
    data.append(lineList)
  myFile.close()
  return data

def writeCsv(data):
  dataString = ""
  for line in data:
    dataString =dataString + ','.join(line)+"\n"
  myNewFile = open("output.csv", "w")
  myNewFile.write(dataString)
  myNewFile.close()

data = readCsv("yourCsv.csv")
# Remove the data you don't need
writeCsv(dataAfterRemovingColumns)

我的readCsv函数生成一个2D列表,其中每个项目都是CSV文件中一行数据的列表。因此,在我注解了# Remove the data you don't need的地方,您将遍历2D列表,从构成您要删除的列的每一行中删除项。希望这有意义!

ecr0jaav

ecr0jaav2#

您可以从stdlib使用CSV

#!/usr/bin/env python

import csv

inputCsvFilePath  = 'input.csv'
outputCsvFilePath = 'output.csv'

inputCsvColumnNumbers  = [1,3,5]
outputCsvColumnHeaders = ['one', 'three', 'five']

# reading/writing row by row (high IO, low memory):
with open(inputCsvFilePath) as inputCsv:
    inputCsvReader = csv.reader(inputCsv)

    with open(outputCsvFilePath, 'w') as outputCsv:
        outputCsvWriter = csv.writer(outputCsv)

        # write custom csv header:
        outputCsvWriter.writerow(outputCsvColumnHeaders)
        # skip input file header:
        inputCsvReader.__next__()

        for inputRow in inputCsvReader:
            outputCsvWriter.writerow( [inputRow[i] for i in inputCsvColumnNumbers] )

就个人而言,我会使用sqlite:

#!/bin/bash

sqlite3 <<EOF
-- input:
.separator ',' "\n"

.import 'input.csv' inputData

-- output:
.mode csv
.header on
.once 'output.csv'

select
  user_id  as "one"
, login_id as "three"
, password as "five"
from inputData
;
EOF
4xy9mtcn

4xy9mtcn3#

根据您将某些列从csv文件保存到另一个文件的目的,您可以使用pandas库,如下所示:

import pandas as pd

def save_csv(df, path, cols):
    df[cols].to_csv(path, index=False)

with open('path/to/csv', r) as f:
    df = pd.read_csv(f)

# Assuming you want to save columns colA and colB
save_csv(df, path/to/dest/csv, ['colA', 'colB'])

你也可以使用csv DictReader,DictWriter作为另一种方法,它的代码更长,但时间更快(基于我的时间):

import csv

def use_csv():
    def new_dict(d, cols):
        new_dict = {}
        for col in cols:
            new_dict[col] = d[col]
        return new_dict

    with open('path/to/csv', 'r') as f:
        df = csv.DictReader(f)

        with open('path/to/dest/csv', 'w') as csvfile:
            fieldnames = ['colA', 'colB']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

            writer.writeheader()
            for row in df:
                data = new_dict(row, fieldnames)
                writer.writerow(data)
vs91vp4v

vs91vp4v4#

我使用DictReader和DictWriter找到了这个更短的方法:

import csv
with open('original.csv', newline='') as originalfile:
    reader = csv.DictReader(originalfile)
    
    with open('new.csv', 'w', newline='') as newfile:
        fieldnames = ['asset_id', 'treatments']
        writer = csv.DictWriter(newfile, fieldnames=fieldnames)
        writer.writeheader()

        for row in reader:   
            writer.writerow({'asset_id':row['Asset ID'], 'treatments':row['Treatments Identified']})

相关问题