使用Python移除csv文件中以相同特殊字符开头的列

pbwdgjma 于 2022-12-24 发布在 Python

关注(0)|答案(2)|浏览(156)

我的csv文件包含以下列：AFM_反转_指示器、警报_消息、 axios _密钥、_时间表、播放器、__mv_Splunk_警报_ID、__mv_编号_塑料、__mv_代码_帐户_统计_演示。
我想删除以“__mv”开头的列。我看到一些帖子用Pandas来过滤掉列。是否可以使用Python中的csv模块来完成此操作，如果可以，如何操作？另外，使用Pandas，我应该给予什么正则表达式：

df.filter(regex='')
df.to_csv(output_file_path)

注：我正在使用python3. 8

python-3.x

来源：https://stackoverflow.com/questions/74898724/remove-columns-starting-with-same-special-character-in-a-csv-file-using-python

2条答案

按热度按时间

pdtvr36n1#

您不需要为此使用.filter，只需找出哪些列，然后将它们从DataFrame中删除即可

import pandas as pd

# Load the dataframe (In our case create a dummy one with your columns)
df = pd.DataFrame(columns = ["AFM_reversal_indicator", "Alert_Message,axiom_key", "_timediff,player",  "__mv_Splunk_Alert_Id", "__mv_nbr_plastic", "__mv_code_acct_stat_demo"])

# Get a list of all column names starting with "__mv"
mv_columns = [col for col in df.columns if col.startswith("__mv")]

# Drop the columns
df = df.drop(columns=mv_columns)

# Save the updated dataframe to a CSV file
df.to_csv("cleaned_data.csv", index=False)

mv_columns将遍历DataFrame中的列，并选择那些以“__mv”开头的列，然后.drop将删除这些列。
如果出于某种原因，您只想使用csv包，那么解决方案可能没有pandas那么优雅。

import csv

with open("original_data.csv", "r") as input_file, open("cleaned_data.csv", "w", newline="") as output_file:

    reader = csv.reader(input_file)
    writer = csv.writer(output_file)

    header_row = next(reader)

    mv_columns = [col for col in header_row if col.startswith("__mv")]

    mv_column_indices = [header_row.index(col) for col in mv_columns]

    new_header_row = [col for col in header_row if col not in mv_columns]

    writer.writerow(new_header_row)

    for row in reader:
        new_row = [row[i] for i in range(len(row)) if i not in mv_column_indices]

        writer.writerow(new_row)

因此，首先，读取应该是标题的第一行。使用与前面类似的逻辑，找到以“__mv”开头的列，然后获取它们的索引。将新列写入输出文件，并将不存在的列写入“__mv”列。然后，需要迭代CSV的其余部分，并删除这些列。

赞(0）回复(0）举报 2022-12-24

ljsrvy3e2#

你的意思是用标准的python？你可以使用列表解析，例如。

import csv

with open( 'data.csv', 'r' ) as f:
    DataGenerator = csv.reader( f )
    Header = next( DataGenerator )
    Header = [ Col.strip() for Col in Header ]
    Data = list( DataGenerator )
    if Data[-1] == []: del( Data[-1] )
    Data = [ [Row[i] for i in range( len( Header ) ) if not Header[i].startswith( "__mv" ) ] for Row in Data ]
    Header = [ Col for Col in Header if not Col.startswith( "__mv" ) ]

然而，这只是一个简单的例子，你可能会有更多的事情要考虑，例如，你的csv列是什么类型，你是想像我这样一次读取所有的数据，还是一个接一个地从生成器读取以节省内存，等等。
您也可以使用内置的filter命令来代替内部列表解析。
另外，如果你安装了numpy，并且你想要更"数字化"的东西，你可以使用"结构化numpy数组"（https://numpy.org/doc/stable/user/basics.rec.html）。它们很不错。（我个人更喜欢它们而不是panda）。numpy也有自己的csv读取功能（参见：https://www.geeksforgeeks.org/how-to-read-csv-files-with-numpy/）

赞(0）回复(0）举报 2022-12-24

我来回答

使用Python移除csv文件中以相同特殊字符开头的列

2条答案

相关问题

热门标签

最新问答