如何从csv.file中提取字符串，并将它们写成字符串列表

dkqlctbz 于 2023-07-31 发布在其他

关注(0)|答案(1)|浏览(104)

我想从csv文件中的某些列中提取一些字符串，如果满足另一列中的一个条件。然后我想把提取的字符串写在一个txt. file的列表中。
我是pandas的新手，所以可能有一个明显的解决方案，但是我用下面的代码生成的文件是空的。如果我在第12行打印变量“extracted rows”，我只得到这个：“Series（[]，dtype：”有什么想法吗？

import pandas as pd

def process_csv(file_name):
    # Read the CSV file
    df = pd.read_csv(file_name)

    # Assuming the columns are named as 'Column5', 'Column4' and 'Column3'
    # Convert 'Column5' to numeric
    df['Column5'] = pd.to_numeric(df['Column5'], errors='coerce')

    # Extract rows where 'Column5' is >= 18
    extracted_rows = df[df['Column5'] >= 18]

    # Create new strings by concatenating 'Column4' and 'Column3' (which need to be reverse order in generated string for my purpose 
    combined_strings = extracted_rows['Column4'] + " " + extracted_rows['Column3']
    
    print(combined_strings)

    # Write the combined strings to a txt file
    with open('file.txt', 'w') as f:
        for item in combined_strings:
            f.write('%s\n' % item)

process_csv('file.csv')

字符串
更新：采纳了一个建议，我与apply合作，试图找到一个解决方案，解决第五列中包含两个数字和'-'的情况。但是现在我只得到那些实际包含'-'的行。让我有点抓狂：

import pandas as pd

def process_csv(file_name):
    # Read the CSV file
    df = pd.read_csv(file_name)

    # Check if strings in column 5 contain '-'
    # If so split at '-' and take the first part
    # Otherwise, keep the original string
    df.iloc[:, 4] = df.iloc[:, 4].apply(lambda x: x.split('-')[0] if len(str(x)) > 3 and '-' in str(x) else x)

    # Convert column 5 to numeric, set invalid parsing as NaN
    df.iloc[:, 4] = pd.to_numeric(df.iloc[:, 4], errors='coerce')

    # Replace NaNs (resulted from invalid parsing) with a negative number
    df.iloc[:, 4].fillna(-1, inplace=True)

    # Extract rows where column 5 is >= 18
    extracted_rows = df[df.iloc[:, 4] >= 18]

    # Create new strings by concatenating column 4 and column 3
    combined_strings = extracted_rows.iloc[:, 3] + " " + extracted_rows.iloc[:, 2]

   print(combined_strings)
   Write the combined strings to a txt file
   with open('file.txt', 'w') as f:
        for item in combined_strings:
            f.write("%s\n" % item)

process_csv('file.csv')

型

csv

来源：https://stackoverflow.com/questions/76753610/how-do-i-extract-strings-from-a-csv-file-and-write-them-as-a-list-of-strings

1条答案

按热度按时间

c9qzyr3d1#

你可以使用apply。有关更多信息和文档：（https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html）

import pandas as pd

df = pd.DataFrame({'Col1': ['a', 'b', 'c'], 'Col2': ['a', 'b', 'e'], 'Col3': ['e', 'f', 'g']})

def do_something(row):
# In this function, the first input parameter is the "row"
# of the DataFrame, you could have more input parameters,
# but this could be quite complicated.
    if row['Col1'] == row['Col2']:
        return row['Col1'] + ' ' + row['Col3']
        

df.apply(do_something, axis=1)

字符串
以下是输出：

>>> 
0     a e
1     b f
2    None
dtype: object

型
当然，你可以通过这样做将输出重定向到DataFrame的一部分：

df.loc[:, 'output'] = df.apply(do_something, axis=1)

型
x1c 0d1x的数据
希望这对你有帮助！：）

赞(0）回复(0）举报 2023-07-31

我来回答

如何从csv.file中提取字符串，并将它们写成字符串列表

1条答案

相关问题

热门标签

最新问答