如何从csv.file中提取字符串,并将它们写成字符串列表

dkqlctbz  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(103)

我想从csv文件中的某些列中提取一些字符串,如果满足另一列中的一个条件。然后我想把提取的字符串写在一个txt. file的列表中。
我是pandas的新手,所以可能有一个明显的解决方案,但是我用下面的代码生成的文件是空的。如果我在第12行打印变量“extracted rows”,我只得到这个:“Series([],dtype:”有什么想法吗?

import pandas as pd

def process_csv(file_name):
    # Read the CSV file
    df = pd.read_csv(file_name)

    # Assuming the columns are named as 'Column5', 'Column4' and 'Column3'
    # Convert 'Column5' to numeric
    df['Column5'] = pd.to_numeric(df['Column5'], errors='coerce')

    # Extract rows where 'Column5' is >= 18
    extracted_rows = df[df['Column5'] >= 18]

    # Create new strings by concatenating 'Column4' and 'Column3' (which need to be reverse order in generated string for my purpose 
    combined_strings = extracted_rows['Column4'] + " " + extracted_rows['Column3']
    
    print(combined_strings)

    # Write the combined strings to a txt file
    with open('file.txt', 'w') as f:
        for item in combined_strings:
            f.write('%s\n' % item)

process_csv('file.csv')

字符串
更新:采纳了一个建议,我与apply合作,试图找到一个解决方案,解决第五列中包含两个数字和'-'的情况。但是现在我只得到那些实际包含'-'的行。让我有点抓狂:

import pandas as pd

def process_csv(file_name):
    # Read the CSV file
    df = pd.read_csv(file_name)

    # Check if strings in column 5 contain '-'
    # If so split at '-' and take the first part
    # Otherwise, keep the original string
    df.iloc[:, 4] = df.iloc[:, 4].apply(lambda x: x.split('-')[0] if len(str(x)) > 3 and '-' in str(x) else x)

    # Convert column 5 to numeric, set invalid parsing as NaN
    df.iloc[:, 4] = pd.to_numeric(df.iloc[:, 4], errors='coerce')

    # Replace NaNs (resulted from invalid parsing) with a negative number
    df.iloc[:, 4].fillna(-1, inplace=True)

    # Extract rows where column 5 is >= 18
    extracted_rows = df[df.iloc[:, 4] >= 18]

    # Create new strings by concatenating column 4 and column 3
    combined_strings = extracted_rows.iloc[:, 3] + " " + extracted_rows.iloc[:, 2]

   print(combined_strings)
   Write the combined strings to a txt file
   with open('file.txt', 'w') as f:
        for item in combined_strings:
            f.write("%s\n" % item)

process_csv('file.csv')

c9qzyr3d

c9qzyr3d1#

你可以使用apply。有关更多信息和文档:(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

import pandas as pd

df = pd.DataFrame({'Col1': ['a', 'b', 'c'], 'Col2': ['a', 'b', 'e'], 'Col3': ['e', 'f', 'g']})

def do_something(row):
# In this function, the first input parameter is the "row"
# of the DataFrame, you could have more input parameters,
# but this could be quite complicated.
    if row['Col1'] == row['Col2']:
        return row['Col1'] + ' ' + row['Col3']
        

df.apply(do_something, axis=1)

字符串
以下是输出:

>>> 
0     a e
1     b f
2    None
dtype: object


当然,你可以通过这样做将输出重定向到DataFrame的一部分:

df.loc[:, 'output'] = df.apply(do_something, axis=1)


x1c 0d1x的数据
希望这对你有帮助!:)

相关问题