我想从csv文件中的某些列中提取一些字符串,如果满足另一列中的一个条件。然后我想把提取的字符串写在一个txt. file的列表中。
我是pandas的新手,所以可能有一个明显的解决方案,但是我用下面的代码生成的文件是空的。如果我在第12行打印变量“extracted rows”,我只得到这个:“Series([],dtype:”有什么想法吗?
import pandas as pd
def process_csv(file_name):
# Read the CSV file
df = pd.read_csv(file_name)
# Assuming the columns are named as 'Column5', 'Column4' and 'Column3'
# Convert 'Column5' to numeric
df['Column5'] = pd.to_numeric(df['Column5'], errors='coerce')
# Extract rows where 'Column5' is >= 18
extracted_rows = df[df['Column5'] >= 18]
# Create new strings by concatenating 'Column4' and 'Column3' (which need to be reverse order in generated string for my purpose
combined_strings = extracted_rows['Column4'] + " " + extracted_rows['Column3']
print(combined_strings)
# Write the combined strings to a txt file
with open('file.txt', 'w') as f:
for item in combined_strings:
f.write('%s\n' % item)
process_csv('file.csv')
字符串
更新:采纳了一个建议,我与apply合作,试图找到一个解决方案,解决第五列中包含两个数字和'-'的情况。但是现在我只得到那些实际包含'-'的行。让我有点抓狂:
import pandas as pd
def process_csv(file_name):
# Read the CSV file
df = pd.read_csv(file_name)
# Check if strings in column 5 contain '-'
# If so split at '-' and take the first part
# Otherwise, keep the original string
df.iloc[:, 4] = df.iloc[:, 4].apply(lambda x: x.split('-')[0] if len(str(x)) > 3 and '-' in str(x) else x)
# Convert column 5 to numeric, set invalid parsing as NaN
df.iloc[:, 4] = pd.to_numeric(df.iloc[:, 4], errors='coerce')
# Replace NaNs (resulted from invalid parsing) with a negative number
df.iloc[:, 4].fillna(-1, inplace=True)
# Extract rows where column 5 is >= 18
extracted_rows = df[df.iloc[:, 4] >= 18]
# Create new strings by concatenating column 4 and column 3
combined_strings = extracted_rows.iloc[:, 3] + " " + extracted_rows.iloc[:, 2]
print(combined_strings)
Write the combined strings to a txt file
with open('file.txt', 'w') as f:
for item in combined_strings:
f.write("%s\n" % item)
process_csv('file.csv')
型
1条答案
按热度按时间c9qzyr3d1#
你可以使用
apply
。有关更多信息和文档:(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)字符串
以下是输出:
型
当然,你可以通过这样做将输出重定向到DataFrame的一部分:
型
x1c 0d1x的数据
希望这对你有帮助!:)