使用Python和CSV匹配字符串时出现问题

lf5gs5x2  于 2022-12-15  发布在  Python
关注(0)|答案(2)|浏览(166)

如果我继续,我试图制作一个Python脚本,它可以从终端中给定的医疗任务中读取症状,并将其与dataset.csv中的其他症状进行比较,然后给出该任务可能遭受的患者形式。
我遇到的问题是它似乎没有读取dataset.csv,只是给我:

The patient is likely suffering from d.

数据集.csv如下所示:

Asthma, Wheezing, coughing, chest tightness, and shortness of breath
Atelectasis, Shortness of breath, chest pain or discomfort, and a cough
Atypical pneumonia, Fever, chills, chest pain or discomfort, and shortness of breath
Basal cell carcinoma, Flat, pale, or yellowish patch of skin
Bell's palsy, Facial droop or weakness, numbness, pain around the jaw
Biliary colic, Pain in the upper abdomen that may spread to the shoulder or back
Bladder cancer, Blood in the urine, pain or burning with urination, and frequent urination
Brain abscess, Headache, fever, confusion, drowsiness, seizures, and weakness

我的剧本是这样的:

#!/usr/bin/env python3

import argparse
import csv

# Parse the command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('-t', '--task', help='The symptoms to search for in the dataset')
parser.add_argument('-d', '--dataset', help='The dataset to search in')
args = parser.parse_args()

# Get the task symptoms
task_symptoms = args.task.split(', ')

# Initialize a dictionary to store disease counts
disease_counts = {}

# Open the dataset
try:
    # Open the dataset
    with open(args.dataset, 'r') as csv_file:
        csv_reader = csv.reader('dataset.csv')

# Iterate through each row
    for row in csv_reader:
        
        # Get the disease and symptoms
        disease = row[0].strip()
        symptoms = row[1:]
        
        # Initialize the count
        count = 0
        
        # Iterate through each symptom in the task
        for task_symptom in task_symptoms:
            
            # Iterate through each symptom in the dataset
            for symptom in symptoms:

                # If the symptom matches a symptom in the task
                if task_symptom == symptom:
                    
                    # Increment the count
                    count += 1

        # Store the disease name and count in the dictionary
        disease_counts[disease] = count
# Get the maximum count
    max_count = max(disease_counts.values())

    # Get the most probable disease from the counts
    most_probable_disease = [k for k, v in disease_counts.items() if v == max_count][0]

    print(f'The patient is likely suffering from {most_probable_disease}.')

except FileNotFoundError:
    print("Error: Could not open the file.")

我做错了什么?
我所排除的一个例子是(取决于症状):

The patient is likely suffering from Asthma

已经三个星期了,但我还是想不出来。
谢谢你帮我

m4pnthwp

m4pnthwp1#

我认为问题出在csv文件的格式上。

Asthma, Wheezing, coughing, chest tightness, and shortness of breath

由于每个逗号后有一个空格,csv文件中的此行将生成以下字段:

row[0] = "Asthma"
row[1] = " Wheezing"
row[2] = " coughing"
row[3] = " chest tightness"
row[4] = " and shortness of breath"

看到第一个字段之后的所有字段都以空格开头吗?字符串" coughing"与字符串"coughing"不匹配。

nzkunb0c

nzkunb0c2#

默认情况下,当使用csv.reader()阅读CSV文件时,每个值单独拆分为,。CSV包含额外的空格,将包含在值中。例如,您可以使用CSV文件进行测试,如:

Asthma,Wheezing,coughing,chest tightness,and shortness of breath

不过,您可以为csv.reader()使用skipinitialspace=True参数,这样可以确保每个symptom不以空格字符开头。
例如:

csv_reader = csv.reader('dataset.csv', skipinitialspace=True)

或者,您可以通过对每个symptom使用.strip()来确保没有额外的空格:

if task_symptom == symptom.strip():

您可能还希望通过将两个参数都转换为小写来确保比较不区分大小写:

if task_symptom.lower() == symptom.strip().lower():

相关问题