用于重新格式化CSV文件并输出空文件的Python脚本

o2g1uqev  于 2023-04-27  发布在  Python
关注(0)|答案(1)|浏览(129)

有人能帮我用我的python脚本将输入csv转换为具有正确格式的输出csv吗?目前它只输出一个空的csv文件。格式参数如下:
输入:
1.输入csv中的每一行都包含输出csv行中7列中的一列
1.需要注意的是,有些空行只包含引号
1.每个数据集的输入文件中都有多个行,有些会有额外的行,有些会有缺失的行
输出:
1.输出csv中的每一行应该有7列
1.在输入csv中,任何包含空数据(如两个引号)的行都应在输出csv中删除
1.从输入csv中使用“Investor focuses:”作为锚点来确定行数据集中的最后一列
1.从输入csv中,任何不包含“@”符号的行的数据集将“Email:NA”插入该行的第4列
1.从输入csv中,任何不包括“电话:”的行的数据集将“电话:NA”插入该行A的第5列。
1.从输入csv中删除仅包含“Email:”或“Placeholder image”的任何行

  1. csv文件的内容应该有一个逗号,表示一组引号中每个单元格的内容,并在引号外的末尾加上一个逗号,以实现正确的csv格式
    1.输入文件将始终为input.csv,输出将始终为output.csv
    以下是输入数据格式的示例:
"",
"Username 001",
"",
"Partner",
"London, United Kingdom",
"Emails:",
"",
"user@example.eu",
"Phone:00-000-000-000",
"Investor types: Private Equity Firm, Venture Capital",
"Investment focuses: Financial Services, Artificial Intelligence, Machine Learning, Software, Advice, Professional Services, Semiconductor, Agriculture, AgTech, Biotechnology, Medical, Medical Device, Veterinary, CRM, Information Technology, E-Commerce, Internet, Robotics, Security, Public Safety, Service Industry Show past investments",
"Placeholder image",
"",
"Username 002",
"",
"Co-Founder and General Partner",
"London, United Kingdom",
"Emails:",
"",
"Phone:00-00-0000-0000",
"Investor type: Micro VC",
"Investment focuses: Semiconductor, Software, SaaS, Health Care, CleanTech, Data Visualization, Hardware, Internet of Things, Smart Cities, Financial Services, FinTech, CMS, IaaS, Information Technology, Internet, PaaS, Web Design, Web Development, Web Hosting, Machine Learning, Robotics, Artificial Intelligence, Computer, Software Engineering Show past investments",
"",
"Username 003",
"",
"Chief Executive",
"Cardiff, United Kingdom",
"Emails:",
"",
"user@example.wales",
"Investor types: Government Office, Micro VC",
"Investment focuses: Cosmetics, Men's, Sensor, Building Material, Financial Services, FinTech, Marketing, Manufacturing, Artificial Intelligence, Biotechnology, Health Care, Life Science, Software, Blockchain, Enterprise Applications, Industrial Automation, Industrial Manufacturing, Internet, SaaS, Supply Chain Management, Information Technology, Wellness Show past investments",
"",
"Username 004",
"",
"Partner",
"London, United Kingdom",
"Emails:",
"",
"user@example.com",
"Phone:00-00-0000-0000",
"Investor type: Private Equity Firm",
"Investment focuses: Construction, Aerospace, Financial Services, Hospital, Pharmaceutical, Industrial, Machinery Manufacturing, Coffee, Food and Beverage Show past investments",

以下是示例数据的输出方式:

"Username 001","Partner","London, United Kingdom","user001@example.eu","Phone:00-000-000-000","Investor types: Private Equity Firm, Venture Capital","Investment focuses: Financial Services, Artificial Intelligence, Machine Learning, Software, Advice, Professional Services, Semiconductor, Agriculture, AgTech, Biotechnology, Medical, Medical Device, Veterinary, CRM, Information Technology, E-Commerce, Internet, Robotics, Security, Public Safety, Service Industry Show past investments"
"Username 002","Co-Founder and General Partner","London, United Kingdom","Email:NA","Phone:00-00-0000-0000","Investor type: Micro VC","Investment focuses: Semiconductor, Software, SaaS, Health Care, CleanTech, Data Visualization, Hardware, Internet of Things, Smart Cities, Financial Services, FinTech, CMS, IaaS, Information Technology, Internet, PaaS, Web Design, Web Development, Web Hosting, Machine Learning, Robotics, Artificial Intelligence, Computer, Software Engineering Show past investments",
"Username 003","Chief Executive","Cardiff, United Kingdom","user003@example.wales","Phone:NA","Investor types: Government Office, Micro VC","Investment focuses: Cosmetics, Men's, Sensor, Building Material, Financial Services, FinTech, Marketing, Manufacturing, Artificial Intelligence, Biotechnology, Health Care, Life Science, Software, Blockchain, Enterprise Applications, Industrial Automation, Industrial Manufacturing, Internet, SaaS, Supply Chain Management, Information Technology, Wellness Show past investments",
"Username 004","Partner","London, United Kingdom","user004@example.com","Phone:00-00-0000-0000","Investor type: Private Equity Firm","Investment focuses: Construction, Aerospace, Financial Services, Hospital, Pharmaceutical, Industrial, Machinery Manufacturing, Coffee, Food and Beverage Show past investments",

当前脚本:

import csv

# Open the input file and create a reader object
with open('input.csv', newline='') as f_input:
    reader = csv.reader(f_input)

    # Open the output file and create a writer object
    with open('output.csv', 'w', newline='') as f_output:
        writer = csv.writer(f_output)

        # Initialize variables to hold data for the current row
        current_name = ""
        current_title = ""
        current_location = ""
        current_email = ""
        current_phone = ""
        current_investor_types = ""
        current_investment_focuses = ""

        # Loop through each row in the input file
        for row_number, row in enumerate(reader):

            # Skip any rows that contain only empty data
            if not any(row):
                continue

            # Check that each row has the correct number of columns
            if len(row) != 1 and len(row) != 3 and len(row) != 4 and len(row) != 7:
                print(f"Error: Row {row_number} has {len(row)} columns. Skipping row.")
                continue

            # Extract the data from the row and store it in the appropriate variables
            if row[0] != "":
                current_name = row[0]
            elif row[2] != "":
                current_title = row[2]
            elif row[3] != "":
                current_location = row[3]
            elif "@" in row[4]:
                current_email = row[4]
            elif "Phone:" in row[4]:
                current_phone = row[4]
            elif row[5] == "Investor types:":
                current_investor_types = row[6]
            elif row[5] == "Investment focuses:":
                current_investment_focuses = row[6]

                # Check if the current row contains all necessary data
                if current_name != "" and current_title != "" and current_location != "":
                    # If the current row does not contain an email, add "Email:NA" to the appropriate column
                    if current_email == "":
                        current_email = "Email:NA"

                    # If the current row does not contain a phone number, add "Phone:NA" to the appropriate column
                    if current_phone == "":
                        current_phone = "Phone:NA"

                    # Remove any rows that just include "Email:" or "Placeholder image"
                    if current_email == "Email:" or current_email == "Placeholder image":
                        current_email = ""

                    # Write the data for the current row to the output file
                    writer.writerow([current_name, current_title, current_location, current_email, current_phone, current_investor_types, current_investment_focuses])

                    # Reset the variables for the next row
                    current_name = ""
                    current_title = ""
                    current_location = ""
                    current_email = ""
                    current_phone = ""
                    current_investor_types = ""
                    current_investment_focuses = ""

        # Check if the last row in the input file contains all necessary data
        if current_name != "" and current_title != "" and current_location != "":
            # If the last row does not contain an email, add "Email:NA" to the appropriate column
            if current_email == "":
                current_email = "Email:NA"

            # If the last row does not contain a phone number, add "Phone:NA" to the appropriate column
            if current_phone == "":
                current_phone = "Phone:NA"

            # Remove any rows that just include "Email:" or "Placeholder image"
            if current_email == "Email:" or current_email:
                current_email = ""

            # Write the data for the last row to the output file
            writer.writerow([current_name, current_title, current_location, current_email, current_phone, current_investor_types, current_investment_focuses])
8cdiaqws

8cdiaqws1#

import csv

# Open the input file and create a reader object
with open('input.csv', newline='') as f_input:
    reader = csv.reader(f_input)

    # Open the output file and create a writer object
    with open('output.csv', 'w', newline='') as f_output:
        writer = csv.writer(f_output)

        # Initialize variables to hold data for the current row
        current_name = ""
        current_title = ""
        current_location = ""
        current_email = "Email:NA"
        current_phone = "Phone:NA"
        current_investor_types = ""
        current_investment_focuses = ""

        # Loop through each row in the input file
        for row_number, row in enumerate(reader):
            row_content = row[0].strip()

            # Skip any rows that contain only empty data
            if not row_content:
                continue

            # Extract the data from the row and store it in the appropriate variables
            if "Email:" in row_content and "@" in row_content:
                current_email = row_content
            elif "Phone:" in row_content:
                current_phone = row_content
            elif "Investor types:" in row_content:
                current_investor_types = row_content
            elif "Investment focuses:" in row_content:
                current_investment_focuses = row_content

                # Write the data for the current row to the output file
                writer.writerow([current_name, current_title, current_location, current_email, current_phone, current_investor_types, current_investment_focuses])

                # Reset the variables for the next row
                current_email = "Email:NA"
                current_phone = "Phone:NA"
                current_investor_types = ""
                current_investment_focuses = ""
            elif row_content not in ["Email:", "Placeholder image"]:
                if current_title == "":
                    current_title = row_content
                elif current_location == "":
                    current_location = row_content
                else:
                    current_name = row_content
                    current_title = ""
                    current_location = ""

相关问题