Python脚本生成字母不读.csv好[重复]

ltqd579y  于 2023-11-14  发布在  Python
关注(0)|答案(1)|浏览(130)

此问题在此处已有答案

Python read csv - BOM embedded into the first key(2个答案)
8天前关闭
也许你可以帮我解决问题。我运行这段代码来读取一个.csv文件,并使用一个.txt示例字母来为.csv文件的每一行生成一个.txt文件。我根据.csv文件的Institution列命名每个文件。

import csv
import os

# Define the path to the directory where you want to save the files
directory_path = r'C:\Users'

# Set the current working directory to the specified path
os.chdir(directory_path)

# Define the CSV and sample text file names
csv_file = "sample.csv"
sample_text_file = "sample_template.txt"

# Read the CSV file
with open(csv_file, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    data = list(csv_reader)

# Read the sample text template
with open(sample_text_file, 'r') as template_file:
    template_text = template_file.read()

# Define the placeholders without delimiters
placeholders = ["Position", "Platform", "Institution", "Optional Paragraph"]

# Process each row from the CSV
for row in data:
    # Create a copy of the template text for this row
    modified_text = template_text

    # Replace placeholders with values from the CSV
    for placeholder in placeholders:
        value = row.get(placeholder, '')
        modified_text = modified_text.replace(f'"{placeholder}"', value)

    # Get the name for the output file from "Institution" column
    output_file_name = f"{row['Institution']}.txt"

    # Write the modified text to the output file
    with open(output_file_name, 'w') as output_file:
        output_file.write(modified_text)

    print(f"File '{output_file_name}' has been created.")

print("All files have been generated in the specified directory.")

字符串
.txt文件被写入导出到latex,看起来像这样:

\begin{document}

%----------------------------------------------------------------------------------------
%   FIRST PAGE HEADER
%----------------------------------------------------------------------------------------

\vspace{-1em} % Pull the rule closer to the logo

\rule{\linewidth}{1pt} % Horizontal rule

\bigskip\bigskip % Vertical whitespace

%----------------------------------------------------------------------------------------
%   YOUR NAME AND CONTACT INFORMATION
%----------------------------------------------------------------------------------------

\hfill
\begin{tabular}{l @{}}
    \today \bigskip\\ % Date
    NAME \\
        INSTITUTION \\
    A1 \\ % Address
    A2 \\
\end{tabular}

Dear Members of the Recruiting Committee,
\bigskip % Vertical whitespace

%----------------------------------------------------------------------------------------
%   LETTER CONTENT
%----------------------------------------------------------------------------------------

I am writing to apply for the "Position" position that you have advertised on "Platform". 

I believe the skills and competencies I hold would be of great value to the "Institution".

"Optional Paragraph"

Thank you for your time and consideration. 

\bigskip % Vertical whitespace

Sincerely yours,

\vspace{50pt} % Vertical whitespace

\includegraphics[width=0.2\textwidth]{signature.png}

NAME

\end{document}


示例.csv看起来像这样:

Position,Platform,Institution,Optional Paragraph
Waiter,Facebook,Company A ,
Coder,Twitter,Company B,I am cool.

我的问题是,这给了我公司A以下结果:

I am writing to apply for the  position that you have advertised on Facebook. 

I believe the skills and competencies I hold would be of great value to the Company A .


正如你所看到的,Position列不能被我的代码识别。我想知道你是否知道是什么导致了这一点?

yhived7q

yhived7q1#

我相信你的CSV文件中有一个BOM(字节顺序标记),比如:

with open("input.csv", "w", encoding="utf-8-sig") as f:
    f.write(
        """Position,Platform,Institution,Optional Paragraph
Waiter,Facebook,Company A ,
Coder,Twitter,Company B,I am cool."""
    )

with open("input.csv", "rb") as f:
    print(f.read(12))

字符串
它看起来像:

b'\xef\xbb\xbfPosition,'


当DictReader获取该基础文件并读取第一行作为字段名时,BOM将作为第一个列名的一部分读取:

import csv

with open("input.csv", newline="") as f:
    reader = csv.DictReader(f)
    print(reader.fieldnames)
['\ufeffPosition', 'Platform', 'Institution', 'Optional Paragraph']

所以'Position' != '\ufeffPosition',这就是为什么get(fieldname,'')返回空字符串。
尝试上面的代码,打印字段名,如果你看到\ufeff,用encoding='utf-8-sig'打开文件:

with open("input.csv", newline="", encoding="utf-8-sig") as f:
    reader = csv.DictReader(f)
    print(reader.fieldnames)
['Position', 'Platform', 'Institution', 'Optional Paragraph']

相关问题