python-3.x file.read()UnicodeDecodeError - Devuan Daedalus(Debian 12 w/o systemd)

egdjgwm8  于 12个月前  发布在  Python
关注(0)|答案(2)|浏览(131)

我有一个Python脚本,是我朋友写的,用于文本替换,在他的系统Ubuntu Focal中运行。
以下是脚本:

#!/usr/bin/env python3
"""
Script Name: replace_text.py
Purpose: This Python script performs text substitution in files within a given directory.
It replaces specific characters as per predefined substitutions, providing a convenient way to modify text files.

Usage:
python replace_text.py /path/to/your/directory

Note:
- Ensure you have Python installed on your system.
- The script processes all files within the specified directory and its subdirectories.
- Files are modified in-place, so have a backup if needed.
"""

import os
import sys

def replace_text_in_files(directory):
    # Character substitutions
    substitutions = {
        '': 'fi',
        '': 'fl',
        'ä': 'ā',
        'é': 'ī',
        'ü': 'ū',
        'å': 'ṛ',
        'è': 'ṝ',
        'ì': 'ṅ',
        'ñ': 'ṣ',
        'ï': 'ñ',
        'ö': 'ṭ',
        'ò': 'ḍ',
        'ë': 'ṇ',
        'ç': 'ś',
        'à': 'ṁ',
        'ù': 'ḥ',
        'ÿ': 'ḷ',
        'û': 'ḹ',
        'Ä': 'Ā',
        'É': 'Ī',
        'Ü': 'Ū',
        'Å': 'Ṛ',
        'È': 'Ṝ',
        'Ì': 'Ṅ',
        'Ñ': 'Ṣ',
        'Ï': 'Ñ',
        'Ö': 'Ṭ',
        'Ò': 'Ḍ',
        'Ë': 'Ṇ',
        'Ç': 'Ś',
        'À': 'Ṁ',
        'Ù': 'Ḥ',
        'ß': 'Ḷ',
        '“': '“',
        '”': '”',
        ' ': ' ',
        '‘': '‘',
        '–': '-',
        '’': '’',
        '—': '—',
        '•': '»',
        '…': '...',
    }

    # Walk through the directory and its subdirectories
    for root, dirs, files in os.walk(directory):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            with open(file_path, 'r', encoding='utf-8') as file:
                file_content = file.read()
            
            # Perform substitutions
            for original, replacement in substitutions.items():
                file_content = file_content.replace(original, replacement)

            # Write the modified content back to the file
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(file_content)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python replace_text.py /path/to/your/directory")
        sys.exit(1)

    directory_path = sys.argv[1]
    replace_text_in_files(directory_path)
    print("Text substitution completed successfully.")

字符串
我Devuan Daedalus基于Debian 12但没有systemd。在我的机器上运行此脚本时,我得到以下错误:

~/Documents/software-related/software-files$ python3 replace_text.py ~/Desktop/test-dir/
Traceback (most recent call last):
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 89, in <module>
    replace_text_in_files(directory_path)
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 73, in replace_text_in_files
    file_content = file.read()
                   ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte


他对此毫无头绪,而我对python一无所知,所以我在这个论坛上寻求那些有知识的人的帮助。
我接受了Ofer Sadan的建议,将文件作为bytes文件打开。但这给了我另一个错误:

binary mode doesn't take an encoding argument


如果出现以下情况,请询问更多信息:
1.这个问题似乎太含糊/太笼统/太笼统。
1.我没有提供足够的信息。
谢谢你,

dffbzjpn

dffbzjpn1#

Python抱怨,因为当阅读其中一个文件时,字节转换为utf-8字符。它达到了一个字节不是有效utf-8字符的地步。你确定这个文件实际上是一个utf-8编码的文件吗?https://www.charset.org/utf-8
尝试读取文件作为二进制将给予你实际的字节,但你想替换字符。然后你将不得不转换字节到一个字符串与utf-8编解码器,我猜,你会以同样的错误结束。
我会格外小心,在你的情况下(备份),你可能试图回火一个实际的二进制文件。你确定你正在触摸的文件是要修改?

i2loujxw

i2loujxw2#

我已经在.doc.docx文件上运行了这个脚本,我猜它们不是UTF-8编码的。
抱歉,浪费了你的时间。
感谢您的贡献@frederic-laurencin和@ eldericmubarmeg

相关问题