我有一个Python脚本,是我朋友写的,用于文本替换,在他的系统Ubuntu Focal中运行。
以下是脚本:
#!/usr/bin/env python3
"""
Script Name: replace_text.py
Purpose: This Python script performs text substitution in files within a given directory.
It replaces specific characters as per predefined substitutions, providing a convenient way to modify text files.
Usage:
python replace_text.py /path/to/your/directory
Note:
- Ensure you have Python installed on your system.
- The script processes all files within the specified directory and its subdirectories.
- Files are modified in-place, so have a backup if needed.
"""
import os
import sys
def replace_text_in_files(directory):
# Character substitutions
substitutions = {
'': 'fi',
'': 'fl',
'ä': 'ā',
'é': 'ī',
'ü': 'ū',
'å': 'ṛ',
'è': 'ṝ',
'ì': 'ṅ',
'ñ': 'ṣ',
'ï': 'ñ',
'ö': 'ṭ',
'ò': 'ḍ',
'ë': 'ṇ',
'ç': 'ś',
'à': 'ṁ',
'ù': 'ḥ',
'ÿ': 'ḷ',
'û': 'ḹ',
'Ä': 'Ā',
'É': 'Ī',
'Ü': 'Ū',
'Å': 'Ṛ',
'È': 'Ṝ',
'Ì': 'Ṅ',
'Ñ': 'Ṣ',
'Ï': 'Ñ',
'Ö': 'Ṭ',
'Ò': 'Ḍ',
'Ë': 'Ṇ',
'Ç': 'Ś',
'À': 'Ṁ',
'Ù': 'Ḥ',
'ß': 'Ḷ',
'“': '“',
'”': '”',
' ': ' ',
'‘': '‘',
'–': '-',
'’': '’',
'—': '—',
'•': '»',
'…': '...',
}
# Walk through the directory and its subdirectories
for root, dirs, files in os.walk(directory):
for file_name in files:
file_path = os.path.join(root, file_name)
with open(file_path, 'r', encoding='utf-8') as file:
file_content = file.read()
# Perform substitutions
for original, replacement in substitutions.items():
file_content = file_content.replace(original, replacement)
# Write the modified content back to the file
with open(file_path, 'w', encoding='utf-8') as file:
file.write(file_content)
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python replace_text.py /path/to/your/directory")
sys.exit(1)
directory_path = sys.argv[1]
replace_text_in_files(directory_path)
print("Text substitution completed successfully.")
字符串
我Devuan Daedalus基于Debian 12但没有systemd。在我的机器上运行此脚本时,我得到以下错误:
~/Documents/software-related/software-files$ python3 replace_text.py ~/Desktop/test-dir/
Traceback (most recent call last):
File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 89, in <module>
replace_text_in_files(directory_path)
File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 73, in replace_text_in_files
file_content = file.read()
^^^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte
型
他对此毫无头绪,而我对python一无所知,所以我在这个论坛上寻求那些有知识的人的帮助。
我接受了Ofer Sadan的建议,将文件作为bytes文件打开。但这给了我另一个错误:
binary mode doesn't take an encoding argument
型
如果出现以下情况,请询问更多信息:
1.这个问题似乎太含糊/太笼统/太笼统。
1.我没有提供足够的信息。
谢谢你,
2条答案
按热度按时间dffbzjpn1#
Python抱怨,因为当阅读其中一个文件时,字节转换为utf-8字符。它达到了一个字节不是有效utf-8字符的地步。你确定这个文件实际上是一个utf-8编码的文件吗?https://www.charset.org/utf-8
尝试读取文件作为二进制将给予你实际的字节,但你想替换字符。然后你将不得不转换字节到一个字符串与utf-8编解码器,我猜,你会以同样的错误结束。
我会格外小心,在你的情况下(备份),你可能试图回火一个实际的二进制文件。你确定你正在触摸的文件是要修改?
i2loujxw2#
我已经在
.doc
和.docx
文件上运行了这个脚本,我猜它们不是UTF-8编码的。抱歉,浪费了你的时间。
感谢您的贡献@frederic-laurencin和@ eldericmubarmeg