我更新了我的问题:我有一个XML文件,其中包含几个标题。但是,这些标题是法语,而不是英语,像这样:
<entity name="MissionTemplate">
<string name="code" required="true" title="Bond Restitution Date"/>
<string name="description" namecolumn="true" title="Description"/>
我想用英语翻译每个标题。为了使它,我有一个翻译的CSV文件,我必须使用它来翻译XML文件。这个CSV文件不在最终的应用程序中,只是在我的计算机本地,它只是用来获得正确的翻译。下面是一个示例:
Table;Champ;Anglais;Français
"HeaderTable8011405";"Code";"Code";"Code"
"HeaderTable8011405";"Activity_Type";"Auction,Mission";"Vente,Mission"
"HeaderTable8011405";"Activity_Type";"Activity Type";"Type activité"
...
我还有另一个CSV翻译的文件,其中包含英语单词和法语单词之间的对应关系。它在最终应用程序中。因此,当用户更改语言时,此文件用于翻译应用程序。下面是该文件的示例:
"key","message","comment","context"
"Code","Code",,
"Auction,Mission","Vente,Mission",,
"Activity Type","Type activité",,
...
我必须在XML文件中获得法语标题,并在本地CSV文件(第一个CSV翻译的文件)中搜索它。如果有法语标题,我就复制英语翻译,并替换XML文件中的法语翻译。最后,我必须在第二个CSV翻译文件(用于在英语和法语之间切换的文件)中添加一个新行,第一列是英语单词,第二列是法语单词。总结一下,有3个文件:一个XML文件(包含几个标题),一个CSV本地文件,其中包含良好的翻译(不在我的应用程序中,只是用来获得正确的英语翻译),另一个CSV文件,其中包含英语和法语单词之间的对应关系(它在应用程序中使用两种语言之间的切换:英文和法文)。
我可以手工制作这个以前的任务,因为它是如此繁琐和耗时.
所以,我试着创建一个Python程序来代替我完成这个任务。代码如下:
from bs4 import BeautifulSoup
import pandas as pd
import csv
xml_file = input("Give here the name of the XML file which you want translate : ")
translations_file = input("Give here the name of the translation's file : ")
final_translations_file = input("Give here the name of the final translation's file : ")
# Reading the data inside the xml
# file to a variable under the name
# data
with open('MissionTemplate_model.xml', 'r') as f:
data = f.read()
# Passing the stored data inside
# the beautifulsoup parser, storing
# the returned object
data = BeautifulSoup(data, "xml")
# Title list
title_list = []
# Get each title in the XML file
for element in data.find_all():
if 'title' in element.attrs:
title_content = element['title']
title_list.append(title_content)
# Check if the title is translated
missing_translations = []
translation_list= pd.read_csv(translations_file,delimiter=';')
translation_fr_list = translation_list.Français
translation_en_list = translation_list.Anglais
header = ['key', 'message', 'comment', 'context']
additional_data = []
for each_title in title_list:
found = False
lineNb = 0
for each_translation in translation_fr_list:
# It's all right
if each_title == each_translation:
found = True
translation_en = translation_en_list[lineNb]
translation_fr = each_translation
for element in data.find_all():
if 'title' in element.attrs and element['title'] == translation_fr:
# Set the title in the XML file
element['title'] = translation_en
# Add couple of translation in the additional data's list
additional_data.append([translation_en,translation_fr,None,None])
continue
lineNb = lineNb + 1
# Else
if found == False:
missing_translations.append(each_title)
# Load the CSV file using pandas
df = pd.read_csv(final_translations_file)
# Create a DataFrame from additional_data
additional_df = pd.DataFrame(additional_data, columns=header)
# Add the news datas at the end of the existing DataFrame
updated_df = pd.concat([df, additional_df], ignore_index=True)
# Save the updated DataFrame in the CSV file
updated_df = updated_df.applymap(lambda x: ' '.join(x.strip().split()) if isinstance(x, str) else x)
updated_df.to_csv(final_translations_file, index=False, quoting=csv.QUOTE_NONE, escapechar=' ')
# Load the CSV file using pandas
df = pd.read_csv(final_translations_file, sep=',', skipinitialspace=True)
# Function to check if a string begins and ends by double quotes
def has_quotes(s):
return s.startswith('"') and s.endswith('"')
# Explore the lines of the DataFrame
for index, row in df.iterrows():
# Check if the two first columns doesn't already have double quotes
if not has_quotes(row[0]) and not has_quotes(row[1]):
# Modify this line
df.iloc[index, :2] = '"' + df.iloc[index, :2] + '"'
# Save the modified DataFrame as new csv file
df.to_csv(final_translations_file, index=False, quoting=csv.QUOTE_NONE, escapechar=' ')
# Open again the file in read/write mode
with open(final_translations_file, "r+") as f:
# Read the current content of the file
content = f.read()
# Find the first line's position (up to the first occurence of \n)
first_line_end = content.find("\n") + 1
# Extract the first line and check if it contains double quotes
first_line = content[:first_line_end]
if not has_quotes(first_line):
modified_first_line = '"' + '","'.join(first_line.strip().split(",")) + '"\n'
# Replace the first line by the modified
f.seek(0)
f.write(modified_first_line)
f.write(content[first_line_end:])
# Replace the content of the XML original XML file by the modified content
modified_xml_file = data.prettify()
with open(xml_file, 'w') as f:
f.write(modified_xml_file)
# Write the missing translations in a file
with open('missing_translations.txt','w') as f:
missing_translations.insert(0,xml_file+":\n") # Add the name of the file at the begining of the missing translations's list
missing_translations_string = '\n'.join(missing_translations) # Tranform the list to string
f.write(missing_translations_string)
没关系,但在第二个CSV文件(应用程序内部使用的CSV文件)中的每个单词之间存在不需要的空格,例如:
"key","message","comment","context"
"Code","Code",,
"Auction, Mission","Vente, Mission",,
"Activity Type","Type activité",,
我尝试了很多解决方案,在使用split()函数创建文件后删除每个空格,但它不起作用...
你能帮帮我吗?
谢谢你,谢谢!
1条答案
按热度按时间fhity93d1#
我不能读完所有的代码,看看哪里可以插入或不可以插入额外的空格。
相反,我建议你重新考虑你的程序的结构。除了不使用Pandas和使逻辑/循环更简单之外,我建议将程序分成不同的部分:
1.创建法语→英语查找字典
1.迭代XML并以翻译和缺失结束
1.写出已翻译和缺失的文件
1.修改XML,然后将其写出
可能最大的变化将是使用一个字典来存储你已经拥有的翻译。
从这个translations.csv文件开始:
创建一个字典,其中每个键是法语单词,其值是英语单词:
现在,您可以对XML的元素循环一次,并根据翻译器指令检查每个标题。我只使用BeautifulSoup来处理破碎/混乱的HTML;对于有效的XML,我喜欢标准库的ElementTree模块。我还添加了自己的逻辑,如果元素没有标题(跳过并移动到next),或者如果翻译不存在(追加到missing,然后移动到next)。
从这个input.xml文件开始:
看不见的和翻译:
将它们写入自己的文件看起来很简单:
要更改XML并保存它,我开始使用的ElememtTree类使此操作变得简单。我将复制粘贴前面的XML循环代码,删除缺失的和翻译的列表,并添加关键的
elem.set("title", english)
调用,用英语翻译替换title属性:这将生成最终的output.xml文件:
如果你想删除没有标题或无法翻译的元素,可以使用
root.remove(elem)
从其父元素(根元素)中删除该元素: