替换python中的发音符号

aydmsdu9  于 2023-02-06  发布在  Python
关注(0)|答案(3)|浏览(141)

如何用规范化的单词替换srt文件中的发音符号?
我有一个罗马尼亚的srt,我想用jellyfin播放一部电影,但是我遇到了一个问题,这个应用程序不支持像ĂăÂâÎîȘșȚț这样的特殊字符,所以我想把它们去掉。
我尝试使用unidecode,但是单词被奇怪地替换为ț-〉thș-〉o
'我也尝试过只使用sed来替换字符,但一些字符(如ș)显示为º,因此以下函数无法替换它们:

def strip_accents(s):
    d = 'ĂăÂâÎîȘșȚț'
    n = 'AaAaIiSsTt'
    dl = [i for i in d]
    nl = [i for i in n]
    ii = 0
    for x in dl:
        s = re.sub(x, nl[ii], s)
        ii += 1
    return s
rks48beu

rks48beu1#

对于字幕,先用记事本打开,然后使用保存为,在此步骤中Encoding:UTF-8然后保存



你可以这个代码列表:

import re

def strip_accents():
   d = 'ĂăÂâÎîȘșȚț'
   n = 'AaAaIiSsTt'
   dl = [i for i in d]
   # print(dl)
   nl = [i for i in n]
   # print(nl)
   new_list = []
   i =0
   for string in dl:
      new_string = string. replace(dl[i], nl[i])
      new_list. append(new_string)
      # print(new_strings)
      i +=1
   return new_list  

print(strip_accents())
vcirk6k6

vcirk6k62#

所以,多亏了github托管的repo,我找到了一种方法来做我想要的事情。我只是简单地用正确的发音符号șȘ替换了奇怪的字符º,ª,然后一切都按照我想要的方式工作了。我不需要替换发音符号来让字幕工作。
https://github.com/pckltr/corecteaza-subtitrari/
Python代码:

def fix_accents(s):
    char_dict = { "º": "ș", "ª": "Ș", "ş": "ș", "Ş": "Ș", "ţ": "ț", "Ţ": "Ț", "þ": "ț", "Þ": "Ț", "ã": "ă"  }
    for k,v in char_dict.items():
        s = s.replace(k, v)
    return s
8iwquhpp

8iwquhpp3#

我知道这是一个老帖子,但是对于那些想要这种脚本的人,可以同时运行多个srt文件,我做了一个使用os和pysrt的python脚本。
首先你需要在终端中运行pip install pysrt,然后创建一个python文件并放入包含你想要修改的srt文件的文件夹中,然后你只需运行它,它就可以工作了!

import pysrt
import os

def change_charset(file_name):
    # open the file with the encoding because sometimes you get an error
    subs = pysrt.open(file_name, encoding='iso-8859-1')

    dictionar = {"º": "s", "ª": "S", "ş": "s", "Ş": "S",
                 "ţ": "t", "Ţ": "T", "þ": "t", "Þ": "T", "ã": "a", "Ã": "A",
                 "õ": "o", "Õ": "O", "ç": "c", "Ç": "C", "ñ": "n", "Ñ": "N",
                 "á": "a", "Á": "A", "é": "e", "É": "E", "í": "i", "Í": "I",
                 "ó": "o", "Ó": "O", "ú": "u", "Ú": "U", "ý": "y", "Ý": "Y",
                 "à": "a", "À": "A", "è": "e", "È": "E",
                 "Î": "I", "î": "i", "Â": "A", "â": "a"} 
 # dictionary with the characters that need to be changed

    for sub in subs:  # loop through the subtitles
        for key, value in dictionar.items():  # loop through the dictionary
            # replace the characters with the new ones
            sub.text = sub.text.replace(key, value)
    # save the file with the new encoding
    subs.save(file_name, encoding="utf-8")
    # print the name of the file that was saved
    print(f"Done saving file {file_name}")

def main():
    for items in os.listdir():  # loop through the files in the directory
        if items.endswith(".srt"):  # check if the file is a subtitle file
            change_charset(items)  # call the function to change the charset

if __name__ == "__main__":
    main()

我这样做的目的是为了易于使用,但是您可以自己指定路径并将文件放在单个目录中

相关问题