csv 无法使用Google TTS生成非拉丁字符

kqlmhetl  于 2023-09-27  发布在  Go
关注(0)|答案(1)|浏览(115)

我已经创建了一个python脚本来使用.csv作为数据源生成音频。生成英语/西班牙语音频时已验证脚本,但我无法在Telegu中生成单词。
我的.csv文件是utf-8格式的,我的脚本指定utf-8,但是当我尝试运行脚本时,我总是收到以下错误消息:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

我的脚本显示在下面,如果有人可以建议我可能做错了什么?
我也试过在我的脚本中插入泰卢固语字符的Unicode转义序列,但也不起作用。
我的代码如下:

import os

# Set the GOOGLE_APPLICATION_CREDENTIALS environment variable
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"C:\MYJSON.json"

import csv
from google.cloud import texttospeech_v1
from pydub import AudioSegment
import io

# Initialize Google Cloud TTS client
client = texttospeech_v1.TextToSpeechClient()

# Initialize an empty audio segment
final_audio = AudioSegment.empty()

# Read CSV and Generate Audio
with open("C:\\testscript.csv", 'r', newline='', encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile)  # Read the CSV as a regular reader
    next(reader)  # Skip the header row
    
    for row in reader:
        phrase = row[0]  # Access the first column (Column A) for the phrase
        lang = row[1]    # Access the second column (Column B) for the language

        if phrase and lang:
            # Determine the language code based on the language specified in the CSV
            if lang == 'English':
                lang_code = 'en-US'
                voice_name = 'en-US-Wavenet-D'
            elif lang == 'Telugu':
                lang_code = 'te-IN'
                voice_name = 'te-IN-Standard-B'
            else:
                continue  # Skip rows with unrecognized languages

            # Generate and append audio
            synthesis_input = texttospeech_v1.SynthesisInput(text=phrase)
            voice = texttospeech_v1.VoiceSelectionParams(language_code=lang_code, name=voice_name)
            audio_config = texttospeech_v1.AudioConfig(audio_encoding=texttospeech_v1.AudioEncoding.MP3)
            response = client.synthesize_speech(input=synthesis_input, voice=voice, audio_config=audio_config)
            audio = AudioSegment.from_mp3(io.BytesIO(response.audio_content))
            final_audio += audio + AudioSegment.silent(duration=3000)

# Save Final Audio
final_audio.export("C:\\audio2.wav", format="wav")
ruyhziif

ruyhziif1#

您遇到的错误消息,**“SyntaxError:(unicode error)'unicodeescape' codec无法解码位置2-3的字节:truncated \UXXXXbash escape "**是一个常见的Python错误,与如何解释字符串字面量以及如何在字符串表示中处理反斜杠有关。
要解决此问题,您可以使用双反斜杠\\\\而不是单反斜杠\\来转义文件路径中的反斜杠。下面是修改后的行:

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"C:\\MYJSON.json"

相关问题