Google Cloud Speech-to-Text API -将JSON格式转换为纯文本，无变量

b1zrtrql 于 2023-11-20 发布在 Go

关注(0)|答案(1)|浏览(137)

我使用Google Cloud Speech-to-Text API将音频文件（采访）转换为文本，效果非常好，尽管我在使用JSON输出时遇到了困难。
由于我只需要成绩单结果（“好吧，我要读给你听，开头的问题。”），我想知道是否有一种简单的方法来删除变量：“words”，“endTime”，“startTime”和“word”？
有谁知道一个简单的方法吗？也许用python？

"results": [ {
    "alternatives": [ {
      "transcript": "Okay, I'm going to read you, the opening question.",
      "words": [ {
        "endTime": "1.800s",
        "startTime": "1.300s",
        "word": "Okay,"
      }, {
        "endTime": "2.800s",
        "startTime": "1.800s",
        "word": "I'm"
      }, {
        "endTime": "3s",
        "startTime": "2.800s",
        "word": "going"
      }, {
        "endTime": "3.100s",
        "startTime": "3s",
        "word": "to"
      }, {
        "endTime": "3.300s",
        "startTime": "3.100s",
        "word": "read"
      }, {
        "endTime": "4.300s",
        "startTime": "3.300s",
        "word": "you"
      }, {
        "endTime": "4.400s",
        "startTime": "4.300s",
        "word": "the"
      }, {
        "endTime": "6s",
        "startTime": "4.400s",
        "word": "opening"
      }, {
        "endTime": "6.200s",
        "startTime": "6s",
        "word": "question."

字符串
提前谢谢你马特
我没有找到任何解决方案，因为我在数据格式方面的经验很少。

JSON

来源：https://stackoverflow.com/questions/74777241/google-cloud-speech-to-text-api-convert-json-format-in-plain-text-w-o-variable

1条答案

按热度按时间

3j86kqsm1#

我注意到，当我现在使用语音到文本API时，输出看起来略有不同：

{
    "results": [
        {
            "alternatives": [
                {
                    "transcript": "Okay, I'm going to read you, the opening question.",
                    "words": [
                        {
                            "endTime": "1.800s",
                            "startTime": "1.300s",
                            "word": "Okay,"
                        },
                        {
                            "endTime": "2.800s",
                            "startTime": "1.800s",
                            "word": "I'm"
                        },
                        {
                            "endTime": "3s",
                            "startTime": "2.800s",
                            "word": "going"
                        },
                        {
                            "endTime": "3.100s",
                            "startTime": "3s",
                            "word": "to"
                        },
                        {
                            "endTime": "3.300s",
                            "startTime": "3.100s",
                            "word": "read"
                        },
                        {
                            "endTime": "4.300s",
                            "startTime": "3.300s",
                            "word": "you"
                        },
                        {
                            "endTime": "4.400s",
                            "startTime": "4.300s",
                            "word": "the"
                        },
                        {
                            "endTime": "6s",
                            "startTime": "4.400s",
                            "word": "opening"
                        },
                        {
                            "endTime": "6.200s",
                            "startTime": "6s",
                            "word": "question."
                        }
                    ]
                }
            ],
            "languageCode": "en-US"
        },
        ...
    ]
}

字符串
您可以使用此Python脚本解析JSON并将输出保存到文本文件中：

import json

with open('your-json-file.json', 'r', encoding='utf-8') as json_file:
    data = json.load(json_file)

results = data['results']

output = ''

for resultIndex, result in enumerate(results):
    sentence = result['alternatives'][0]['transcript']
    startOffset = result['alternatives'][0]['words'][0]['startOffset']
    output += f"{startOffset}\t{sentence}\n"

with open('output.txt', 'w', encoding='utf-8') as output_file:
    output_file.write(output)

print('Output has been written to output.txt')

型
或者，如果您想将JSON文件转换为仅包含句子的JSON文件，则可以使用此Python脚本：

import json

with open('your-json-file.json', 'r', encoding='utf-8') as json_file:
    data = json.load(json_file)

results = data['results']

sentences = []

for result in results:
    transcript = result['alternatives'][0]['transcript']
    sentences.append(transcript)

output_data = json.dumps(sentences, ensure_ascii=False, indent=4)

with open('output.json', 'w', encoding='utf-8') as output_file:
    output_file.write(output_data)

print('Output has been written to output.json')

型

赞(0）回复(0）举报 2023-11-20

我来回答

Google Cloud Speech-to-Text API -将JSON格式转换为纯文本，无变量

1条答案

相关问题

热门标签

最新问答