向Node.js中的OpenAI Whisper API提供File对象的正确方法是什么?

jrcvhitl  于 2023-06-29  发布在  Node.js
关注(0)|答案(1)|浏览(195)

假设我想从一个URL下载一个文件,然后调用whisper API来转录它。
我会像documentation建议的那样做:

const resp = await openai.createTranscription(
  fs.createReadStream("audio.mp3"),
  "whisper-1"
);

在我实现中:

public static async transcribeFromPublicUrl({ url, format }: { url: string; format: string }) {
    const now = new Date().toISOString();
    const filePath = `${this.tmpdir}/${now}.${format}`;
    try {
      const response = await axios.get<Stream>(url, {
        responseType: 'stream',
      });
      const fileStream = fs.createWriteStream(filePath);
      response.data.pipe(fileStream);

      await new Promise((resolve, reject) => {
        fileStream.on('finish', resolve);
        fileStream.on('error', reject);
      });

      const transcriptionResponse = await 
      this.openai.createTranscription(fs.readFileSync(filePath), 'whisper');
      return { success: true, response: transcriptionResponse };
    } catch (error) {
      console.error('Failed to download the file:', error);
      return { success: false, error: error };
    }
  }

但是,这会导致以下错误:

Argument of type 'Buffer' is not assignable to parameter of type 'File'.
  Type 'Buffer' is missing the following properties from type 'File': lastModified, name, webkitRelativePath, size, and 5 more.ts(2345)

好吧,没什么大不了的,让我们把Buffer转换成文件:

...
 const file = new File([fs.readFileSync(filePath)], now, { type: `audio/${format}` });
 const transcriptionResponse = await this.openai.createTranscription(file, 'whisper');
...

虽然这不会抛出任何类型脚本错误,但javascript File api is not available from node.js
进一步研究,我发现openai库需要将File类型作为参数传递:

/**
     *
     * @summary Transcribes audio into the input language.
     * @param {File} file The audio file to transcribe, in one of these formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.
     * @param {string} model ID of the model to use. Only &#x60;whisper-1&#x60; is currently available.
     * @param {string} [prompt] An optional text to guide the model\\\&#39;s style or continue a previous audio segment. The [prompt](/docs/guides/speech-to-text/prompting) should match the audio language.
     * @param {string} [responseFormat] The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
     * @param {number} [temperature] The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use [log probability](https://en.wikipedia.org/wiki/Log_probability) to automatically increase the temperature until certain thresholds are hit.
     * @param {string} [language] The language of the input audio. Supplying the input language in [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) format will improve accuracy and latency.
     * @param {*} [options] Override http request option.
     * @throws {RequiredError}
     * @memberof OpenAIApi
*/
createTranscription(file: File, model: string, prompt?: string, responseFormat?: string, temperature?: number, language?: string, options?: AxiosRequestConfig): Promise<import("axios").AxiosResponse<CreateTranscriptionResponse, any>>;

所以我来总结一下:我没有访问NodeJs上的File API的权限,但我应该为openai库提供一个File?

c9x0cxw0

c9x0cxw01#

我不认为createTranscription是按照docs的例子调用的。比较:
createTranscription(fs.readFileSync(filePath), 'whisper');
对比
createTranscription(fs.createReadStream("audio.mp3"), "whisper-1")
调整下面的行应该可以解决这个问题(至少今天对我来说是有效的):

const transcriptionResponse = await this.openai.createTranscription(
  fs.readFileSync(filePath),
  'whisper'
);

=>

const transcriptionResponse = await this.openai.createTranscription(
  fs.createReadStream(filePath),
  'whisper-1'
);

由于某种原因,即使fs.createReadStream方法返回fs.ReadStream类,这种方法也能工作。
注意型号名称的更正。

相关问题