如何结束Google语音转文本流的正常识别并获取待处理的文本结果?

问题描述

我希望能够结束Google语音转文本流(created with streamingRecognize),并获取待处理的SR(语音识别)结果。

简而言之,相关的Node.js代码

// create SR stream
const stream = speechClient.streamingRecognize(request);

// observe data event
const dataPromise = new Promise(resolve => stream.on('data',resolve));

// observe error event
const errorPromise = new Promise((resolve,reject) => stream.on('error',reject));

// observe finish event
const finishPromise = new Promise(resolve => stream.on('finish',resolve));

// send the audio
stream.write(audioChunk);

// for testing purposes only,give the SR stream 2 seconds to absorb the audio
await new Promise(resolve => setTimeout(resolve,2000));

// end the SR stream gracefully,by observing the completion callback
const endPromise = util.promisify(callback => stream.end(callback))();

// a 5 seconds test timeout
const timeoutPromise = new Promise(resolve => setTimeout(resolve,5000)); 

// finishPromise wins the race here
await Promise.race([
  dataPromise,errorPromise,finishPromise,endPromise,timeoutPromise]);

// endPromise wins the race here
await Promise.race([
  dataPromise,timeoutPromise]);

// timeoutPromise wins the race here
await Promise.race([dataPromise,timeoutPromise]);

// I don't see any data or error events,dataPromise and errorPromise don't get settled

我的经验是SR流成功结束,但是没有任何数据事件或错误事件。 dataPromiseerrorPromise都不会得到解决或拒绝。

我如何发信号通知音频结束,关闭SR流并仍然获得未决的SR结果?

我需要坚持使用streamingRecognize API,因为我正在播放的音频是实时的,即使它可能突然停止播放。

为澄清起见,只要我继续传输音频,它就可以工作,我确实会收到实时SR结果。但是,当我发送最终的音频块并像上面那样结束流时,我没有得到否则会期望的最终结果。

要获得最终结果,我实际上必须将流静保持几秒钟,这可能会增加ST费用。我觉得必须有更好的方法获取它们。

已更新:因此看来,结束streamingRecognize流的唯一正确时间是发生data事件,其中StreamingRecognitionResult.is_finaltrue时。同样,我们似乎希望继续播放音频,直到触发data事件为止,以便最终或临时获得所有结果。

这对我来说似乎是个错误,提交了issue

已更新:现在看来as a bug已得到确认。在修复之前,我正在寻找一种可能的解决方法

已更新:供以后参考,here is the list涉及streamingRecognize的当前和先前跟踪的问题。

我希望对于使用streamingRecognize的人来说这是一个普遍的问题,令人惊讶的是之前从未有过报道。也将其as a bug提交到issuetracker.google.com

解决方法

这:“ 我正在寻找潜在的解决方法。”-您是否考虑过从SpeechClient扩展为基类?我没有要测试的凭据,但是您可以使用自己的类从SpeechClient进行扩展,然后根据需要调用内部close()方法。 close()方法将关闭SpeechClient并解决未完成的Promise。

或者,您也可以Proxy SpeechClient()并根据需要进行拦截/响应。但是,由于您打算将其关闭,因此以下选项可能是您的解决方法。

const speech = require('@google-cloud/speech');

class ClientProxy extends speech.SpeechClient {
  constructor() {
    super();
  }
  myCustomFunction() {
    this.close();
  }
}

const clientProxy = new ClientProxy();
try {
  clientProxy.myCustomFunction();
} catch (err) {
  console.log("myCustomFunction generated error: ",err);
}
,

由于它是一个错误,所以我不知道这是否适合您,但是我使用了this.recognizeStream.end();。几次在不同的情况下都奏效。但是,我的代码有点不同...

此供稿可能适合您: https://groups.google.com/g/cloud-speech-discuss/c/lPaTGmEcZQk/m/Kl4fbHK2BQAJ

,

我很糟糕-毫不奇怪,这在我的代码中变成了模糊的竞态条件。

我整理了一个独立的示例,该示例可以正常工作(gist)。它帮助我找到了问题所在。希望它可以帮助其他人和我未来的自我:

// A simple streamingRecognize workflow,// tested with Node v15.0.1,by @noseratio

import fs from 'fs';
import path from "path";
import url from 'url'; 
import util from "util";
import timers from 'timers/promises';
import speech from '@google-cloud/speech';

export {}

// need a 16-bit,16KHz raw PCM audio 
const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)),"sample.raw");
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';

const request = {
  config: {
    encoding: encoding,sampleRateHertz: sampleRateHertz,languageCode: languageCode,},interimResults: false // If you want interim results,set this to true
};

// init SpeechClient
const client = new speech.v1p1beta1.SpeechClient();
await client.initialize();

// Stream the audio to the Google Cloud Speech API
const stream = client.streamingRecognize(request);

// log all data
stream.on('data',data => {
  const result = data.results[0];
  console.log(`SR results,final: ${result.isFinal},text: ${result.alternatives[0].transcript}`);
});

// log all errors
stream.on('error',error => {
  console.warn(`SR error: ${error.message}`);
});

// observe data event
const dataPromise = new Promise(resolve => stream.once('data',resolve));

// observe error event
const errorPromise = new Promise((resolve,reject) => stream.once('error',reject));

// observe finish event
const finishPromise = new Promise(resolve => stream.once('finish',resolve));

// observe close event
const closePromise = new Promise(resolve => stream.once('close',resolve));

// we could just pipe it: 
// fs.createReadStream(filename).pipe(stream);
// but we want to simulate the web socket data

// read RAW audio as Buffer
const data = await fs.promises.readFile(filename,null);

// simulate multiple audio chunks
console.log("Writting...");
const chunkSize = 4096;
for (let i = 0; i < data.length; i += chunkSize) {
  stream.write(data.slice(i,i + chunkSize));
  await timers.setTimeout(50);
}
console.log("Done writing.");

console.log("Before ending...");
await util.promisify(c => stream.end(c))();
console.log("After ending.");

// race for events
await Promise.race([
  errorPromise.catch(() => console.log("error")),dataPromise.then(() => console.log("data")),closePromise.then(() => console.log("close")),finishPromise.then(() => console.log("finish"))
]);

console.log("Destroying...");
stream.destroy();
console.log("Final timeout...");
await timers.setTimeout(1000);
console.log("Exiting.");

输出:

Writting...
Done writing.
Before ending...
SR results,final: true,text:  this is a test I'm testing voice recognition This Is the End
After ending.
data
finish
Destroying...
Final timeout...
close
Exiting.

要对其进行测试,需要一个16位/ 16KHz原始PCM音频文件。任意WAV文件无法按原样工作,因为它包含带有元数据的标头。