问题描述
我想用 nodejs 和谷歌语音到文本 api 制作一个实时转录应用程序。
我正在使用 RecordRTC 和 socket.io 将音频块发送到后端服务器。目前我正在录制 1 s 长的块并且转录工作但它不将其视为流,它在处理每个块后发送响应。这意味着我得到了一半的句子,谷歌无法使用上下文来帮助自己识别语音。
我的问题是如何让谷歌将我的块视为连续流。或者是否有另一种解决方案可以达到相同的结果? (这是现场转录麦克风音频,或非常接近现场)。
Google 在他们的网站上有一个演示,它完全符合我的要求,所以应该可以做到。
我的代码:(主要来自selfservicekiosk-audio-streaming repo)
ss 是 socket.io-stream
服务端
io.on("connect",(socket) => {
socket.on("create-room",(data,cb) => createRoom(socket,data,cb))
socket.on("disconnecting",() => exitFromroom(socket))
// getting the stream,it gets called every 1s with a blob
ss(socket).on("stream-speech",async function (stream: any,data: any) {
const filename = path.basename("stream.wav")
const writeStream = fs.createWriteStream(filename)
stream.pipe(writeStream)
speech.speechStreamToText(
stream,async function (transcribeObj: any) {
socket.emit("transcript",transcribeObj.transcript)
}
)
})
async speechStreamToText(stream: any,cb: Function) {
sttRequest.config.languageCode = "en-US"
sttRequest = {
config: {
sampleRateHertz: 16000,encoding: "WEBM_OPUS",enableAutomaticpunctuation: true,},singleUtterance: false,}
const stt = speechToText.SpeechClient()
//setup the stt stream
const recognizeStream = stt
.streamingRecognize(sttRequest)
.on("data",function (data: any) {
//this gets called every second and I get transciption chunks which usually have close to no sense
console.log(data.results[0].alternatives)
})
.on("error",(e: any) => {
console.log(e)
})
.on("end",() => {
//this gets called every second.
console.log("on end")
})
stream.pipe(recognizeStream)
stream.on("end",function () {
console.log("socket.io stream ended")
})
}
客户端
const sendBinaryStream = (blob: Blob) => {
const stream = ss.createStream()
ss(socket).emit("stream-speech",stream,{
name: "_temp/stream.wav",size: blob.size,})
ss.createBlobReadStream(blob).pipe(stream)
}
useEffect(() => {
let recorder: any
if (activeChat) {
navigator.mediaDevices.getUserMedia({ audio: true,video: false }).then((stream) => {
streamRef.current = stream
recorder = new RecordRTC(stream,{
type: "audio",mimeType: "audio/webm",sampleRate: 44100,desiredSampleRate: 16000,timeSlice: 1000,numberOfAudioChannels: 1,recorderType: StereoAudioRecorder,ondataavailable(blob: Blob) {
sendBinaryStream(blob)
},})
recorder.startRecording()
})
}
return () => {
recorder?.stopRecording()
streamRef.current?.getTracks().forEach((track) => track.stop())
}
},[])
感谢任何帮助!
解决方法
我也有同样的问题!
也许谷歌官方演示正在使用 node-record-lpcm16 和 SoX:https://cloud.google.com/speech-to-text/docs/streaming-recognize?hl=en