谷歌语音到文本的实时转录

问题描述

我想用 nodejs 和谷歌语音到文本 api 制作一个实时转录应用程序。

我正在使用 RecordRTC 和 socket.io 将音频块发送到后端服务器。目前我正在录制 1 s 长的块并且转录工作但它不将其视为流，它在处理每个块后发送响应。这意味着我得到了一半的句子，谷歌无法使用上下文来帮助自己识别语音。

我的问题是如何让谷歌将我的块视为连续流。或者是否有另一种解决方案可以达到相同的结果？（这是现场转录麦克风音频，或非常接近现场）。

Google 在他们的网站上有一个演示，它完全符合我的要求，所以应该可以做到。

我的代码：（主要来自selfservicekiosk-audio-streaming repo）

ss 是 socket.io-stream

服务端

io.on("connect",(socket) => {
        socket.on("create-room",(data,cb) => createRoom(socket,data,cb))
        socket.on("disconnecting",() => exitFromroom(socket))

        // getting the stream,it gets called every 1s with a blob
        ss(socket).on("stream-speech",async function (stream: any,data: any) {

            const filename = path.basename("stream.wav")
            const writeStream = fs.createWriteStream(filename)
           
            stream.pipe(writeStream)
            speech.speechStreamToText(
                stream,async function (transcribeObj: any) {
                    socket.emit("transcript",transcribeObj.transcript)
                }
            )
        })

async speechStreamToText(stream: any,cb: Function) {
        sttRequest.config.languageCode = "en-US"

        sttRequest = {
            config: {
                sampleRateHertz: 16000,encoding: "WEBM_OPUS",enableAutomaticpunctuation: true,},singleUtterance: false,}

        const stt = speechToText.SpeechClient()
        //setup the stt stream
        const recognizeStream = stt
            .streamingRecognize(sttRequest)
            .on("data",function (data: any) {
                //this gets called every second and I get transciption chunks which usually have close to no sense
                console.log(data.results[0].alternatives)
            })
            .on("error",(e: any) => {
                console.log(e)
            })
            .on("end",() => {
                //this gets called every second. 
                console.log("on end")
            })

        stream.pipe(recognizeStream)
        stream.on("end",function () {
            console.log("socket.io stream ended")
        })
    }

客户端

const sendBinaryStream = (blob: Blob) => {
    const stream = ss.createStream()
    ss(socket).emit("stream-speech",stream,{
        name: "_temp/stream.wav",size: blob.size,})
    ss.createBlobReadStream(blob).pipe(stream)
}

useEffect(() => {
        let recorder: any
        if (activeChat) {
            navigator.mediaDevices.getUserMedia({ audio: true,video: false }).then((stream) => {
                streamRef.current = stream
                recorder = new RecordRTC(stream,{
                    type: "audio",mimeType: "audio/webm",sampleRate: 44100,desiredSampleRate: 16000,timeSlice: 1000,numberOfAudioChannels: 1,recorderType: StereoAudioRecorder,ondataavailable(blob: Blob) {
                        sendBinaryStream(blob)
                    },})
                recorder.startRecording()
            })
        }
        return () => {
            recorder?.stopRecording()
            streamRef.current?.getTracks().forEach((track) => track.stop())
        }
    },[])

感谢任何帮助！

解决方法

我也有同样的问题！

也许谷歌官方演示正在使用 node-record-lpcm16 和 SoX：https://cloud.google.com/speech-to-text/docs/streaming-recognize?hl=en

blob google-speech-api socket.io-stream stream