如何将 S3 存储桶中的音频文件直接提供给 Google 语音转文本

问题描述

我们正在使用 Google 的语音转文本 API 开发语音应用程序。现在我们的数据（音频文件）存储在 AWS 上的 S3 存储桶中。有没有办法将 S3 URI 直接传递给 Google 的语音转文本 API？

从他们的文档来看，目前这在 Google 的语音转文本 API 中似乎是不可能的

他们的愿景和 NLP API 并非如此。

您知道为什么对语音 API 有这种限制吗？
对此有什么好的解决方法？

解决方法

目前，Google only allows 音频文件来自您的本地来源或来自 Google 的云存储。相关文档没有给出合理的解释。

传递由 URI 引用的音频 更常见的是，您将在语音请求的音频字段中传递一个 uri 参数，指向位于 Google Cloud Storage 上的音频文件（二进制格式，而不是 base64）

我建议您将文件移至 Google Cloud Storage。如果您不想，有一个很好的解决方法：将 Google Cloud Speech API 与流式 API 结合使用。您不需要在任何地方存储任何东西。您的语音应用程序提供来自任何麦克风的输入。如果您不知道如何处理来自麦克风的输入，请不要担心。

Google 提供了一个 sample code 来完成这一切：

# [START speech_transcribe_streaming_mic]
from __future__ import division

import re
import sys

from google.cloud import speech

import pyaudio
from six.moves import queue

# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10)  # 100ms


class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""

    def __init__(self,rate,chunk):
        self._rate = rate
        self._chunk = chunk

        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True

    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,channels=1,rate=self._rate,input=True,frames_per_buffer=self._chunk,# Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests,etc.
            stream_callback=self._fill_buffer,)

        self.closed = False

        return self

    def __exit__(self,type,value,traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()

    def _fill_buffer(self,in_data,frame_count,time_info,status_flags):
        """Continuously collect data from the audio stream,into the buffer."""
        self._buff.put(in_data)
        return None,pyaudio.paContinue

    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data,and stop iteration if the chunk is None,indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]

            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            yield b"".join(data)


def listen_print_loop(responses):
    """Iterates through server responses and prints them.
    The responses passed is a generator that will block until a response
    is provided by the server.
    Each response may contain multiple results,and each result may contain
    multiple alternatives; for details,see the documentation.  Here we
    print only the transcription for the top alternative of the top result.
    In this case,responses are provided for interim results as well. If the
    response is an interim one,print a line feed at the end of it,to allow
    the next result to overwrite it,until the response is a final one. For the
    final one,print a newline to preserve the finalized transcription.
    """
    num_chars_printed = 0
    for response in responses:
        if not response.results:
            continue

        # The `results` list is consecutive. For streaming,we only care about
        # the first result being considered,since once it's `is_final`,it
        # moves on to considering the next utterance.
        result = response.results[0]
        if not result.alternatives:
            continue

        # Display the transcription of the top alternative.
        transcript = result.alternatives[0].transcript

        # Display interim results,but with a carriage return at the end of the
        # line,so subsequent lines will overwrite them.
        #
        # If the previous result was longer than this one,we need to print
        # some extra spaces to overwrite the previous result
        overwrite_chars = " " * (num_chars_printed - len(transcript))

        if not result.is_final:
            sys.stdout.write(transcript + overwrite_chars + "\r")
            sys.stdout.flush()

            num_chars_printed = len(transcript)

        else:
            print(transcript + overwrite_chars)

            # Exit recognition if any of the transcribed phrases could be
            # one of our keywords.
            if re.search(r"\b(exit|quit)\b",transcript,re.I):
                print("Exiting..")
                break

            num_chars_printed = 0


def main():
    language_code = "en-US"  # a BCP-47 language tag

    client = speech.SpeechClient()
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=RATE,language_code=language_code,)

    streaming_config = speech.StreamingRecognitionConfig(
        config=config,interim_results=True
    )

    with MicrophoneStream(RATE,CHUNK) as stream:
        audio_generator = stream.generator()
        requests = (
            speech.StreamingRecognizeRequest(audio_content=content)
            for content in audio_generator
        )

        responses = client.streaming_recognize(streaming_config,requests)

        # Now,put the transcription responses to use.
        listen_print_loop(responses)


if __name__ == "__main__":
    main()
# [END speech_transcribe_streaming_mic]

依赖项是 google-cloud-speech 和 pyaudio

对于 AWS S3，您可以在从 Google Speech API 获取成绩单之前/之后将音频文件存储在那里。流式传输速度也非常快。

并且不要忘记包含您的凭据。您需要先通过提供 GOOGLE_APPLICATION_CREDENTIALS

获得授权

amazon-s3 google-cloud-speech google-speech-api speech-recognition speech-to-text

如何将 S3 存储桶中的音频文件直接提供给 Google 语音转文本

问题描述

解决方法

相关问答