使用 SpeechRecognition identify_google() 从 wav 文件截断的语音到文本输出

问题描述

我有 wav 格式的语音音频文件,每个文件 60 秒。但是,输出会被截断,并且只捕获了大约 15% 的长度。我已经在我本地的 Jupyter Notebook 和 Google Colab 中尝试过这个。根据文档,此请求低于 API 的阈值。我做错了什么,或者如何绕过这个限制?

# select a recognizer session
# recognize_google() : Google Web Speech API
r = sr.Recognizer()

interview = sr.AudioFile('sample.wav')
with interview as source:
  print('Ready...')
  r.pause_threshold = 2
  audio = r.record(source,duration=60)

type(audio)
transcription = r.recognize_google(audio,language='en_CA')
print(transcription)

解决方法

尝试使用此代码,如果输出仍然与旧的相同,您可以识别 try 和 except 块或更改 pause_threshold

import speech_recognition as sr
r = sr.Recognizer()

with sr.AudioFile("sample.wav") as source:
    print("Ready")
    r.pause_threshold = 0.6 
    audio = r.record(source)
try:
    s = r.recognize_google(audio)
    print("Text: "+s)
except sr.UnknownValueError:
    print("Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Error {0}".format(e))