使用 Python Sound-Device 进行实时音频处理不起作用

问题描述

我想从我的麦克风录制音频，并通过语音转文本 API（几乎）实时转录。我可以使用的 STT API 是 Vocapia 的 VoxSigma。它允许我发送带有语音记录的 .wav 文件并在几秒钟内接收带有脚本的 XML 文件：

def vocapia(wavfile):
    cmd = "curl -ksS -u password REST-URL -F method=vrbs_trans -F " \
          "model=eng -F audiofile=@" + wavfile + " > ../resources/static/XMLs/dynamic_recording.xml"
    os.system(cmd)
    try:
        # parse xml document to retrieve transcript
        mydoc = minidom.parse("../resources/static/XMLs/dynamic_recording.xml")
        words = mydoc.getElementsByTagName('Word')
        sentence = ""
        for elem in words:
            sentence = sentence + elem.firstChild.data[1:]
        return sentence

    # catch "xml.parsers.expat.ExpatError: no element found: line 1,column 0" 
    # -> empty xml means nothing was transcribed
    except ExpatError:
        print("nothing was transcribed yet. Resuming...")
        return ""

问题是我使用 python sound-device 的方法不断写入 .wav 文件，并且似乎只有在录音完成后才保存它。因此，我似乎无法实时访问录音。如果我在录音时将 .wav 文件发送到 Vocapia，则不会转录任何内容（.wav 文件为空）。这是处理录音的代码：

# Thread Function for parallel STT
def thread_func(text):
    
    while(True):
        
        time.sleep(5)  # give time for some dialogue to happen
        new_text = text + vocapia("recording.wav")
        print("new_text")

# sound-device file and queue
wav_file = "recording.wav"
q = queue.Queue()

def callback(indata,frames,time,status):
    if status:
        print(status,file=sys.stderr)
    q.put(indata.copy())

try:
    # delete any prior recordings
    os.remove(wav_file)

    device_info = sd.query_devices(0,'input')
    # soundfile expects an int,sounddevice provides a float:
    samplerate = int(device_info['default_samplerate'])

    # start thread that handles the stt by sending wav_file to vocapia
    th = threading.Thread(target=thread_func,args=(session_text,))
    th.start()

    with sf.soundFile(wav_file,mode='x',samplerate=samplerate,channels=1) as file:
        with sd.InputStream(samplerate=samplerate,callback=callback,channels=1):
            print('Started recording. Press Ctrl+C to stop the recording.')
            while True:
                file.write(q.get())

except KeyboardInterrupt:

    print('\nRecording finished.

运行此代码，我将每五秒接收并捕获“xml.parsers.expat.ExpatError: no element found: line 1,column 0”错误。不过，只要我停止录制循环，我就可以运行 vocapia(wav_file) 并获得完整的成绩单。

关于我可以做些什么来使其正常工作的任何想法？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

audio-processing python python-sounddevice speech-to-text wav wav

使用 Python Sound-Device 进行实时音频处理不起作用

问题描述

解决方法

相关问答