如何在python中连续运行语音识别时执行基本文本处理

问题描述

我目前正在使用Microsoft Azure从实时语音识别中获取转录的文本。使用转录的文本,我将其放入TextRank中以从语音流中提取关键字。但是,当我运行此代码时,运行TextRank代码时会丢失很多语音识别功能。有没有一种方法可以连续运行语音识别,同时将转录结果传递给下一个过程,同时处理TextRank关键字提取,这样我就不会丢失任何语音并提取关键字?

def from_mic():
    speech_config = speechsdk.SpeechConfig(subscription="",region="")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

    # print("Speak into your microphone.")
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)
    return result.text

for i in range(1,10):
    transcript = from_mic()
    summa_keywords = summa_keyword_extractor.keywords(transcript,ratio=1.0)
    print(summa_keywords)

解决方法

您需要设置两个并行进程,但与一个任务队列互连。

这是因为您依赖于记录器进程的提取。

这是尝试实现此目的的一种方法(显然它没有打磨,可以进一步改进):

def recorder_process(recorder_queue,extractor_queue):
  speech_config = speechsdk.SpeechConfig(subscription="",region="")
  speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

  while True:
    request = recorder_queue.get()
    result = speech_recognizer.recognize_once_async().get()
    extractor_queue.put(result.text)

def extractor_process(extractor_queue,results_queue):
  while True:
    transcript = extractor_queue.get()
    summa_keywords = summa_keyword_extractor.keywords(transcript,ratio=1.0)
    results_queue.put({'transcript': transcript,'keywords': summa_keywords})

if __name__ == "__main__":
    # Connect to remote host over TCP
    client = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    client.connect((HOST,PORT))

    # Set up a Queue to pass data to the update process,and another one
    # for the two children to communicate
    recorder_queue = Queue()
    extractor_queue = Queue()
    results_queue = Queue()

    # Create two child processes,pass a reference to the Queue to each
    recorder = Process(target=recorder_process,args=(recorder_queue,extractor_queue))
    extractor = Process(target=extractor_process,args=(extractor_queue,results_queue))

    recorder.start()
    extractor.start()

    index = 0
    while True:
      recorder_queue.put(index)
      index += 1
      sleep(1)

    recorder.join()
    extractor.join()

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...