如何使用 IBM Speech to Text 进行说话人分类?

问题描述

我正在尝试使用 IBM 语音到文本执行说话者分类。我正在通过 API 发送我的音频文件,并且得到的结果为 JSON 格式,如下所示。

{
  "results": [
    {
      "alternatives": [
        {
          "timestamps": [
            [
              "hello",0.68,1.19
            ],[
              "yeah",1.47,1.91
            ],1.96,2.12
            ],[
              "how's",2.12,2.59
            ],[
              "Billy",2.59,3.17
            ],[
              "good",4.01,4.30
            ]
          ]
          "confidence": 0.82,"transcript": "hello yeah yeah how's Billy good "
        }
      ],"final": true
    }
  ],"result_index": 0,"speaker_labels": [
    {
      "from": 0.68,"to": 1.19,"speaker": 2,"confidence": 0.52,"final": false
    },{
      "from": 1.47,"to": 1.93,"speaker": 1,"confidence": 0.62,{
      "from": 1.96,"to": 2.12,"confidence": 0.51,{
      "from": 2.12,"to": 2.59,{
      "from": 2.59,"to": 3.17,{
      "from": 4.01,"to": 4.30,"confidence": 0.63,"final": true
    }
  ]
}

但我想要这种格式 ->

Speaker 2 - "Hello?"
Speaker 1 - "Yeah?"
Speaker 2 - "Yeah,how's Billy?"
Speaker 1 - "Good."

有什么方法可以给我这种格式的结果还是我必须编写自己的代码? 这是我的代码

with open('/content/test.mp3','rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,content_type='audio/mp3',word_alternatives_threshold=0.9,speaker_labels = True
    ).get_result()
print(json.dumps(speech_recognition_results,indent=2))

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)