Google Speech API 会为某些 FLAC 文件返回空结果,而其他人则不会,尽管它们具有相同的编解码器和采样率

问题描述

下面的代码是我用来请求转录的。

import io
from google.cloud import speech_v1p1beta1 as speech
def transcribe_file(speech_file):
    """Transcribe the given audio file."""

    client = speech.SpeechClient()

    encoding = speech.RecognitionConfig.AudioEncoding.FLAC
    if os.path.splitext(speech_file)[1] == ".wav":
        encoding = speech.RecognitionConfig.AudioEncoding.LINEAR16
    with io.open(speech_file,"rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,sample_rate_hertz=32000,language_code="ja-JP",max_alternatives=3,enable_word_time_offsets=True,enable_automatic_punctuation=True,enable_word_confidence=True,)

    response = client.recognize(config=config,audio=audio)
    #print(speech_file,"Recognition Done")
    return response

正如我在标题中所写的,响应的结果对于某些文件有空列表,而对于某些文件则没有。 它们具有相同的采样率和编解码器(32000,FLAC)

以下是每种情况之一的 ffprobe -i "AUdioFILE" -show_streams 结果。

左边一个是空的。唯一的区别是文件的持续时间。

如何获得非空结果?

Result of ffprobe

编辑:

Result of ffprobe show stream show format

Something not captured in one screen

遗憾的是,re-mux 不起作用。

我使用了 ffmpeg-git-20210225

ffbrobe 损坏的结果

./ffprobe -show_streams -show_format broken.flac 
ffprobe version N-56320-ge937457b7b-static https://johnvansickle.com/ffmpeg/  copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 66.100 / 56. 66.100
  libavcodec     58.125.101 / 58.125.101
  libavformat    58. 68.100 / 58. 68.100
  libavdevice    58. 12.100 / 58. 12.100
  libavfilter     7.107.100 /  7.107.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Input #0,flac,from 'broken.flac':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:00.90,start: 0.000000,bitrate: 342 kb/s
  Stream #0:0: Audio: flac,32000 Hz,mono,s16
[STREAM]
index=0
codec_name=flac
codec_long_name=FLAC (Free Lossless Audio Codec)
profile=unkNown
codec_type=audio
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
sample_fmt=s16
sample_rate=32000
channels=1
channel_layout=mono
bits_per_sample=0
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/32000
start_pts=0
start_time=0.000000
duration_ts=28672
duration=0.896000
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=16
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
disPOSITION:default=0
disPOSITION:dub=0
disPOSITION:original=0
disPOSITION:comment=0
disPOSITION:lyrics=0
disPOSITION:karaoke=0
disPOSITION:forced=0
disPOSITION:hearing_impaired=0
disPOSITION:visual_impaired=0
disPOSITION:clean_effects=0
disPOSITION:attached_pic=0
disPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=broken.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0.000000
duration=0.896000
size=38362
bit_rate=342517
probe_score=100
TAG:encoder=Lavf58.45.100
[/FORMAT]

未损坏的 ffprobe 结果

./ffprobe -show_streams -show_format non_broken.flac 
ffprobe version N-56320-ge937457b7b-static https://johnvansickle.com/ffmpeg/  copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 66.100 / 56. 66.100
  libavcodec     58.125.101 / 58.125.101
  libavformat    58. 68.100 / 58. 68.100
  libavdevice    58. 12.100 / 58. 12.100
  libavfilter     7.107.100 /  7.107.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Input #0,from 'non_broken.flac':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:00.86,bitrate: 358 kb/s
  Stream #0:0: Audio: flac,s16
[STREAM]
index=0
codec_name=flac
codec_long_name=FLAC (Free Lossless Audio Codec)
profile=unkNown
codec_type=audio
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
sample_fmt=s16
sample_rate=32000
channels=1
channel_layout=mono
bits_per_sample=0
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/32000
start_pts=0
start_time=0.000000
duration_ts=27648
duration=0.864000
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=16
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
disPOSITION:default=0
disPOSITION:dub=0
disPOSITION:original=0
disPOSITION:comment=0
disPOSITION:lyrics=0
disPOSITION:karaoke=0
disPOSITION:forced=0
disPOSITION:hearing_impaired=0
disPOSITION:visual_impaired=0
disPOSITION:clean_effects=0
disPOSITION:attached_pic=0
disPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=non_broken.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0.000000
duration=0.864000
size=38701
bit_rate=358342
probe_score=100
TAG:encoder=Lavf58.45.100
[/FORMAT]

ffmpeg -f lavfi -i sine=d=0.864:r=32000 output.flac

的结果
ffmpeg version 3.4.8-0ubuntu0.2 copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  WARNING: library configuration mismatch
  avcodec     configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0,lavfi,from 'sine=d=0.864:r=32000':
  Duration: N/A,bitrate: 512 kb/s
    Stream #0:0: Audio: pcm_s16le,s16,512 kb/s
File 'output.flac' already exists. Overwrite ? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le (native) -> flac (native))
Press [q] to stop,[?] for help
Output #0,to 'output.flac':
  Metadata:
    encoder         : Lavf57.83.100
    Stream #0:0: Audio: flac,128 kb/s
    Metadata:
      encoder         : Lavc57.107.100 flac
[Parsed_sine_0 @ 0x55c317ddda00] EOF timestamp not reliable
size=      16kB time=00:00:00.86 bitrate= 154.0kbits/s speed= 205x    
video:0kB audio:8kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 99.364586%

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...