Google Speech API的最佳采样率是多少?有任何Google员工或专家要发表评论吗?

问题描述

到目前为止,我已经测试了一个非常小的16 kHz和48 kHz音频文件。我很想进行更大的测试,但是您知道这要花钱。

48 kHz采样率提供了更好的结果。但是,在文档中说最好是16 kHz

所以我有点困惑

这里是我用来通过Google Speech to Text API测试的16 kHz和48 kHz flac文件

16 kHz:https://drive.google.com/file/d/1MbiW3t86W68ZqENtDqD4XdNmEV7QZbZA/view?usp=sharing

48 kHz:https://drive.google.com/file/d/1aLN1ptMJBwuYc6FdAk6CxcK1Ex4jI3vh/view?usp=sharing

这里是产生的成绩单

16 kHz

Hello,dear students.

 Welcome to the lecture 1 of introduction to programming course.

 In this course,you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important cause of your Carriage. Why is that because in this course you will you will learn how to do

 Programming haftar called how to compose a software. So this is your most important lesson among all of the courses you are going to take because this lesson will teach you how to program.

 okay,so if you want to be a good programmer a good software engineer you have to

 Perfect.

 This course you have to give your most attention to this.

48 kHz

Hello,you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important course of your Carriage. Why is that because in this course you will you will learn how to do

 Programming how to code how to compose a software. So this is your most important lesson.

 Among all of the courses you are going to take because these lesson will teach you how to program.

 okay,so if you want to be a good programmer a good software engineer you have to

 Perfect.

 This course you have to give your most attention to this.

视频的原始采样率为48 kHz

那么任何专家或员工都可以对此发表评论吗?

这些是我与ffmpeg一起用来组成flac文件的16 kHz和48 kHz命令

-af aformat=s16:16000:mono
-af aformat=s16:48000:mono

解决方法

16 kHz只是用于转录语音到文本的推荐采样率。 1

我们建议在以下音频文件中使用至少16 kHz的采样率: 您可以使用语音转文字进行转录。中的采样率 音频文件通常为16 kHz,32 kHz,44.1 kHz和48 kHz。 由于清晰度会受到频率范围的极大影响, 尤其是在较高频率下,采样率小于16 kHz导致音频文件的信息很少或没有高于8的信息 千赫。这可能会阻止语音转文字正确转录 语音。语音清晰度需要贯穿始终的信息 2kHz至4kHz范围,尽管那些谐波(倍数) 较高的频率对于保持频率也很重要 语音清晰度。因此,将采样率保持在 最低16 kHz是一个好习惯。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...