问题描述
这是一个从 youtube 链接获取所有字幕的 Python 程序:
from pytube import YouTube
yt = YouTube('https://youtu.be/5MgBikgcWnY')
captions = yt.captions.all()
for caption in captions:
print(caption)
上面程序的输出是:
<Caption lang="Arabic" code="ar">
<Caption lang="Chinese (China)" code="zh-CN">
<Caption lang="English" code="en">
<Caption lang="English (auto-generated)" code="a.en">
<Caption lang="French" code="fr">
<Caption lang="German" code="de">
<Caption lang="Hungarian" code="hu">
<Caption lang="Italian" code="it">
但我只想从字典对中的上述输出中获取语言和代码。
{"Arabic" : "ar","Chinese" : "zh-CN","English" : "en","French : "fr","German" : "de","Hungarian" : "hu","Italian" : "it"}
提前致谢。
解决方法
很简单
from pytube import YouTube
yt = YouTube('https://youtu.be/5MgBikgcWnY')
captions = yt.captions.all()
captions_dict = {}
for caption in captions:
# Mapping the caption name to the caption code
captions_dict[caption.name] = caption.code
如果你想要一个单线
captions_dict = {caption.name: caption.code for caption in captions}
输出
{'Arabic': 'ar','Bangla': 'bn','Burmese': 'my','Chinese (China)': 'zh-CN','Chinese (Taiwan)': 'zh-TW','Croatian': 'hr','English': 'en','English (auto-generated)': 'a.en','French': 'fr','German': 'de','Hebrew': 'iw','Hungarian': 'hu','Italian': 'it','Japanese': 'ja','Persian': 'fa','Polish': 'pl','Portuguese (Brazil)': 'pt-BR','Russian': 'ru','Serbian': 'sr','Slovak': 'sk','Spanish': 'es','Spanish (Spain)': 'es-ES','Thai': 'th','Turkish': 'tr','Ukrainian': 'uk','Vietnamese': 'vi'}