问题描述
更清楚一点:我想检索一个演员在 IMDB 中显示的系列有多少集(带有日期)。
I'm using the Doctor Who page as an example
在这种情况下,我想知道马特史密斯从 2010 年到 2020 年出现了 46 集。
IMDB 在角色对象上完美地做到了这一点,具有 currentRole 和它的 notes 属性
from imdb import IMDb
ia = IMDb()
movie = ia.get_movie('0436992') # id for Doctor Who
cast = movie['cast']
print("Actor name :",cast[0]['name'])
print("Role :",cast[0].currentRole)
print("Notes :",cast[0].notes)
Actor name : Matt Smith
Role : The Doctor
Notes : (58 episodes,2010-2020)
(奇怪的是,剧集数错了,因为网站上写了 46 集,如果你点击它会显示 54 集,但这不是我的观点)
然而,其他演员在这个系列中扮演了多个角色,Character.currentRole
则返回一个列表。我更改了代码以正确获取它:
from imdb import IMDb
ia = IMDb()
movie = ia.get_movie('0436992')
cast = movie['cast']
for i in range(2):
print("Actor name :",cast[i]['name'])
if isinstance(cast[i].currentRole,list):
print("Roles :")
for role in cast[i].currentRole:
print(" - ",role," (Note :" + role.notes + ")")
else:
print("Role :",cast[i].currentRole)
print("Notes :",cast[i].notes)
print("")
但结果是:
Actor name : Matt Smith
Role : The Doctor
Notes : (58 episodes,2010-2020)
Actor name : David Tennant
Roles :
- The Doctor (Note :)
- ... (Note :)
Notes :
我无法在此处检索我想要的信息,并且所有“注释”都是空的。我在调试时尝试从 imdbpy 中挖掘 Person 和 Character 对象,但找不到我需要的东西。
它似乎只发生在扮演多个角色的演员身上,有没有办法用 imdbpy 来检索它,而不是外部解析器?
任何想法都值得赞赏
解决方法
我遇到了同样的问题。遗憾的是,我也无法用 IMDbPY 解决它。我认为它是越野车。 相反,我用 bs4 编写了自己的解析器:
import requests
from bs4 import BeautifulSoup
# parse the page with bs4
page = requests.get('https://www.imdb.com/title/tt0436992/fullcredits')
soup = BeautifulSoup(page.text,'lxml')
# find the cast table
table = soup.find('table',{"class": "cast_list"})
cast = []
# iterate over it
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
cast_member = {}
for column in columns:
# name column
if column_marker == 1:
cast_member['name'] = column.get_text().strip()
# combined role and episodes/years column
elif column_marker == 3:
links = column.find_all('a')
role_element = column.find('a',{'class': None})
if role_element:
cast_member['role'] = role_element.get_text().strip()
episodes_and_years_element = column.find('a',{'class': 'toggle-episodes'})
if episodes_and_years_element:
episodes_and_years = episodes_and_years_element.get_text().strip().split(',')
cast_member['episodes'] = episodes_and_years[0]
if len(episodes_and_years) > 1:
cast_member['years'] = episodes_and_years[1]
column_marker += 1
if len(cast_member):
cast.append(cast_member)
print(cast[:5])
这绝对不是最优雅的解决方案,但我相信它可以满足您的需求。