问题描述
我正在尝试从页面 (https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7) 中提取少量数据。我成功地获取了一些我想要的数据,但仍然无法提取一些数据。任何想法都会非常有帮助。
from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request,urlopen
req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7',headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage,'html.parser')
val = soup.find('span',class_='u-label u-label--value u-label--secondary text-dark rounded mr-1').text
transfee = soup.find('span',id='ContentPlaceHolder1_spanTxFee').text
fromaddr = soup.find('span',id='spanFromAdd').text
token = soup.find('span',class_='hash-tag text-truncate hash-tag-custom-from tooltip-address').text
print ("From: \t\t ",fromaddr)
print ("Value: \t\t ",val)
print ("Transaction Fee: ",transfee)
print ("Tokens: \t ",token)
电流输出:
From: 0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value: 0.679753633258727619 BNB
Transaction Fee: 0.00059691 BNB ($0.18)
Tokens: PancakeSwap: Router v2
想要的输出:
From: 0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value: 0.679753633258727619 BNB
Transaction Fee: 0.00059691 BNB ($0.18)
#-- the part I cant get to work
Tokens: Wrapped BNB (WBNB) -> https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
FaraCrystal (Fara) -> https://bscscan.com/token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1
解决方法
我使用了 css 选择器方法,它首先找到 div
标签并从中找到 ul
标签,它返回标签列表,我们必须在其中选择包含数据的索引 1
from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request,urlopen
req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7',headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage,'html.parser')
main_data=soup.select("div.row > div.col-md-9 >ul.list-unstyled.mb-0")[1]
for i in main_data:
print(i.find_all("a")[-1].get_text())
print("https://bscscan.com/token/"+i.find_all("a")[-1]['href'])
输出:
Wrapped BNB (WBNB)
https://bscscan.com/token//token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
FaraCrystal (FARA)
https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1
或者通过使用 find_all
方法
main_data=soup.find_all("ul",class_="list-unstyled mb-0")