Python PhantomJS无法获取所有html

问题描述

据我所知，我们可以通过多种方式进行网页抓取：

1-使用简单的请求和 bs4

在网页源中包含脚本的情况下：

2-使用硒和 bs4

3-使用 PhantomJS

我在这里尝试刮擦https://zenitbet.com/en/line/football

我知道第一种方法行不通，但是PhantomJS也不行。在此链接中，我需要previousIsLockedValue标签。但是我完全不知道。有人可以帮忙吗？

我的代码：

<table>

在输出中，找不到页面的主要部分（即投注框）

解决方法

我成功地使用了硒来渲染页面，然后简单地用熊猫读取html来获取表格。

// replace namespace by the correct namespace of the class
using static namespace.Language;

class Program
{
    public static void Main()
    {
        // some code ...

        var stringLanguage = languages.Select(languagePrint => $"{languagePrint.Year} {languagePrint.Name} {languagePrint.ChiefDeveloper}");

        // This now works because the static parts were imported
        PrintAll(stringLanguage); 
    }
}

输出：

from selenium import webdriver
import pandas as pd
url = 'https://zenitbet.com/en/line/football'

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)

dfs = pd.read_html(driver.page_source)

driver.quit()

beautifulsoup phantomjs python selenium web-scraping

Python PhantomJS无法获取所有html

问题描述

解决方法

相关问答