通过字段名称从表中提取数据 Xpath,Python

问题描述

我想从此页面https://mbasic.facebook.com/kristina.layus提取数据 有一个两行的表格“住所”

Current city --- Moscow,Russia
Home town    --- Saint Petersburg,Russia

我可以借助完整的xpath提取数据(提取的数据为“ Moscow,Russia”):

/html/body/div/div/div[2]/div/div[1]/div[4]/div/div/div[1]/div/table/tbody/tr/td[2]/div/a

但是我想借助表中的名称提取数据。我尝试过了

//div[@id='living']//div[@title='Current City']//a/text()

但是收到错误

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@id='living']//div[@title='Current City']//a/text()"}
  (Session info: chrome=84.0.4147.89)

我的代码

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import webdriverwait
from selenium.webdriver.support import expected_conditions as EC

class FacebookParser:
    LOGIN_URL = 'https://www.facebook.com/login.PHP'

    def __init__(self,login,password):
        chrome_options = webdriver.ChromeOptions()
        prefs = {"profile.default_content_setting_values.notifications": 2}
        chrome_options.add_experimental_option("prefs",prefs)

        self.driver = webdriver.Chrome(chrome_options=chrome_options)
        self.wait = webdriverwait(self.driver,10)
        self.login(login,password)

    def login(self,password):
        self.driver.get(self.LOGIN_URL)

        # wait for the login page to load
        self.wait.until(EC.visibility_of_element_located((By.ID,"email")))

        self.driver.find_element_by_id('email').send_keys(login)
        self.driver.find_element_by_id('pass').send_keys(password)
        self.driver.find_element_by_id('loginbutton').click()
    
    def get_user_by_id(self,id):
        self.driver.get(BASIC_URL + 'profile.PHP?id=' + str(id))
        
    def get_user_by_url(self,url):
        self.driver.get(url)
    
    def find_element_by_xpath_safe(self,path):
        try:
            return parser.driver.find_element_by_xpath(path)
        except:
            return None

    def get_first_name(self):
        res = self.find_element_by_xpath_safe('//span/div/span/strong')
        if res:
            vec = res.text.split()
            if len(vec) > 0:
                return vec[0]
            else:
                print("Can't split {}".format(res.text))
        return ""

    def get_second_name(self):
        res = self.find_element_by_xpath_safe('//span/div/span/strong')
        if res:
            vec = res.text.split()
            if len(vec) > 1:
                return vec[1]
            else:
                print("Can't split {}".format(res.text))
        return ""
    
    def get_curr_city(self):
        res = self.find_element_by_xpath_safe('/html/body/div/div/div[2]/div/div[1]/div[4]/div/div/div[1]/div/table/tbody/tr/td[2]/div/a')
        if res:
            return res.text
        return ""
    
    def get_home_town(self):
        res = self.find_element_by_xpath_safe('/html/body/div/div/div[2]/div/div[1]/div[4]/div/div/div[2]/div/table/tbody/tr/td[2]/div/a')
        if res:
            return res.text
        return ""
        

#####################################
LOGIN = '----.com'
PASSWORD = '----'
BASIC_URL = 'https://mbasic.facebook.com/'
#####################################

parser = FacebookParser(login=LOGIN,password=PASSWORD)
parser.driver.get("https://mbasic.facebook.com/kristina.layus")


parser.driver.get("https://mbasic.facebook.com/kristina.layus")
print(parser.get_curr_city())

解决方法

要打印文本俄罗斯莫斯科,您需要为visibility_of_element_located()引出WebDriverWait,并且可以使用以下基于Locator Strategy

  • 打印俄罗斯的莫斯科

    driver.get('https://mbasic.facebook.com/kristina.layus')
    print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH,"//span[text()='Current City']//following::td//a"))).text)
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

参考文献

您可以在NoSuchElementException上找到一些相关的讨论:

,

尝试在登录(loginbutton.click())和打开目标页面之间添加以下代码:

external interface ViewModelProps : RProps {
    var viewModel : MyViewModel
}
val App = functionalComponent<ViewModelProps> { props ->
    val model = props.viewModel
    val (state,setState) = useState(model.stateFlow.value)
    useEffectWithCleanup {
        val job = model.stateFlow.onEach {
            if (it != state) {
                setState(it)
            }
        }.launchIn(GlobalScope)
        return@useEffectWithCleanup { job.cancel() }
    }
}

此代码将等到登录过程完成后才打开目标页面。

还要检查您的xpath表达式:在调查页面源div时,可以找到带有from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait WebDriverWait(wd,DELAY).until(EC.presence_of_element_located((By.ID,"mount_0_0"))) 的div元素,但是缺少具有属性id="living"的div。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...