剧作家 python 无法使用 cookie 获取 api 页面

问题描述

我在 python 中有两个脚本:

登录 >> 进入网站,使用登录表单登录并将 cookie 存储到 JSON 文件中以备后用

import json
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch(slow_mo=50)
    context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/89.0.4389.114 Safari/537.36')
    page = context.new_page()
    page.goto('https://www.url.us/signin')
    try:
        page.wait_for_selector('#signInFormPage input[name="userName"]',state='visible')
        page.type('#signInFormPage input[name="userName"]',"aaa")
        page.type('#signInFormPage input[name="password"]',"aa")
        page.click('#userNamePasswordSignInButton')
        page.wait_for_timeout(3000)
        cookies = context.cookies()
        page.wait_for_timeout(10000)
        f = open('./cookies.json','w')
        f.write(json.dumps(cookies))
        page.close()
        context.close()
        browser.close()             
    except Exception as e:
        print("Error in playwright script.")
        page.close()
        context.close()
        browser.close() 

这个脚本运行良好。 第二个脚本是从文件获取存储的cookies并打印同一网站其他页面页面源:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False,slow_mo=50)
    context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/89.0.4389.114 Safari/537.36')
    page = context.new_page()
    cookie_file = open('./cookies.json')
    cookies = json.load(cookie_file)
    context.add_cookies(cookies)
    page.goto('https://www.url.us/Product/10aaa')
    try:
        page.wait_for_timeout(6000)
        print(page.content())
        page.close()
    except Exception as e:
        print("Error in playwright script.")
        page.close()

而且这个脚本也运行良好。

但问题是这个网站有一些我想提取的信息的 API,而且信息不能通过前端用户可见的页面源获得。因此,当我将 API 链接放在第二个链接中时,我收到了空的 JSON 页面。这些 API 请求使用令牌值,但由于我使用 cookie 来获取页面源,因此我没有令牌。我使用这些脚本是因为这是通过该网站拥有的 Cloudflare 保护的唯一途径。例如,有什么方法可以将请求模块与 playwright 模块结合使用吗?或者任何其他对这种情况有帮助的建议,我如何使用 cookie 获取 JSON 页面

使用持久上下文更新代码

1 脚本:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch_persistent_context(r'C:\Users\test\Downloads\pyyy',headless=False)
    page = browser.new_page()
    page.goto('https://www.url.us/signin')
    try:
        page.wait_for_selector('#signInFormPage input[name="userName"]',"aaaaa")
        page.type('#signInFormPage input[name="password"]',"aaaa")
        page.click('#userNamePasswordSignInButton')
        page.wait_for_timeout(3000)
        page.close()
    except Exception as e:
        print("Error in playwright script.")
        page.close()

2:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch_persistent_context(r'C:\Users\test\Downloads\pyyy',headless=False)
    page = browser.new_page()
    page.goto('https://www.url.us/Product/aaa')
    try:
        page.wait_for_timeout(6000)
        print(page.content())
        page.close()
    except Exception as e:
        print("Error in playwright script.")
        page.close()

解决方法

我会启动一个 Persistent context,而不是保存和加载 cookie。此持久上下文将保留user_data_dir您提供的信息。