基于图像的 Lambda 不适用于 Selenium Webdriver

问题描述

我想知道为什么我的项目可以在本地运行,但在我的帐户中不能作为基于图像的 lambda 函数

以下我在本地执行的命令:

 docker build -t lambda-repository .
 docker run -p 9001:8080 lambda-repository:latest

在另一个终端:

curl -XPOST "http://localhost:9001/2015-03-31/functions/function/invocations" -d '{}'

结果:

"Google"%  

我的 AWS 账户中发生的情况:

错误

START RequestId: 2a8b9cf9-f1f0-48cb-8a5d-eb86c2dddcc0 Version: $LATEST
[ERROR] WebDriverException: Message: unkNown error: Chrome Failed to start: crashed.
  (unkNown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running,so ChromeDriver is assuming that Chrome has crashed.)

Traceback (most recent call last):
  File "/var/task/app.py",line 14,in handler
    driver = webdriver.Chrome(options=options)
  File "/var/lang/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py",line 76,in __init__
    RemoteWebDriver.__init__(
  File "/var/lang/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",line 157,in __init__
    self.start_session(capabilities,browser_profile)
  File "/var/lang/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",line 252,in start_session
    response = self.execute(Command.NEW_SESSION,parameters)
  File "/var/lang/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",line 321,in execute
    self.error_handler.check_response(response)
  File "/var/lang/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py",line 242,in check_response
    raise exception_class(message,screen,stacktrace)
END RequestId: 2a8b9cf9-f1f0-48cb-8a5d-eb86c2dddcc0
REPORT RequestId: 2a8b9cf9-f1f0-48cb-8a5d-eb86c2dddcc0  Duration: 2736.59 ms    Billed Duration: 3752 ms    Memory Size: 2048 MB    Max Memory Used: 168 MB Init Duration: 1014.78 ms   

Dockerfile:

FROM public.ecr.aws/lambda/python:3.8

RUN pwd
workdir /tmp
RUN yum -y update
RUN yum -y install wget unzip
RUN yum -y install GConf2 libX11
RUN yum -y install curl

copY download-chromedriver.sh ./
RUN /bin/bash -c "source ./download-chromedriver.sh"
RUN unzip chromedriver_linux64.zip
RUN mv chromedriver /usr/bin/chromedriver
RUN chromedriver -version

RUN curl https://intoli.com/install-google-chrome.sh | bash
RUN mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
RUN google-chrome -version && which google-chrome

workdir /var/task
copY app.py ./
copY requirements.txt ./
RUN pip3 install -r requirements.txt

CMD ["app.handler"]

下载-chromedriver.sh:

#!/usr/bin/env bash

CHROME_DRIVER_VERSION=`curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE`
wget -N https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip

app.py:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver


    def handler(event,context):
        options = Options()
        options.add_argument('--headless')
        options.add_argument('--no-sandBox')
        options.add_argument('--disable-dev-shm-usage')
        options.add_argument('--disable-gpu')
        options.add_argument("window-size=1280,1024")
    
        url = 'http://www.google.com'
        driver = webdriver.Chrome(options=options)
        driver.get(url)
        text = driver.title
        return text

requirements.txt:

selenium==3.141.0
urllib3==1.26.2

解决方法

这不是解决方案

我在 lambda zip 中使用了这种方式。但也许会给你更多提示。

当您运行 chromedriver --verseion 时(我认为它应该是 chromedriver --version)意味着在本地运行。

所以你应该在 lambda_function.handler 中输入 os.system('<chromedriver_path>/chromedriver --version')
喜欢executable_path

并且您需要设置options.binary_location
我不确定您是否也需要 def handler(event,context): os.system('/usr/bin/chromedriver --version') options = Options() options.add_argument('--headless') options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') options.add_argument('--disable-gpu') options.add_argument("window-size=1280,1024") options.binary_location = "/usr/bin/google-chrome" url = 'http://www.google.com' driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver",options=options) driver.get(url) text = driver.title return text

    const conference = await mysqlHelper.query(conferenceSql);
    // select query
    const time = conference[0].created_datetime
    // I wonder how to get the timezone from this time variable here.
    console.log(time) 
    // console.log returns '2021-02-23T01:30:00.000Z' to the terminal

在您成功使用 chrome88 启动 Python3.8 后,我希望分享如何工作。