如何为Ruby Capybara刮板编写Dockerfile?

问题描述

我正在尝试编写一个Dockerfile在Docker容器上运行Ruby Capybara scraper。我在主机操作系统上测试了以下代码。但这在docker容器上出错了。

Dockerfile

FROM ruby:2.6.6

RUN apt-get update -y && \
apt-get install -y xvfb

RUN wget https://ftp.mozilla.org/pub/firefox/releases/80.0.1/linux-x86_64/en-US/firefox-80.0.1.tar.bz2
RUN tar -xjf firefox-80.0.1.tar.bz2
RUN mv firefox /opt/firefox80
RUN ln -s /opt/firefox80/firefox /usr/bin/firefox
RUN ls /opt/firefox80

RUN wget -N https://github.com/mozilla/geckodriver/releases/download/v0.27.0/geckodriver-v0.27.0-linux64.tar.gz
RUN tar -xvzf geckodriver-v0.27.0-linux64.tar.gz
RUN chmod +x geckodriver
RUN mv -f geckodriver /usr/local/share/geckodriver
RUN ln -s /usr/local/share/geckodriver /usr/local/bin/geckodriver
RUN ln -s /usr/local/share/geckodriver /usr/bin/geckodriver
RUN mkdir capybara
workdir /capybara/
copY . /capybara

RUN bundle install

main.rb

require 'capybara'
require 'capybara/dsl'
require 'selenium-webdriver'

include Capybara::DSL

Capybara.register_driver :selenium_headless_firefox do |app|
  browser_options = ::Selenium::WebDriver::Firefox::Options.new()
  browser_options.args << '--headless'

  Capybara::Selenium::Driver.new(
    app,browser: :firefox,options: browser_options
  )
end

target = "https://maps.google.com/?cid=13666314335012854449"

session = Capybara::Session.new(:selenium_headless_firefox)
session.visit(target)

宝石文件

source 'https://rubygems.org'

gem 'selenium-webdriver'
gem 'capybara','~>3.30'
gem 'geckodriver-helper'

Docker上的错误消息

/usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok': invalid argument: can't kill an exited process (Selenium::WebDriver::Error::UnkNownError)
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `new'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/default.rb:114:in `request'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/remote/bridge.rb:102:in `create_session'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/firefox/marionette/driver.rb:44:in `initialize'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/firefox/driver.rb:33:in `new'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/firefox/driver.rb:33:in `new'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver/common/driver.rb:54:in `for'
        from /usr/local/bundle/gems/selenium-webdriver-3.142.7/lib/selenium/webdriver.rb:88:in `for'
        from /usr/local/bundle/gems/capybara-3.33.0/lib/capybara/selenium/driver.rb:52:in `browser'
        from /usr/local/bundle/gems/capybara-3.33.0/lib/capybara/selenium/driver.rb:71:in `visit'
        from /usr/local/bundle/gems/capybara-3.33.0/lib/capybara/session.rb:278:in `visit'

这是我在docker容器上运行main.rb文件时得到的。我期待开发者社区的任何帮助。

我通过docker run [docker_image] ruby main.rb运行了main.rb文件

解决方法

问题不是Capybara的问题,而是Firefox的问题-您下载的tar.bz2文件不包含依赖关系,这会导致崩溃。最简单的解决方案是通过apt安装它。假设所有文件都在同一目录中,则Dockerfile应该如下所示:

FROM ruby:2.6.6

WORKDIR /app

COPY . .

RUN apt-get update -y && \
    apt-get install -y xvfb firefox-esr && \
    wget -N https://github.com/mozilla/geckodriver/releases/download/v0.27.0/geckodriver-v0.27.0-linux64.tar.gz && \
    tar -xvzf geckodriver-v0.27.0-linux64.tar.gz && \
    chmod +x geckodriver && \
    mv -f geckodriver /usr/local/share/geckodriver && \
    ln -s /usr/local/share/geckodriver /usr/local/bin/geckodriver && \
    ln -s /usr/local/share/geckodriver /usr/bin/geckodriver && \
    bundle install && \
    apt-get clean && \
    rm geckodriver-v0.27.0-linux64.tar.gz && \
    rm -rf /var/lib/apt/lists/*

CMD [ "ruby","/app/main.rb" ]

然后您可以运行:

docker build -t capybara:latest . # Build image
docker run -it --rm --env DISPLAY=$DISPLAY --volume="$HOME/.Xauthority:/root/.Xauthority:rw" --net=host capybara:latest firefox # Verify Firefox works
docker run -it --rm capybara:latest # Run your script

注意:第二个命令仅在Linux上有效,在Windows上运行dockerized Linux GUI应用程序要困难一些,并且需要一些其他设置。

编辑:

没有安装任何东西 “在Docker上”。 Docker不是操作系统。这是一个应用程序容器化框架。它可以在容器内运行各种操作系统(或根本没有OS-参见base image)。这意味着在Docker映像(或容器-不推荐)中安装某些东西的方法取决于已安装的东西。

在这种情况下,您的基础映像ruby:2.6.6基于Debian Buster映像(请参阅Dockerfile),因此您需要以与常规台式机或服务器安装方式相同的方式安装浏览器。系统。

Debian Buster不随Chrome一起提供,因为它不是开源的。要安装等效的开放源代码-Chromium,请按以下步骤修改Dockerfile

FROM ruby:2.6.6

WORKDIR /app

COPY . .

RUN apt-get update -y && \
    apt-get install -y xvfb chromium && \
    wget -N https://github.com/mozilla/geckodriver/releases/download/v0.27.0/geckodriver-v0.27.0-linux64.tar.gz && \
    tar -xvzf geckodriver-v0.27.0-linux64.tar.gz && \
    chmod +x geckodriver && \
    mv -f geckodriver /usr/local/share/geckodriver && \
    ln -s /usr/local/share/geckodriver /usr/local/bin/geckodriver && \
    ln -s /usr/local/share/geckodriver /usr/bin/geckodriver && \
    bundle install && \
    apt-get clean && \
    rm geckodriver-v0.27.0-linux64.tar.gz && \
    rm -rf /var/lib/apt/lists/*

CMD [ "ruby","/app/main.rb" ] 

如果您确实需要Chrome,请遵循官方的documentation(请注意,安装后需要删除存档文件)。话虽如此,Chrome的Dockerfile为:

FROM ruby:2.6.6

WORKDIR /app

COPY . .

RUN apt-get update -y && \
    apt-get install -y xvfb && \
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
    apt install -y ./google-chrome-stable_current_amd64.deb && \
    wget -N https://github.com/mozilla/geckodriver/releases/download/v0.27.0/geckodriver-v0.27.0-linux64.tar.gz && \
    tar -xvzf geckodriver-v0.27.0-linux64.tar.gz && \
    chmod +x geckodriver && \
    mv -f geckodriver /usr/local/share/geckodriver && \
    ln -s /usr/local/share/geckodriver /usr/local/bin/geckodriver && \
    ln -s /usr/local/share/geckodriver /usr/bin/geckodriver && \
    bundle install && \
    apt-get clean && \
    rm google-chrome-stable_current_amd64.deb && \
    rm geckodriver-v0.27.0-linux64.tar.gz && \
    rm -rf /var/lib/apt/lists/*

CMD [ "ruby","/app/main.rb" ]