如何在Docker Compose中将Scrapy与Python和Tor通过Privoxy一起使用

我正在尝试使用Python和tor和privoxy运行Scrapy。 我在https://github.com/khpeek/privoxy-tor-scraper中使用khpeek / privoxy-tor-scraper的抓取器。这是我的目录结构:

 - docker-compose.yml
 - privoxy
   - config
   - Dockerfile
- scraper
   - Dockerfile
   - newnym.py
   - requirements.txt
- tor
   - Dockerfile

我正在尝试运行以下 docker-compose.yml

version: '3'

services:
  privoxy:
    build: ./privoxy
    ports:
      - "8118:8118"
    links:
      - tor

  tor:
    build:
      context: ./tor
      args:
        password: "1234"
    ports:
      - "9050:9050"
      - "9051:9051"

  scraper:
    build: ./scraper
    links:
      - tor
      - privoxy

其中 tor Dockerfile 是:

FROM alpine:3.7
EXPOSE 9050 9051
ARG password
RUN apk --update add tor
RUN echo "ControlPort 9051" >> /etc/tor/torrc
RUN echo "CookieAuthentication 1" >> /etc/tor/torrc
RUN echo "HashedControlPassword $(tor --quiet --hash-password $password)" >> /etc/tor/torrc
CMD ["tor"]
privoxy

帽子是:

FROM alpine:latest
EXPOSE 8118
RUN apk --update add privoxy
COPY config /etc/privoxy/
#CMD ["privoxy","--no-daemon"]
CMD ["privoxy","--no-daemon","/etc/privoxy/config"]

其中 config 由两行组成:

listen-address 0.0.0.0:8118
forward-socks5 / tor:9050 .

抓取工具 Dockerfile 是:

FROM python:3.6-alpine
ADD . /scraper
WORKDIR /scraper
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD ["python","newnym.py"]

其中 requirements.txt 包含一行请求。最后,程序 newnym.py 设计为仅测试使用Tor更改IP地址是否有效:

from time import sleep,time

import requests as req
import telnetlib

def get_ip():
    IPECHO_ENDPOINT = 'http://ipecho.net/plain'
    HTTP_PROXY = 'http://privoxy:8118'
    return req.get(IPECHO_ENDPOINT,proxies={'http': HTTP_PROXY}).text

def request_ip_change():
    #tn = telnetlib.Telnet('privoxy',8118)
    tn = telnetlib.Telnet('tor',9051)
    tn.read_until("Escape character is '^]'.",2)
    tn.write('AUTHENTICATE ""\r\n')
    tn.read_until("250 OK",2)
    tn.write("signal NEWNYM\r\n")
    tn.read_until("250 OK",2)

if __name__ == '__main__':
    dts = []
    #isOpen('tor',9051)
    #isOpen('privoxy',8118)
    try:
        while True:
            ip = get_ip()
            t0 = time()
            request_ip_change()
            while True:
                new_ip = get_ip()
                if new_ip == ip:
                    sleep(1)
                else:
                    break
            dt = time() - t0
            dts.append(dt)
            print("{} -> {} in ~{}s".format(ip,new_ip,int(dt)))
    except KeyboardInterrupt:
        print("Stopping...")
        print("Average: {}".format(sum(dts) / len(dts)))

docker-compose build 构建成功,但是如果我尝试 docker-compose up ,则会收到以下错误消息:

scraper_1_651fd6690a2d | Traceback (most recent call last):
scraper_1_651fd6690a2d |   File "newnym.py",line 45,in <module>
scraper_1_651fd6690a2d |     request_ip_change()
scraper_1_651fd6690a2d |   File "newnym.py",line 27,in request_ip_change
scraper_1_651fd6690a2d |     tn = telnetlib.Telnet('tor',9051)
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/telnetlib.py",line 218,in __init__
scraper_1_651fd6690a2d |     self.open(host,port,timeout)
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/telnetlib.py",line 234,in open
scraper_1_651fd6690a2d |     self.sock = socket.create_connection((host,port),timeout)
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/socket.py",line 724,in create_connection
scraper_1_651fd6690a2d |     raise err
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/socket.py",line 713,in create_connection
scraper_1_651fd6690a2d |     sock.connect(sa)
scraper_1_651fd6690a2d | ConnectionRefusedError: [Errno 111] Connection refused

相关文章

功能概要:(目前已实现功能)公共展示部分:1.网站首页展示...
大体上把Python中的数据类型分为如下几类: Number(数字) ...
开发之前第一步,就是构造整个的项目结构。这就好比作一幅画...
源码编译方式安装Apache首先下载Apache源码压缩包,地址为ht...
前面说完了此项目的创建及数据模型设计的过程。如果未看过,...
python中常用的写爬虫的库有urllib2、requests,对于大多数比...