在multiprocessing.Process中使用时,为什么我的request.Session开始新的HTTP连接?

问题描述

以这个Python 2.7脚本为例,它使用了多处理模块:

# Local test
import urllib2
import shlex
import requests
import json
import threading
import os
import logging
from multiprocessing import Process,Queue
from threading import current_thread

sessions = {}
logging.basicConfig(filename='/tmp/python.log',level=logging.DEBUG)


def worker(session,queue):
    logging.exception('parent process: ' + str(os.getppid()) + ',process id: ' + str(os.getpid()) + ' -- ' + str(session.verify))
    url = 'http://127.0.0.1:8487/test'
    response = session.get(url,verify=True,timeout=5).json()
    queue.put(response)
    return response

def doWork():
    global sessions
    try:
        thread = threading.current_thread()
        if not id(thread) in sessions:
            sessions[id(thread)] = requests.Session()
            session = sessions[id(thread)]
            session.verify = 'new session - ' + current_thread().name
        else:
            session = sessions[id(thread)]
            session.verify = 'reuse session - ' + current_thread().name
        queue = Queue()
        p = Process(target=worker,args=(session,queue))
        p.start()
        p.join()
        return queue.get()
    except Exception as e:
        logging.exception(e)
        return "error"

请不要担心“会话注册表”。这对于更大的环境是必要的,但对我的工作不会有任何影响。我想展示的是,我实际上是在分支过程中重用了相同的会话对象。所以我正在像这样运行此脚本:

python
Python 2.7.5 (default,Aug  7 2019,00:51:29)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help","copyright","credits" or "license" for more information.
>>> import test
>>> test.doWork()
{u'name': 3}
>>> test.doWork()
{u'name': 3}
>>> test.doWork()
{u'name': 3}
>>>

我的python.log显示如下:

ERROR:root:parent process: 10092,process id: 10240 -- new session - MainThread
None
INFO:urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1
DEBUG:urllib3.connectionpool:"GET /test HTTP/1.1" 200 10
ERROR:root:parent process: 10092,process id: 10253 -- reuse session - MainThread
None
INFO:urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1
DEBUG:urllib3.connectionpool:"GET /test HTTP/1.1" 200 10
ERROR:root:parent process: 10092,process id: 10261 -- reuse session - MainThread
None
INFO:urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1
DEBUG:urllib3.connectionpool:"GET /test HTTP/1.1" 200 10

为什么会话是一个相同的会话对象,为什么它开始一个新的HTTP连接?如果我更改代码以直接调用worker,而无需进行多处理,它将按预期工作,并且连接将被重用。

仅供参考,我使用的是报告以下内容的模拟HTTP服务器(mock-server.com):

2020-09-25 08:34:55 5.11.1 INFO 1080 returning response:

  {
    "body" : "{\"name\":3}","delay" : {
      "timeUnit" : "MILLISECONDS","value" : 30
    },"connectionoptions" : {
      "closeSocket" : false
    }
  }

 for request:

  {
    "method" : "GET","path" : "/test","headers" : {
      "Host" : [ "127.0.0.1:8487" ],"Connection" : [ "keep-alive" ],"Accept-Encoding" : [ "gzip,deflate" ],"Accept" : [ "*/*" ],"User-Agent" : [ "python-requests/2.6.0 cpython/2.7.5 Linux/3.10.0-1062.18.1.el7.x86_64" ],"content-length" : [ "0" ]
    },"keepAlive" : true,"secure" : false
  }

 for action:

  {
    "body" : "{\"name\":3}","connectionoptions" : {
      "closeSocket" : false
    }
  }

服务器正在回复keep-alive:

curl -v 127.0.0.1:8487/test
* About to connect() to 127.0.0.1 port 8487 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8487 (#0)
> GET /test HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:8487
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< content-length: 10
<
* Connection #0 to host 127.0.0.1 left intact
{"name":3}

解决方法

request.Session()具有一个带有10个线程的内置线程池(通过HTTPAdapter),您可能正在对该内部池进行初始化。首先,在使用中,您可能不需要包装池,因为会话已经具有内置池。或者,将内部池限制为1个线程,以查看是否有帮助

session = requests.session()
adapter = requests.adapters.HTTPAdapter(pool_connections=1)
session.mount("http://",adapter)