当在Starlette中使用WEB

问题描述

我正在尝试构建使用Pytorch模型的API。但是，一旦我将WEB_CONCURRENCY增加到1以上，它就会创建比预期多得多的线程，并且即使发送单个请求，也会减慢很多速度。

示例代码：

api.sh

export WEB_CONCURRENCY=2

python api.py

api.py

from starlette.applications import Starlette
from starlette.responses import UJSONResponse
from starlette.middleware.gzip import GZipMiddleware
from mymodel import Model


model = Model()
app = Starlette(debug=False)
app.add_middleware(GZipMiddleware,minimum_size=1000)    


@app.route('/process',methods=['GET','POST','HEAD'])
async def add_styles(request):
    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse([],headers=response_header)

    print('===Request body===')
    print(params)

    model_output = model(params.get('data',[])) # It is very simplified. Inside there are 
                                                 # many things that are happening,which 
                                                 # involve file reading/writing 
                                                 # and spawning processes with `popen` that 
                                                 # do even more processing. But I don't 
                                                 # think that should be an issue here.

    return model_output


if __name__ == '__main__':
    uvicorn.run('api:app',host='0.0.0.0',port=int(os.environ.get('PORT',8080)))

在api.sh中使用WEB_CONCURRENCY=1时，运行nvidia-smi且模型使用1.2GB或VRAM时仅看到1个python进程。请求大约需要0.7s

在api.sh中使用WEB_CONCURRENCY=2时，在nvidia-smi中可以看到多达8个python进程，它们将使用约8GB以上的VRAM。如果您很幸运并且没有出现内存不足错误，那么单个请求最多可能需要3秒钟。

我正在使用Python3.8

为什么WEB_CONCURRENCY=2时Pytorch不使用预期的2.4GB VRAM？为什么它放慢这么多？

解决方法

如果其他任何人在此问题上迷失了方向，请使用金枪鱼。它使用单独的线程/进程，因此不会发生内部冲突。

因此，与其运行：python api.py，不如运行它：gunicorn -w 2 api:app -k uvicorn.workers.UvicornWorker

python pytorch starlette uvicorn

当在Starlette中使用WEB_CONCURRENCY> 1

问题描述

解决方法

相关问答