nginx 响应时间太长

问题描述

我有一个 Nginx，主要用作几个上游服务的反向代理。这个 Nginx 有一个简单的端点用于健康检查：

    location /ping { return 200 '{"ping":"successful"}'; }

我遇到的问题是此 ping 响应时间太长：

    $ cat /proc/loadavg; date ; httpstat localhost/ping?foo=bar
    2.93 1.98 1.94 8/433 16725
    Thu Jul 15 15:25:08 UTC 2021
    Connected to 127.0.0.1:80 from 127.0.0.1:42946

    HTTP/1.1 200 OK
    Date: Thu,15 Jul 2021 15:26:24 GMT
    X-Request-ID: b8d276b0b3828113cfee3bf2daa01293

      DNS Lookup   TCP Connection   Server Processing   Content Transfer
    [     4ms    |       0ms      |      76032ms      |        0ms       ]
                 |                |                   |                  |
        namelookup:4ms            |                   |                  |
                            connect:4ms               |                  |
                                          starttransfer:76036ms          |
                                                                     total:76036ms

那个 ^ 告诉我请求时的平均负载很低（2.93 8 核服务器的 1m 平均负载是可以的）

curl/httpstat 在 15:25:08 发起请求，在 15:26:24 获得响应。快速建立连接，发送请求，然后服务器响应需要 76 秒。

如果我查看此 ping 的访问日志，我会看到 "req_time":"0.000"（这是 $request_time 变量）。

  {"t":"2021-07-15T15:26:24+00:00","id":"b8d276b0b3828113cfee3bf2daa01293","cid":"18581172","pid":"13631","host":"localhost","req":"GET /ping?foo=bar HTTP/1.1","scheme":"","status":"200","req_time":"0.000","body_sent":"21","bytes_sent":"373","content_length":"","request_length":"85","stats":"","upstream":{"status":"","sent":"","received":"","addr":"","conn_time":"","resp_time":""},"client":{"id":"#","agent":"curl/7.58.0","addr":",127.0.0.1:42946"},"limit_status":{"conn":"","req":""}}

这是访问日志格式，以防有人想知道其余的值是什么：

  log_format main escape=json '{"t":"$time_iso8601","id":"$ring_request_id","cid":"$connection","pid":"$pid","host":"$http_host","req":"$request","scheme":"$http_x_forwarded_proto","status":"$status","req_time":"$request_time","body_sent":"$body_bytes_sent","bytes_sent":"$bytes_sent","content_length":"$content_length","request_length":"$request_length","stats":"$location_tag","upstream":{"status":"$upstream_status","sent":"$upstream_bytes_sent","received":"$upstream_bytes_received","addr":"$upstream_addr","conn_time":"$upstream_connect_time","resp_time":"$upstream_response_time"},"client":{"id":"$http_x_auth_appid$http_x_ringdevicetype#$remote_user$http_x_auth_userid","agent":"$http_user_agent","addr":"$http_x_forwarded_for,$remote_addr:$remote_port"},"limit_status":{"conn":"$limit_conn_status","req":"$limit_req_status"}}';

我的问题是：如果请求只用了 0 秒来处理和响应，那么 Nginx 会在哪里花费这 76 秒？

需要特别提及的是，服务器此时也正在超时与上游的大量连接：我们看到很多 upstream timed out (110: Connection timed out) while reading response header from upstream 和 upstream server temporarily disabled while reading response header from upstream。

所以，这两者是相关的，我看不出为什么当 cpu 和负载都低/可接受时，上游超时会导致 /ping 需要 76 秒才能参与和响应。

有什么想法吗？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

nginx nginx-reverse-proxy openresty openresty openresty timeout timeout