如何在 Java AWS SDK 的上下文中获取正在使用/打开连接的文件描述符数量? 问题描述考虑的解决方案完整的错误日志PS

问题描述

问题描述

目前,我在服务中看到来自 Lambda SDK 2.0(带有 Netty 客户端)的 SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time(完整错误日志)异常,其中多个节点轮询来自 N 个队列的 SQS 消息并尝试以非常高的速度调用 lambda (无限制)速率。

我尝试根据每个节点的 cpu 使用情况应用背压。这并没有真正的帮助,因为以高速率使用 SQS 消息仍然会在每个主机上产生大量网络连接,从而保持较低的 cpu 使用率,从而导致相同的错误

此外,增加连接获取超时也无济于事(甚至使情况变得更糟),因为连接获取的积压正在堆积,而新的 Lambda 调用请求正在传入。类似适用于增加最大连接数(目前,我有 120000 个最大连接值)。

因此,我正在构建一个 SQS 背压机制,该机制可防止节点根据该节点上打开的网络连接数轮询更多消息。

问题是

  1. 如何获取主机上打开的连接数? (除了下面的解决方案)
  2. 是否有任何 Java 库/框架可以在不为下面提到的选项实现自定义代码的情况下使用?

考虑的解决方

  1. 根据作为 SDK metrics 一部分发出的 LeasedConcurrency 指标(通过 CloudWatchMetricpublisher获取
  2. 基于 JMX FileDescriptorUse 指标获取

完整的错误日志

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections,increase acquire timeout,or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized),but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count,increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up,the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue,try smoothing out your requests so that large traffic bursts cannot overload the client,being more efficient with the number of times you need to call AWS,or by increasing the number of hosts sending requests.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) ~[AwsJavaSdk-Core-2.0.jar:?]

PS

任何相关网络/操作系统/背压资源的链接包括低级细节,例如 cpu 低的原因,而主机上有大量连接需要处理)将不胜感激

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)