问题描述
从Kubernetes集群中的应用程序到Mongo DB(Atlas)的连接基本上可以正常工作。但是,有时连接断开,我不知道为什么。我的日志中似乎有两个不同的错误,它们似乎是相关的:
Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET,servers=[{address=xxxxx-tedsb.gcp.mongodb.net:27017,type=UNKNowN,state=CONNECTING,exception={com.mongodb.MongoSocketopenException: Exception opening socket},caused by {java.net.ConnectException: Connection timed out}},{address=xxxxx-shard-00-01-tedsb.gcp.mongodb.net:27017,type=REPLICA_SET_SECONDARY,TagSet{[Tag{name='nodeType',value='ELECTABLE'},Tag{name='provider',value='GCP'},Tag{name='region',value='EUROPE_norTH_1'}]},roundtripTime=34.3 ms,state=CONNECTED},{address=xxxxx.gcp.mongodb.net:27017,caused by {java.net.ConnectException: Connection timed out}}]
和
com.mongodb.MongoSocketopenException: Exception opening socket
at com.mongodb.connection.TlsChannelStreamFactoryFactory$TlsChannelStream.lambda$openAsync$0(TlsChannelStreamFactoryFactory.java:246)
at com.mongodb.connection.TlsChannelStreamFactoryFactory$SelectorMonitor.lambda$start$0(TlsChannelStreamFactoryFactory.java:141)
at java.base/java.lang.Thread.run(UnkNown Source)
Caused by: java.net.ConnectException: Connection timed out
at java.base/sun.nio.ch.socketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.socketChannelImpl.finishConnect(UnkNown Source)
at com.mongodb.connection.TlsChannelStreamFactoryFactory$TlsChannelStream.lambda$openAsync$0(TlsChannelStreamFactoryFactory.java:218)
... 2 more
我没有达到连接数量的限制,并且Pod本身也没有压力(内存,cpu,磁盘空间)。 JVM也是如此。我使用的是官方驱动程序(“ org.mongodb.scala” %%“ mongo-scala-driver”%“ 4.1.0”)和Mongo 4.2.8。
我使用的配置非常基本:
private val uri = configuration.get[String]("mongodb.uri")
private val clientSettings = MongoClientSettings
.builder()
.uuidRepresentation(UuidRepresentation.STANDARD)
.applyConnectionString(new ConnectionString(uri))
.codecRegistry(Registry)
.build()
val client = MongoClient(clientSettings)
我发现了这个(问题)[https://jira.mongodb.org/browse/JAVA-3274],该问题已在mongo-scala-driver:4.1.0中得到解决,我希望这能解决我的问题,但是没有。
你知道我接下来可以检查什么吗?
解决方法
显然,您的Pod与副本集的三个节点中的两个节点之间存在连接问题。由于可见节点当前是辅助节点,因此驱动程序无法满足默认设置的readPreference=primary
,因此过一会儿就会超时。
我将检查您的防火墙配置,确保您的Pod可以访问所有三个节点,而不仅仅是一个节点。另外,请确保Atlas中已将Pod的IP地址列入白名单。