GKE升级后,谷歌Kubernetes上的DSpace jspui部署失败

问题描述

我们已经将dspace 6.3部署到了Google Kubernetes引擎(GKE)上,该部署一直运行良好。但是,当我们将GKE从v1.12.7-gke.24升级到1.14.10-gke.50时,容器突然失败了。对k8s版本的更改是工作k8s节点和发生故障的k8s节点之间的唯一区别。本地构建的Docker容器可以正常工作。我们将其他dspace模块部署在可以正常工作的单独容器(例如solr)中,只有jspui模块出现故障。

dspace分支“ dspace-6_x”标记dspace-6.3”

Docker镜像:tomcat:8-alpine

通过gitlab CI / CD管道进行部署

该故障是由于在调用各种dspace工厂服务单例模式bean的早期加载时,Spring Loader跌倒引起的。加载网站时,这会导致404错误,因为该网络应用无法初始化。

/usr/local/tomcat/log/localhost.YYYY-MM-dd.log中的错误消息:

28-Oct-2020 23:47:18.668 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart 
Exception sending context initialized event to listener instance of class 
[org.dspace.servicemanager.servlet.dspaceKernelservletcontextlistener]
        java.lang.RuntimeException: Failure during filter init: Failed to startup the dspace Service 
Manager: failure starting up spring service manager: Error creating bean with name 
'org.dspace.app.sherpa.submit.SHERPASubmitService' defined in URL 
[jar:file:/dspace/webapps/jspui/WEB-INF/lib/dspace-api-6.3.jar!/spring/spring-dspace-addon-sherpa-services.xml]: 
Cannot resolve reference to bean 'org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService' while setting 
bean property'configuration'; nested exception is org.springframework.beans.factory.BeanCreationException: 
Error creating bean with name 'org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService' defined in 
file [/dspace/config/spring/api/sherpa.xml]: Cannot create inner bean 
'org.dspace.app.sherpa.submit.MetadataValueISSNExtractor#1b511285' of type 
[org.dspace.app.sherpa.submit.MetadataValueISSNExtractor] while setting bean property 
'issnItemExtractors' with key [0]; nestedexception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 
'org.dspace.app.sherpa.submit.MetadataValueISSNExtractor#1b511285': Injection of autowired dependencies 
Failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire 
field: public org.dspace.content.service.ItemService 
org.dspace.app.sherpa.submit.MetadataValueISSNExtractor.itemService; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 
'org.dspace.content.ItemServiceImpl#0': Injection of autowireddependencies Failed; nested exception is
 org.springframework.beans.factory.BeanCreationException: Could not autowire field: protected 
org.dspace.handle.service.HandleService org.dspace.content.dspaceObjectServiceImpl.handleService; ...

在以下位置引发“失败启动Spring服务管理器”错误消息:

org.dspace.servicemanager.dspaceServiceManager (\ dspace-services \ src \ main \ java \ org \ dspace \ servicemanager \ dspaceServiceManager.java第215行)

在第212行的catch语句中,该语句调用

org.dspace.servicemanager.spring.SpringServiceManager.startup() (\ dspace-services \ src \ main \ java \ org \ dspace \ servicemanager \ spring \ SpringServiceManager.java第177行)

它使用Spring框架尽早加载工厂bean。

我们首先想到的是,新的k8s版本可能需要更多的内存。因此,我们将Tomcat内存从1.5GB增加到了4GB。这不能解决问题。

我们已经研究了升级间的GKE中间版本的发行说明,但没有任何帮助。

我们尝试使用其他Tomcat docker镜像,但无济于事。因此,我们认为这与操作系统无关。

远程调试连接到Tomcat的速度不足以捕获异常。我们尝试了Java专用的Google Cloud Debugger,但是Alpine Linux缺少了一些必需的库。无论如何,我不认为我们会发现比所记录的错误消息更有用的东西。

如果有人有任何想法,我们将不胜感激。

我们生产的k8s配置yaml文件

ingress:
  hosts:
    - our.url.uts.edu.au

database:
  secret: our_password
  name: our_db_name
  host: "our.db.instance.url"
  port: "5432"

dspace:
  env:
    - name: dspace_HOSTNAME
      value: our.url.uts.edu.au
    - name: SOLR_PORT
      value: "8080"
      # Include colon if port is specified
    - name: dspace_PORT
      value: ""
    - name: MAX_DB_CONNECTIONS
      value: "50"
    - name: "MAX_IDLE_DB_CONNECTIONS"
      value: "30"
    - name: INITIAL_DB_CONNECTIONS
      value: "20"
    - name: S3_ASSETSTORE_SUBFOLDER
      value: "our_folder"
    - name: S3_CONNECTION_TTL
      value: "120000"
    - name: S3_MAX_CONNECTIONS
      value: "50"
    - name: REST_EVENT_WEBHOOK_URL
      value: http://our.rest.service.url/dspace/v2/webhook
    - name: UTSLIB_FRAMEWORK_dspace_TOKEN
      value: OUR_TOKEN
    - name: CATALINA_OPTS
      value: "-xms1512m -Xmx1512m"

  resources:
    requests:
      memory: "1640Mi"
      cpu: 100m
    limits:
      memory: "1896Mi"
      cpu: "450m"

solr:
  pvc:
    accessModes:
      - ReadWriteOnce
    annotations: {}
    size: 35Gi

  env:
    - name: CATALINA_OPTS
      value: "-xms3904m -Xmx3904m -XX:+UseG1GC"

  resources:
    requests:
      memory: "4032Mi"
      cpu: 50m
    limits:
      memory: "4096Mi"
      cpu: "800m"

cron:
  env:
    - name: SOLR_PORT
      value: "8080"
    - name: MAX_DB_CONNECTIONS
      value: "3"
    - name: MAX_IDLE_DB_CONNECTIONS
      value: "1"
    - name: INITIAL_DB_CONNECTIONS
      value: "0"
    - name: S3_ASSETSTORE_SUBFOLDER
      value: "our_folder"
    - name: S3_CONNECTION_TTL
      value: "120000"
    - name: S3_MAX_CONNECTIONS
      value: "50"
    - name: JAVA_OPTS
      value: "-xms32m -Xmx384m"
    - name: REST_EVENT_WEBHOOK_URL
      value: http://our.rest.service.url/dspace/v2/webhook
    - name: UTSLIB_FRAMEWORK_dspace_TOKEN
      value: OUR_TOKEN

我们的Dockerfile分为构建和运行时过程。 Dockerfile.build

FROM maven:3-jdk-8

# Modules that should be excluded from depdendency resolution
ARG EXCLUDE_MODULES=!dspace-rdf,!dspace-sword,!dspace-xmlui,!dspace-xmlui-mirage2

ENV dspace_VERSION=6.3 \
    dspace_SHA1=e60db8dee2726933fcc7b7949c16757a510a79c5

ENV ANT_VERSION=1.10.8
ENV ANT_HOME=/opt/ant-$ANT_VERSION
ENV PATH=$ANT_HOME/bin:$PATH \
    ANT_SHA1=20658b765bed8a7c3d18daa71a108e15d1937da2

workdir /dspace-src

# Download dspace source and install Ant
RUN curl -fSL "https://github.com/dspace/dspace/releases/download/dspace-${dspace_VERSION}/dspace-${dspace_VERSION}-src-release.tar.gz" -o dspace.tar.gz && \
    echo "${dspace_SHA1} *dspace.tar.gz" | sha1sum -c - && \
    tar -xz -f dspace.tar.gz --strip-components=1 && \
    rm -f dspace.tar.gz && \
    curl -fSL "https://archive.apache.org/dist/ant/binaries/apache-ant-${ANT_VERSION}-bin.tar.gz" -o ant.tar.gz && \
    echo "${ANT_SHA1} *ant.tar.gz" | sha1sum -c - && \
    mkdir ${ANT_HOME} && \
    tar -xz -f ant.tar.gz -C ${ANT_HOME} --strip-components=1 && \
    rm -rf ant.tar.gz

# copy in custom artifacts
copY ./src/artifacts/ ./artifacts

# copy in pom.xml files
copY ./src/dspace/pom.xml                          ./dspace/
copY ./src/dspace/modules/pom.xml                  ./dspace/modules/
copY ./src/dspace/modules/jspui/pom.xml            ./dspace/modules/jspui/
copY ./src/dspace/modules/utslib-copyright/pom.xml ./dspace/modules/utslib-copyright/
copY ./src/dspace/modules/utslib-taglib/pom.xml    ./dspace/modules/utslib-taglib/

# Install custom artifacts and prime the Maven repository 
RUN mvn clean install --batch-mode --fail-never -f ./artifacts/jris-master && \
    mvn install -P ${EXCLUDE_MODULES} --batch-mode --fail-never -T 5

Dockerfile.runtime:

ARG BUILD_IMAGE=our.git.url/dspace/build:latest

FROM ${BUILD_IMAGE} as build

# copy in our source changes
copY ./src/dspace ./dspace

# We don't use these modules,but they'll be built anyway if not excluded
ARG EXCLUDE_MODULES=!dspace-rdf,!dspace-sword

# Unzip the maxmind GeoLite database (IP location stuff for Solr).
# (maxmind changed their privacy policy so you Now have to login to download,# which makes it fail for the standard dspace installation)
# Build dspace with our source changes and move it to the installation directory
# Build only our customisations (skip building the specified modules)
# Could multithread the maven build,but there's dependency resolution problems
RUN tar -zxf ./dspace/config/GeoLite2-City_20191224.tar.gz --strip-components=1 -C ./dspace/config && \
    rm ./dspace/config/GeoLite2-City_20191224.tar.gz && \
    mvn package --batch-mode -P ${EXCLUDE_MODULES} -f ./dspace/pom.xml && \
    cd ./dspace/target/dspace-installer && \
    ant copy_webapps install_code

FROM tomcat:8-alpine
#FROM tomcat:8-jre8

ARG dspace_INSTALL_DIR=/dspace

ENV dspace_HOME=${dspace_INSTALL_DIR}

# copy built source into this image
copY --from=build ${dspace_INSTALL_DIR} ${dspace_INSTALL_DIR}

# copy in our config overrides
# (These are not used in compilation,but are applied at runtime)
copY ./src/local.cfg ${dspace_INSTALL_DIR}/config/

# Symlink all webapps and create temp upload directory
RUN ln -s ${dspace_INSTALL_DIR}/webapps/* ./webapps/

解决方法

在DSpace和Tomcat中实现了最详细的日志记录级别之后,可以获得有关Spring错误源的更多信息。

问题出在我们的自定义工厂类之一上。错误日志摘录:

 ./startFabric.sh javascript

有问题的属性是可写的,具有有效的getter和setter,并且getter和setter都是long类型。我删除了属性集代码,只是将其保留为默认值。部署有效。

简单升级k8s版本可能导致此错误的方法超出了我们的范围。在具有先前GKE版本的Pod中,完全可以执行相同的代码。