问题描述
我在kubernetes上部署了一个略带绿色的集群。
一切似乎都已启动并运行:
$ kubectl get pods:
NAME READY STATUS RESTARTS AGE
greenplum-operator-588d8fcfd8-nmgjp 1/1 Running 0 40m
svclb-greenplum-krdtd 1/1 Running 0 39m
svclb-greenplum-k28bv 1/1 Running 0 39m
svclb-greenplum-25n7b 1/1 Running 0 39m
segment-a-0 1/1 Running 0 39m
master-0 1/1 Running 0 39m
尽管如此,由于群集状态为Pending
,因此似乎有些问题:
$ kubectl describe greenplumclusters.greenplum.pivotal.io my-greenplum
Name: my-greenplum
Namespace: default
Labels: <none>
Annotations: <none>
API Version: greenplum.pivotal.io/v1
Kind: GreenplumCluster
Metadata:
Creation Timestamp: 2020-09-23T08:31:04Z
Finalizers:
stopcluster.greenplumcluster.pivotal.io
Generation: 2
Managed Fields:
API Version: greenplum.pivotal.io/v1
Fields Type: FieldsV1
fieldsV1:
f:Metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:masterandStandby:
.:
f:antiAffinity:
f:cpu:
f:hostBasedAuthentication:
f:memory:
f:standby:
f:storage:
f:storageClassName:
f:workerSelector:
f:segments:
.:
f:antiAffinity:
f:cpu:
f:memory:
f:mirrors:
f:primarySegmentCount:
f:storage:
f:storageClassName:
f:workerSelector:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2020-09-23T08:31:04Z
API Version: greenplum.pivotal.io/v1
Fields Type: FieldsV1
fieldsV1:
f:Metadata:
f:finalizers:
f:status:
.:
f:instanceImage:
f:operatorVersion:
f:phase:
Manager: greenplum-operator
Operation: Update
Time: 2020-09-23T08:31:11Z
Resource Version: 590
Self Link: /apis/greenplum.pivotal.io/v1/namespaces/default/greenplumclusters/my-greenplum
UID: 72ed72a8-4dd9-48fb-8a48-de2229d88a24
Spec:
Master And Standby:
Anti Affinity: no
cpu: 0.5
Host Based Authentication: # host all gpadmin 0.0.0.0/0 trust
Memory: 800Mi
Standby: no
Storage: 1G
Storage Class Name: local-path
Worker Selector:
Segments:
Anti Affinity: no
cpu: 0.5
Memory: 800Mi
Mirrors: no
Primary Segment Count: 1
Storage: 2G
Storage Class Name: local-path
Worker Selector:
Status:
Instance Image: registry.localhost:5000/greenplum-for-kubernetes:v2.2.0
Operator Version: registry.localhost:5000/greenplum-operator:v2.2.0
Phase: Pending
Events: <none>
如您所见:
阶段:待定
我查看了操作员日志:
{"level":"DEBUG","ts":"2020-09-23T09:12:18.494Z","logger":"podexec","msg":"master-0 is not active master","namespace":"default","error":"command terminated with exit code 2"}
{"level":"DEBUG","ts":"2020-09-23T09:12:18.497Z","msg":"master-1 is not active master","error":"pods \"master-1\" not found"}
{"level":"DEBUG","logger":"controllers.GreenplumCluster","msg":"current active master","greenplumcluster":"default/my-greenplum","activeMaster":""}
我不太清楚它们的意思...
我的意思是,似乎正在寻找两个大师:master-0
和master-1
。如您所见,我只将一个主服务器和一个网段一起部署。
greenplum群集清单是:
apiVersion: "greenplum.pivotal.io/v1"
kind: "GreenplumCluster"
Metadata:
name: my-greenplum
spec:
masterandStandby:
hostBasedAuthentication: |
# host all gpadmin 0.0.0.0/0 trust
memory: "800Mi"
cpu: "0.5"
storageClassName: local-path
storage: 1G
workerSelector: {}
segments:
primarySegmentCount: 1
memory: "800Mi"
cpu: "0.5"
storageClassName: local-path
storage: 2G
workerSelector: {}
Master正在记录此信息:
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance master-0 directory /greenplum/data-1
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Command pg_ctl reports Master master-0 instance active
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20200923:11:29:27:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1,attempt 1/4
20200923:11:29:42:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1,attempt 2/4
20200923:11:29:57:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1,attempt 3/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1,attempt 4/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Failed to connect to template1
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-No standby master configured. skipping...
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-Check status of database with gpstate utility
20200923:11:30:12:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
简而言之:
连接到template1的超时已过期
完整的master-0日志:
*******************************
Initializing Greenplum for Kubernetes Cluster
*******************************
*******************************
Generating gpinitsystem_config
*******************************
{"level":"INFO","ts":"2020-09-23T11:28:58.394Z","logger":"startGreenplumContainer","msg":"initializing Greenplum Cluster"}
Sub Domain for the cluster is: agent.greenplum-1.svc.cluster.local
*******************************
Running gpinitsystem
*******************************
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking configuration parameters,please wait...
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Locale has not been set in,will set to default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Locale set to en_US.utf8
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-ARRAY_NAME variable not set,will provide default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-Master hostname master-0.agent.greenplum-1.svc.cluster.local does not match hostname output
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking to see if master-0.agent.greenplum-1.svc.cluster.local can be resolved on this host
Warning: Permanently added the RSA host key for IP address '10.42.2.5' to the list of kNown hosts.
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Can resolve master-0.agent.greenplum-1.svc.cluster.local to this host
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-No DATABASE_NAME set,will exit following template1 updates
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-CHECK_POINT_SEGMENTS variable not set,will set to default value
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[WARN]:-ENCODING variable not set,will set to default UTF-8
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-MASTER_MAX_CONNECT not set,will set to default value 250
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Detected a single host GPDB array build,reducing value of BATCH_DEFAULT from 60 to 4
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking configuration parameters,Completed
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking Master host
20200923:11:28:58:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking new segment hosts,please wait...
Warning: Permanently added the RSA host key for IP address '10.42.1.5' to the list of kNown hosts.
{"level":"DEBUG","ts":"2020-09-23T11:28:59.038Z","logger":"DNS resolver","msg":"resolved DNS entry","host":"segment-a-0"}
{"level":"INFO","logger":"keyscanner","msg":"starting keyscan","host":"segment-a-0"}
20200923:11:28:59:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Checking new segment hosts,Completed
{"level":"INFO","ts":"2020-09-23T11:28:59.064Z","msg":"keyscan successful","host":"segment-a-0"}
20200923:11:28:59:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Building the Master instance database,please wait...
20200923:11:29:02:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Found more than 1 instance of shared_preload_libraries in /greenplum/data-1/postgresql.conf,will append
20200923:11:29:02:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Starting the Master in admin mode
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Spawning parallel processes batch [1],please wait...
.
20200923:11:29:03:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Waiting for parallel processes batch [1],please wait...
......
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Parallel process exit status
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as completed = 1
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as killed = 0
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Total processes marked as Failed = 0
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:------------------------------------------------
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Deleting distributed backout files
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Removing back out file
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-No errors generated from parallel processes
20200923:11:29:09:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Starting gpstop with args: -a -l /home/gpadmin/gpAdminLogs -m -d /greenplum/data-1
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 6.10.1 build commit:efba04ce26ebb29b535a255a5e95d1f5ebfde94e'
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Master segment instance directory=/greenplum/data-1
20200923:11:29:09:001357 gpstop:master-0:gpadmin-[INFO]:-Stopping master segment and waiting for user connections to finish ...
server shutting down
20200923:11:29:10:001357 gpstop:master-0:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20200923:11:29:10:001357 gpstop:master-0:gpadmin-[INFO]:-Terminating processes for segment /greenplum/data-1
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Starting gpstart with args: -a -l /home/gpadmin/gpAdminLogs -d /greenplum/data-1
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Gathering information and validating the environment...
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.10.1 build commit:efba04ce26ebb29b535a255a5e95d1f5ebfde94e'
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20200923:11:29:10:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance in admin mode
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Setting new master era
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Master Started...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Shutting down master
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Commencing parallel segment instance startup,please wait...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Process results...
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:- Successful segment starts = 1
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:- Failed segment starts = 0
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Successfully started 1 of 1 segment instances
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-----------------------------------------------------
20200923:11:29:11:001380 gpstart:master-0:gpadmin-[INFO]:-Starting Master instance master-0 directory /greenplum/data-1
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Command pg_ctl reports Master master-0 instance active
20200923:11:29:12:001380 gpstart:master-0:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=15
20200923:11:29:27:001380 gpstart:master-0:gpadmin-[WARNING]:-Timeout expired connecting to template1,attempt 4/4
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[WARNING]:-Failed to connect to template1
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-No standby master configured. skipping...
20200923:11:30:12:001380 gpstart:master-0:gpadmin-[INFO]:-Check status of database with gpstate utility
20200923:11:30:12:000095 gpinitsystem:master-0:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
有什么想法吗?
解决方法
这些天我在 kubernetes 上部署了 greenplum。
我的问题是 cgroup 目录的权限。当我查看 Pod 中 /greenplum/data1/pg_log/ 下的文件时,我发现它打印出诸如“无法访问目录”/sys/fs/cgroup/memory/gpdb/“之类的错误。因为 Pod 使用了 hostPath
。
我的建议是在/greenplum/data1下的文件中打印错误/pg_log/.
Pod 的日志并不是全部事实。
顺便说一句,我最后使用了 v0.8.0。我先选择v2.3.0,但是master在准备好的时候很快就被杀死了,可能是被Docker杀死了。日志就像'收到快速关机请求。 ic-proxy-server: 收到信号 15'