一些映射器反复报告 UgiMetrics

问题描述

我运行了一个 Hive 作业,它在 map: 34%,reduce 0% 处卡住了 1 个多小时。

根据日志,这个应用有 60 个映射器,其中 19 个在 1 分钟内成功,而其他 41 个映射器卡住了。

我在所有 60 个映射器中找到了一个例外:

2021-06-28 11:30:17,172 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.remoteexception(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error

这是一个例外,但 19 个映射器似乎忽略了它并最终在 1 分钟内完成。虽然其他 41 个映射器也报告了此异常,但在此之后,他们 每10秒重复报告以下语句,直到被手动杀死:

2021-06-28 11:30:15,560 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Initializing operator MAP[0]
2021-06-28 11:30:16,177 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
<MAP>Id =0
  <Children>null
  <\Children>
  <Parent><\Parent>
<\MAP>
2021-06-28 11:30:16,191 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanoperator: Initializing operator TS[0]
2021-06-28 11:30:16,191 INFO [main] org.apache.hadoop.hive.ql.exec.FilterOperator: Initializing operator FIL[20]
2021-06-28 11:30:16,832 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[2]
2021-06-28 11:30:16,833 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<content:string,pt_d:string>
2021-06-28 11:30:16,850 INFO [main] org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator: Initializing operator LVF[3]
2021-06-28 11:30:16,850 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[4]
2021-06-28 11:30:16,850 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:string>
2021-06-28 11:30:16,850 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[5]
2021-06-28 11:30:16,851 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:string>
2021-06-28 11:30:16,852 INFO [main] org.apache.hadoop.hive.ql.exec.UDTFOperator: Initializing operator UDTF[6]
2021-06-28 11:30:16,852 INFO [main] org.apache.hadoop.hive.ql.exec.LateralViewJoinoperator: Initializing operator LVJ[7]
2021-06-28 11:30:16,852 INFO [main] org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator: Initializing operator LVF[8]
2021-06-28 11:30:16,852 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[9]
2021-06-28 11:30:16,852 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col1:string,_col2:string,_col3:string,_col4:string>
2021-06-28 11:30:16,853 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[10]
2021-06-28 11:30:16,864 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col1:string,868 INFO [main] org.apache.hadoop.hive.ql.exec.UDTFOperator: Initializing operator UDTF[11]
2021-06-28 11:30:16,870 INFO [main] org.apache.hadoop.hive.ql.exec.LateralViewJoinoperator: Initializing operator LVJ[12]
2021-06-28 11:30:16,870 INFO [main] org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator: Initializing operator LVF[13]
2021-06-28 11:30:16,870 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[14]
2021-06-28 11:30:16,870 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col4:string,_col5:string>
2021-06-28 11:30:16,870 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[15]
2021-06-28 11:30:16,871 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col4:string,871 INFO [main] org.apache.hadoop.hive.ql.exec.UDTFOperator: Initializing operator UDTF[16]
2021-06-28 11:30:16,871 INFO [main] org.apache.hadoop.hive.ql.exec.LateralViewJoinoperator: Initializing operator LVJ[17]
2021-06-28 11:30:16,871 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[18]
2021-06-28 11:30:16,_col6:string,_col7:string,_col8:string>
2021-06-28 11:30:16,871 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing operator FS[19]
2021-06-28 11:30:16,871 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead,use mapreduce.task.attempt.id
2021-06-28 11:30:16,878 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Using serializer : class org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe[[[B@681adc8f]:[gamename,content,eventtime,msgid,sendtime,eventid]:[string,string,string]] and formatter : org.apache.hadoop.hive.ql.io.HiveIgnoreKeytextoutputFormat@3506d826
2021-06-28 11:30:16,878 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.healthChecker.script.timeout is deprecated. Instead,use mapreduce.tasktracker.healthchecker.script.timeout
2021-06-28 11:30:16,886 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = hdfs://hz-cluster8/tmp/hive/public/3fceb1fd-b42b-4b61-9657-650266f8a9a7/hive_2021-06-28_11-29-28_305_1555474801287271013-51796/-mr-10005/47eb8974-dd5f-4107-b277-55edfacefdd7/map.xml
2021-06-28 11:30:17,032 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 1
2021-06-28 11:30:17,159 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://hz-cluster8/hive-staging/.hive-staging_hive_2021-06-28_11-29-28_305_1555474801287271013-51363/_tmp.-ext-10003/000007_0
2021-06-28 11:30:17,159 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://hz-cluster8/hive-staging/.hive-staging_hive_2021-06-28_11-29-28_305_1555474801287271013-51363/_task_tmp.-ext-10003/_tmp.000007_0
2021-06-28 11:30:17,159 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://hz-cluster8/hive-staging/.hive-staging_hive_2021-06-28_11-29-28_305_1555474801287271013-51363/_tmp.-ext-10003/000007_0
2021-06-28 11:30:17,172 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.remoteexception(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2021-06-28 11:30:17,281 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.remoteexception(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2021-06-28 11:30:17,831 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[19]: records written - 1
2021-06-28 11:30:17,905 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[19]: records written - 10
2021-06-28 11:30:18,242 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 10
2021-06-28 11:30:18,411 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[19]: records written - 100
2021-06-28 11:30:19,886 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851019884 ugi.UgiMetrics: Context=ugi,Hostname=hadoop4079.jd.163.org,LoginSuccessNumOps=0,LoginSuccessAvgTime=0.0,LoginFailureNumOps=0,LoginFailureAvgTime=0.0,GetGroupsNumOps=0,GetGroupsAvgTime=0.0,renewalFailuresTotal=0,renewalFailures=01624851019884 jvm.JvmMetrics: Context=jvm,ProcessName=jvm_1624331736154_694811_m_100055558127628,SessionId=,MemNonHeapUsedM=48.702858,MemNonHeapCommittedM=49.460938,MemNonHeapMaxM=-1.0,MemHeapUsedM=107.52921,MemHeapCommittedM=2025.25,MemHeapMaxM=2025.25,MemmaxM=2025.25,GcCountcopy=1,GcTimeMilliscopy=30,GcCountMarkSweepCompact=2,GcTimeMillisMarkSweepCompact=335,GcCount=3,GcTimeMillis=365,ThreadsNew=0,ThreadsRunnable=4,ThreadsBlocked=0,ThreadsWaiting=4,ThreadsTimedWaiting=9,ThreadsTerminated=0,LogFatal=0,LogError=0,LogWarn=5,LogInfo=691624851019885 metricssystem.MetricsSystem: Context=metricssystem,NumActiveSources=2,NumAllSources=2,NumActiveSinks=1,NumAllSinks=0,Sink_fileNumOps=0,Sink_fileAvgTime=0.0,Sink_fileDropped=0,Sink_fileQsize=0,SnapshotNumOps=2,SnapshotAvgTime=0.5,PublishNumOps=0,PublishAvgTime=0.0,DroppedPubAll=0
2021-06-28 11:30:29,885 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851029883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851029883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=48.78595,MemNonHeapCommittedM=49.585938,MemHeapUsedM=128.7804,LogInfo=701624851029884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=1,Sink_fileAvgTime=2.0,SnapshotNumOps=5,SnapshotAvgTime=0.3333333333333333,PublishNumOps=1,DroppedPubAll=0
2021-06-28 11:30:39,885 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851039884 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851039884 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=48.784737,LogInfo=711624851039884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=2,SnapshotNumOps=8,PublishNumOps=2,DroppedPubAll=0
2021-06-28 11:30:49,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851049883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851049884 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=48.80197,MemNonHeapCommittedM=49.835938,MemHeapUsedM=132.63693,LogInfo=721624851049884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=3,Sink_fileAvgTime=1.0,SnapshotNumOps=11,SnapshotAvgTime=0.0,PublishNumOps=3,DroppedPubAll=0
2021-06-28 11:30:59,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851059883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851059883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=48.59202,LogInfo=731624851059884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=4,SnapshotNumOps=14,PublishNumOps=4,DroppedPubAll=0
2021-06-28 11:31:09,883 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851069883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851069883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=48.63655,MemHeapUsedM=132.80693,LogInfo=741624851069883 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=5,SnapshotNumOps=17,PublishNumOps=5,DroppedPubAll=0
2021-06-28 11:31:16,214 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[19]: records written - 1000
2021-06-28 11:31:19,886 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851079885 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851079885 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.81852,MemNonHeapCommittedM=50.585938,MemHeapUsedM=68.14816,GcCountcopy=2,GcTimeMilliscopy=90,GcCount=4,GcTimeMillis=425,ThreadsRunnable=5,ThreadsTimedWaiting=8,LogInfo=761624851079885 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=6,SnapshotNumOps=20,PublishNumOps=6,DroppedPubAll=0
2021-06-28 11:31:29,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851089884 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851089884 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.844147,MemNonHeapCommittedM=50.648438,MemHeapUsedM=70.10012,LogInfo=771624851089884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=7,SnapshotNumOps=23,PublishNumOps=7,DroppedPubAll=0
2021-06-28 11:31:39,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851099883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851099883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.855232,LogInfo=781624851099884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=8,SnapshotNumOps=26,PublishNumOps=8,DroppedPubAll=0
2021-06-28 11:31:49,883 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851109883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851109883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.822273,MemHeapUsedM=72.0617,LogInfo=791624851109883 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=9,SnapshotNumOps=29,PublishNumOps=9,DroppedPubAll=0
2021-06-28 11:31:59,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851119883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851119883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.8097,MemHeapUsedM=72.45921,LogInfo=801624851119884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=10,SnapshotNumOps=32,PublishNumOps=10,DroppedPubAll=0
2021-06-28 11:32:09,883 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851129883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851129883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.73227,LogInfo=811624851129883 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=11,SnapshotNumOps=35,PublishNumOps=11,DroppedPubAll=0
2021-06-28 11:32:19,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851139883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851139883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.535156,MemHeapUsedM=94.546326,LogInfo=821624851139884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=12,SnapshotNumOps=38,PublishNumOps=12,945 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 100
2021-06-28 11:32:29,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851149883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851149883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.664505,MemNonHeapCommittedM=50.710938,MemHeapUsedM=181.03868,LogInfo=841624851149883 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=13,SnapshotNumOps=41,PublishNumOps=13,DroppedPubAll=0
2021-06-28 11:32:39,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851159883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851159884 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.727936,MemNonHeapCommittedM=50.960938,LogInfo=851624851159884 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=14,SnapshotNumOps=44,PublishNumOps=14,DroppedPubAll=0
2021-06-28 11:32:49,884 INFO [file] org.apache.hadoop.metrics2.sink.Slf4jSink: 1624851169883 ugi.UgiMetrics: Context=ugi,renewalFailures=01624851169883 jvm.JvmMetrics: Context=jvm,MemNonHeapUsedM=49.739174,MemHeapUsedM=184.40704,LogInfo=861624851169883 metricssystem.MetricsSystem: Context=metricssystem,Sink_fileNumOps=15,SnapshotNumOps=47,PublishNumOps=15,DroppedPubAll=0

...

为什么会发生这种情况,为什么有些映射器会忽略此异常? 以下是应用程序主日志的一部分。它还包含大量重复的语句,有什么问题,他们在等什么?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)