今天监控一直报等待事件异常,查了下数据库基本都是gc buffer busy acquire等待事件。这个等待事件之前一直没接触过,今天特意了解下。
参考文档:Oracle Mos
一、简要定义
该等待事件仅适用于RAC环境,类似于非RAC环境中的"buffer busy"等待。
在11.1和更早版本中,这种类型的等待被分为"gc buffer busy"等待。
从Oracle 11.2开始"gc buffer busy"等待被分为两个新的等待类别:
- gc buffer busy acquire
- gc buffer busy release
gc buffer busy acquire:是当session 1尝试请求访问远程实例(remote instance)的buffer,但是session 1之前已经有相同实例上的另外一个session 2正在请求访问了相同的buffer,并且没有完成,那么session 1就是在等待gc buffer busy acquire。
gc buffer busy release:是在本地实例session 1之前已经有远程实例session 2请求访问了本地实例的相同buffer,并且没有完成,那么本地实例的session 1就是在等待gc buffer busy release。
二、一般原因
- High contention in particular HOT blocks of the objects
- Other waits like "gc block busy" and "enq: TX - row lock contention
- High network latency or a problem with network
-
Busy server or active paging/swapping due to low free memory
Individual waits-(用于在GV$SESSION_WAIT中看到的等待)
P1 File # P2 Block # P3 Mode requested/mode held/block class SECONDS_IN_WAIT Amount of time waited for the current event file# This is the file# of the file that Oracle is trying to read from. block# This is the starting block number in the file from where Oracle starts reading the blocks. blocks This parameter specifies the number of blocks that Oracle is rying to read from the file# starting at block# Inst_id instance number To determine the root blocker for sessions waiting on the gc wait events use the below options 1.system state dump at cluster level 2. oratop displays waiters/blockers 3. v$wait_chains can be used to find the root blocker for sessions that are blocked,Troubleshooting Database Contention With V$Wait_Chains (Doc ID 1428210.1) 4. Using v$hang_info, v$hang_session_info, etc 5. Oracle Hang Manager (Doc ID 1534591.1) Using the above information we can find the sessions waiting for specific gc events with their final blockers at instance level
如果等待缓冲区花费的时间较长,则需要根据以下内容确定哪个段遭受争用: SELECT inst_id, sid, event, wait_class, P1, P2, P3 Mode requested / mode held / block class, seconds_in_wait FROM gv$session_wait WHERE event LIKE 'gc buffer%'; 从前面的输出中,使用P1和P2中的数据,可以使用以下命令获得相关的对象信息以下查询: SELECT segment_name FROM dba_extents WHERE file_id = &file AND &block BETWEEN block_id AND block_id + blocks - 1 AND ROWNUM = 1;
三、故障排查
1)特定HOT块的争用较高
这是由于大量并发插入导致过多的索引块拆分或带有从序列生成的键的右增长索引。
buffer busy 会频繁伴随着这一点。如果问题仍然存在,可以使用
System Wide wait-(用于在V$SYstem_EVNET中看到的等待)说明寻找热块。或者从问题时期的AWR报告的
Segments by Global Cache Buffer Busy获取问题segment。
2)gc block busy、enq: TX - row lock contention以及其他等待可能会影响阻止会话或者LMS进程。
如果还有其他等待可能会使块的持有者放慢速度,则解决该问题是当务之急,因为gc buffer busy acquire/release可能只是该等待的副作用。
3)高网络延迟或网络问题
发出
"ping -s 10000 <数据库使用的HAIP IP地址或私有IP地址>"并按照文档执行网络检查(
How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1))
对于过去发生的问题的RCA,请检查OSWatcher以获取ping延迟时间。
AWR报告将包含
Interconnect Ping Latency Stats,这对于检查网络延迟也很有用。
OSWatcher中的netstat和CHM输出中的Nic&Protocol部分可以提供有关网络运行状况的信息。
对于过去发生的问题的RCA,请检查CHM或OSWatcher输出。
5)低效sql语句
低效sql语句会导致不必要的buffer被请求访问,增加了buffer busy的机会。在AWR中可以找到TOP sql。解决方法可以优化sql语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。
关于select是否会导致gc buffer busy acquire:
6)数据在节点间交叉访问
7)Oracle Bug
四、可能的解决方案
对于高争用和热块:
Solution is to reorganize the index in a way to avoid the contention or hot spots using the below options I. Global Hash partition the index CREATE INDEX hgidx ON tab (c1,c2,c3) GLOBAL PARTITION BY HASH (c1,c2) (PARTITION p1 TABLESPACE tbs_1, PARTITION p2 TABLESPACE tbs_2, PARTITION p3 TABLESPACE tbs_3, PARTITION p4 TABLESPACE tbs_4); II. Recreate the index as reverse key index (not suitable for large table, Could require buffer cache increased accordingly) III. If index key is generated from a sequence, increase cache size of the sequence and make the sequence 'no order' if application supports it. Refer the doc link: http://docs.oracle.com/database/121/vldbG/GUID-BF3F38E1-62BB-4EE3-86C1-A2EF8A258B1F.htm#vldbG1089
对于enq: TX - row lock contention:
Mode 4-Related to ITL waits 从AWR报告或使用以下sql查找具有较高ITL等待的段: SELECT OWNER, OBJECT_NAME, OBJECT_TYPE FROM V$SEGMENT_STATISTICS WHERE STATISTIC_NAME = 'ITL waits' AND VALUE > 0 ORDER BY VALUE; 增加这些高ITL等待的segment的inittrans值 Mode 6-Primarily due to application issue: 这是一个应用程序问题,需要应用程序开发人员来调查所涉及的sql语句。 以下文档可能有助于进一步深入研究: Note:102925.1 - Tracing sessions: waiting on an enqueue Note:179582.1 - How to Find TX Enqueue Contention in RAC or OPS Note:1020008.6 - SCRIPT: FULLY DECODED LOCKING Note:62354.1 - TX Transaction locks - Example wait scenarios Note:224305.1 -Autonomous Transaction can cause locking
How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1)
How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1).pdf