RAC等待事件：gc buffer busy acquire

今天监控一直报等待事件异常，查了下数据库基本都是gc buffer busy acquire等待事件。这个等待事件之前一直没接触过，今天特意了解下。

参考文档：Oracle Mos

一、简要定义

该等待事件仅适用于RAC环境，类似于非RAC环境中的"buffer busy"等待。

当会话正在等待访问另外一个会话正在适用和持有的块且无法共享该块时，会发生这种情况。多个会话可能会排队等待同一个块。

在11.1和更早版本中，这种类型的等待被分为"gc buffer busy"等待。

从Oracle 11.2开始"gc buffer busy"等待被分为两个新的等待类别：

gc buffer busy acquire
gc buffer busy release

gc buffer busy acquire：是当session 1尝试请求访问远程实例(remote instance)的buffer，但是session 1之前已经有相同实例上的另外一个session 2正在请求访问了相同的buffer，并且没有完成，那么session 1就是在等待gc buffer busy acquire。

gc buffer busy release：是在本地实例session 1之前已经有远程实例session 2请求访问了本地实例的相同buffer，并且没有完成，那么本地实例的session 1就是在等待gc buffer busy release。

二、一般原因

High contention in particular HOT blocks of the objects
Other waits like "gc block busy" and "enq: TX - row lock contention
High network latency or a problem with network
Busy server or active paging/swapping due to low free memory

Individual waits-（用于在GV$SESSION_WAIT中看到的等待）

P1                 File #
P2                 Block #
P3                 Mode requested/mode held/block class
SECONDS_IN_WAIT    Amount of time waited for the current event
file#              This is the file# of the file that Oracle is trying to read from.
block#             This is the starting block number in the file from where Oracle starts reading the blocks.
blocks             This parameter specifies the number of blocks that Oracle is rying to read from the file# starting at block#
Inst_id            instance number
To determine the root blocker for sessions waiting on the gc wait events use the below options
1.system state dump at cluster level
2. oratop displays waiters/blockers
3. v$wait_chains can be used to find the root blocker for sessions that are blocked,Troubleshooting Database Contention With V$Wait_Chains (Doc ID 1428210.1)
4. Using v$hang_info, v$hang_session_info, etc
5. Oracle Hang Manager (Doc ID 1534591.1)
Using the above information we can find the sessions waiting for specific gc events with their final blockers at instance level

System Wide wait-（用于在V$SYstem_EVNET中看到的等待）

如果等待缓冲区花费的时间较长，则需要根据以下内容确定哪个段遭受争用:
SELECT inst_id,
       sid,
       event,
       wait_class,
       P1,
       P2,
       P3 Mode requested / mode held / block class,
       seconds_in_wait
  FROM gv$session_wait
 WHERE event LIKE 'gc buffer%';
从前面的输出中，使用P1和P2中的数据，可以使用以下命令获得相关的对象信息以下查询：
SELECT segment_name
  FROM dba_extents
 WHERE file_id = &file
   AND &block BETWEEN block_id AND block_id + blocks - 1
   AND ROWNUM = 1;

三、故障排查

1）特定HOT块的争用较高

这是由于大量并发插入导致过多的索引块拆分或带有从序列生成的键的右增长索引。

buffer busy 会频繁伴随着这一点。如果问题仍然存在，可以使用 System Wide wait-（用于在V$SYstem_EVNET中看到的等待）说明寻找热块。或者从问题时期的AWR报告的 Segments by Global Cache Buffer Busy获取问题segment。

2）gc block busy、enq: TX - row lock contention以及其他等待可能会影响阻止会话或者LMS进程。

如果还有其他等待可能会使块的持有者放慢速度，则解决该问题是当务之急，因为gc buffer busy acquire/release可能只是该等待的副作用。

检查AWR报告中的 Top 10 Foreground Events by Total Wait Time部分，以查看其他等待是否显着影响数据库的性能。

3）高网络延迟或网络问题

发出 "ping -s 10000 <数据库使用的HAIP IP地址或私有IP地址>"并按照文档执行网络检查( How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1))

对于过去发生的问题的RCA，请检查OSWatcher以获取ping延迟时间。

AWR报告将包含 Interconnect Ping Latency Stats，这对于检查网络延迟也很有用。

OSWatcher中的netstat和CHM输出中的Nic＆Protocol部分可以提供有关网络运行状况的信息。

4）繁忙的服务器或活动的页面调度/交换（由于可用内存不足）

检查vmstat输出或CHM输出，以查看服务器是否繁忙或大量的分页/交换。

对于过去发生的问题的RCA，请检查CHM或OSWatcher输出。

5）低效sql语句

低效sql语句会导致不必要的buffer被请求访问，增加了buffer busy的机会。在AWR中可以找到TOP sql。解决方法可以优化sql语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。

关于select是否会导致gc buffer busy acquire：

查询一般以shared模式请求buffer，但是如果buffer不在buffer cache中，那么需要IO将buffer 读到内存中，这个过程需要以exclusive模式，如果同时有大量其他的session也请求查询该buffer（以shared 模式请求），那么就会有buffer等待了，此时可能buffer cache不够大。
如果查询请求的block已经被修改了，查询需要访问CR块，为了重构CR块，需要读取对应的undo block，如果undo block不在buffer中，需要IO把undo block读到内存，如果有大量查询访问这个CR块，那么都会有buffer busy等待了。

6）数据在节点间交叉访问

RAC数据库，同一数据在不同数据库实例上被请求访问。

如果应用程序可以实现，那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问，避免同一数据被多个实例交叉访问，可以减少buffer的争用，避免gc等待。

7）Oracle Bug

四、可能的解决方案

对于高争用和热块：

Solution is to reorganize the index in a way to avoid the contention or hot spots using the below options
I. Global Hash partition the index
CREATE INDEX hgidx ON tab (c1,c2,c3) GLOBAL
     PARTITION BY HASH (c1,c2)
     (PARTITION p1  TABLESPACE tbs_1,
      PARTITION p2  TABLESPACE tbs_2,
      PARTITION p3  TABLESPACE tbs_3,
      PARTITION p4  TABLESPACE tbs_4);
II. Recreate the index as reverse key index (not suitable for large table, Could require buffer cache increased accordingly)
III. If index key is generated from a sequence, increase cache size of the sequence and make the sequence 'no order' if application supports it.
Refer the doc link: http://docs.oracle.com/database/121/vldbG/GUID-BF3F38E1-62BB-4EE3-86C1-A2EF8A258B1F.htm#vldbG1089

对于enq: TX - row lock contention：

Mode 4-Related to ITL waits
从AWR报告或使用以下sql查找具有较高ITL等待的段:
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
  FROM V$SEGMENT_STATISTICS
 WHERE STATISTIC_NAME = 'ITL waits'
   AND VALUE > 0
 ORDER BY VALUE;
增加这些高ITL等待的segment的inittrans值
Mode 6-Primarily due to application issue:
这是一个应用程序问题，需要应用程序开发人员来调查所涉及的sql语句。 以下文档可能有助于进一步深入研究：
Note:102925.1 - Tracing sessions: waiting on an enqueue
Note:179582.1 - How to Find TX Enqueue Contention in RAC or OPS
Note:1020008.6 - SCRIPT: FULLY DECODED LOCKING
Note:62354.1 - TX Transaction locks - Example wait scenarios
Note:224305.1 -Autonomous Transaction can cause locking

How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1)

RAC等待事件：gc buffer busy acquire

How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1).pdf

RAC等待事件：gc buffer busy acquire

相关文章