postgresql的buffer descriptor

本文原创为freas_1990,转载请标明出处:http://www.jb51.cc/article/p-zzldlbjr-yu.html

在Oracle里,曾经被itpub吵得沸沸扬扬的buffer,buffer handle概念已经逐渐冷淡下来。当年的热闹也仅仅停留在官方文档,以及一些猜测层面。

现在Oracle在互联网公司(主要是阿里巴巴)已经被逐渐铲除。开源技术攻城略地之势越见明显。

为了纪念曾经的热闹,我们从开源的postgresql层面来看一下buffer descriptor是什么概念吧。

基于原代码分析,不贴文档,不掉书袋,直接贴源代码。

/*
 *  struct sbufdesc -- shared buffer cache metadata for a single
 *		       shared buffer descriptor.
 *
 *	We keep the name of the database and relation in which this
 *	buffer appears in order to avoid a catalog lookup on cache
 *	flush if we don't have the reldesc in the cache.  It is also
 *	possible that the relation to which this buffer belongs is
 *	not visible to all backends at the time that it gets flushed.
 *	Dbname,relname,dbid,and relid are enough to determine where
 *	to put the buffer,for all storage managers.
 */

struct sbufdesc {
    Buffer		freeNext;	/* link for freelist chain */
    Buffer		freePrev;
    SHMEM_OFFSET	data;		/* pointer to data in buf pool */

    /* tag and id must be together for table lookup to work */
    BufferTag		tag;		/* file/block identifier */
    int			buf_id;		/* maps global desc to local desc */

    BufFlags		flags;    	/* described below */
    int16		bufsmgr;	/* storage manager id for buffer */
    unsigned		refcount;	/* # of times buffer is pinned */

    char sb_dbname[NAMEDATALEN+1];	/* name of db in which buf belongs */
    char sb_relname[NAMEDATALEN+1];	/* name of reln */
#ifdef HAS_TEST_AND_SET
    /* can afford a dedicated lock if test-and-set locks are available */
    slock_t	io_in_progress_lock;
#endif /* HAS_TEST_AND_SET */

    /*
     * I padded this structure to a power of 2 (128 bytes on a MIPS) because
     * BufferDescriptorGetBuffer is called a billion times and it does an
     * C pointer subtraction (i.e.,"x - y" -> array index of x relative
     * to y,which is calculated using division by struct size).  Integer
     * ".div" hits you for 35 cycles,as opposed to a 1-cycle "sra" ...
     * this hack cut 10% off of the time to create the Wisconsin database!
     * It eats up more shared memory,of course,but we're (allegedly)
     * going to make some of these types bigger soon anyway... -pma 1/2/93
     */

/* NO spinlock */

#if defined(PORTNAME_ultrix4)
    char		sb_pad[60];	/* no slock_t */
#endif /* mips */

/* HAS_TEST_AND_SET -- platform dependent size */

#if defined(PORTNAME_aix)
    char		sb_pad[44];	/* typedef unsigned int slock_t; */
#endif /* aix */
#if defined(PORTNAME_alpha)
    char		sb_pad[40];	/* typedef msemaphore slock_t; */
#endif /* alpha */
#if defined(PORTNAME_hpux)
    char		sb_pad[44];	/* typedef struct { int sem[4]; } slock_t; */
#endif /* hpux */
#if defined(PORTNAME_irix5)
    char		sb_pad[44];	/* typedef abilock_t slock_t; */
#endif /* irix5 */
#if defined(PORTNAME_next)
    char		sb_pad[56];	/* typedef struct mutex slock_t; */
#endif /* next */

/* HAS_TEST_AND_SET -- default 1 byte spinlock */

#if defined(PORTNAME_BSD44_derived) || \
    defined(PORTNAME_bsdi) || \
    defined(PORTNAME_bsdi_2_1) || \
    defined(PORTNAME_i386_solaris) || \
    defined(PORTNAME_linux) || \
    defined(PORTNAME_sparc) || \
    defined(PORTNAME_sparc_solaris)
    char		sb_pad[56];	/* has slock_t */
#endif /* 1 byte slock_t */
};


开头的两个域:freeNext,freePrev是典型的双向链表知识,不多做阐述。

SHMEM_OFFSET data;定义了当前buffer的内存地址。由于buffer是在共享内存内,这个地址其实是一个“unsigned long”类型(参考postgresql共享内存之——分片(slice))。

refcount这个域是一个老概念了,在redis源代码解析系列里曾经提到过,也就是这个buffer(或者内存对象)被引用的次数。postgresql(或者Oracle)里的buffer pin即起源于这里。

当refcount非0时,说明当前buffer正在被使用,此时,应该被pin住。

/* HAS_TEST_AND_SET -- default 1 byte spinlock */

#if defined(PORTNAME_BSD44_derived) || \
    defined(PORTNAME_bsdi) || \
    defined(PORTNAME_bsdi_2_1) || \
    defined(PORTNAME_i386_solaris) || \
    defined(PORTNAME_linux) || \
    defined(PORTNAME_sparc) || \
    defined(PORTNAME_sparc_solaris)
    char		sb_pad[56];	/* has slock_t */
#endif /* 1 byte slock_t */

这里为每个平台预留了1 byte的空间用做“自旋锁”。

顺便提一下。Oracle 10g里采用了mutex机制,并与之前的latch机制做了比较。窃以为,没有读到源代码,光看一些官方宣传文档,对mutex和latch机制做出评价都是在瞎扯淡。

相关文章

文章浏览阅读601次。Oracle的数据导入导出是一项基本的技能,...
文章浏览阅读553次。开头还是介绍一下群,如果感兴趣polardb...
文章浏览阅读3.5k次,点赞3次,收藏7次。折腾了两个小时多才...
文章浏览阅读2.7k次。JSON 代表 JavaScript Object Notation...
文章浏览阅读2.9k次,点赞2次,收藏6次。navicat 连接postgr...
文章浏览阅读1.4k次。postgre进阶sql,包含分组排序、JSON解...