postgresql源码学习41—— 崩溃恢复③ - 日志来源

一、 日志来源有哪些?

       前篇中获取到了恢复起点,即开始回放日志的位置,后面我们就可以开始读取并应用日志了。不过在此之前还有一个问题,从哪里获取WAL日志呢?

       第一篇中提到过,Startup进程的3大作用——崩溃恢复、从库日志应用、PITR,对于不同的用途,也有不同的日志来源。

       pg维护一个专门的状态机(称作state machine)表示待应用日志的来源,用于在不同时段,从不同的日志源获取WAL日志。对应代码如下(在xlog.c):

/*
 * Codes indicating where we got a WAL file from during recovery, or where
 * to attempt to get one.
 */
typedef enum
{
    XLOG_FROM_ANY = 0,          /* request to read WAL from any source */
    XLOG_FROM_ARCHIVE,          /* restored using restore_command,从归档日志中获取 */
    XLOG_FROM_PG_WAL,           /* existing file in pg_wal,从pg_wal目录获取 */
    XLOG_FROM_STREAM            /* streamed from primary,备库从主库获取 */
} XLogSource;

/* human-readable names for XLogSources, for debugging output */
static const char *const xlogSourceNames[] = {"any", "archive", "pg_wal", "stream"};

二、 WaitForWALToBecomeAvailable函数

       pg会根据当前状态确定初始日志源,当日志源读取发生错误,或者状态发生改变时,会切换到其他日志源。这个功能对应函数是 WaitForWALToBecomeAvailable(在xlog.c文件)

1. 主要参数

  • fetching_ckpt:若为true,说明正在读取检查点记录,并且应该准备从该点之后的RedoStartLSN开始读取WAL日志(If 'fetching_ckpt' is true, we're fetching a checkpoint record, and should prepare to read WAL starting from RedoStartLSN after this.)
  • RecPtr:不一定是指向我们感兴趣的记录开始位置,也可能指向页头或段头('RecPtr' might not point to the beginning of the record we're interested in, it might also point to the page or segment header.)
  • tliRecPtr:如果RecPtr指向页头或段头,则tliRecPtr指向我们感兴趣的WAL记录位置。它用于决定流复制从哪个时间线获取WAL日志(In that case, 'tliRecPtr' is the position of the WAL record we're interested in. It is used to decide which timeline to stream the requested WAL from.)

2. 返回值

  • 如果不是从库模式,且记录不是立即可用的,函数返回false。
  • 如果是从库模式,则一直等到记录可用
  • 当请求的日志记录状态变为可用时,该函数打开包含该记录的文件,并返回true
  • 当遇到从库模式结束(end of standby mode,即用户将从库提升为主库),且无更多可用WAL日志时,返回false
/*
 * Open the WAL segment containing WAL location 'RecPtr'.
 */
static bool
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                            bool fetching_ckpt, XLogRecPtr tliRecPtr)
{
    static TimestampTz last_fail_time = 0;
    TimestampTz now;
    bool        streaming_reply_sent = false;

   /* 首先初始化currentSource */

   /* 如果不是从归档获取日志(即当前在做崩溃恢复) */
    if (!InArchiveRecovery)
        /* 当前日志源设置为从pg_wal目录直接读取wal日志 */
        currentSource = XLOG_FROM_PG_WAL;
   /* 如果是从归档获取日志(即当前在做PITR或为从库) */
    else if (currentSource == XLOG_FROM_ANY ||
             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
    {
        /* 当前日志源设置为从归档读取日志 */
        lastSourceFailed = false;
        currentSource = XLOG_FROM_ARCHIVE;
    }

    for (;;)
    {
        XLogSource  oldSource = currentSource;
        bool        startWalReceiver = false;

        /*
         * 循环检查,如果在日志读取中发生了错误,则考虑开始切换日志源
         */
        if (lastSourceFailed)
        {
            /* 判断当前日志源 */
            switch (currentSource)
            {
                case XLOG_FROM_ARCHIVE:
                case XLOG_FROM_PG_WAL:

                    /*
                     * 检查是否存在trigger文件。注意pg只会在遇到报错时检查该项,因此当你创建trigger文件时,pg仍然在会主从切换前尽可能多地应用归档和pg_wal中的日志。
                      * 若是从库且存在trigger文件,关闭WalReceiver进程,提升为主库,函数返回false(对应前面关于返回值介绍的第4条)
                     */
                    if (StandbyMode && CheckForStandbyTrigger())
                    {
                        ShutdownWalRcv();
                        return false;
                    }

                    /*
                     * Not in standby mode, and we've now tried the archive and pg_wal.
                     * 若非从库,且记录不是立即可用的,函数返回false(对应前面关于返回值介绍的第1条)
                     */
                    if (!StandbyMode)
                        return false;

                    /*
                     * Move to XLOG_FROM_STREAM state, and set to start a walreceiver if necessary. 如果上面两种情况都不符合,说明当前是从库且未检查到trigger文件。则日志源设为XLOG_FROM_STREAM,并启动WalReceiver进程,退出switch语句。
                     */
                    currentSource = XLOG_FROM_STREAM;
                    startWalReceiver = true;
                    break;

/* 如果在日志源为XLOG_FROM_STREAM时发生报错 */
                case XLOG_FROM_STREAM:

                    /* 首先这种情况只可能在从库发生 */
                    Assert(StandbyMode);

                    /*
                     * 在退出 XLOG_FROM_STREAM 状态前,确保WalReceiver进程已关闭,避免覆盖从归档中还原的WAL日志
                     */
                    if (WalRcvStreaming())
                        ShutdownWalRcv();

                    /*
                     * Before we sleep, re-scan for possible new timelines if
                     * we were requested to recover to the latest timeline. 
                      * 在sleep之前,再次查询是否有新的时间线,我们是否有被请求恢复到最新时间线。如果有,则进入XLOG_FROM_ARCHIVE状态重新开始,退出switch语句
                     */
                    if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
                    {
                        if (rescanLatestTimeLine())
                        {
                            currentSource = XLOG_FROM_ARCHIVE;
                            break;
                        }
                    }

                    /* XLOG_FROM_STREAM is the last state in our state
                     * machine, so we've exhausted all the options for
                     * obtaining the requested WAL. We're going to loop back
                     * and retry from the archive, but if it hasn't been long
                     * since last attempt, sleep wal_retrieve_retry_interval
                     * milliseconds to avoid busy-waiting.
                     * XLOG_FROM_STREAM是状态机的最终状态,说明已经尝试尽了所有可能获取WAL日志的日志来源。此时将进入循环,并且重新尝试从归档中获取日志。但如果距离上次尝试的时间还不够长,我们会休眠wal_retrieve_retry_interval 参数指定的毫秒数,避免尝试过于频繁。
                     */
                    now = GetCurrentTimestamp();
                    if (!TimestampDifferenceExceeds(last_fail_time, now,
                                                    wal_retrieve_retry_interval))
                    {
                        long        wait_time;

                        wait_time = wal_retrieve_retry_interval -
                            TimestampDifferenceMilliseconds(last_fail_time, now);

                        (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
                                         WL_LATCH_SET | WL_TIMEOUT |
                                         WL_EXIT_ON_PM_DEATH,
                                         wait_time,
                                         WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
                        ResetLatch(&XLogCtl->recoveryWakeupLatch);
                        now = GetCurrentTimestamp();

                        /* Handle interrupt signals of startup process */
                        HandleStartupProcInterrupts();
                    }
                    /* 记录错误时间,从XLOG_FROM_ARCHIVE状态重新开始,退出switch语句 */
                    last_fail_time = now;
                    currentSource = XLOG_FROM_ARCHIVE;
                    break;

                default:
                    elog(ERROR, "unexpected WAL source %d", currentSource);
            }
        }

        /* 如果没有遇到错误,且日志来源为pg_wal */
        else if (currentSource == XLOG_FROM_PG_WAL)
        {
            /*
             * We just successfully read a file in pg_wal. We prefer files in
             * the archive over ones in pg_wal, so try the next file again
             * from the archive first. 
             * 说明成功从pg_wal中读取到了文件。在PITR或从库模式下,我们更倾向于从归档中获取日志,因此修改日志来源,下一个日志尝试从归档中获取。
             */
            if (InArchiveRecovery)
                currentSource = XLOG_FROM_ARCHIVE;
        }

        /* 如果新旧日志源不相等,记一个debug信息,说明日志源修改过 */
        if (currentSource != oldSource)
            elog(DEBUG2, "switched WAL source from %s to %s after %s",
                 xlogSourceNames[oldSource], xlogSourceNames[currentSource],
                 lastSourceFailed ? "failure" : "success");

        /*
         * We've now handled possible failure. Try to read from the chosen
         * source. 下面处理可能的失败情况,尝试从选择的日志源读取。略。
         */       
…
}

参考

PostgreSQL技术内幕:事务处理深度探索》第4章

相关文章

学习编程是顺着互联网的发展潮流,是一件好事。新手如何学习...
IT行业是什么工作做什么?IT行业的工作有:产品策划类、页面...
女生学Java好就业吗?女生适合学Java编程吗?目前有不少女生...
Can’t connect to local MySQL server through socket \'/v...
oracle基本命令 一、登录操作 1.管理员登录 # 管理员登录 ...
一、背景 因为项目中需要通北京网络,所以需要连vpn,但是服...