如何在Tarantool中实现多键索引的分页？

问题描述

我在存储整数数组的字段上具有多键索引。如何使用它来实现基于光标的分页？

实际上，如果元组在数组字段中具有多个值，则该元组将被选择#tuple[array_field_idx]次。昨天我实现了“ distinct”选择（使用ffi获取元组指针地址），但似乎无法在分页中使用。

您有什么想法可以在Tarantool中实现吗？

解决方法

为了实现多键索引的分页，您应该了解有关Tarantool内部的知识。

我将编写适用于Tarantool 2.3+的代码（但将来可能会被破坏）。如果要升级Tarantool版本，请仔细测试它们并更新FFI定义。

那么，让我们开始吧。首先，您应该知道Tarantool BTree将数据存储在特殊结构memtx_tree_data中，该结构包含指向元组和hint的指针。这是一个特殊的数字，它可以加快简单树索引的元组之间的比较，它是索引元素在数组中的位置。

首先，我们应该了解如何使用元组提取元组提示。可以使用一些FFI代码和树迭代器来完成。

local ffi = require('ffi')

ffi.cdef([[

typedef struct index_def;

typedef struct index;

typedef struct memtx_tree;

typedef struct mempool;

typedef uint64_t hint_t;

enum iterator_type {
    /* ITER_EQ must be the first member for request_create  */
    ITER_EQ               =  0,/* key == x ASC order                  */
    ITER_REQ              =  1,/* key == x DESC order                 */
    ITER_ALL              =  2,/* all tuples                          */
    ITER_LT               =  3,/* key <  x                            */
    ITER_LE               =  4,/* key <= x                            */
    ITER_GE               =  5,/* key >= x                            */
    ITER_GT               =  6,/* key >  x                            */
    ITER_BITS_ALL_SET     =  7,/* all bits from x are set in key      */
    ITER_BITS_ANY_SET     =  8,/* at least one x's bit is set         */
    ITER_BITS_ALL_NOT_SET =  9,/* all bits are not set                */
    ITER_OVERLAPS         = 10,/* key overlaps x                      */
    ITER_NEIGHBOR         = 11,/* tuples in distance ascending order from specified point */
    iterator_type_MAX
};


typedef struct iterator {
    /**
     * Iterate to the next tuple.
     * The tuple is returned in @ret (NULL if EOF).
     * Returns 0 on success,-1 on error.
     */
    int (*next)(struct iterator *it,struct tuple **ret);
    /** Destroy the iterator. */
    void (*free)(struct iterator *);
    /** Space cache version at the time of the last index lookup. */
    uint32_t space_cache_version;
    /** ID of the space the iterator is for. */
    uint32_t space_id;
    /** ID of the index the iterator is for. */
    uint32_t index_id;
    /**
     * Pointer to the index the iterator is for.
     * Guaranteed to be valid only if the schema
     * state has not changed since the last lookup.
     */
    struct index *index;
};


struct memtx_tree_key_data {
    /** Sequence of msgpacked search fields. */
    const char *key;
    /** Number of msgpacked search fields. */
    uint32_t part_count;
    /** Comparison hint,see tuple_hint(). */
    hint_t hint;
};

struct memtx_tree_data {
    /* Tuple that this node is represents. */
    struct tuple *tuple;
    /** Comparison hint,see key_hint(). */
    hint_t hint;
};

typedef int16_t bps_tree_pos_t;
typedef uint32_t bps_tree_block_id_t;

typedef uint32_t matras_id_t;

struct matras_view {
    /* root extent of the view */
    void *root;
    /* block count in the view */
    matras_id_t block_count;
    /* all views are linked into doubly linked list */
    struct matras_view *prev_view,*next_view;
};

struct memtx_tree_iterator {
    /* ID of a block,containing element. -1 for an invalid iterator */
    bps_tree_block_id_t block_id;
    /* Position of an element in the block. Could be -1 for last in block*/
    bps_tree_pos_t pos;
    /* Version of matras memory for MVCC */
    struct matras_view view;
};

typedef struct tree_iterator {
    struct iterator base;
    struct memtx_tree_iterator tree_iterator;
    enum iterator_type type;
    struct memtx_tree_key_data key_data;
    struct memtx_tree_data current;
    /** Memory pool the iterator was allocated from. */
    struct mempool *pool;
};

]])


local function get_tree_comparison_hint(box_iterator_state)
    if box_iterator_state == nil then
        return nil
    end

    local casted = ffi.cast("struct tree_iterator*",box_iterator_state)
    --
    -- IMPORTANT: hint is zero-based (as arrays in C)
    -- Lua arrays is one-based.
    --
    return casted.current.hint
end

return {
    get_tree_comparison_hint = get_tree_comparison_hint,}

然后考虑以下示例：

local box_iterator = require('common.box_iterator')

box.cfg{}

local space = box.schema.create_space('dict',{
    format = {
        {name = 'id',type = 'number'},{name = 'bundles',type = 'array'}
    },if_not_exists = true,})

space:create_index('pk',{
    unique = true,parts = {
        {field = 1,type = 'number'}
    },})

space:create_index('multikey',{
    unique = false,parts = {
        {field = 2,type = 'string',path = '[*]'},-- Note: I intentionally add primary index parts here
        {field = 1,})

space:replace({1,{'a','b','c','d'}})
space:replace({2,{'b','c'}})
space:replace({3,'d'}})
space:replace({4,{'c','d'}})

for iter_state,tuple in space.index.multikey:pairs({'a'},{iterator = 'GE'}) do
    local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
    print(
        string.ljust(tostring(tuple),30),position,tuple[2][tonumber(position)]
    )
end

os.exit()

输出为：

# Tuple                         Hint   Indexed element
[1,['a','d']]       1ULL    a
[3,'d']]                 1ULL    a
[1,'d']]       2ULL    b
[2,['b','c']]                 1ULL    b
[1,'d']]       3ULL    c
[2,'c']]                 2ULL    c
[4,['c','d']]                 1ULL    c
[1,'d']]       4ULL    d
[3,'d']]                 2ULL    d
[4,'d']]                 2ULL    d

您看到的订单是严格确定的。 Tarantool返回我元组的顺序是由（a）索引值确定的-元组[path_to_array] [hint + 1]和主键。第二个条件对于所有Tarantool二级非唯一索引是通用的。 Tarantool在内部将主键合并到每个非唯一索引。您所需要做的就是在您的模式中明确指定它。

所以下一项是cursor。光标允许您从先前停止的地方继续迭代。对于唯一索引，游标是此索引的字段，对于非唯一索引，它是具有合并主键的该索引的字段（有关详细信息，请参见key_def.merge函数，当前它不支持多键索引，但是如果需要了解如何工作索引部分合并）。跟随组合（merge(secondary_index_parts,primary_index_parts)）始终是唯一值，从严格确定的位置开始，它就可以继续迭代。

让我们回到我的例子。例如。我在[1,'d']] 3ULL c行停了下来。我的光标是{'c',1}。

好吧，从现在开始，我可以继续：

-- "GE" is changed to "GT" to skip already scanned tuple: [1,'d']]
for iter_state,tuple in space.index.multikey:pairs({'c',1},{iterator = 'GT'}) do
    local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
    print(
        string.ljust(tostring(tuple),tuple[2][tonumber(position)]
    )
end
--[[
Result:
[2,'d']]                 2ULL    d
--]]

您可以与以前的代码段进行比较，并了解到我会继续扫描所需的值，而不扫描多余的值并且不会丢失任何内容。

这种方法还不太清楚，也不是很舒服。您需要从Tarantool内部构件中提取一些神奇的价值，将它们存储在任何地方。但是我们在项目中使用了这种方法，因为我们还没有其他选择：）

tarantool