问题描述
我在存储整数数组的字段上具有多键索引。 如何使用它来实现基于光标的分页?
实际上,如果元组在数组字段中具有多个值,则该元组将被选择#tuple[array_field_idx]
次。
昨天我实现了“ distinct”选择(使用ffi获取元组指针地址),但似乎无法在分页中使用。
您有什么想法可以在Tarantool中实现吗?
解决方法
为了实现多键索引的分页,您应该了解有关Tarantool内部的知识。
我将编写适用于Tarantool 2.3+的代码(但将来可能会被破坏)。如果要升级Tarantool版本,请仔细测试它们并更新FFI定义。
那么,让我们开始吧。首先,您应该知道Tarantool BTree将数据存储在特殊结构memtx_tree_data中,该结构包含指向元组和hint
的指针。这是一个特殊的数字,它可以加快简单树索引的元组之间的比较,它是索引元素在数组中的位置。
首先,我们应该了解如何使用元组提取元组提示。 可以使用一些FFI代码和树迭代器来完成。
local ffi = require('ffi')
ffi.cdef([[
typedef struct index_def;
typedef struct index;
typedef struct memtx_tree;
typedef struct mempool;
typedef uint64_t hint_t;
enum iterator_type {
/* ITER_EQ must be the first member for request_create */
ITER_EQ = 0,/* key == x ASC order */
ITER_REQ = 1,/* key == x DESC order */
ITER_ALL = 2,/* all tuples */
ITER_LT = 3,/* key < x */
ITER_LE = 4,/* key <= x */
ITER_GE = 5,/* key >= x */
ITER_GT = 6,/* key > x */
ITER_BITS_ALL_SET = 7,/* all bits from x are set in key */
ITER_BITS_ANY_SET = 8,/* at least one x's bit is set */
ITER_BITS_ALL_NOT_SET = 9,/* all bits are not set */
ITER_OVERLAPS = 10,/* key overlaps x */
ITER_NEIGHBOR = 11,/* tuples in distance ascending order from specified point */
iterator_type_MAX
};
typedef struct iterator {
/**
* Iterate to the next tuple.
* The tuple is returned in @ret (NULL if EOF).
* Returns 0 on success,-1 on error.
*/
int (*next)(struct iterator *it,struct tuple **ret);
/** Destroy the iterator. */
void (*free)(struct iterator *);
/** Space cache version at the time of the last index lookup. */
uint32_t space_cache_version;
/** ID of the space the iterator is for. */
uint32_t space_id;
/** ID of the index the iterator is for. */
uint32_t index_id;
/**
* Pointer to the index the iterator is for.
* Guaranteed to be valid only if the schema
* state has not changed since the last lookup.
*/
struct index *index;
};
struct memtx_tree_key_data {
/** Sequence of msgpacked search fields. */
const char *key;
/** Number of msgpacked search fields. */
uint32_t part_count;
/** Comparison hint,see tuple_hint(). */
hint_t hint;
};
struct memtx_tree_data {
/* Tuple that this node is represents. */
struct tuple *tuple;
/** Comparison hint,see key_hint(). */
hint_t hint;
};
typedef int16_t bps_tree_pos_t;
typedef uint32_t bps_tree_block_id_t;
typedef uint32_t matras_id_t;
struct matras_view {
/* root extent of the view */
void *root;
/* block count in the view */
matras_id_t block_count;
/* all views are linked into doubly linked list */
struct matras_view *prev_view,*next_view;
};
struct memtx_tree_iterator {
/* ID of a block,containing element. -1 for an invalid iterator */
bps_tree_block_id_t block_id;
/* Position of an element in the block. Could be -1 for last in block*/
bps_tree_pos_t pos;
/* Version of matras memory for MVCC */
struct matras_view view;
};
typedef struct tree_iterator {
struct iterator base;
struct memtx_tree_iterator tree_iterator;
enum iterator_type type;
struct memtx_tree_key_data key_data;
struct memtx_tree_data current;
/** Memory pool the iterator was allocated from. */
struct mempool *pool;
};
]])
local function get_tree_comparison_hint(box_iterator_state)
if box_iterator_state == nil then
return nil
end
local casted = ffi.cast("struct tree_iterator*",box_iterator_state)
--
-- IMPORTANT: hint is zero-based (as arrays in C)
-- Lua arrays is one-based.
--
return casted.current.hint
end
return {
get_tree_comparison_hint = get_tree_comparison_hint,}
然后考虑以下示例:
local box_iterator = require('common.box_iterator')
box.cfg{}
local space = box.schema.create_space('dict',{
format = {
{name = 'id',type = 'number'},{name = 'bundles',type = 'array'}
},if_not_exists = true,})
space:create_index('pk',{
unique = true,parts = {
{field = 1,type = 'number'}
},})
space:create_index('multikey',{
unique = false,parts = {
{field = 2,type = 'string',path = '[*]'},-- Note: I intentionally add primary index parts here
{field = 1,})
space:replace({1,{'a','b','c','d'}})
space:replace({2,{'b','c'}})
space:replace({3,'d'}})
space:replace({4,{'c','d'}})
for iter_state,tuple in space.index.multikey:pairs({'a'},{iterator = 'GE'}) do
local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
print(
string.ljust(tostring(tuple),30),position,tuple[2][tonumber(position)]
)
end
os.exit()
输出为:
# Tuple Hint Indexed element
[1,['a','d']] 1ULL a
[3,'d']] 1ULL a
[1,'d']] 2ULL b
[2,['b','c']] 1ULL b
[1,'d']] 3ULL c
[2,'c']] 2ULL c
[4,['c','d']] 1ULL c
[1,'d']] 4ULL d
[3,'d']] 2ULL d
[4,'d']] 2ULL d
您看到的订单是严格确定的。 Tarantool返回我元组的顺序是由(a)索引值确定的-元组[path_to_array] [hint + 1]和主键。 第二个条件对于所有Tarantool二级非唯一索引是通用的。 Tarantool在内部将主键合并到每个非唯一索引。 您所需要做的就是在您的模式中明确指定它。
所以下一项是cursor
。光标允许您从先前停止的地方继续迭代。对于唯一索引,游标是此索引的字段,对于非唯一索引,它是具有合并主键的该索引的字段(有关详细信息,请参见key_def.merge函数,当前它不支持多键索引,但是如果需要了解如何工作索引部分合并)。
跟随组合(merge(secondary_index_parts,primary_index_parts)
)始终是唯一值,从严格确定的位置开始,它就可以继续迭代。
让我们回到我的例子。例如。我在[1,'d']] 3ULL c
行停了下来。我的光标是{'c',1}
。
好吧,从现在开始,我可以继续:
-- "GE" is changed to "GT" to skip already scanned tuple: [1,'d']]
for iter_state,tuple in space.index.multikey:pairs({'c',1},{iterator = 'GT'}) do
local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
print(
string.ljust(tostring(tuple),tuple[2][tonumber(position)]
)
end
--[[
Result:
[2,'d']] 2ULL d
--]]
您可以与以前的代码段进行比较,并了解到我会继续扫描所需的值,而不扫描多余的值并且不会丢失任何内容。
这种方法还不太清楚,也不是很舒服。 您需要从Tarantool内部构件中提取一些神奇的价值, 将它们存储在任何地方。但是我们在项目中使用了这种方法,因为我们还没有其他选择:)