多种密钥模式的高效Redis SCAN

问题描述

我正尝试通过SCAN操作对数据进行一些多选查询和过滤操作，但不确定是否朝着正确的方向前进。

我正在使用AWS ElastiCache（Redis 5.0.6）。

密钥设计：：：：

示例：

13434：鳄梨调味酱：蘸料：墨西哥
34244：西班牙凉菜汤：汤：西班牙
42344：海鲜饭：菜肴：西班牙
23444：HotDog：StreetFood：USA
78687：蛋P派：甜点：葡萄牙
75453：Churritos：Dessert：西班牙

如果我想使用复杂的多选过滤器（例如，返回与来自两个不同国家的五种食谱类型匹配的所有键）进行查询的功能，而SCAN全局样式匹配模式无法处理，那么在生产场景中通常使用的方法？

假设我将通过对所有场交替模式和多场滤波器进行笛卡尔乘积来计算所有可能的模式：

[[鳄梨调味酱，西班牙凉菜汤），[汤，菜，甜点]，[葡萄牙]]
*：鳄梨调味酱：汤：葡萄牙
*：鳄梨酱：菜：葡萄牙
*：鳄梨酱：甜点：葡萄牙
*：西班牙凉菜汤：汤：葡萄牙
*：西班牙凉菜汤：菜：葡萄牙
*：西班牙凉菜汤：甜点：葡萄牙

我应该使用哪种机制在Redis中实现这种模式匹配？

每个可扫描模式是否依次使用多个SCAN并合并结果？
LUA脚本在扫描键时对每个模式使用改进的模式匹配，并在单个SCAN中获得所有匹配的键？
建立在排序集之上的索引是否支持快速查找与单个字段匹配的键，并使用ZUNIONSTORE解决同一字段中的匹配替换，并使用ZINTERSTORE解决不同字段的交集？

:: => key1，key2，keyN
：：=> key1，key2，keyN
:: => key1，key2，keyN

建立在排序集之上的索引可支持对所有维组合的键进行快速查找，从而避免并集和相交，却浪费了更多存储空间并扩展了索引键空间的覆盖范围？

:: => key1，key2，keyN
：：=> key1，key2，keyN
::: => key1，key2，keyN
：：=> key1，key2，keyN
：： => key1，key2，keyN
:: => key1，key2，keyN

利用RedisSearch吗？（虽然对于我的用例来说是不可能的，请参阅Tug Grall的答案，这似乎是非常好的解决方案。）
其他？

我已经实现了1），性能却很糟糕。

private static HashSet<String> redisScan(Jedis jedis,String pattern,int scanLimitSize) {

    ScanParams params = new ScanParams().count(scanLimitSize).match(pattern);

    ScanResult<String> scanResult;
    List<String> keys;
    String nextCursor = "0";
    HashSet<String> allMatchedKeys = new HashSet<>();

    do {
        scanResult = jedis.scan(nextCursor,params);
        keys = scanResult.getResult();
        allMatchedKeys.addAll(keys);
        nextCursor = scanResult.getCursor();
    } while (!nextCursor.equals("0"));

    return allMatchedKeys;

}

private static HashSet<String> redisMultiScan(Jedis jedis,ArrayList<String> patternList,int scanLimitSize) {

    HashSet<String> mergedHashSet = new HashSet<>();
    for (String pattern : patternList)
        mergedHashSet.addAll(redisScan(jedis,pattern,scanLimitSize));

    return mergedHashSet;
}

对于2）我创建了一个Lua脚本来帮助服务器端SCAN，并且性能并不出色，但比1还要快得多），甚至考虑到Lua不支持交替匹配模式，我必须通过模式列表循环每个键以进行验证：

local function MatchAny( str,pats )
    for pat in string.gmatch(pats,'([^|]+)') do
        local w = string.match( str,pat )
        if w then return w end
    end
end

-- ARGV[1]: Scan Count
-- ARGV[2]: Scan Match Glob-Pattern
-- ARGV[3]: Patterns

local cur = 0
local rep = {}
local tmp

repeat
  tmp = redis.call("SCAN",cur,"MATCH",ARGV[2],"count",ARGV[1])
  cur = tonumber(tmp[1])
  if tmp[2] then
    for k,v in pairs(tmp[2]) do
      local fi = MatchAny(v,ARGV[3])
      if (fi) then
        rep[#rep+1] = v
      end
    end
  end
until cur == 0
return rep

以这种方式调用：

private static ArrayList<String> redisLuaMultiScan(Jedis jedis,String luaSha,List<String> KEYS,List<String> ARGV) {
    Object response = jedis.evalsha(luaSha,KEYS,ARGV);
    if(response instanceof List<?>)
        return (ArrayList<String>) response;
    else
        return new ArrayList<>();
}

对于3），我已经使用排序集为3个字段中的每个字段实现并维护了一个更新的二级索引，并使用单个字段上的交替匹配模式和像这样的多字段匹配模式来实现查询：

private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis,ArrayList<ArrayList<String>> patternList) {

    ArrayList<String> unionedSets = new ArrayList<>();
    String keyName;
    Pipeline pipeline = jedis.pipelined();

    for (ArrayList<String> subPatternList : patternList) {
        if (subPatternList.isEmpty()) continue;
        keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT,true,true);
        pipeline.zunionstore(keyName,subPatternList.toArray(new String[0]));
        unionedSets.add(keyName);
    }

    String[] unionArray = unionedSets.toArray(new String[0]);
    keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT,true);
    pipeline.zinterstore(keyName,unionArray);
    Response<Set<String>> response = pipeline.zrange(keyName,-1);
    pipeline.del(unionArray);
    pipeline.del(keyName);
    pipeline.sync();

    return response.get();
}

我的压力测试用例的结果显然支持3）在请求延迟方面：

解决方法

我会为选项3投票，但是我可能会开始使用 RediSearch 。

您还看过RediSearch吗？该模块允许您创建二级索引，并执行复杂的查询和全文搜索。

这可以简化您的开发。

我邀请您查看project和Getting Started。

一旦安装，您将可以使用以下命令来实现它：


HSET recipe:13434 name "Guacamole" type "Dip" country "Mexico" 

HSET recipe:34244 name "Gazpacho" type "Soup" country "Spain"

HSET recipe:42344 name "Paella"  type "Dish" country "Spain"

HSET recipe:23444 name "Hot Dog"  type "StreetFood" country "USA"

HSET recipe:78687  name "Custard Pie"  type  "Dessert" country "Portugal"

HSET recipe:75453  name "Churritos" type "Dessert" country "Spain"

FT.CREATE idx:recipe ON HASH PREFIX 1 recipe: SCHEMA name TEXT SORTABLE type TAG SORTABLE country TAG SORTABLE

FT.SEARCH idx:recipe "@type:{Dessert}"

FT.SEARCH idx:recipe "@type:{Dessert} @country:{Spain}" RETURN 1 name

FT.AGGREGATE idx:recipe "*" GROUPBY 1 @type REDUCE COUNT 0 as nb_of_recipe

由于您可以在本教程中找到说明，所以我在这里没有详细解释所有命令，但是这里是基础知识：

使用哈希表存储食谱
创建RediSearch索引并为要查询的字段建立索引
运行查询，例如：
- 要获得全部西班牙沙漠：FT.SEARCH idx:recipe "@type:{Dessert} @country:{Spain}" RETURN 1 name
- 要按类型计算配方数：FT.AGGREGATE idx:recipe "*" GROUPBY 1 @type REDUCE COUNT 0 as nb_of_recipe

创建密钥后，我最终采用了一种简单的策略来更新每个字段的每个二级索引：

protected static void setKeyAndUpdateIndexes(Jedis jedis,String key,String value,int idxDimSize) {
    String[] key_arr = key.split(":");
    Pipeline pipeline = jedis.pipelined();

    pipeline.set(key,value);
    for (int y = 0; y < key_arr.length; y++)
        pipeline.zadd(
                "idx:" +
                    StringUtils.repeat(":",y) +
                    key_arr[y] +
                    StringUtils.repeat(":",idxDimSize-y),java.time.Instant.now().getEpochSecond(),key);

    pipeline.sync();
}

找到与多个模式匹配的键的搜索策略，包括交替模式和多字段过滤器，如下所示：

private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis,ArrayList<ArrayList<String>> patternList) {

    ArrayList<String> unionedSets = new ArrayList<>();
    String keyName;
    Pipeline pipeline = jedis.pipelined();

    for (ArrayList<String> subPatternList : patternList) {
        if (subPatternList.isEmpty()) continue;
        keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT,true,true);
        pipeline.zunionstore(keyName,subPatternList.toArray(new String[0]));
        unionedSets.add(keyName);
    }

    String[] unionArray = unionedSets.toArray(new String[0]);
    keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT,true);
    pipeline.zinterstore(keyName,unionArray);
    Response<Set<String>> response = pipeline.zrange(keyName,-1);
    pipeline.del(unionArray);
    pipeline.del(keyName);
    pipeline.sync();

    return response.get();
}

amazon-elasticache indexing jedis lua lua lua redis