索引时如何使用Elasticsearch验证数据

问题描述

我需要防止某些具有诸如“ null”(作为字符串为null)和""(空字符串)之类的值的字段在Elasticsearch中建立索引,即,我应该能够获取没有在_source字段中包含这些字段。

建立索引时,映射中是否需要任何设置,例如在字段上使用自定义分析器?

P.S:-我正在使用Elasticsearch 7.6.1

我尝试了以下答案,这是行不通的-

{  "settings": {
"number_of_shards": "5","analysis": {
  "normalizer": {
    "my_normalizer": {
      "char_filter": [
        {
          "type": "mapping","mappings": [
            "null =>","\"\"\" =>"
          ]
        }
      ],"filter": [
        "uppercase"
      ],"type": "custom"
    }
  }
},"number_of_replicas": "1"}}

响应错误-序列化设置中仅允许值列表

即使我尝试了以下设置,也没有得到预期的结果:

{  "settings": {
"number_of_shards": "5","analysis": {
  "char_filter": {
    "my_filter": {
      "type": "mapping","mappings": [
        "null =>","\"\"\" =>"
      ]
    }
  },"normalizer": {
    "my_normalizer": {
      "char_filter": [
        "my_filter"
      ],"number_of_replicas": "1"}}

请求- 获取 索引名/ _分析

{"normalizer":"my_normalizer","text":"null"}

响应-

{
"tokens": [
    {
        "token": "","start_offset": 4,"end_offset": 4,"type": "word","position": 0
    }
]

}

预期的响应-

{
"tokens": []
}

解决方法

在分析器定义中使用mapping char filter即可实现,下面是工作示例。

分析API

{
  "tokenizer": "standard","char_filter": [
    {
      "type": "mapping","mappings": [
        "null =>","\"\"\" =>"
      ]
    }
  ],"text": "null" or "" --> note this
}

并返回令牌

{
    "tokens": []
}