如何防止某些字段在Elasticsearch中建立索引

问题描述

我需要防止某些具有“ null”（作为字符串为null）和“”（空字符串）之类的值的字段在Elasticsearch中被索引，即，我应该能够在文档中获取其余字段，除了具有此类的字段_source中的值。我正在使用以下规范化器

{
"analysis": {
    "normalizer": {
        "my_normalizer": {
            "filter": [
                "uppercase"
            ],"type": "custom"
        }
    }
}

}

上方或字段映射中是否需要任何设置？

P.S：-我正在使用Elasticsearch 7.6.1

解决方法

您可以查看Elasticsearch Pipelines。在进行索引（并进行分析）之前，将应用它们。

具体来说，您可以添加Elasticsearch Pipeline，如果它们满足列出的条件，则该字段将删除必填字段。像这样：

PUT _ingest/pipeline/remove_invalid_value
{
   "description": "my pipeline that removes empty string and null strings","processors": [
       { 
          "remove": {
              "field": "field1","ignore_missing": true,"if": "ctx.field1 == \"null\" || ctx.field1 == \"\""
          }
       },{ 
          "remove": {
              "field": "field2","if": "ctx.field2 == \"null\" || ctx.field2 == \"\""
          }
       },{ 
          "remove": {
              "field": "field3","if": "ctx.field3 == \"null\" || ctx.field3 == \"\""
          }
       }
   ]
}

然后，您可以在index request中指定管道，也可以在index settings中将其放置为default_pipeline或final_pipeline。您也可以在索引模板中指定此设置。

（脚本）循环方法

如果您不想编写一长串的删除操作，则可以尝试使用脚本处理器，如下所示：

PUT _ingest/pipeline/remove_invalid_fields
{
  "description": "remove fields","processors": [
    {
      "script": {
        "source": """
          for (x in params.to_delete_on_condition) {
                if (ctx[x] == "null" || ctx[x] == "") {
                    ctx.remove(x);
                }
          }
          ""","params": {
          "to_delete_on_condition": [
            "field1","field2","field3"
          ]
        }
      }
    }
  ]
}

遍历列表，如果条件匹配，则删除该字段。

访问脚本中的嵌套字段并不像许多答案中所说的那样简单，但是应该是可行的。这个想法是nested.field应该以{{1}}的身份访问。

elasticsearch elasticsearch-mapping