问题描述
我创建了一个如下的摄取管道来将字段拆分为单词:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing","processors": [
{
"split": {
"field": "foo","separator": "|"
}
}
]
},"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
但它将字段拆分为字符:
{
"docs" : [
{
"doc" : {
"_index" : "_index","_type" : "_doc","_id" : "_id","_source" : {
"foo" : [
"a","p","l","e","|","t","i","m","e"
]
}
}
}
]
}
如果我用逗号替换分隔符,相同的管道将字段拆分为单词:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing","separator": ","
}
}
]
},"docs": [
{
"_source": {
"foo": "apple,time"
}
}
]
}
那么输出将是:
{
"docs" : [
{
"doc" : {
"_index" : "_index","_source" : {
"foo" : [
"apple","time"
]
}
}
}
]
}
当分隔符为“|”时,如何将字段拆分为单词? 我的下一个问题是如何将此摄取管道应用于现有索引? 我试过 this solution,但对我不起作用。
编辑
以下是将两个部分分配给两列的文档的整个管道:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""","processors": [
{
"split": {
"field": "dv_m","separator": "|","target_field": "dv_m_splited"
}
},{
"set": {
"field": "dv_metric_prod","value": "{{dv_m_splited.1}}","override": false
}
},{
"set": {
"field": "dv_metric_section","value": "{{dv_m_splited.2}}","override": false
}
}
]
},"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
生成此响应:
{
"docs" : [
{
"doc" : {
"_index" : "_index","_source" : {
"dv_metric_prod" : "m","dv_m_splited" : [
"a","a","z","_","n","c","U","d","r","s","g"
],"dv_metric_section" : "a","dv_m" : "amaze_inc|Understanding"
},"_ingest" : {
"timestamp" : "2021-08-02T08:33:58.2234143Z"
}
}
}
]
}
如果我设置了 "separator": "\\|"
,那么我会得到这个错误:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "general_script_exception","reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239"
}
],"type" : "general_script_exception","reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239","caused_by" : {
"type" : "mustache_exception","reason" : "Failed to get value for dv_m_splited.2 @[query-template:1]","caused_by" : {
"type" : "mustache_exception","reason" : "2 @[query-template:1]","caused_by" : {
"type" : "index_out_of_bounds_exception","reason" : "2"
}
}
}
}
}
]
}
解决方法
解决方案相当简单:只需转义分隔符即可。
作为split处理器is a regular expression中的separator
字段,需要对|
等特殊字符进行转义。
您还需要将其转义两次。
所以你的代码只缺少双重转义部分:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing","processors": [
{
"split": {
"field": "foo","separator": "\\|"
}
}
]
},"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}