问题描述
当我尝试从字段中读取 JSON 内容时,我得到:
WARNING: document 1,attribute assorted: JSON error: Syntax error,unexpected TOK_IDENT,expecting $end near 'a:foo'
详情如下:
这是我正在尝试阅读的(超级简化的)CSV 文件:
1,hello world,document number one,a:foo
22,hello again,document number two,foo:bar
23,hello Now,This is some stuff,foo:{bar:baz}
24,hello cow,more test stuff and things,{foo:bar}
55,hello suess,Box and sox and goats and moats,[a]
56,hello raven,nevermore said the thing,foo:bar
当我运行索引器时,这是我得到的结果:
../bin/indexer --config /home/ec2-user/sphinx/etc/sphinx.conf --all --rotate
Sphinx 3.3.1 (commit b72d67b)
copyright (c) 2001-2020,Andrew Aksyonoff
copyright (c) 2008-2016,Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/home/ec2-user/sphinx/etc/sphinx.conf'...
indexing index 'csvtest'...
WARNING: document 1,expecting $end near 'a:foo'
WARNING: document 22,expecting $end near 'foo:bar'
WARNING: document 23,expecting $end near 'foo:{bar:baz}'
WARNING: document 24,unexpected '}',expecting '[' near '}'
WARNING: document 55,unexpected ']',expecting '[' near ']'
WARNING: document 56,expecting $end near 'foo:bar'
collected 6 docs,0.0 MB
sorted 0.0 Mhits,100.0% done
total 6 docs,0.1 Kb
total 0.0 sec,17.7 Kb/sec,1709 docs/sec
rotating indices: successfully sent SIGHUP to searchd (pid=14393).
这是整个配置文件:
source csvsrc
{
type = csvpipe
csvpipe_delimiter =,csvpipe_command = cat /home/ec2-user/sphinx/etc/example.csv
csvpipe_field_string =t
csvpipe_attr_string =c
csvpipe_attr_json =assorted
}
index csvtest
{
source = csvsrc
path = /var/data/test7
morphology = stem_en
rt_field = t
rt_field = c
rt_field = assorted
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312
listen = 9306:MysqL41
log = /var/log/searchd.log
query_log = /var/log/query.log
pid_file = /var/log/searchd.pid
binlog_path = /var/data
}
如果我确实登录并查询,很明显 JSON 实际上没有编入索引(正如警告中所预期的那样)
select * from csvtest;
+------+-------------+----------------------------------+----------+
| id | t | c | assorted |
+------+-------------+----------------------------------+----------+
| 1 | hello world | document number one | NULL |
| 22 | hello again | document number two | NULL |
| 23 | hello Now | This is some stuff | NULL |
| 24 | hello cow | more test stuff and things | NULL |
| 55 | hello suess | Box and sox and goats and moats | NULL |
| 56 | hello raven | nevermore said the thing | NULL |
+------+-------------+----------------------------------+----------+
6 rows in set (0.00 sec)
我尝试了一些东西,但我只是在黑暗中摸索。 我尝试过的一些事情:
- JSON 的替代格式。根据其他 JSON 输入的一些经验,我尝试使用
{foo:bar}
和{[foo:bar]}
和[{foo,bar}]
,他们希望它是顶级的数组或字典。这些实际上会产生略有不同的错误:
WARNING: document 24,expecting '[' near ']'
- 我尝试添加一个尾随逗号,认为这可能是解析器正在寻找的
$end
标记。这会生成实际错误ERROR: index 'csvtest': source 'csvsrc': not all columns found (found=5,total=4,line=1).
,从而阻止索引生成。这对我来说很有意义
2a) 我尝试在 JSON 之后添加一整列其他列,这样我就可以使用结尾逗号,但不会收到阻止生成索引的错误。这确实生成了索引,但没有提供 JSON 解析器正在寻找的 $end
令牌。
我完全被难住了。
解决方法
因此 a:foo
不是有效的 JSON 值 AFAIK。看起来它意味着成为对象?所以需要 {...} 包围它。
但即使是 {foo:bar}
也是无效的。至少应该引用“值”{foo:"bar"}
。但实际上键也引用了 {"foo":"bar"}
Javascript 对象在技术上允许不带引号的键名,但 JSON 需要引用。
...但也要记住它是 CSV。引号通常用于引用(例如,当列包含逗号时),因此引号需要双重编码!有点乱……
24,hello cow,more test stuff and things,"{""foo"":""bar""}"