问题描述
我有一个 json 文件 file.json
编码的 KOI8-R。
Boost Json 仅适用于 UTF-8 编码,因此我将文件从 KOI8-R 转换为 UTF-8:
boost::property_tree::ptree tree;
std::locale loc = boost::locale::generator().generate(ru_RU.UTF-8);
std::ifstream ifs("file.json",std::ios::binary);
ifs.imbue(loc)
boost::property_tree::read_json(ifs,tree);
但是,文件无法读取..我做错了什么?
更新:
{
"соплодие": "лысеющий","обсчитавший": "перегнавший","кариозный": "отдёргивающийся","суверенен": "носившийся","рецидивизм": "поляризуются"
}
并将其保存在 koi8-r 中。
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
int main() {
boost::property_tree::ptree pt;
boost::property_tree::read_json("test.txt",pt);
}
编译,运行并得到以下错误:
terminate called after throwing an instance of 'boost::wrapexcept<boost::property_tree::json_parser::json_parser_error>'
what(): test.txt(2): invalid code sequence
Aborted (core dumped)
然后我使用 boost 语言环境:
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
#include <boost/locale/generator.hpp>
#include <boost/locale/encoding.hpp>
int main() {
std::locale loc = boost::locale::generator().generate("ru_RU.utf8");
std::ifstream ifs("test.txt",std::ios::binary);
ifs.imbue(loc);
boost::property_tree::ptree pt;
boost::property_tree::read_json(ifs,pt);
}
编译(g++ main.cpp -lboost_locale
),运行并得到以下错误:
terminate called after throwing an instance of 'boost::wrapexcept<boost::property_tree::json_parser::json_parser_error>'
what(): <unspecified file>(2): invalid code sequence
Aborted (core dumped)
解决方法
JSON 规范 requires UTF8:
8.1.字符编码
JSON text exchanged between systems that are not part of a closed
ecosystem MUST be encoded using UTF-8 [RFC3629].
通用库只支持它是有意义的。请参阅此处了解更多上下文:JSON character encoding - is UTF-8 well-supported by browsers or should I use numeric escape sequences?
无论如何怎么做
也许对于 libiconv 或 libicu,Boost 语言环境支持后者。
使用 Boost Locale/ICU
这要求您的库是在 ICU 支持下构建的,并且可能(?)您拥有所需的语言环境,这很可能已经在您的系统上。
它还假设源代码是 UTF8 编码,这也是可能的。
#include <boost/locale.hpp>
#include <boost/locale/conversion.hpp>
#include <boost/json.hpp>
#include <boost/json/src.hpp>
#include <iostream>
#include <fstream>
namespace json = boost::json;
int main() {
std::string koi8r = [] {
std::ifstream ifs("input.txt",std::ios::binary);
return std::string(std::istream_iterator<char>(ifs),{});
}();
json::value doc =
json::parse(boost::locale::conv::to_utf<char>(koi8r,"KOI8-R"));
std::cout << "Serialized back: " << doc << "\n";
std::cout << "Extracting a single key: " << doc.as_object()["соплодие"] << "\n";
}
我编了一个随机的 JSON:
{
"соплодие": "лысеющий","обсчитавший": "перегнавший","кариозный": "отдёргивающийся","суверенен": "носившийся","рецидивизм": "поляризуются"
}
并将其保存在 koi8-r 中为 "input.txt":
00000000: 7b0a 2020 2020 22d3 cfd0 cccf c4c9 c522 {. "........"
00000010: 3a20 22cc d9d3 c5c0 ddc9 ca22 2c0a 2020 : "........",.
00000020: 2020 22cf c2d3 dec9 d4c1 d7db c9ca 223a "...........":
00000030: 2022 d0c5 d2c5 c7ce c1d7 dbc9 ca22 2c0a "...........",.
00000040: 2020 2020 22cb c1d2 c9cf dace d9ca 223a ".........":
00000050: 2022 cfd4 c4a3 d2c7 c9d7 c1c0 ddc9 cad3 "..............
00000060: d122 2c0a 2020 2020 22d3 d5d7 c5d2 c5ce .",. ".......
00000070: c5ce 223a 2022 cecf d3c9 d7db c9ca d3d1 ..": "..........
00000080: 222c 0a20 2020 2022 d2c5 c3c9 c4c9 d7c9 ",. "........
00000090: dacd 223a 2022 d0cf ccd1 d2c9 dad5 c0d4 ..": "..........
000000a0: d3d1 220a 7d0a ..".}.
现在运行该程序显示:
Serialized back: {"соплодие":"лысеющий","обсчитавший":"перегнавший","кариозный":"отдёргивающий
ся","суверенен":"носившийся","рецидивизм":"поляризуются"}
Extracting a single key: "лысеющий"