在ASCIIFoldingFilter

问题描述

我一直在使用ASCII折叠过滤器来处理变音符号，不仅用于弹性搜索中的文档，还用于处理其他各种字符串。

public static String normalizeText(String text,boolean shouldTrim,boolean shouldLowerCase) {
        if (Strings.isNullOrEmpty(text)) {
            return text;
        }
        if (shouldTrim) {
            text = text.trim();
        }
        if (shouldLowerCase) {
            text = text.toLowerCase();
        }
        char[] chararray = text.tochararray();

        // once a character is normalized it Could become more than 1 character. Official document says the output
        // length should be of size >= length * 4.
        char[] out = new char[chararray.length * 4 + 1];
        int outLength = ASCIIFoldingFilter.foldToASCII(chararray,out,chararray.length);
        return String.copyValueOf(out,outLength);
    }

但是，根据official documentation，该方法有一个注释This API is for internal purposes only and might change in incompatible ways in the next release.，替代方法是使用foldToASCII(char[] input,int length)非静态方法（此方法内部调用相同的静态方法），但使用它需要准备ascii折叠过滤器，令牌过滤器，令牌流，分析器（这需要选择分析器的类型，我可能必须创建一个自定义的分析器）。我找不到开发人员完成后者的示例。我尝试编写自己的解决方案，但非静态的foldingToAscii不会返回确切的output，而是在末尾附加了一系列不需要的字符。我想知道各种开发人员如何处理这个问题？

编辑：我还看到一些开源项目正在使用静态foldToAscii，因此另一个问题是使用非静态foldToAscii是否真的值得

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）