使用Java将重音符号转换为英文

问题描述

我有一个要求，我需要搜索带有重音符号的字符，这些重音字符可能适合Iceland和Japan中的用户。我写的代码只适用于一些重音符，但并非全部。下面的示例-

À - returns a. Correct.
Â - returns a. Correct.
Ð - returns Ð. This is breaking. It should return e.
Õ - returns Õ. This is breaking. It should return o.

下面是我的代码：-

String accentConvertStr = StringUtils.stripAccents(myKey);

也尝试过这个：-

byte[] b = key.getBytes("Cp1252");
System.out.println("" + new String(b,StandardCharsets.UTF_8));

请告知。

解决方法

我会说它按预期工作。实际上，紧随其后的是StringUtils.stripAccents的基础代码。

String[] chars  = new String[]{"À","Â","Ð","Õ"};

for(String c : chars){
  String normalized = Normalizer.normalize(c,Normalizer.Form.NFD);
  System.out.println(normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+",""));
}

这将输出：一种一种 Ð

如果您阅读https://stackoverflow.com/a/5697575/9671280的答案，将会发现

Be aware that that will not remove what you might think of as “accent” marks from all characters! There are many it will not do this for. For example,you cannot convert Đ to D or ø to o that way. For that,you need to reduce code points to those that match the same primary collation strength in the Unicode Collation Table.

如果仍然要使用StringUtil.stripAccents，则可以单独处理。

请尝试https://github.com/xuender/unidecode，它似乎适合您的情况。

 String normalized = Unidecode.decode(input);

accent-insensitive java java string-utils