如何在python中解码部分转义的unicode字符串混合的unicode和转义的unicode?

问题描述

给出以下字符串:

str = "\\u20ac €"

如何将其解码为€ €

使用str.encode("utf-8").decode("unicode-escape")返回€ â\x82¬

(为澄清起见,我正在寻找一种通用的解决方案,该方法如何解码unicode和转义字符的任何组合)

解决方法

如果这始终是字符串的格式,请使用.split

string = "\\u20ac €"
escaped_unicode,non_escaped_unicode = string.split()
output = '{} {}'.format(escaped_unicode.encode("utf-8").decode("unicode-escape"),non_escaped_unicode)
print(output)
# € €

否则,我们将需要更多的创造力。我认为最通用的解决方案是仍然使用split,然后使用regex来确定是否需要处理转义的unicode(假定输入足够合理,不能在Unicode中混合unicode和转义的unicode。相同的“单词”

import re

string = "ac ab \\u20ac cdef €"
regex = re.compile(r'([\u0000-\u007F]+)')
output = []
for word in string.split():
    match = regex.search(word)
    if match:
        try:
            output.append(match[0].encode("utf-8").decode("unicode-escape"))
        except UnicodeDecodeError:
            # assuming the string contained a literal \\u or anything else
            # that decode("unicode-escape") could not handle,so adding to output as is
            output.append(word)
    else:
        output.append(word)
print(' '.join(output))
# ac ab € cdef €
,

一种简单而快速的解决方案是使用NULL POINTER EXCEPTION来匹配OnViewCreated()和正好四个十六进制数字,并将这些数字转换为Unicode代码点:

@Override
public void onViewCreated(@NonNull View view,@Nullable Bundle savedInstanceState) {
    super.onViewCreated(view,savedInstanceState);

    submitCheck = (Button) view.findViewById(R.id.signupBtn);

    textFillCheck = (EditText) view.findViewById(R.id.signupFirstName);

    submitCheck.setOnClickListener(new View.OnClickListener() {
    @Override
    public void onClick(View view) {

        if (TextUtils.isEmpty(textFillCheck.getText().toString())) {

            Toast.makeText(getActivity(),"Please fill in all fields",Toast.LENGTH_SHORT).show();

            Intent intent = new Intent(getActivity(),SignupFragment.class);
            startActivity(intent);
        }
        else{
         Toast.makeText(getActivity(),textFillCheck.getText().toString(),Toast.LENGTH_LONG).show();
        }
    }
});
  }

输出:

re.sub