前言
本文主要给大家介绍了关于python3中全角和半角字符转换的相关内容,分享出来供大家参考学习,下面话不多说了,来一起看看详细的介绍吧。
一、背景介绍
适用什么场景:学生答题数据中全角字符替换为半角字符
二、全角半角原理
全角即:Double Byte Character,简称DBC
半角即:Single Byte Character,简称SBC
在 windows 中,中文和全角字符都占两个字节,并且使用了 ascii chart 2 (codes 128C255);
全角字符的第一个字节总是被置为 163,而第二个字节则是相同半角字符码加上128(不包括空格,全角空格和半角空格也要考虑进去);
对于中文来说,它的第一个字节被置为大于163,如'阿'为:176 162,检测到中文时不进行转换。
例如:半角 a 为 65,则全角 a 是 163(第一个字节)、193(第二个字节,128+65)。
全角半角示例:(文本 test.txt 包含全角和半角字符)
F:\test>type test.txt 123456 123456 abcdefg abcdefg 中国你好
# -*- coding:utf-8 -*- # i@mail.chenpeng.info ”' 全角即:Double Byte Character,简称:DBC 半角即:Single Byte Character,简称:SBC ”' def DBC2SBC(ustring): ”' 全角转半角 ”' rstring = “” for uchar in ustring: inside_code = ord(uchar) if inside_code == 0x3000: inside_code = 0x0020 else: inside_code -= 0xfee0 if not (0x0021 <= inside_code and inside_code <= 0x7e): rstring += uchar continue rstring += chr(inside_code) return rstring def SBC2DBC(ustring): ”' 半角转全角 ”' rstring = “” for uchar in ustring: inside_code = ord(uchar) if inside_code == 0x0020: inside_code = 0x3000 else: if not (0x0021 <= inside_code and inside_code <= 0x7e): rstring += uchar continue inside_code += 0xfee0 rstring += chr(inside_code) return rstring s = ”' array(‘0' => ‘0',‘1' => ‘1',‘2' => ‘2',‘3' => ‘3',‘4' => ‘4',‘5' => ‘5',‘6' => ‘6',‘7' => ‘7',‘8' => ‘8',‘9' => ‘9',‘A' => ‘A',‘B' => ‘B',‘C' => ‘C',‘D' => ‘D',‘E' => ‘E',‘F' => ‘F',‘G' => ‘G',‘H' => ‘H',‘I' => ‘I',‘J' => ‘J',‘K' => ‘K',‘L' => ‘L',‘M' => ‘M',‘N' => ‘N',‘O' => ‘O',‘P' => ‘P',‘Q' => ‘Q',‘R' => ‘R',‘S' => ‘S',‘T' => ‘T',‘U' => ‘U',‘V' => ‘V',‘W' => ‘W',‘X' => ‘X',‘Y' => ‘Y',‘Z' => ‘Z',‘a' => ‘a',‘b' => ‘b',‘c' => ‘c',‘d' => ‘d',‘e' => ‘e',‘f' => ‘f',‘g' => ‘g',‘h' => ‘h',‘i' => ‘i',‘j' => ‘j',‘k' => ‘k',‘l' => ‘l',‘m' => ‘m',‘n' => ‘n',‘o' => ‘o',‘p' => ‘p',‘q' => ‘q',‘r' => ‘r',‘s' => ‘s',‘t' => ‘t',‘u' => ‘u',‘v' => ‘v',‘w' => ‘w',‘x' => ‘x',‘y' => ‘y',‘z' => ‘z',‘(' => ‘(‘,‘)' => ‘)',‘〔' => ‘[‘,‘〕' => ‘]',‘【' => ‘[‘,‘】' => ‘]',‘〖' => ‘[‘,‘〗' => ‘]',‘”‘ => ‘[‘,‘”‘ => ‘]',‘\” => ‘[‘,‘\” => ‘]',‘{' => ‘{‘,‘}' => ‘}',‘《' => ‘<‘,‘》' => ‘>',‘%' => ‘%',‘+' => ‘+',‘―' => ‘-‘,‘-' => ‘-‘,‘~' => ‘-‘,‘:' => ‘:',‘。' => ‘.',‘、' => ‘,',‘,' => ‘.',‘、' => ‘.',‘;' => ‘,‘?' => ‘?',‘!' => ‘!',‘…' => ‘-‘,‘‖' => ‘|',‘”‘ => ‘”‘,‘\” => ‘`',‘|' => ‘|',‘〃' => ‘”‘,‘ ' => ‘ ‘); ”' # 全角转半角 print(DBC2SBC(s)) # 半角转全角 print(SBC2DBC(s)) s = ”'中文测试”' # 全角转半角 print(DBC2SBC(s)) # 半角转全角 print(SBC2DBC(s))
四、总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对编程小技巧的支持。
五、参考资料
http://thinkerou.com/2015-06/covert-dbc-sbc/