将python中的字节表示文件读取为utf-8字符

问题描述

我有一个由 Windows 操作系统中的内置工具生成的 .txt 文件，我需要在 python 脚本中解析它（如果相关，则在 Linux 机器上）。

我这样打开文件：

with open(path,'r') as spec_file:

我什至尝试了 io 库

io.open(detail,mode="r",encoding="utf-8") as spec_file:

当文件在（例如）崇高文本中打开时，文件正确显示，当逐行遍历文件时：

for line in spec_file:

和打印 (print(line)) 我也得到了正确的表示：

**********************************************************************************
* This diagnostic information may be used by an IT administrator to troubleshoot *
* the installed Trusted Platform Module (TPM). Please zip the folder and attach  *
* it to issues filed through Feedback Hub or with an IT admin.                   *
**********************************************************************************

但是，当打印为 print(repr(line)) 时，我只得到字符字节表示：

'*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\n'
'\x00\n'
'\x00*\x00 \x00T\x00h\x00i\x00s\x00 \x00d\x00i\x00a\x00g\x00n\x00o\x00s\x00t\x00i\x00c\x00 \x00i\x00n\x00f\x00o\x00r\x00m\x00a\x00t\x00i\x00o\x00n\x00 \x00m\x00a\x00y\x00 \x00b\x00e\x00 \x00u\x00s\x00e\x00d\x00 \x00b\x00y\x00 \x00a\x00n\x00 \x00I\x00T\x00 \x00a\x00d\x00m\x00i\x00n\x00i\x00s\x00t\x00r\x00a\x00t\x00o\x00r\x00 \x00t\x00o\x00 \x00t\x00r\x00o\x00u\x00b\x00l\x00e\x00s\x00h\x00o\x00o\x00t\x00 \x00*\x00\n'
'\x00\n'
'\x00*\x00 \x00t\x00h\x00e\x00 \x00i\x00n\x00s\x00t\x00a\x00l\x00l\x00e\x00d\x00 \x00T\x00r\x00u\x00s\x00t\x00e\x00d\x00 \x00P\x00l\x00a\x00t\x00f\x00o\x00r\x00m\x00 \x00M\x00o\x00d\x00u\x00l\x00e\x00 \x00(\x00T\x00P\x00M\x00)\x00.\x00 \x00P\x00l\x00e\x00a\x00s\x00e\x00 \x00z\x00i\x00p\x00 \x00t\x00h\x00e\x00 \x00f\x00o\x00l\x00d\x00e\x00r\x00 \x00a\x00n\x00d\x00 \x00a\x00t\x00t\x00a\x00c\x00h\x00 \x00 \x00*\x00\n'
'\x00\n'
'\x00*\x00 \x00i\x00t\x00 \x00t\x00o\x00 \x00i\x00s\x00s\x00u\x00e\x00s\x00 \x00f\x00i\x00l\x00e\x00d\x00 \x00t\x00h\x00r\x00o\x00u\x00g\x00h\x00 \x00F\x00e\x00e\x00d\x00b\x00a\x00c\x00k\x00 \x00H\x00u\x00b\x00 \x00o\x00r\x00 \x00w\x00i\x00t\x00h\x00 \x00a\x00n\x00 \x00I\x00T\x00 \x00a\x00d\x00m\x00i\x00n\x00.\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00*\x00\n'
'\x00\n'
'\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\n'

因此无法搜索文件并将其作为字符串处理，所以我需要以某种方式将其转换为 utf-8 字符串，有什么想法可能吗？

解决方法

您的文件采用 UTF-16 LE 编码（因为是 Windows，请参阅 this question 了解更多信息），因此您需要将其设置为编码：

FirebaseFunctions.instance.httpsCallable('helloWorld');

LE 代表 Little Endian，这很重要，因为常规的“utf-16”检查字节顺序标记，Windows 不会输出该标记（再次，因为 Windows），因此您需要明确说明字节顺序。

bytebuffer python string

将python中的字节表示文件读取为utf-8字符

问题描述

解决方法

相关问答