在NLP Google API中使用''.join解析txt进行命名实体识别时出现错误

通过Google input_helper_v2.py提供的脚本，我在尝试为Google NLP API中的命名实体识别构建数据集时遇到了麻烦。

该问题与函数 _DownloadGcsFile 一起出现，因为它会引发以下错误：

gsutil_cp_cmd = ' '.join(['gsutil','cp',gcs_file,local_filename])
TypeError: sequence item 2: expected str instance,bytes found

我尝试放入b' '.join(['gsutil',local_filename])，但它会产生类似的问题。

在搜索信息时，我注意到这可能是python 2.7中正在开发的脚本造成的。

我将非常感谢您，因为我是一个完整的初学者。非常感谢。

好吧，这意味着gcs_file的类型为 bytes 。因此，您需要将其设置为字符串（ str ）类型。例如：

gsutil_cp_cmd = ' '.join(['gsutil','cp',gcs_file.decode('utf-8'),local_filename])