我正在使用 IBM Cloud Object Storage 并希望从存储中读取 pdf 文件并希望以字符串的形式存储其文本内容

问题描述

我使用了 IBM COS 文档中提到的 ibm_boto3。我已将资源定义如下：

cos = ibm_boto3.resource("s3",ibm_api_key_id=COS_API_KEY_ID,ibm_service_instance_id=SERVICE_INSTANCE_ID,ibm_auth_endpoint=COS_AUTH_ENDPOINT,config=Config(signature_version="oauth"),endpoint_url=COS_ENDPOINT
)

以下是我用来获取pdf文件内容的代码：

def get_item(bucket_name,item_name):
    print("Retrieving item from bucket: {0},key: {1}".format(bucket_name,item_name))
    try:
        file = cos.Object(bucket_name,item_name).get()
        file_content = file["Body"].read() #returns data in bytes
        #print("\nFILE:-------------------------\n",file) #shows the Meta data of the object
        return file_content
    except ClientError as be:
        print("CLIENT ERROR: {0}\n".format(be))
    except Exception as e:
        print("Unable to retrieve file contents: {0}\n".format(e))

该对象是 ibm_botocore.response.StreamingBody 对象 类型。我无法将获得的字节数据转换为字符串。我试过用 utf-8 和 base64 解码，但没有用。当我尝试使用 utf-8 进行解码时，出现以下错误：

无法检索文件内容：“utf-8”编解码器无法解码位置 11 中的字节 0xb5：起始字节无效

我也无法弄清楚 IBM COS 使用什么类型的编码。

提前致谢。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

ibm-cloud ibm-cloud-storage python-3.x