熊猫pd.read_csvs3_path失败,出现“ TypeError:'协程'对象不可下标”

问题描述

我正在Amazon EMR Cluster中运行一个spark应用程序,并且从几天前开始,每当尝试使用熊猫从S3读取文件时,都会出现以下错误。我添加了引导操作来安装pandas,fsspec和s3fs。

代码:

import pandas as pd
df = pd.read_csv(s3_path)

错误日志:

Traceback (most recent call last):
  File "spark.py",line 84,in <module>
    df=pd.read_csv('s3://<bucketname>/<filename>.csv')
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parsers.py",line 686,in read_csv
    return _read(filepath_or_buffer,kwds)
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parsers.py",line 435,in _read
    filepath_or_buffer,encoding,compression
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/common.py",line 222,in get_filepath_or_buffer
    filepath_or_buffer,mode=mode or "rb",**(storage_options or {})
  File "/usr/local/lib/python3.7/site-packages/fsspec/core.py",line 133,in open
    out = self.__enter__()
  File "/usr/local/lib/python3.7/site-packages/fsspec/core.py",line 101,in __enter__
    f = self.fs.open(self.path,mode=mode)
  File "/usr/local/lib/python3.7/site-packages/fsspec/spec.py",line 844,in open
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py",line 394,in _open
    autocommit=autocommit,requester_pays=requester_pays)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py",line 1276,in __init__
    cache_type=cache_type)
  File "/usr/local/lib/python3.7/site-packages/fsspec/spec.py",line 1134,in __init__
    self.details = fs.info(path)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py",line 719,in info
    return sync(self.loop,self._info,path,bucket,key,kwargs,version_id)
  File "/usr/local/lib/python3.7/site-packages/fsspec/asyn.py",line 51,in sync
    raise exc.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/fsspec/asyn.py",line 35,in f
    result[0] = await future
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py",line 660,in _info
    Key=key,**version_id_kw(version_id),**self.req_kw)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py",line 214,in _call_s3
    raise translate_boto_error(err)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py",line 207,in _call_s3
    return await method(**additional_kwargs)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/client.py",line 121,in _make_api_call
    operation_model,request_dict,request_context)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/client.py",line 140,in _make_request
    return await self._endpoint.make_request(operation_model,request_dict)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/endpoint.py",line 90,in _send_request
    exception):
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/endpoint.py",line 199,in _needs_retry
    caught_exception=caught_exception,request_dict=request_dict)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/hooks.py",line 29,in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/utils.py",line 1225,in redirect_from_error
    new_region = self.get_bucket_region(bucket,response)
  File "/usr/local/lib/python3.7/site-packages/botocore/utils.py",line 1283,in get_bucket_region
    headers = response['ResponseMetadata']['HTTPHeaders']
TypeError: 'coroutine' object is not subscriptable
sys:1: RuntimeWarning: coroutine 'AioBaseClient._make_api_call' was never awaited

s3fs可能有问题,因为这似乎是唯一收到更新的软件包,但是我在熊猫的更新日志中找不到与此相关的任何东西吗?

解决方法

Dask/s3fs团队已确认这是一个错误。此Github issue建议aiobotocore无法获得S3存储桶的 region_name

如果您遇到相同的问题,请考虑将s3fs降级为0.4.2,或者尝试将环境变量AWS_DEFAULT_REGION设置为解决方法。

编辑:aiobotocore=1.1.1的最新版本已对其进行了修复。如果您遇到相同的问题,请升级aiobotocore和s3fs。

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...