问题描述
我正在尝试如下划分数据帧:
from io import StringIO
import pandas as pd
data = """
A,B,C
87jg,28,3012
h372,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data),sep=',')
for key,group in df.groupby(['C','B']):
group.to_csv(f'df_{key}.csv',index=False)
这会将按数据帧分组的结果导出到本地计算机。有没有一种方法可以执行此操作并将这些多个拆分的csv上传到s3(类似于boto3的put_object)
解决方法
您也可以使用必须安装的s3fs。可以使用pip
完成安装,例如:
pip install s3fs
已验证示例基于您的代码:
import os
from io import StringIO
import pandas as pd
import s3fs
# I did not use my default aws profile
# so had to provide key and secret. If you use
# the default aws profile,providing `key`
# and `secret` should not be required
fs = s3fs.S3FileSystem(
anon=False,key='<access_key>',secret='<secret_key>')
data = """
A,B,C
87jg,28,3012
h372,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data),sep=',')
for key,group in df.groupby(['C','B']):
group.to_csv(fs.open(f's3://<bucket-name>/df_{key[0]}-M{key[1]}.csv','w'),index=False)
代码正确上传文件:
,from io import StringIO
import pandas as pd
import boto3
data = """
A,')
client = boto3.client('s3')
for key,'B']):
group.to_csv(f'df_{key}.csv',index=False)
client.upload_file(f'df_{key}.csv','my-another-test-bucket-2',f'df_{key[0]}-M{key[1]}.csv')
S3存储桶