将数据帧上传到s3 python

问题描述

我正在尝试如下划分数据帧:

from io import StringIO
import pandas as pd

data = """
A,B,C
87jg,28,3012
h372,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data),sep=',')

for key,group in df.groupby(['C','B']):
    group.to_csv(f'df_{key}.csv',index=False)

这会将按数据帧分组的结果导出到本地计算机。有没有一种方法可以执行此操作并将这些多个拆分的csv上传到s3(类似于boto3的put_object)

解决方法

您也可以使用必须安装的s3fs。可以使用pip完成安装,例如:

pip install s3fs

已验证示例基于您的代码:

import os

from io import StringIO
import pandas as pd
import s3fs

# I did not use my default aws profile
# so had to provide key and secret. If you use
# the default aws profile,providing `key`
# and `secret` should not be required
fs = s3fs.S3FileSystem(
        anon=False,key='<access_key>',secret='<secret_key>')

data = """ 
A,B,C
87jg,28,3012
h372,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data),sep=',')

for key,group in df.groupby(['C','B']):
    group.to_csv(fs.open(f's3://<bucket-name>/df_{key[0]}-M{key[1]}.csv','w'),index=False)

代码正确上传文件:

enter image description here

,
from io import StringIO
import pandas as pd
import boto3


data = """
A,')

client = boto3.client('s3')
for key,'B']):
    group.to_csv(f'df_{key}.csv',index=False)
    client.upload_file(f'df_{key}.csv','my-another-test-bucket-2',f'df_{key[0]}-M{key[1]}.csv')

S3存储桶

enter image description here