问题描述
我已经制作了一个python脚本,可以将文件上传到s3存储桶中。我需要脚本从docker容器中定期运行。
#!/usr/local/bin/python3
import boto3
from botocore.errorfactory import ClientError
import os
import glob
import json
import time
s3_client = boto3.client('s3')
s3_bucket_name = 'ap-rewenables-feature-data'
uploaded = None
max_mod_time = '0'
file_list = glob.glob('/data/*.json')
file_mod_time = None
# get mod time for all file in data directory
file_info = [{'file': file,'mod_time': time.strftime(
'%Y-%m-%d %H:%M:%s',time.gmtime(os.path.getmtime(file)))} for file in file_list]
timestamp_sorted_file_info = sorted(file_info,key = lambda f: f['mod_time'])
if os.path.exists('max_mod_time.json'):
with open('max_mod_time.json','r') as mtime:
max_mod_time = json.load(mtime)['max_mod_time']
# Todo: fix strange behavior in Docker Container
# upload the files tp s3
for file in timestamp_sorted_file_info:
file_mod_time = file['mod_time']
# file_mod_time = '2020-09-19 13:28:53' # for debugging
file_name = os.path.basename(file['file'])
uploaded = False
if file_mod_time > max_mod_time:
with open(os.path.join('/data/',file_name),"rb") as f:
s3_client.upload_fileobj(f,s3_bucket_name,file_name)
# error check - https://stackoverflow.com/a/38376288/7582937
try:
s3_client.head_object(Bucket=s3_bucket_name,Key=file_name)
except ClientError as error:
# Not found
if error.response['ResponseMetadata']['HTTPStatusCode'] == 404:
raise error
uploaded = True
# save max mod time to file
# https://stackoverflow.com/a/5320889/7582937
object_to_write = json.dumps(
{"max_mod_time": file_mod_time})
if uploaded:
if object_to_write:
open('max_mod_time.json','w').write(str(object_to_write))
我正在crond
python容器中使用3.7-alpine
。我的Dockerfile
在下面:
FROM python:3.7-alpine
workdir /scripts
RUN pip install boto3
ENV AWS_ACCESS_KEY_ID=############
ENV AWS_SECRET_ACCESS_KEY=###################
copY s3-file-upload-crontab /etc/crontabs/root
RUN chmod 644 /etc/crontabs/root
copY s3_upload.py /scripts/s3_upload.py
RUN chmod a+x /scripts/s3_upload.py
ENTRYPOINT crond -f
脚本应该定期运行,并将所有新文件上传到s3存储桶中,以下是我的crontab文件。
5-10/1 * * * * /bin/pwd; /scripts/s3_upload
我正在使用docker-compose.yml
进行构建,以调出容器并将主机目录同步到容器中的目录。
version: '3.8'
services:
s3-data-transfer:
image: ap-aws-s3-file-upload
build:
context: ./s3-data-upload/
volumes:
- ./data/features:/data
运行docker-compose build
和docker-compose up
之后,我得到的结果是:
Creating highspeed_s3-data-transfer_1 ... done
Attaching to highspeed_s3-data-transfer_1
它只是挂在那儿,我已经通过将其附加到容器,创建文件并运行上传脚本来手动测试了该脚本。手动运行时,它应能正常工作。
crond
的配置/设置似乎有问题,我看不到任何可能引起问题的东西。
我该如何解决?欢迎任何建议。
谢谢。
解决方法
一段时间后,我可以通过将crobtab中的计时正确设置为以下方式来解决此问题:
4/10 * * * * /bin/pwd; /scripts/s3_upload