问题描述
我有一个EC2实例,在其中运行实现以下步骤的python守护程序:
- 发送 API请求,以启动数据的点击流;
- 对于收到的每条记录,以JSON格式,对数据应用一些转换,并保存在字符串中,并在收到大量记录后,将其转储为CSV strong>本地格式;
- 运行连接到RDS Postgresql db的子进程,并执行copY命令以从先前本地转储的CSV中加载数据;
- 将当前CSV加载到S3存储桶,然后从EC2实例中删除。
所有步骤都运行良好;问题是,守护程序的单个实例不足以保持 near 的实时速度,因为几个小时后,从API获得记录的时间到从请求并转储到数据库中太大。
鉴于此,当运行守护程序的多个实例(无论是2还是8)时,实际上只有一个正在运行,cpu%约为11%。 我想要实现的是让所有实例都以相似的cpu使用率启动并运行。
除了启动守护程序的不同实例外,我还尝试过:
有关体系结构的详细信息:
- EC2实例c5.2xlarge(8个vcpu,8 GB内存,仅EBS实例存储,网络带宽高达10 Gbps,EBS带宽高达4750 Mbps);
- x86_64-pc-linux-gnu上的Postgresql 11.6,由gcc(GCC)4.8.5 20150623(Red Hat 4.8.5-11)编译,64位。最大连接值:5000。
实现了守护程序类:
import sys,os,time,atexit,signal
class Daemon:
"""A generic daemon class.
Usage: subclass the daemon class and override the run() method."""
def __init__(self,pidfile,stdin,stdout,stderr):
self.pidfile = pidfile
self.stdin = stdin
self.stdout = stdout
self.stderr = stderr
def daemonize(self):
"""Deamonize class. UNIX double fork mechanism."""
try:
pid = os.fork()
if pid > 0:
# exit first parent
sys.exit(0)
except OSError as err:
sys.stderr.write('fork #1 Failed: {0}\n'.format(err))
sys.exit(1)
# decouple from parent environment
os.chdir('/')
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent
sys.exit(0)
except OSError as err:
sys.stderr.write('fork #2 Failed: {0}\n'.format(err))
sys.exit(1)
# redirect standard file descriptors
sys.stdout.flush()
sys.stderr.flush()
si = open(self.stdin,'r')
so = open(self.stdout,'a+')
se = open(self.stderr,'a+')
os.dup2(si.fileno(),sys.stdin.fileno())
os.dup2(so.fileno(),sys.stdout.fileno())
os.dup2(se.fileno(),sys.stderr.fileno())
# write pidfile
atexit.register(self.delpid)
pid = str(os.getpid())
with open(self.pidfile,'w+') as f:
f.write(pid + '\n')
def delpid(self):
os.remove(self.pidfile)
def start(self):
"""Start the daemon."""
# Check for a pidfile to see if the daemon already runs
try:
with open(self.pidfile,'r') as pf:
pid = int(pf.read().strip())
except IOError:
pid = None
if pid:
message = "pidfile {0} already exist. " + \
"Daemon already running?\n"
sys.stderr.write(message.format(self.pidfile))
sys.exit(1)
# Start the daemon
self.daemonize()
self.run()
def stop(self):
"""Stop the daemon."""
# Get the pid from the pidfile
try:
with open(self.pidfile,'r') as pf:
pid = int(pf.read().strip())
except IOError:
pid = None
if not pid:
message = "pidfile {0} does not exist. " + \
"Daemon not running?\n"
sys.stderr.write(message.format(self.pidfile))
return # not an error in a restart
# Try killing the daemon process
try:
while 1:
os.kill(pid,signal.SIGTERM)
time.sleep(0.1)
except OSError as err:
e = str(err.args)
if e.find("No such process") > 0:
if os.path.exists(self.pidfile):
os.remove(self.pidfile)
else:
print (str(err.args))
sys.exit(1)
def restart(self):
"""Restart the daemon."""
self.stop()
self.start()
def run(self):
"""You should override this method when you subclass Daemon.
It will be called after the process has been daemonized by
start() or restart()."""
def ingestion(str: csv_fullpath_ec2):
psql_code = "\copy adobe_ls.adobe_livestream_stg(TABLE_COLUMNS) " \
+ "FROM '" + csv_fullpath_ec2 + "' " \
+ "WITH (FORMAT CSV,DELIMITER '" + delimiter + "',QUOTE E'\b');"
cmd = ['/usr/bin/psql','-U',DB_USERNAME,'-h',HOST_NAME,'-p',PORT,'-d',DB_NAME,'-c','{}'.format(psql_code)]
result = subprocess.run(cmd,env,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)