问题描述
我是生物学家,并且对并行处理不熟悉。一些重要的背景是我的某些脚本可能需要15个小时才能执行。当主要功能(in_command)正在运行时,我试图并行运行一个将对硬件使用情况(CPU,RAM等)进行快照的功能。我遇到的问题是,计时器(get_stats)上的递归脚本分别运行时正确执行,但是一旦我使用多处理并行运行它,计时器似乎就无法工作。即使我有一个300秒的计时器,该函数也大约每秒运行一次。在其他脚本完成后,它确实停止了,但是我获得的快照比需要的更多。我也不喜欢我目前的方法,因此,如果有更好的方法,我愿意学习。我只是不能让它对其他脚本产生太大影响,因此快照方法只需要大致了解正在发生的事情。谢谢!
import psutil
import platform
from datetime import datetime
import multiprocessing
import time
import os
from threading import Timer
import multiprocessing as mp
def get_size(bytes,suffix="B"):
"""
Scale bytes to its proper format
e.g:
1253656 => '1.20MB'
1253656678 => '1.17GB'
"""
factor = 1024
for unit in ["","K","M","G","T","P"]:
if bytes < factor:
return f"{bytes:.2f}{unit}{suffix}"
bytes /= factor
def get_stats(switch,snapshot_dict,beg_time):
df =snapshot_dict
start_time = beg_time
df['time'].append(time.time() - start_time)
# Get core information
df['total_cores'].append(psutil.cpu_count(logical=True))
df['physical_cores'].append(psutil.cpu_count(logical=False))
cpufreq = psutil.cpu_freq()
# cpu frequency in Mhz
df['max_frequency'].append(cpufreq.max)
df['min_frequency'].append(cpufreq.min)
df['current_frequency'].append(cpufreq.current)
cpu_core = {}
for i,percentage in enumerate(psutil.cpu_percent(percpu=True,interval=1)):
cpu_core[str(i)] = percentage
df['cpu_core'].append(cpu_core)
# get ram information
svmem = psutil.virtual_memory()
df['total_memory'].append(get_size(svmem.total))
df['available_memory'].append(get_size(svmem.available))
df['used_memory'].append(get_size(svmem.used))
df['percent_memory'].append(svmem.percent)
# swap memory if it exists
swap = psutil.swap_memory()
df['swap_total'].append(get_size(swap.total))
df['swap_free'].append(get_size(swap.free))
df['swap_used'].append(get_size(swap.used))
df['swap_percentage'].append(swap.percent)
print(df)
#Call the code on a recursive function
t = Timer(300,get_stats(switch,df,beg_time))
t.start()
print('Switch: ',switch.value)
if switch.value == 1:
t.cancel()
def in_command(file,switch):
f = open(file,'r')
f_lines = f.readlines()
for line in f_lines:
print(line)
os.system(line)
f.close()
switch.value += 1
if __name__ == "__main__":
manager= mp.Manager()
df = {'time': [],'total_cores': [],'physical_cores': [],'max_frequency': [],'min_frequency': [],'current_frequency': [],'cpu_core': [],'total_memory': [],'available_memory': [],'used_memory': [],'percent_memory': [],'swap_total': [],'swap_free': [],'swap_used': [],'swap_percentage': []}
process_switch = manager.Value('i',0)
start_time = time.time()
p1 = mp.Process(target=get_stats,args = (process_switch,start_time))
p2 = mp.Process(target=in_command,args=('text_command.txt',process_switch))
p1.start()
p2.start()
p1.join()
p2.join()
print('finished')
解决方法
谢谢迈克尔。因此,对于可能正在阅读本文的每个人,我都不会在调用函数中包含参数,而是将它们分开放置。
不正确:
t = Timer(300,get_stats(switch,df,beg_time))
正确:
t = Timer(5,function=get_stats,args=[switch,snapshot_dict,beg_time])