问题描述
我的问题是光线不会分布在我的工人身上
我总共有 16 个内核,因为我在 ubuntu 的每个 ec2 aws 实例上有 8 个 cpu。
但是,当我启动我的 ray 集群并提交我的 python 脚本时,它只分布在 8 个内核上,因为只有 8 个 pid 显示可以使用。
另外值得注意的是,我无法访问 EC2 实例上的 ray 仪表板,我只能通过打印正在使用的 pid 来获取此信息。
如何让我的脚本被所有 16 个 cpu 使用,从而显示用于执行脚本的 16 个pid?
这是我的脚本:
import os
import ray
import time
import xgboost
from xgboost.sklearn import XGBClassifier
def printer():
print("INSIDE WORKER " + str(time.time()) +" PID : "+ str(os.getpid()))
# decorators allow for futures to be created for parallelization
@ray.remote
def func_1():
#model = XGBClassifier()
count = 0
for i in range(100000000):
count += 1
printer()
return count
@ray.remote
def func_2():
#model = XGBClassifier()
count = 0
for i in range(100000000):
count += 1
printer()
return count
@ray.remote
def func_3():
count = 0
for i in range(100000000):
count += 1
printer()
return count
def main():
#model = XGBClassifier()
start = time.time()
results = []
ray.init(address='auto')
#append fuction futures
for i in range(10):
results.append(func_1.remote())
results.append(func_2.remote())
results.append(func_3.remote())
#run in parrallel and get aggregated list
a = ray.get(results)
b = 0
#add all values in list together
for j in range(len(a)):
b += a[j]
print(b)
#time to complete
end = time.time()
print(end - start)
if __name__ == '__main__':
main()
这是我的配置:
# A unique identifier for the head node and workers of this cluster.
cluster_name: basic-ray-123454
# The maximum number of workers nodes to launch in addition to the head
# node. This takes precedence over min_workers. min_workers defaults to 0.
max_workers: 2 # this means zero workers
min_workers: 2 # this means zero workers
# Cloud-provider specific configuration.
provider:
type: aws
region: eu-west-2
availability_zone: eu-west-2a
file_mounts_sync_continuously: False
auth:
ssh_user: ubuntu
ssh_private_key: /home/user/.ssh/aws_ubuntu_test.pem
head_node:
InstanceType: c5.2xlarge
ImageId: ami-xxxxxxa6b31fd2c
KeyName: aws_ubuntu_test
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 200
worker_nodes:
InstanceType: c5.2xlarge
ImageId: ami-xxxxx26a6b31fd2c
KeyName: aws_ubuntu_test
file_mounts: {
"/home/ubuntu": "/home/user/RAY_AWS_DOCKER/ray_example_2_4/conda_env.yaml"
}
setup_commands:
- echo "start initialization_commands"
- sudo apt-get update
- sudo apt-get upgrade
- sudo apt-get install -y python-setuptools
- sudo apt-get install -y build-essential curl unzip psmisc
- pip install --upgrade pip
- pip install ray[all]
- echo "all files :"
- ls
# - conda install -c conda-forge xgboost
head_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)