为什么 Julia 的“Distributed.interrupt”会终止工作进程?

问题描述

我对函数 Distributed.interrupt() 的作用感到困惑。文档说它会“中断指定工作人员的当前执行任务”,但它似乎也会终止工作人员。

示例:

using distributed

addprocs(1)  # Adding one local worker 
my_worker = workers()[1]

# Check number of processes
println("Processes: ",nprocs())

# Define a function
@everywhere function just_sleep(time)
    println("Sleeping...")
    sleep(time)
end

# Execute on the worker
remote_do(just_sleep,my_worker,100)

# Wait a bit and interrupt
sleep(5)
interrupt(my_worker)

# Check number of processes again
sleep(5)
println("Processes: ",nprocs())

我得到这个输出

> julia testing.jl
Processes: 2
      From worker 2:    Sleeping...
Worker 2 terminated.
Processes: 1

我希望工作进程 #2 仍然活着,并且最后的进程数量仍然是两个。它甚至没有帮助向 just_sleep() 的主体添加异常处理:

function just_sleep(time)
    println("Sleeping...")
    try
        sleep(time)
    catch e
        if isa(e,InterruptException)
            println("interrupted")
        else
            println(e)
        end
    end
end

现在 interrupt() 似乎表现得像 distributed.rmprocs()。我在 Windows 10 上安装了 Julia 1.5.3。


编辑

我也在 WSL Ubuntu 上尝试过。有更多信息,但工人也被终止

Processes: 2
      From worker 2:    Sleeping...
      From worker 2:    fatal: error thrown and no exception handler available.
      From worker 2:    InterruptException()
      From worker 2:    jl_mutex_unlock at /buildworker/worker/package_linux64/build/src/locks.h:144 [inlined]
      From worker 2:    jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:476
      From worker 2:    poptask at ./task.jl:704
      From worker 2:    wait at ./task.jl:712 [inlined]
      From worker 2:    task_done_hook at ./task.jl:442
      From worker 2:    _jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
      From worker 2:    jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
      From worker 2:    jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
Worker 2 terminated.      From worker 2:        jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:196

      From worker 2:    start_task at /buildworker/worker/package_linux64/build/src/task.c:715
      From worker 2:    unkNown function (ip: (nil))
Processes: 1

有趣的是,它适用于交互式 REPL 会话(仅适用于 Linux):

Processes: 2
workers() = [2]
      From worker 2:    Sleeping for 100 s...
      From worker 2:    interrupted!
Processes: 2
workers() = [2]
1-element Array{Int64,1}:
 2

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)