问题描述
当前,我可以使用srun [variety of settings] bash
在计算笔记上创建外壳。但是,如果我的ssh由于某种原因断开连接,并且我想重新访问该外壳,该怎么办?
解决方法
假设从笔记本电脑到群集的登录节点的SSH连接不稳定,则可以使用screen
或tmux
之类的terminal multiplexer,具体取决于已安装的内容。登录节点。
通常,会话看起来像这样
[you@yourlaptop ~]$ ssh cluster-frontend
[you@cluster ~]$ tmux # to enter a persistent tmux session
[you@cluster ~]$ srun [...] bash # to get a shell on a compute node
[you@computenode ~]$ # some work,then...
some SSH error (e.g. Write failed: Broken pipe)
[you@yourlaptop ~]$ ssh cluster-frontend
[you@cluster ~]$ tmux a # to re-attach to the persistent tmux session
[you@computenode ~]$ # resume work
对于screen
,您将使用screen -r
而不是tmux a
。否则过程是相同的。
如果要从另一个终端实例(在右下方)加入工作,则可以使用Slurm的sattach
命令。
[you@yourlaptop ~]$ ssh cluster-frontend |
[you@cluster ~]$ srun [...] bash |
srun: job ******* queued and waiting for resources |
srun: job ******* has been allocated resources | [you@yourlaptop ~]$ ssh cluster-frontend
[you@computenode ~]$ | [you@cluster ~]$ sattach --pty ********
[you@computenode ~]$ echo OK | [you@computenode ~]$ echo OK
[you@computenode ~]$ OK | [you@computenode ~]$ OK
原始终端和运行sattach
的终端现在已完全同步。
请注意,以上内容并不能防止srun
的意外终止;每当srun
终止时,作业也会终止。