基于Snakemake的软件在slurm集群上不工作,而在登录节点上工作

问题描述

我正在尝试在我们的集群 https://github.com/MeHelmy/princess 上使用 SLURM 运行基于 snakemake 的软件 https://gitlab.com/cigene/computational/orion-support

简而言之, 问题是当我在登录节点上运行该软件时,它可以工作。然而,当我提交一个 slurm 脚本来运行这个软件时,它给了我一个错误 snakemake/temp file is not found. 在没有集群的情况下完成工作需要不切实际的时间,所以我非常感谢你的帮助。

到目前为止,我们、IT 团队和开发人员已经尝试过:

(1) 使用我们的集群管理器检查 yaml 文件设置 - 看起来不错

(2) 添加 --latency-wait 120 - 无明显变化

(3) 检查是否有从集群中删除临时文件的限制——没有这样的限制

我跑的:

(base) [mariesai@cn-4 ~]$ module load PyYAML/5.1.2-GCCcore-8.3.0
(base) [mariesai@cn-4 ~]$ module load Miniconda3/4.7.10
(base) [mariesai@cn-4 ~]$ module load snakemake/5.30.1
(base) [mariesai@cn-4 ~]$ module load htslib 

The following have been reloaded with a version change:
  1) GCCcore/8.3.0 => GCCcore/9.3.0
  2) XZ/5.2.4-GCCcore-8.3.0 => XZ/5.2.5-GCCcore-9.3.0
  3) binutils/2.32-GCCcore-8.3.0 => binutils/2.34-GCCcore-9.3.0
  4) bzip2/1.0.8-GCCcore-8.3.0 => bzip2/1.0.8-GCCcore-9.3.0
  5) zlib/1.2.11-GCCcore-8.3.0 => zlib/1.2.11-GCCcore-9.3.0

(base) [mariesai@cn-4 princesstest]$ python3.7  /net/fs-1/Transpose/Software/princess/princess all -d $PWD/0406/2.without.e -r ont -s $PWD/test.fastq.gz --chr ssa18 -f /mnt/SCRATCH/kristenl/Reference/Simon_Sept2020.fasta -j 200  --rerun-incomplete --verbose 

错误摘要

[Tue Apr  6 09:05:56 2021]
Error in rule readsstat:
    jobid: 6
    output: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/statitics/raw_reads/reads_stat.txt
    conda-env: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/.snakemake/conda/1b58da75
    shell:
        
        python scripts/rawcoverage.py -i /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/test.fastq.gz -o /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/statitics/raw_reads/reads_stat.txt -t 5
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 12727829

Error executing rule readsstat on cluster (jobid: 6,external: 12727829,jobscript: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/.snakemake/tmp.4xwny_7b/snakejob.readsstat.6.sh). For error details see the cluster log and the log files of the involved rule(s).

日志摘要 slurm-12727829.out

Waiting at most 5 seconds for missing files.
Missing files after 5 seconds:
/net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/.snakemake/tmp.4xwny_7b

slurm-12727823.out

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 5
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   readsstat
    1
Select jobs to execute...

[Tue Apr  6 09:05:21 2021]
Job 0: Calculating read coverage statitics for: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/test.fastq.gz


        python scripts/rawcoverage.py -i /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/test.fastq.gz -o /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/statitics/raw_reads/reads_stat.txt -t 5
        
Activating conda environment: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/.snakemake/conda/1b58da75
Waiting at most 5 seconds for missing files.
MissingOutputException in line 10 of /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/modules/stat.smk:
Job Missing files after 5 seconds:
/net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/statitics/raw_reads/reads_stat.txt
This might be due to filesystem latency. If that is the case,consider to increase the wait time with --latency-wait.
Job id: 0 completed successfully,but some output files are missing. 0
  File "/cluster/software/snakemake/5.30.1/lib/python3.6/site-packages/snakemake/executors/__init__.py",line 575,in handle_job_success
  File "/cluster/software/snakemake/5.30.1/lib/python3.6/site-packages/snakemake/executors/__init__.py",line 254,in handle_job_success
Shutting down,this might take some time.
Exiting because a job execution Failed. Look above for error message

slurm-12727825.out

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   readsstat
    1
Select jobs to execute...

[Tue Apr  6 09:05:46 2021]
Job 0: Calculating read coverage statitics for: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/test.fastq.gz


        python scripts/rawcoverage.py -i /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/test.fastq.gz -o /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/statitics/raw_reads/reads_stat.txt -t 1
        
Activating conda environment: /net/cn-1/mnt/SCRATCH/princesstest/0406/2.without.e/.snakemake/conda/1b58da75
[Tue Apr  6 09:06:02 2021]
Finished job 0.
1 of 1 steps (100%) done

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...