Snakemake --use-conda 与 --cluster 和 NFS4 存储

问题描述

我在集群模式下使用 snakemake 向 HPCC 提交一个简单的单规则工作流,它运行具有多个计算节点的 Torque。 NFSv4 存储安装在 /data 上。有一个链接 /PROJECT_DIR -> /data/PROJECT_DIR/

我使用以下方法提交作业:

snakemake --verbose --use-conda --conda-prefix /data/software/miniconda3-ngs/envs/snakemake \
--rerun-incomplete --printshellcmds --latency-wait 60  \ 
--configfile /PROJECT_DIR/config.yaml -s '/data/WORKFLOW_DIR/Snakefile' --jobs 100 \
--cluster-config '/PROJECT_DIR/cluster.json' \
--cluster 'qsub -j oe -l mem={cluster.mem} -l walltime={cluster.time} \
                      -l nodes={cluster.nodes}:ppn={cluster.ppn}'

作业失败:

Error in rule fastqc1:                                      
    jobid: 1                                          
    output: /PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html                                    
    conda-env: /data/software/miniconda3-ngs/envs/snakemake/74019bbc                     
    shell: 
                                                                                
        fastqc -o /PROJECT_DIR/OUTPUT_DIR/ -t 4 -f fastq /PROJECT_DIR/INPUT/SAMPLE.fastq.gz 
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 211078.CLUSTER

    Error executing rule fastqc1 on cluster (jobid: 1,external: 211078.CLUSTER,jobscript:
    PROJECT_DIR/.snakemake/tmp.t5a2dpxe/snakejob.fastqc1.1.sh). For error details see the cluster
    log and the log files of the involved rule(s).

提交的作业脚本如下所示:

Jobscript: 
#!/bin/sh                                             
# properties = {"type": "single","rule": "fastqc1","local": false,"input": 
  ["/PROJECT_DIR/INPUT_DIR/SAMPLE.fastq.gz"],"output": ["/PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html"],"wildcards": {"sample": "SAMPLE","read": "1"},"params": {},"log": [],"threads": 4,"resources": {},"jobid": 1,"cluster": {"nodes": 1,"ppn": 4,"time": "01:00:00","mem": "32gb"}}                                         
  
  cd /data/PROJECT_DIR && \
  PATH='/data/software/miniconda3-ngs/envs/snakemake-5.32.2/bin':$PATH \
  /data/software/miniconda3-ngs/envs/snakemake-5.32.2/bin/python3.8 \ 
  -m snakemake /PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html --snakefile /data/WORKFLOW_DIR/Snakefile \
  --force -j --keep-target-files --keep-remote --max-inventory-time 0 \                  
  --wait-for-files /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe \
  /PROJECT_DIR/INPUT/SAMPLE.fastq.gz /data/software/miniconda3-ngs/envs/snakemake/74019bbc --latency-wait 60 \ 
  --attempt 1 --force-use-threads --scheduler ilp \
  --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ \  
  --configfiles /PROJECT_DIR/config.yaml -p --allowed-rules fastqc1 --nocolor --notemp --no-hooks --nolock \    
  --mode 2  --use-conda --conda-prefix /data/software/miniconda3-ngs/envs/snakemake  \
  && touch /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe/1.jobfinished || \
  (touch /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe/1.jobfailed; exit 1) 

不知何故,当使用交互式 qsub shell 在单个计算节点上本地运行工作流时,不会出现此问题。只有在从登录节点向整个计算集群提交作业时才会发生。

经测试的蛇形版本:

  • 5.10.0
  • 5.32.2
  • 6.0.5

解决方法

通过提供作业脚本 (--jobscript SCRIPT) 解决:

#!/bin/bash
# properties = {properties}
set +u;
source /data/software/miniconda3-ngs/etc/profile.d/conda.sh;
conda activate snakemake-5.32.2
set -u;
{exec_job}

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...