Lambda通配符在BWA MEM或BWA-MEM2Snakemake包装器中不起作用 Snakefile Config.yaml samples.txt

问题描述

我正在将Snakemake Shell:移植到Snakemake Wrappers的过程中,并注意到我已成功使用lambda wildcards: BWA MEM wrapper的其他包装失败了。

我只能设法使包装器像下面这样硬编码:

input:
        reads=["trimming/trimmomatic/{sample}.1.fastq","trimming/trimmomatic/{sample}.2.fastq"]

但是,我更喜欢使用lambda wildcards: getTrims(wildcards.sample)[0]lambda wildcards: getTrims(wildcards.sample)[1];类似于trimming规则的输入(如下)。

最小示例

Snakefile

# Directories------------------------------------------------------------------
configfile: "config.yaml"

# Setting the names of all directories
dir_list = ["REF_DIR","LOG_DIR","BENCHMARK_DIR","QC_DIR","TRIM_DIR","ALIGN_DIR","MARKDUP_DIR","CALLING_DIR","ANNOT_DIR"]
dir_names = ["refs","logs","benchmarks","qc","trimming","alignment","mark_duplicates","variant_calling","annotation"]
dirs_dict = dict(zip(dir_list,dir_names))

import os
import pandas as pd
# getting the samples @R_662_4045@ion (names,path to r1 & r2) from samples.txt
samples_@R_662_4045@ion = pd.read_csv("samples.txt",sep='\t',index_col=False)
# get a list of the sample names
sample_names = list(samples_@R_662_4045@ion['sample'])
sample_locations = list(samples_@R_662_4045@ion['location'])
samples_dict = dict(zip(sample_names,sample_locations))
# get number of samples
len_samples = len(sample_names)


# Rules -----------------------------------------------------------------------

rule all:
    input:
        expand('{TRIM_DIR}/{TRIM_TOOL}/{sample}_{pair}_trim_{paired}.fq.gz',TRIM_DIR=dirs_dict["TRIM_DIR"],TRIM_TOOL=config["TRIM_TOOL"],sample=sample_names,pair=['R1','R2'],paired=['paired','unpaired']),expand('{ALIGN_DIR}/{ALIGN_TOOL}/{sample}.bam',ALIGN_DIR=dirs_dict['ALIGN_DIR'],ALIGN_TOOL=config['ALIGN_TOOL'],sample=sample_names),def getHome(sample):
  return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))

rule trimming:
    input:
        r1 = lambda wildcards: getHome(wildcards.sample)[0],r2 = lambda wildcards: getHome(wildcards.sample)[1]
    output:
        r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fq.gz"),r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],"{sample}_R1_trim_unpaired.fq.gz"),r2 = os.path.join(dirs_dict["TRIM_DIR"],"{sample}_R2_trim_paired.fq.gz"),r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],"{sample}_R2_trim_unpaired.fq.gz")
    log: os.path.join(dirs_dict["LOG_DIR"],"{sample}.log")
    threads: 32
    params:
        # list of trimmers (see manual)
        trimmer=["MINLEN:36"],# optional parameters
        extra="",compression_level="-9"
    resources:
        mem = 1000,time = 120
    message: """--- Trimming FASTQ files with Trimmomatic."""
    wrapper:
        "0.64.0/bio/trimmomatic/pe"

trim_dir = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"])
trims_locations = [trim_dir] * len_samples
trims_dict = dict(zip(sample_names,trims_locations))

def getTrims(sample):
  return(list(os.path.join(trims_dict[sample],"{0}_{1}_trim_paired.fq.gz".format(sample,'R2']))

rule alignment:
    input:
        reads=["trimming/trimmomatic/{sample}_R1_trim_paired.fq.gz","trimming/trimmomatic/{sample}_R2_trim_paired.fq.gz"]
    output:
        os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
    log: os.path.join(dirs_dict["LOG_DIR"],"{sample}.log")
    message: """--- Alignment with BWA."""
    threads: 8
    resources:
        mem = 2500,time = 100
    params:
        index=os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",sort="none",sort_order="queryname",sort_extra=""            
    wrapper:
        "0.64.0/bio/bwa/mem"

Config.yaml

# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"

# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"

samples.txt

MTG325

这有效:

(snakemake)$ snakemake -n -r 
Building DAG of jobs...
Job counts:
    count   jobs
    1   alignment
    1   all
    2

[Wed Sep  2 08:17:16 2020]
Job 2: --- Alignment with BWA.
Reason: Missing output files: alignment/bwa/MTG325.bam

[Wed Sep  2 08:17:16 2020]
localrule all:
    input: trimming/trimmomatic/MTG325_R1_trim_paired.fq.gz,trimming/trimmomatic/MTG325_R1_trim_unpaired.fq.gz,trimming/trimmomatic/MTG325_R2_trim_paired.fq.gz,trimming/trimmomatic/MTG325_R2_trim_unpaired.fq.gz,alignment/bwa/MTG325.bam
    jobid: 0
    reason: Input files updated by another job: alignment/bwa/MTG325.bam

Job counts:
    count   jobs
    1   alignment
    1   all
    2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

这不是:

rule alignment:
    input:
        reads=["lambda wildcards: getTrims(wildcards.sample)[0]","lambda wildcards: getTrims(wildcards.sample)[1]"]
    output:
        os.path.join(dirs_dict["ALIGN_DIR"],sort_extra=""
    wrapper:
        "0.64.0/bio/bwa/mem"

结果:

(snakemake) [moldach@arc wrappers]$ snakemake -n -r
Building DAG of jobs...
MissingInputException in line 65 of /home/moldach/wrappers/Snakefile:
Missing input files for rule alignment:
lambda wildcards: getTrims(wildcards.sample)[1]
lambda wildcards: getTrims(wildcards.sample)[0]

解决方法

input:
    reads=["lambda wildcards: getTrims(wildcards.sample)[0]","lambda wildcards: getTrims(wildcards.sample)[1]"]

您在此处提供了字符串列表,因此snakemake实际上将"lambda wildcards: getTrims(wildcards.sample)[0]"查找为输入文件,而不将其视为输入函数。

您的rule alingment需要两个输入读取文件的列表,这应该适合您getTrims(sample)函数的输出。

您尝试过吗:

input:
    reads=lambda wildcards: getTrims(wildcards.sample)

稍后将R1R2放回同一列表中时,这里不需要分开。