问题描述
我正在将Snakemake
Shell:
移植到Snakemake Wrappers的过程中,并注意到我已成功使用lambda wildcards:
BWA MEM wrapper的其他包装失败了。
我只能设法使包装器像下面这样硬编码:
input:
reads=["trimming/trimmomatic/{sample}.1.fastq","trimming/trimmomatic/{sample}.2.fastq"]
但是,我更喜欢使用lambda wildcards: getTrims(wildcards.sample)[0]
和lambda wildcards: getTrims(wildcards.sample)[1]
;类似于trimming
规则的输入(如下)。
最小示例
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR","LOG_DIR","BENCHMARK_DIR","QC_DIR","TRIM_DIR","ALIGN_DIR","MARKDUP_DIR","CALLING_DIR","ANNOT_DIR"]
dir_names = ["refs","logs","benchmarks","qc","trimming","alignment","mark_duplicates","variant_calling","annotation"]
dirs_dict = dict(zip(dir_list,dir_names))
import os
import pandas as pd
# getting the samples @R_662_4045@ion (names,path to r1 & r2) from samples.txt
samples_@R_662_4045@ion = pd.read_csv("samples.txt",sep='\t',index_col=False)
# get a list of the sample names
sample_names = list(samples_@R_662_4045@ion['sample'])
sample_locations = list(samples_@R_662_4045@ion['location'])
samples_dict = dict(zip(sample_names,sample_locations))
# get number of samples
len_samples = len(sample_names)
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{TRIM_DIR}/{TRIM_TOOL}/{sample}_{pair}_trim_{paired}.fq.gz',TRIM_DIR=dirs_dict["TRIM_DIR"],TRIM_TOOL=config["TRIM_TOOL"],sample=sample_names,pair=['R1','R2'],paired=['paired','unpaired']),expand('{ALIGN_DIR}/{ALIGN_TOOL}/{sample}.bam',ALIGN_DIR=dirs_dict['ALIGN_DIR'],ALIGN_TOOL=config['ALIGN_TOOL'],sample=sample_names),def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule trimming:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[0],r2 = lambda wildcards: getHome(wildcards.sample)[1]
output:
r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fq.gz"),r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],"{sample}_R1_trim_unpaired.fq.gz"),r2 = os.path.join(dirs_dict["TRIM_DIR"],"{sample}_R2_trim_paired.fq.gz"),r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],"{sample}_R2_trim_unpaired.fq.gz")
log: os.path.join(dirs_dict["LOG_DIR"],"{sample}.log")
threads: 32
params:
# list of trimmers (see manual)
trimmer=["MINLEN:36"],# optional parameters
extra="",compression_level="-9"
resources:
mem = 1000,time = 120
message: """--- Trimming FASTQ files with Trimmomatic."""
wrapper:
"0.64.0/bio/trimmomatic/pe"
trim_dir = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"])
trims_locations = [trim_dir] * len_samples
trims_dict = dict(zip(sample_names,trims_locations))
def getTrims(sample):
return(list(os.path.join(trims_dict[sample],"{0}_{1}_trim_paired.fq.gz".format(sample,'R2']))
rule alignment:
input:
reads=["trimming/trimmomatic/{sample}_R1_trim_paired.fq.gz","trimming/trimmomatic/{sample}_R2_trim_paired.fq.gz"]
output:
os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
log: os.path.join(dirs_dict["LOG_DIR"],"{sample}.log")
message: """--- Alignment with BWA."""
threads: 8
resources:
mem = 2500,time = 100
params:
index=os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",sort="none",sort_order="queryname",sort_extra=""
wrapper:
"0.64.0/bio/bwa/mem"
Config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"
samples.txt
MTG325
这有效:
(snakemake)$ snakemake -n -r
Building DAG of jobs...
Job counts:
count jobs
1 alignment
1 all
2
[Wed Sep 2 08:17:16 2020]
Job 2: --- Alignment with BWA.
Reason: Missing output files: alignment/bwa/MTG325.bam
[Wed Sep 2 08:17:16 2020]
localrule all:
input: trimming/trimmomatic/MTG325_R1_trim_paired.fq.gz,trimming/trimmomatic/MTG325_R1_trim_unpaired.fq.gz,trimming/trimmomatic/MTG325_R2_trim_paired.fq.gz,trimming/trimmomatic/MTG325_R2_trim_unpaired.fq.gz,alignment/bwa/MTG325.bam
jobid: 0
reason: Input files updated by another job: alignment/bwa/MTG325.bam
Job counts:
count jobs
1 alignment
1 all
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
这不是:
rule alignment:
input:
reads=["lambda wildcards: getTrims(wildcards.sample)[0]","lambda wildcards: getTrims(wildcards.sample)[1]"]
output:
os.path.join(dirs_dict["ALIGN_DIR"],sort_extra=""
wrapper:
"0.64.0/bio/bwa/mem"
结果:
(snakemake) [moldach@arc wrappers]$ snakemake -n -r
Building DAG of jobs...
MissingInputException in line 65 of /home/moldach/wrappers/Snakefile:
Missing input files for rule alignment:
lambda wildcards: getTrims(wildcards.sample)[1]
lambda wildcards: getTrims(wildcards.sample)[0]
解决方法
input:
reads=["lambda wildcards: getTrims(wildcards.sample)[0]","lambda wildcards: getTrims(wildcards.sample)[1]"]
您在此处提供了字符串列表,因此snakemake实际上将"lambda wildcards: getTrims(wildcards.sample)[0]"
查找为输入文件,而不将其视为输入函数。
您的rule alingment
需要两个输入读取文件的列表,这应该适合您getTrims(sample)
函数的输出。
您尝试过吗:
input:
reads=lambda wildcards: getTrims(wildcards.sample)
稍后将R1
和R2
放回同一列表中时,这里不需要分开。