问题描述
如标题所示,我的 Snakefile 在 all 规则中给了我一个扩展函数的语法错误。我知道这通常是由空格/缩进错误引起的,但是我已经确认文件中没有选项卡。我已经删除了每个空格,并使用 grep 搜索了文件。我很感激任何建议。
错误信息:
SyntaxError in line 14 of /PATH/to/Snakefile:
Unexpected keyword expand in rule deFinition (Snakefile,line 14)
代码:
from glob import glob
from numpy import unique
reads = glob('{}/*'.format(config['readDir']))
samples = []
for i in reads:
sampleName = i.replace('{}/'.format(config['readDir']),'')
sampleName = sampleName.replace('{}'.format(config['readSuffix1']),'')
sampleName = sampleName.replace('{}'.format(config['readSuffix2']),'')
samples.append(sampleName)
samples = unique(samples)
rule all:
expand('fastqc/{sample}_1_fastqc.html',sample=samples),expand('gene_count/{sample}.count',sample=samples)
rule fastqc:
input:
r1 = config['readDir'] + '/{sample}' + config['readSuffix1'],r2 = config['readDir'] + '/{sample}' + config['readSuffix2']
output:
o1 = 'fastqc/{sample}_1_fastqc.html',o2 = 'fastqc/{sample}_2_fastqc.html'
params:
'fastqc'
shell:
'fastqc {input.r1} {input.r2} -o {params}'
rule trim:
input:
r1 = config['readDir'] + '/{sample}' + config['readSuffix1'],r2 = config['readDir'] + '/{sample}' + config['readSuffix2']
output:
'trimmed_reads/{sample}_val_1.fq','trimmed_reads/{sample}_val_2.fq'
params:
outDir = 'trimmed_reads',suffix = '{sample}',minPhred = config['minPhred'],minOverlap = config['minOverlap']
shell:
'trim_galore --paired --quality {params.minPhred} '
'--stringency {params.minOverlap} --basename {params.suffix} '
'--output_dir {params.outDir} {input.r1} {input.r2}'
rule align:
input:
r1 = 'trimmed_reads/{sample}_val_1.fq',r2 = 'trimmed_reads/{sample}_val_2.fq'
output:
sam = temp('aligned_reads/{sample}.sam'),bam = 'aligned_reads/{sample}.bam'
params:
ref = config['hisatRef']
threads:
config['threads']
log:
'logs/{sample}_hisat2.log'
shell:
'hisat2 --dta -p {threads} -x {params.ref} '
'-1 {input.r1} -2 {input.r2} -S {output.sam} 2> {log}; '
'samtools sort -@ {threads} -o {output.bam} {output.sam}; '
rule sort_name:
input:
'aligned_reads/{sample}.bam'
output:
bam = temp('aligned_reads/{sample}_name_sorted.bam'),index = temp('aligned_reads/{sample}_name_sorted.bam.bai')
threads:
config['threads']
shell:
'samtools sort -n -@ {threads} -o {output.bam} {input}; '
rule count:
input:
bam = 'aligned_reads/{sample}.bam'
output:
'gene_count/{sample}.count'
params:
annotations = config['annotations'],minMapq = config['minMapq'],stranded = config['stranded']
shell:
'htseq-count -s {params.stranded} -a {params.minMapq} '
'--additional_attr=gene_name --additional_attr=gene_type '
'{input.bam} {params.annotations} > {output}'
解决方法
这是来自 python 的错误,因为规则 all
有两个用逗号分隔的函数。在这种情况下,第二个扩展调用会导致错误。您可以将 ,
替换为 +
以解决如下所示的错误。
expand('fastqc/{sample}_1_fastqc.html',sample=samples) + expand('gene_count/{sample}.count',sample=samples)
您也可以将两者合并为一个扩展函数,如下所示
expand(['fastqc/{sample}_1_fastqc.html','gene_count/{sample}.count'],sample=samples)
,
以下代码将解决此问题:
rule all:
input:
expand('fastqc/{sample}_1_fastqc.html',sample=samples),expand('gene_count/{sample}.count',sample=samples)