问题描述
我在Python编程方面经验很少,而且我才刚刚开始学习这种脚本语言。我有一个脚本和一堆.gff基因组文件。我正在使用此脚本来汇总我的gff文件中的信息。
#!/usr/bin/env python2
from Bio.SeqIO.FastaIO import SimpleFastaParser
from Bio.Seq import translate
from Bio.Seq import reverse_complement
import os
import string
import random
import sys
''' given all the gff files,summarise them to create a big CSV file with all the details of these genomes the details:
1. Number of CDSs
2. Number of pseudogene
3. Number of other elements in GFF files
4. Genome size After summary -> add ST,pathotype and Metadata and see how stratifying the data changes anything and will give a general description of the pan-genome'''
def read_gff(gff_file): categories = ["hypothetical protein","transposase","pseudogene","conjuga","phage","fimbrial","plasmid","crispr","resistance","virulence","secretion system"] counts = {}
for cat in categories:
counts[cat] = 0
try:
f = open(gff_file)
except IOError:
print("Could not read file:",gff_file)
return counts
for line in f:
line = line.lower()
if line.startswith("##fasta"):
break
if line.startswith("#"):
continue
toks = line.strip().split()
product = toks[2]
if product not in counts:
counts[product] = 0
counts[product] += 1
for cat in categories:
if cat in line:
counts[cat] += 1
f.close()
return counts
header = ["cds","trna","hypothetical protein","secretion system"]
out = open("gff_summary.csv","w")
out.write("ID,file_name," + ",".join(header) + "\n")
cnt = 0
with open(sys.argv[1]) as f:
for line in f:
toks = line.strip().split("\t")
if line.startswith("ID"):
annot_loc_index = toks.index("Annotation_Location")
continue
ID = toks[0]
files = toks[annot_loc_index].split(",")
for f1 in files:
print(f1)
counts = read_gff(f1)
out.write(ID + "," + f1)
for cat in header:
out.write("," + str(counts[cat]))
out.write("\n")
cnt += 1
out.close()
我已使用以下命令运行此脚本。我尝试过通配符和单打。但是我没有工作。
root@h:/home/fan/monas/script/gff_summaries# python summarise_gffs.py /home/fuan/monas/gff_combine/*.gff
但是以下错误不断出现。
Traceback (most recent call last):
File "summarise_gffs.py",line 71,in <module>
files = toks[annot_loc_index].split(",")
NameError: name 'annot_loc_index' is not defined
谢谢
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)