问题描述
有一个关于 Rosalind 的开放阅读框架练习,我得到的结果与示例任务中得到的结果不同。可以在 here 中找到练习说明。
gencode = {"GCT": "A","GCC": "A","GCA": "A","GCG": "A","TGT": "C","TGC": "C","GAT": "D","GAC": "D","GAA": "E","GAG": "E","TTT": "F","TTC": "F","GGT": "G","GGC": "G","GGA": "G","GGG": "G","CAT": "H","CAC": "H","ATA": "I","ATT": "I","ATC": "I","AAA": "K","AAG": "K","TTA": "L","TTG": "L","CTT": "L","CTC": "L","CTA": "L","CTG": "L","ATG": "M","AAT": "N","AAC": "N","CCT": "P","CCC": "P","CCA": "P","CCG": "P","CAA": "Q","CAG": "Q","CGT": "R","CGC": "R","CGA": "R","CGG": "R","AGA": "R","AGG": "R","TCT": "S","TCC": "S","TCA": "S","TCG": "S","AGT": "S","AGC": "S","ACT": "T","ACC": "T","ACA": "T","ACG": "T","GTT": "V","GTC": "V","GTA": "V","GTG": "V","TGG": "W","TAT": "Y","TAC": "Y","TAA": "_","TAG": "_","TGA": "_"}
seq = 'AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG'
rev_seq = seq[::-1]
def get_orf_proteins(seq):
proteins=[]
for i in range(len(seq)-2):
if gencode[seq[i:i+3]] == 'M':
print(i)
prot = ''
k = i
while gencode[seq[k:k+3]] != '_' and k < len(seq)-3:
prot += gencode[seq[k:k+3]]
k += 3
proteins.append(prot)
return(list(set(proteins)))
print(get_orf_proteins(seq))
print(get_orf_proteins(rev_seq))
返回以下蛋白质序列:
['MGMTPRLGLEsllE','MTPRLGLEsllE','M','MIRVAS']
['MY','MSLVSPNKVFSEIRFSAPVGVHWTQSMY']
我是否遗漏了什么,或者示例解决方案不正确?
解决方法
DNA 串的反向补码不仅仅是反向的 DNA 串。