如何使用 Python3 中的 biopython 库从 TB 测序 fasta 文件中找到反向基因如 pncA的突变？

问题描述

要找到类似 S104R 的突变（吡嗪酰胺从 2288681 到 2289241），我们必须首先删除“-”（用于剥离 fasta 文件中存在的插入/删除），然后对其进行反向补充，然后寻找指定密码子编号的特定突变（这里是 104）。我已经使用基本的字符串函数找到了答案，但如果 biopython 库可能的话，我想要更干净和简单。

解决方法

所以下面的代码对我来说很好用：

from Bio import SeqIO
sample_file=SeqIO.parse('fasta_file_location','fasta') // there are two items in sample_file(reference and patient sequence)

ref=str(sample_file[0].seq).replace('-','')[2288681:2289241].replace('A','t').replace('T','a').replace('C','g').replace('G','c')[::-1].upper()[(104-1)*3:(104-1)*3+3]
pat=str(sample_file[1].seq).replace('-','c')[::-1].upper()[(104-1)*3:(104-1)*3+3]

print("ref: ",ref,"pat: ",pat)  // output-> ref: AGC,pat: CGG

但下面的代码对我不起作用：

ref=sample_file[0].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]
pat=sample_file[1].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]

最好有更方便的方法，因为后者使用 biopython 函数，所以如果你知道如何使它更好，请帮助。