从python中的fasta文件之间找到主题

问题描述

有人可以帮我处理这个 python 代码吗?当我运行它时,什么也没有发生。对我来说没有错误或任何奇怪的东西。它读入并打开文件就好了。 我有一组 Fasta 格式的蛋白质序列,我必须找到我的序列的基序 像“RRTxSKxxxxAxxRxG”我必须找到一个写x的序列

这是我的python代码

import re
    userinput = input("Please provide a FASTA file.")
    while userinput:
    try:
        if userinput == "0":
            break
        with open(userinput,mode = 'r') as protein:
            readprotein = protein.read()
        matches = re.findall('RTxSKxxxxAxxRxG',readprotein)
        for match in matches:
            print(match)
        break
    except FileNotFoundError:
        print("File not found. enter the fasta file.")
        userinput = input("Please provide a FASTA file. 0 to quit.")

解决方法

我的输入为 fasta.fasta:

>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG','RTGSKWFGRAGGRXG','RTGSKPPPPAGGRXG']
['RTXSKXXXXAXXRXG']
['RTGSKXXXXAGGRXG']
[]
['RTGSKLLLLAGGRSG','RTGSKPPPPAGGRXG']

将您的代码修改为:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 12 14:48:00 2021

@author: Pietro


https://stackoverflow.com/questions/67948483/find-motif-from-in-between-fasta-file-from-python


"""

import re


# userinput = input("Please provide a FASTA file.")

userinput = 'fasta.fasta'


pattern = re.compile(r"(RT[A-Z]SK[A-Z]{4}A[A-Z]{2}R[A-Z]G)")

matchz = []
while userinput:
    try:
        if userinput == "0":
            break
        with open(userinput,mode = 'r') as protein:
            for line in protein:  #memory efficient way
            #readprotein = protein.readlines()
            #for line in readprotein:
                # print(line)
                line = line.upper().strip("\n")
                if line.startswith('>'):
                    name=line
                else:
                    matches = re.findall(pattern,line)
                    print(name,matches)
                    matchz.append(matches)
        for match in matchz:
            print(match)
        break
    except FileNotFoundError:
        print("File not found. enter the fasta file.")
        userinput = input("Please provide a FASTA file. 0 to quit.")

输出为:

>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG','RTGSKPPPPAGGRXG']