如何从python中的列表创建多个子类?

问题描述

我试图以分层的方式获取数据,所以我决定转向子类。我从中获取数据的文件的格式如下:

2WQZ_chain_A
score = 338.0
53-164
208-317
327-595
611-654

2WQZ_chain_B
score = 344.0
53-164
205-317
327-595
611-655

2XB6_chain_A
score = 319.0
64-163
211-317
327-596
613-654

2XB6_chain_B
score = 329.0
53-163
212-317
327-596
613-654

我想获得的是第一类,称为PDB名称(即2WQZ),其子类名为chain_A,chain_B等。这些子类应包含一个名为“ score”的对象,以及一个包含可能的间隔的名为“ intervals”的第三个子类。总体思路类似于this

此刻,我尝试使用字典,但最终得到了正确的PDB类,但是只有第二条链,我的代码

class PDB(object):
    def __init__(self,pdbname):
        self.pdbid = pdbname

class Chain(PDB):
    def __init__(self,chainame,score,pdbname):
        self.chainid = chainame
        self.score = score
        super().__init__(pdbname)



making_class = open("covered_intervals.txt","r").readlines()

pdblist = []

for i in making_class:
    if "chain" in i:
        pdblist.append(i[:4])

pdblist = list(dict.fromkeys(pdblist))
pdblist2 = dict.fromkeys(pdblist)

for i in pdblist:
    pdblist2[i] = PDB(i)
    for j in making_class:
        if i in j:
            chainame = j[5:12]
            pdblist2[i] = Chain(chainame,4,i)

4是一个占位符,我明白为什么只得到最后一个链,却不知道如何在同一PDB下获得两个链。

解决方法

首先,我建议创建一些东西,可以将文件中的一个文本块解析为可用变量,例如:

def parse_block(lines):
    pdb_name = lines[0][:4]
    chain = lines[0][5:]
    score = lines[1].split("=")[1].strip()
    intervals = lines[2:]
    return (pdb_name,chain,score,intervals)

使用此方法,您可以构建您的类,或使用嵌套的字典,这也非常适合数据结构。

from collections import defaultdict

with open("pdbdata","r") as f:
    content = f.read()

pdb_dict = defaultdict(dict)

for block in content.split("\n\n"):
    pdb_name,intervals = parse_block(block.splitlines())
    pdb_dict[pdb_name][chain] = {"score": score,"intervals": intervals}

生成的嵌套字典看起来像这样:

{'2WQZ': {'chain_A': {'intervals': ['53-164','208-317','327-595','611-654'],'score': '338.0'},'chain_B': {'intervals': ['53-164','205-317','611-655'],'score': '344.0'}},'2XB6': {'chain_A': {'intervals': ['64-163','211-317','327-596','613-654'],'score': '319.0'},'chain_B': {'intervals': ['53-163','212-317','score': '329.0'}}}
,

在这种情况下,可以为顶级节点创建字典,并且由于它是固定深度树,因此不需要类嵌套。链类将包含三个组成部分

  1. 链名
  2. 得分
  3. 范围列表-我已经为范围实现了一个类
class Chain():
    def __init__(self,chainame,score=None):
        self.chainid = chainame
        self.score = score
        self.ranges=[]

    def add_range(self,range):
        self.ranges.append(range)

    def add_score(self,score):
        self.score = score


class range1():
    def __init__(self,str):
        x = str.split("-")
        self.start = int(x[0])
        self.end = int(x[1])

counter = 0
pdb = ""
ch = None
data = {}

with open("covered_intervals.txt","r") as f:
    line = f.readline()
    while line:
        line = line.strip()
        if line.strip()=="":
            counter=0
            x = data.get(pdb,[])
            x.append(ch)
            data[pdb] = x
        elif counter==0:
            x = line.split("_",1)
            pdb = x[0]
            chainname = x[1]
            ch = Chain(chainname)
            counter  = counter +1
        elif counter==1:
            ch.add_score(float(line.split("=")[1]))
            counter = counter +1
        else:
            ch.add_range(range1(line))
        line = f.readline()

if counter != 0:
    x = data.get(pdb,[])
    x.append(ch)
    data[pdb] = x