修改Genbank文件

问题描述

嗨,我正尝试在文件搜索特定的单词列表。如果找到了这些单词之一,我想在其下添加换行符并添加该短语\ colour = 1(我不想删除我要搜索的原始单词)。

An extract of the file for context and format:
LOCUS       contig_2_pilon_pilon 5558986 bp    DNA     linear   BCT 16-JUN-2020
DEFinitioN  Escherichia coli O157:H7 strain (270078)
ACCESSION   
VERSION
KEYWORDS    .
SOURCE      Escherichia coli 270078
  ORGANISM  Escherichia coli 270078
            Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
            Escherichia.
COMMENT     Annotated using prokka 1.14.6 from
            https://github.com/tseemann/prokka.
FEATURES             Location/Qualifiers
     source          1..5558986
                     /organism="Escherichia coli 270078"
                     /mol_type="genomic DNA"
                     /strain="strain"
                     /db_xref="taxon:562"
     CDS             61523..61744
                     /gene="pspD"
                     /locus_tag="JCCJNNLA_00057"
                     /inference="ab initio prediction:Prodigal:002006"
                     /inference="similar to AA sequence:RefSeq:EG10779-MONOMER"
                     /codon_start=1
                     /transl_table=11
                     /product="peripheral inner membrane heat-shock protein"
                     /translation="MNTRWQQAGQKVKPGFKLAGKLVLLTALRYGPAGVAGWAIKSVA
                     RRPLKMLLAVALEPLLSRAANKLAQRYKR"

以下是我在整个文件中寻找的单词列表之一:

regulation_list=["anti-repressor","anti-termination","antirepressor","antitermination","antiterminator","anti-terminator","cold-shock","cold shock","heat-shock","heat shock","regulation","regulator","regulatory","helicase","antibiotic resistance","repressor","zinc","sensor","dipeptidase","deacetylase","5-dehydrogenase","glucosamine kinase","glucosamine-kinase","dna-binding","dna binding","methylase","sulfurtransferase","acetyltransferase","control","ATP-binding","ATP binding","Cro","Ren protein","CII","inhibitor","activator","derepression","protein Sxy","sensing","Tir chaperone","Tir-cytoskeleton","Tir cytoskeleton","Tir protein","EspD"]

如您所见,摘录包含我正在寻找的一种ephrases,我想在其下方添加一个短语为/colour = 1的换行符

任何帮助都会很棒!

解决方法

# Create simple input file for testing:
cat > foo.txt <<EOF
foo
foo anti-termination
bar anti-repressor anti-termination
baz
EOF

python -c '
import re

# Using a shortened version of your list:
regulation_list=["anti-repressor","anti-termination","etc"]

# For speed and simplicity,compile the regular expression once,the reuse it later:
regulation_re = re.compile("|".join(regulation_list))

with open("foo.txt","r") as in_file:
    for line in in_file:
        line = line.strip()
        print(line)
        if re.search(regulation_re,line):
           print("/colour = 1")
' > bar.txt

cat bar.txt

打印:

foo
foo anti-termination
/colour = 1
bar anti-repressor anti-termination
/colour = 1
baz

您可能想在/colour=1字符串中添加额外的换行符和多余的空格以进行对齐(您的问题尚不清楚),就像这样:

print("\n                     /colour = 1")