问题描述
嗨,我正尝试在文件中搜索特定的单词列表。如果找到了这些单词之一,我想在其下添加换行符并添加该短语\ colour = 1(我不想删除我要搜索的原始单词)。
An extract of the file for context and format: LOCUS contig_2_pilon_pilon 5558986 bp DNA linear BCT 16-JUN-2020 DEFinitioN Escherichia coli O157:H7 strain (270078) ACCESSION VERSION KEYWORDS . SOURCE Escherichia coli 270078 ORGANISM Escherichia coli 270078 Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae; Escherichia. COMMENT Annotated using prokka 1.14.6 from https://github.com/tseemann/prokka. FEATURES Location/Qualifiers source 1..5558986 /organism="Escherichia coli 270078" /mol_type="genomic DNA" /strain="strain" /db_xref="taxon:562" CDS 61523..61744 /gene="pspD" /locus_tag="JCCJNNLA_00057" /inference="ab initio prediction:Prodigal:002006" /inference="similar to AA sequence:RefSeq:EG10779-MONOMER" /codon_start=1 /transl_table=11 /product="peripheral inner membrane heat-shock protein" /translation="MNTRWQQAGQKVKPGFKLAGKLVLLTALRYGPAGVAGWAIKSVA RRPLKMLLAVALEPLLSRAANKLAQRYKR"
以下是我在整个文件中寻找的单词列表之一:
regulation_list=["anti-repressor","anti-termination","antirepressor","antitermination","antiterminator","anti-terminator","cold-shock","cold shock","heat-shock","heat shock","regulation","regulator","regulatory","helicase","antibiotic resistance","repressor","zinc","sensor","dipeptidase","deacetylase","5-dehydrogenase","glucosamine kinase","glucosamine-kinase","dna-binding","dna binding","methylase","sulfurtransferase","acetyltransferase","control","ATP-binding","ATP binding","Cro","Ren protein","CII","inhibitor","activator","derepression","protein Sxy","sensing","Tir chaperone","Tir-cytoskeleton","Tir cytoskeleton","Tir protein","EspD"]
如您所见,摘录包含我正在寻找的一种ephrases,我想在其下方添加一个短语为/colour = 1
的换行符
任何帮助都会很棒!
解决方法
# Create simple input file for testing:
cat > foo.txt <<EOF
foo
foo anti-termination
bar anti-repressor anti-termination
baz
EOF
python -c '
import re
# Using a shortened version of your list:
regulation_list=["anti-repressor","anti-termination","etc"]
# For speed and simplicity,compile the regular expression once,the reuse it later:
regulation_re = re.compile("|".join(regulation_list))
with open("foo.txt","r") as in_file:
for line in in_file:
line = line.strip()
print(line)
if re.search(regulation_re,line):
print("/colour = 1")
' > bar.txt
cat bar.txt
打印:
foo
foo anti-termination
/colour = 1
bar anti-repressor anti-termination
/colour = 1
baz
您可能想在/colour=1
字符串中添加额外的换行符和多余的空格以进行对齐(您的问题尚不清楚),就像这样:
print("\n /colour = 1")