问题描述
AC_peg_0698__[locus_tag=Adeh_0700] Bac_export_2
AC_peg_0699__[locus_tag=Adeh_0701] Bac_export_1
AC_peg_0700__[locus_tag=Adeh_0702] Bac_export_3
AC_peg_0701__[locus_tag=Adeh_0703] FliP
AC_peg_0702__[locus_tag=Adeh_0704] FliO
AC_peg_0703__[locus_tag=Adeh_0705] FliMN_C
AC_peg_0704__[locus_tag=Adeh_0706] YscJ_FliF
AC_peg_0705__[locus_tag=Adeh_0707] No_Domain
AC_peg_0706__[locus_tag=Adeh_0708] No_Domain
.
.
.
AC_peg_1378__[locus_tag=Adeh_1382] CheR_N CheR
AC_peg_1379__[locus_tag=Adeh_1383] Response_reg CheB_methylest
AC_peg_1380__[locus_tag=Adeh_1384] MotB_plug OmpA
AC_peg_1381__[locus_tag=Adeh_1385] MotA_ExbB
AC_peg_1382__[locus_tag=Adeh_1386] FlbD
AC_peg_1383__[locus_tag=Adeh_1387] Flg_bb_rod FlaE Flg_bbr_C
AC_peg_1384__[locus_tag=Adeh_1388] FlgD FlgD_ig
我有一个列表,我必须与上面的文件进行比较。此列表中大约有38个元素。
Bac_export_1
Bac_export_2
Bac_export_3
Bac_export_4
ChapFlgA
CheC
FHIPEP
Flg_hook
FlgD
FlgD_ig
Flgi
Flgi
FlgM
FlgN
FlhC
FlhD
FlhE
FliD
FliD
FliG
FliH
FliJ
FliL
FliM
FliN_N
FliO
FliP
FliS
FliT
FliW
FliX
glucosaminidase
MotA_ExbB
MotB_plug
MotY_N
Rod-binding
T3SS_ATPase_C
YscJ_FliF
我想提取所有与列表中任何元素匹配的行。 例如,对于列表中的MotB_plug元素,我想提取与其匹配的所有行并为其创建一个单独的文件。我想要列表中每个元素的单独文件。
最终文件看起来像这样:
ES114__peg_0708___VF_0715 MotB_plug OmpA
ES114__peg_2773___VF_A0187 MotB_plug OmpA
MJ11__peg_0701___VFMJ11_RS10770 MotB_plug OmpA
MJ11__peg_2789___VFMJ11_RS01075 MotB_plug OmpA
SR5__peg_0707___VFSR5_RS03610 MotB_plug OmpA
SR5__peg_2770___VFSR5_RS14190 MotB_plug OmpA
LFI1238__peg_0951___VSAL_RS05180 MotB_plug OmpA
解决方法
也许效率不是很高,但是可以做到:
with open("protein_domain.txt") as f:
lst = list()
for el in f:
lst.append(el.strip())
for pattern in lst:
with open(f"{pattern}.txt","w") as new_file:
with open("data.txt") as f:
for line in f:
if pattern in line:
new_file.write(line)
要获得更有效但累积的结果:
grep -E "Bac_export_1|Bac_export_2|Bac_export_3|Bac_export_4|ChapFlgA|CheC|FHIPEP|Flg_hook|FlgD|FlgD_ig|FlgI|FlgI|FlgM|FlgN|FlhC|FlhD|FlhE|FliD|FliD|FliG|FliH|FliJ|FliL|FliM|FliN_N|FliO|FliP|FliS|FliT|FliW|FliX|Glucosaminidase|MotA_ExbB|MotB_plug|MotY_N|Rod-binding|T3SS_ATPase_C|YscJ_FliF" data.txt > output.txt