问题描述
我正在编写一个 Python 脚本来从文本文件中提取几个特征。
输入文件具有以下结构:
ENTRY M00001 Pathway Module
NAME Glycolysis (Embden-Meyerhof pathway),glucose => pyruvate
CLASS Pathway modules; Carbohydrate Metabolism; Central carbohydrate Metabolism
PATHWAY map00010 Glycolysis / gluconeogenesis
map01200 Carbon Metabolism
map01100 Metabolic pathways
///
我正在从“ENTRY”字段和“PATHWAY”字段中提取值。但是,当我将内容写入 Postgresql 11.0 表时,我得到的结果如下。列类型为“字符变化”
id map_id
{M00001} {map00010,map01200,map01100}
{M00002} {map00010,map01230,map01100}
{M00003} {map00010,map00020,map01100}
{M00004} {map00030,map01100,map01120}
{M00005} {map00030,map00230,map01100}
cursor = conn.cursor()
dict = {}
with open ('file') as f:
for line in f:
if(re.search("^[A-Z]",line) ):
key,value = re.split("\s+",line,1)
dict[key] = value
elif(re.search("^\s+",line)):
dict[key] = dict[key] + line
elif(re.search("^///",line)):
e = dict['ENTRY']
string = ''.join(e)
id = re.findall(r"(^[A-Za-z+]\d+)",string)
map_id = re.findall(r"(map\d+)\s+.*",dict['PATHWAY'])
cursor.execute("INSERT INTO tbl (id,map_id) VALUES (%s,%s)",(id,map_id))
conn.commit()
conn.close()
cursor.close()
预期输出为:
id map_id
M00001 map00010,map01100
M00002 map00010,map01100
M00003 map00010,map01100
M00004 map00030,map01120
M00005 map00030,map01100
非常感谢任何帮助
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)