转义正则表达式python

问题描述

我正在从以下数据字段中提取id标签：

{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}

当我使用'"id":\s*"(.*?)"'时遇到此字段时，我正在使用的正则表达式中断。

因为，只有某些字段具有这样的额外保留标记：

{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"All clear 2019 \n ","id":"7462764"}

整个文件的格式为：

{"info":[{"purchased_at":"","product_desc":"","id":""}{..}]}

解决方法

您可以导入*** Error in `./a.out': double free or corruption (out): 0x0000000001428c40 *** Aborted库以提取键（json）的所需值，而不是使用正则表达式：

id

更新：如果需要使用与正则表达式相关的方法来查找，那么使用具有适当模式的import json str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}' js = json.loads(str) for i in js: if i == 'id': print(js[i]) >>> 8745485库的search函数可能会有所帮助：

re

只需在findall模块中使用re方法即可提取数据。

import re
line='{"purchased_at":"2020-04-21T05:55:30.000Z","id":"8745485"}'
print(re.findall('"id":\s*"(.*?)"',line))

输出

['8745485']

python python-3.x regex regex-greedy regex-group