问题描述
{"cdate":"2020-09-01T23:11:46-02:00","email":"[email protected]","phone":"+66 9988 1234","firstName":"John","lastName":"Smith","orgid":"0","orgname":"","segmentio_id":"","bounced_hard":"0","bounced_soft":"0","bounced_date":"0000-00-00","ip":"1234567","ua":"","hash":"jfepfjepjfewfe87","socialdata_lastcheck":"0000-00-00 00:00:00","email_local":"","email_domain":"","sentcnt":"27","rating_tstamp":"2019-09-22","gravatar":"1","deleted":"0","anonymized":"0","adate":"2020-08-21T04:11:09-05:00","udate":"2020-02-01T21:01:21-06:00",
此文本全部一行。
我想提取三个值:“电子邮件”,“名字”和“姓氏”。我使用了cut -d ":" -f 6,8,9
。
这提供:
"[email protected]","phone":"John","orgid"
。然后我可以清理它。
问题是我在文件中有数百个相似的条目,并且它们并非以相同的方式隔开。所以我不能说切割的下一个用途应该是+50(或其他)。
我看过grep,但是我不知道如何实现我的目标。理想情况下,我要提取:
[email protected] 约翰 史密斯
我不在乎它是一行还是三行。
谢谢!
解决方法
也许最好的方法是使用一些json解析器来利用这种格式。但是,出于乐趣,这可能会起作用:
grep -o '"\(email\|firstName\|lastName\)":"[^"]*"' input_file
Worth checking out用于适当的工具。