如果行不是以UNIX Shell中的时间戳开头的,则将其连接到上一行

问题描述

我有一个输出带有时间戳前缀的日志的工具,但是日志条目可能包含换行符。我想将没有时间戳的任何行与上一行合并。

示例:

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

使用awk,我可以执行以下操作来合并不以[大括号开头的行]

awk -v RS="[" 'NR>1{$1=$1; print RS,$0}'

但是,您可以在上面的“扭曲”行上看到失败的地方。 “扭曲”行以[开头,这不是时间戳的一部分。

有没有办法为该时间戳前缀使用正则表达式?还是有一个更好的命令行工具来完成此任务?

解决方法

您可以尝试按照https://ideone.com/PXVCh2网站上显示的示例进行笔试和测试吗

awk '
{
  printf("%s%s",$0~/^\[ [0-9]{4}\/[0-9]{2}\/[0-9]{2}/\
          ?(FNR!=1?ORS:""):OFS,$0)
}
END{ print "" }
' Input_file

根据Ed先生的评论,添加了一条print新行语句,以在Input_file的最后添加新行,以防万一它已经可以省略该部分了。

注意:我已经在手机上写了这个;抱歉,我无法判断它在大屏幕上的显示效果如何,所以我在这里将一行打印内容分为两行

,

在我看来,您真正的问题实际上是您带引号的字符串可以包含换行符,因此,这种GNU awk解决方案(用于多字符RS)用于查找带引号的字符串可能比在开始时查找时间戳更健壮。行:

$ awk -v RS='"[^"]*"' '{gsub("\n"," ",RT); ORS=RT} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

如果您引用的字符串可以包含一个可能出现在行首的时间戳,则这比检查以时间戳开头的行更好。 (请注意"four lines with a twist...块中的时间戳记):

$ cat file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist
[ 2020/08/12 11:40] to confuse you
repeatedly and
in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"

$ awk -v RS='"[^"]*"' '{ORS=gensub("\n","g",RT)} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist [ 2020/08/12 11:40] to confuse you repeatedly and in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"
,

假设日志包含您的示例文件:

$ cat log

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

以下代码检查双引号(“)的数目,如果只找到一个双引号,则将两行连接起来:

$ gawk 'gsub("\"","\"") == 1 {x=$0; getline; print x " " $0;} gsub("\"","\"") == 2 {print}' log

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...