从文件中删除与另一个文件的空白行相对应的行

问题描述

我有两个文件，行和列的数量相同。以;分隔。例子；

file_a：

1;1;1;1;1
2;2;2;2;2
3;3;3;3;3
4;4;4;4;4

file_b：

A;A;A;A;A
B;B;;;B
;;;;
D;D;D;D;D

忽略定界符，file_b中的第3行为空。所以我也想在命令之前从file_a中删除第3行；

paste -d ';' file_a file_b。

为了获得这样的输出：

1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D

编辑：列数为93，每行和两个文件的列数均相同，因此两个文件的行和列矩阵完全相同。

解决方法

请您尝试按照GNU awk中的示例进行跟踪，编写和测试。

awk '
BEGIN{
  FS=OFS=";"
}
FNR==NR{
  arr[FNR]=$0
  next
}
!/^;+$/{
  print arr[FNR],$0
}
' file_a file_b

说明： 添加以上详细说明。

awk '                 ##Starting awk program from here.
BEGIN{                ##Starting BEGIN section from here.
  FS=OFS=";"          ##Setting field separator and output field separator as ; here.
}
FNR==NR{              ##Checking condition if FNR==NR which will be TRUE when file_a is being read.
  arr[FNR]=$0         ##Creating arr with index FNR and value is current line.
  next                ##next will skip all further statements from here.
}
!/^;+$/{              ##Checking condition if line NOT starting from ; till end then do following.
  print arr[FNR],$0   ##Printing arr with index of FNR and current line.
}
' file_a file_b       ##Mentioning Input_file names here.

由于您提到两个文件的行数相同，因此getline将适合此处：

$ awk '(getline line < "f2")==1 && line ~ /[^;]/' f1
1;1;1;1;1
2;2;2;2;2
4;4;4;4;4

您还可以在paste中执行awk功能：

$ awk '(getline line < "f2")==1 && line ~ /[^;]/{print $0 ";" line}' f1
1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D

如果成功读取了行，则getline的返回值为1。 line ~ /[^;]检查行是否包含任何非;字符。如果两个条件都满足，则可以打印所需的结果。

基本上是对@ RavinderSingh13解决方案的修改，但我只存储空记录的NR：

$ awk '
NR==FNR {            # process the b file
    if($0~/^;+$/)    # when empty record met
        a[NR]        # hash the record number NR
    next
}
!(FNR in a)          # print non-empty matches of a file
' fileb filea

输出：

1;1;1;1;1
2;2;2;2;2
4;4;4;4;4

在之后 paste进行过滤比较容易。假设要排除的输入行的格式与问题中显示的完全一样，则可以使用固定在行尾的paste模式来过滤grep的输出。（该行的末尾有5个空字段）

paste -d ';' file_a file_b | grep -v ';;;;;$'

使用问题中显示的输入文件，它会精确打印出所需的输出。

修改：
为了满足注释中的其他要求，可以修改grep命令以指定与空列数相对应的分号数。对于不同的输入文件，只需相应地更改数字5。

paste -d ';' file_a file_b | grep -v ';\{5\}$'

如果问题中现在指定的列数为93，则命令为

paste -d ';' file_a file_b | grep -v ';\{93\}$'

Edit2：
您还可以从file_b

的第一行中获取所需数量的分号

SEMICOLONS=$(head -1 file_b | sed 's/[^;]*//g')
paste -d ';' file_a file_b | grep -v ";$SEMICOLONS"'$'

或组合为

paste -d ';' file_a file_b | grep -v ';'$(head -1 file_b | sed 's/[^;]*//g')'$'

awk blank-line paste