如何将这个AWK函数放入for循环中以提取列?

问题描述

我有数十个文件(例如fA.txt,fB.txt和fc.txt),并希望得到fALL.txt中所示的输出

fA.txt:  
id    V   W   X   Y   Z  
a     1   2   4   8   16   
b     3   6   13  17  18  
c     5   1   20  4   8  
   
fB.txt:  
id    F    G   H   J   K  
a     2    5   9   7   12  
b     4    9   12  3   19  
c     6    13  2   40  7  
 
fC.txt:  
id    L    M   N    O  P  
a     7    2   19   8  16  
b     8    6   12  23  47  
c     91   11  15  19  80  

desired output
  
fALL.txt:
  
id    fA_V   fB_F   fC_L    
a     1      2      7    
b     3      4      8    
c     5      6      91    

id    fA_W   fB_G   fC_M  
a     2      5      2  
b     6      9      6  
c     1      13     11  

id    fA_X   fB_H   fC_N  
a     4      9      19  
b     13     12     12  
c     20     2      15  

id    fA_Y  fB_J   fC_O  
a     8      7      8  
b     17     3      23  
c     4      40     19  

id    fA_Z  fB_K   fC_P  
a     16     12     16  
b     18     19     47  
c     8      7      80  

我在此站点上看到了以下AWK代码,该代码适用于只有两列的输入文件'NR==FNR{a[FNR]=$0; next} {a[FNR] = a[FNR] OFS $2} END{for (i=1;i<=FNR;i++) print a[i]}' file1 file2 file3 就我而言,我对上述内容进行了如下修改,并适用于提取第二列: 'NR==FNR{a[FNR]=$1 OFS $2; next} {a[FNR] = a[FNR] OFS $2} END{for (i=1; i<=FNR; i++) print a[i]}' file1 file2 file3 我尝试将以上内容放入for循环中以提取后续列,但未成功。任何有用的提示将不胜感激。 所需输出中的第一数据块是每个输入文件中的第二列,其标题文件名和相应输入文件中的列标题串联在一起。随后的块是每个输入文件的第三,第四,第五列。

解决方法

gnu awk应该适合您。

cat tab.awk
BEGIN {
   OFS = "\t"
   for (i=1; i<=ARGC; ++i) {
      fn = ARGV[i]
      sub(/\.[^.]+$/,"_",fn)
      fhdr[i] = fn
   }
}
!seen[$1]++ {
   keys[++k] = $1
}
{
   for (i=2; i<=NF; ++i)
      map[$1][i] = map[$1][i] (map[$1][i] == "" ? "" : OFS) (FNR == 1 ? fhdr[ARGIND] : "") $i
}
END {
   for (i=2; i<=NF; ++i) {
      for (j=1; j<=k; j++) {
         key = keys[j]
         print key,map[key][i]
      }
      print ""
   }
}

然后将其用作:

awk -f tab.awk f{A,B,C}.txt
id  fA_V  fB_F  fC_L
a   1     2     7
b   3     4     8
c   5     6     91

id  fA_W  fB_G  fC_M
a   2     5     2
b   6     9     6
c   1     13    11

id  fA_X  fB_H  fC_N
a   4     9     19
b   13    12    12
c   20    2     15

id  fA_Y  fB_J  fC_O
a   8     7     8
b   17    3     23
c   4     40    19

id  fA_Z  fB_K  fC_P
a   16    12    16
b   18    19    47
c   8     7     80


说明:

 BEGIN {
   OFS = "\t"                   # Use output field separator as tab
   for (i=1; i<=ARGC; ++i) {    # for each filename in input
      fn = ARGV[i]
      sub(/\.[^.]+$/,fn)  # remove anything after dot with a _
      fhdr[i] = fn              # and save it in fhdr associative array
   }
}
!seen[$1]++ {                   # if this id is not found in seen array
   keys[++k] = $1               # store in seen and in keys array by index
}
{
   for (i=2; i<=NF; ++i)        # for each field starting from 2nd column
      map[$1][i] = map[$1][i] (map[$1][i] == "" ? "" : OFS) (FNR == 1 ? fhdr[ARGIND] : "") $i
   # build 2 dimensional array map where key is $1,i and value is column  value
   # for 1st record prefix column value with part filename stored in fhdr array
   # we keep appending value in this array with OFS delimiter
}
END {                           # do this in the end
   for (i=2; i<=NF; ++i) {      # for each column position from 2 onwards
      for (j=1; j<=k; j++) {    # for each id stored in keys array 
         key = keys[j]
         print key,map[key][i] # print id and value text built above
      }
      print ""                  # print a line break
   }
}