awk,按行模式重新组合

问题描述

| 我有这个输入文件
     1.00 3 4
     93.00 2 3
     105.00 0 2
     119.00 0 2
     122.00 1 4
     202.00 1 3
     207.00 1 2
     210.00 1 4
     236.00 0 1
     237.00 0 4
     237.00 0 2
     240.00 1 3
     243.00 2 3
     243.00 3 4
     243.00 0 3
     275.00 0 4
     275.00 2 4
     353.00 0 3
     361.00 1 4
     411.00 0 1
     412.00 1 3
     425.00 0 3
     426.00 0 4
     455.00 1 4
     464.00 0 3
     520.00 0 4
     560.00 1 3
     561.00 1 4
     581.00 0 2
我想像这样作为输出 并计算这些信息
    field1 field2 nbrepeated time1 time2  time3   time4
    3      4      1          1.00  243.0  0       0
    2      3      1          93.0  243.0  0       0
    0      2      2          93.00 119.00 237.00  581.00 
    :      :      :          :     :      :       :
    :      :      :          :     :      :       :
    :      :      :          :     :      :       :




    <field1> <field2> <nbrepeated> <time1> <time2> <time3> <time4> are columns
    

解决方法

        Perl版本:
use strict;
use warnings;

my %data;
while (my $line = <DATA>) {
    chomp($line);
    my @row = split(/\\s/,$line);
    my $key = $row[1] . $row[2];
    push @{$data{$key}},$row[0];
}

my $max = 0;
for my $key (keys %data) {
    if (scalar @{$data{$key}} > $max) {
        $max = scalar @{$data{$key}};
    }
}

{
    my @times;
    push @times,\"time\" . $_ for (1 .. $max);
    myFormat(\"field1\",\"field2\",\"nbrepeated\",@times);
}

for my $key (keys %data) {
    my ($f1,$f2) = split (//,$key);
    my $nr = $#{$data{$key}};
    my @times = @{$data{$key}};
    for (my $i = 0; $i < $max; $i++) {
        if (! defined $times[$i] ) {
            $times[$i] = 0;
        }
    }
    myFormat($f1,$f2,$nr,@times);
}

sub myFormat {
    printf \"%-8s %-8s %-12s %-8s \",shift,shift;
    for my $line (@_) {
        printf \"%-8s \",$line;
    }
    print \"\\n\";
}

__DATA__
1.00 3 4
93.00 2 3
105.00 0 2
119.00 0 2
122.00 1 4
202.00 1 3
207.00 1 2
210.00 1 4
236.00 0 1
237.00 0 4
237.00 0 2
240.00 1 3
243.00 2 3
243.00 3 4
243.00 0 3
275.00 0 4
275.00 2 4
353.00 0 3
361.00 1 4
411.00 0 1
412.00 1 3
425.00 0 3
426.00 0 4
455.00 1 4
464.00 0 3
520.00 0 4
560.00 1 3
561.00 1 4
581.00 0 2
产生输出:
field1   field2   nbrepeated   time1    time2    time3    time4    time5
0        1        1            236.00   411.00   0        0        0
0        4        3            237.00   275.00   426.00   520.00   0
1        2        0            207.00   0        0        0        0
1        4        4            122.00   210.00   361.00   455.00   561.00
0        2        3            105.00   119.00   237.00   581.00   0
3        4        1            1.00     243.00   0        0        0
0        3        3            243.00   353.00   425.00   464.00   0
2        4        0            275.00   0        0        0        0
2        3        1            93.00    243.00   0        0        0
1        3        3            202.00   240.00   412.00   560.00   0
输出未排序。如果指定要对其进行排序的方式,则对它进行排序将没有问题。     ,        您可以使用Bash4关联数组使用Shell脚本轻松完成此操作:
#!/bin/bash

declare -A times

#create an associative array containing the times
#for each combination of field1,field2
while read line
do
  time=$(echo $line | cut -d\' \' -f1)
  key=$(echo $line | cut -d\' \' -f2,3)
  times[\"$key\"]=\"${times[$key]} $time\"
done < data.txt

#print header
echo \"field1 field2 nbrepeated time1 time2 time3 time4\"

#iterate over the associative array and print
for key in \"${!times[@]}\"
do
    data=($(echo ${times[$key]}))
    reps=$((${#data[@]}-1))
    #if there are fewer than 4 time entries,add zeros
    while [ ${#data[@]} -lt 4 ]
    do
        data[${#data[@]}]=0
    done

    echo \"$key $reps ${data[@]}\"
done
输出量
field1 field2 nbrepeated time1 time2 time3 time4
1 3 3 202.00 240.00 412.00 560.00
1 2 0 207.00 0 0 0
1 4 4 122.00 210.00 361.00 455.00 561.00
2 3 1 93.00 243.00 0 0
2 4 0 275.00 0 0 0
0 1 1 236.00 411.00 0 0
0 2 3 105.00 119.00 237.00 581.00
0 3 3 243.00 353.00 425.00 464.00
0 4 3 237.00 275.00 426.00 520.00
3 4 1 1.00 243.00 0 0
    ,        bash版本
declare -A t

while read tm f1 f2; do
    t[\"$f1:$f2\"]+=\" $tm\"
done < times.txt

max=0
for key in \"${!t[@]}\"; do
    set -- ${t[$key]}
    [[ $# -gt $max ]] && max=$#
done

{
    printf \"field1 field2 nbrepeated\"
    for i in $(seq $max); do printf \" %s\" time$i; done
    echo \' \"avg\"\'

    for key in \"${!t[@]}\"; do
        f1=${key%:*}
        f2=${key#*:}
        set -- ${t[$key]}
        printf \"%d %d %d\" $f1 $f2 $(($# - 1))
        for i in $(seq $max); do
            printf \" %.1f\" ${1-0}
            shift
        done

        # calculate average
        set -- ${t[$key]}
        n=$(( $# - 1 ))
        if [[ $n -eq 0 ]]; then
            avg=$1
        else
            prev=$1
            shift
            total=\"0\"
            while [[ $# -gt 0 ]]; do
                total=\"$total + ($1 - $prev)\"
                prev=$1
                shift
            done
            avg=$( echo \"scale=1; ($total)/$n\" | bc )
        fi
        printf \" %.1f\\n\" $avg
    done
} | column -t
产生这个输出
field1  field2  nbrepeated  time1  time2  time3  time4  time5  \"avg\"
2       4       0           275.0  0.0    0.0    0.0    0.0    275.0
2       3       1           93.0   243.0  0.0    0.0    0.0    150.0
1       3       3           202.0  240.0  412.0  560.0  0.0    119.3
1       2       0           207.0  0.0    0.0    0.0    0.0    207.0
0       4       3           237.0  275.0  426.0  520.0  0.0    94.3
0       2       3           105.0  119.0  237.0  581.0  0.0    158.6
0       3       3           243.0  353.0  425.0  464.0  0.0    73.6
0       1       1           236.0  411.0  0.0    0.0    0.0    175.0
1       4       4           122.0  210.0  361.0  455.0  561.0  109.7
3       4       1           1.0    243.0  0.0    0.0    0.0    242.0
    

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...