问题描述
|
我有这个输入文件
1.00 3 4
93.00 2 3
105.00 0 2
119.00 0 2
122.00 1 4
202.00 1 3
207.00 1 2
210.00 1 4
236.00 0 1
237.00 0 4
237.00 0 2
240.00 1 3
243.00 2 3
243.00 3 4
243.00 0 3
275.00 0 4
275.00 2 4
353.00 0 3
361.00 1 4
411.00 0 1
412.00 1 3
425.00 0 3
426.00 0 4
455.00 1 4
464.00 0 3
520.00 0 4
560.00 1 3
561.00 1 4
581.00 0 2
我想像这样作为输出
并计算这些信息
field1 field2 nbrepeated time1 time2 time3 time4
3 4 1 1.00 243.0 0 0
2 3 1 93.0 243.0 0 0
0 2 2 93.00 119.00 237.00 581.00
: : : : : : :
: : : : : : :
: : : : : : :
<field1> <field2> <nbrepeated> <time1> <time2> <time3> <time4> are columns
解决方法
Perl版本:
use strict;
use warnings;
my %data;
while (my $line = <DATA>) {
chomp($line);
my @row = split(/\\s/,$line);
my $key = $row[1] . $row[2];
push @{$data{$key}},$row[0];
}
my $max = 0;
for my $key (keys %data) {
if (scalar @{$data{$key}} > $max) {
$max = scalar @{$data{$key}};
}
}
{
my @times;
push @times,\"time\" . $_ for (1 .. $max);
myFormat(\"field1\",\"field2\",\"nbrepeated\",@times);
}
for my $key (keys %data) {
my ($f1,$f2) = split (//,$key);
my $nr = $#{$data{$key}};
my @times = @{$data{$key}};
for (my $i = 0; $i < $max; $i++) {
if (! defined $times[$i] ) {
$times[$i] = 0;
}
}
myFormat($f1,$f2,$nr,@times);
}
sub myFormat {
printf \"%-8s %-8s %-12s %-8s \",shift,shift;
for my $line (@_) {
printf \"%-8s \",$line;
}
print \"\\n\";
}
__DATA__
1.00 3 4
93.00 2 3
105.00 0 2
119.00 0 2
122.00 1 4
202.00 1 3
207.00 1 2
210.00 1 4
236.00 0 1
237.00 0 4
237.00 0 2
240.00 1 3
243.00 2 3
243.00 3 4
243.00 0 3
275.00 0 4
275.00 2 4
353.00 0 3
361.00 1 4
411.00 0 1
412.00 1 3
425.00 0 3
426.00 0 4
455.00 1 4
464.00 0 3
520.00 0 4
560.00 1 3
561.00 1 4
581.00 0 2
产生输出:
field1 field2 nbrepeated time1 time2 time3 time4 time5
0 1 1 236.00 411.00 0 0 0
0 4 3 237.00 275.00 426.00 520.00 0
1 2 0 207.00 0 0 0 0
1 4 4 122.00 210.00 361.00 455.00 561.00
0 2 3 105.00 119.00 237.00 581.00 0
3 4 1 1.00 243.00 0 0 0
0 3 3 243.00 353.00 425.00 464.00 0
2 4 0 275.00 0 0 0 0
2 3 1 93.00 243.00 0 0 0
1 3 3 202.00 240.00 412.00 560.00 0
输出未排序。如果指定要对其进行排序的方式,则对它进行排序将没有问题。
, 您可以使用Bash4关联数组使用Shell脚本轻松完成此操作:
#!/bin/bash
declare -A times
#create an associative array containing the times
#for each combination of field1,field2
while read line
do
time=$(echo $line | cut -d\' \' -f1)
key=$(echo $line | cut -d\' \' -f2,3)
times[\"$key\"]=\"${times[$key]} $time\"
done < data.txt
#print header
echo \"field1 field2 nbrepeated time1 time2 time3 time4\"
#iterate over the associative array and print
for key in \"${!times[@]}\"
do
data=($(echo ${times[$key]}))
reps=$((${#data[@]}-1))
#if there are fewer than 4 time entries,add zeros
while [ ${#data[@]} -lt 4 ]
do
data[${#data[@]}]=0
done
echo \"$key $reps ${data[@]}\"
done
输出量
field1 field2 nbrepeated time1 time2 time3 time4
1 3 3 202.00 240.00 412.00 560.00
1 2 0 207.00 0 0 0
1 4 4 122.00 210.00 361.00 455.00 561.00
2 3 1 93.00 243.00 0 0
2 4 0 275.00 0 0 0
0 1 1 236.00 411.00 0 0
0 2 3 105.00 119.00 237.00 581.00
0 3 3 243.00 353.00 425.00 464.00
0 4 3 237.00 275.00 426.00 520.00
3 4 1 1.00 243.00 0 0
, bash版本
declare -A t
while read tm f1 f2; do
t[\"$f1:$f2\"]+=\" $tm\"
done < times.txt
max=0
for key in \"${!t[@]}\"; do
set -- ${t[$key]}
[[ $# -gt $max ]] && max=$#
done
{
printf \"field1 field2 nbrepeated\"
for i in $(seq $max); do printf \" %s\" time$i; done
echo \' \"avg\"\'
for key in \"${!t[@]}\"; do
f1=${key%:*}
f2=${key#*:}
set -- ${t[$key]}
printf \"%d %d %d\" $f1 $f2 $(($# - 1))
for i in $(seq $max); do
printf \" %.1f\" ${1-0}
shift
done
# calculate average
set -- ${t[$key]}
n=$(( $# - 1 ))
if [[ $n -eq 0 ]]; then
avg=$1
else
prev=$1
shift
total=\"0\"
while [[ $# -gt 0 ]]; do
total=\"$total + ($1 - $prev)\"
prev=$1
shift
done
avg=$( echo \"scale=1; ($total)/$n\" | bc )
fi
printf \" %.1f\\n\" $avg
done
} | column -t
产生这个输出
field1 field2 nbrepeated time1 time2 time3 time4 time5 \"avg\"
2 4 0 275.0 0.0 0.0 0.0 0.0 275.0
2 3 1 93.0 243.0 0.0 0.0 0.0 150.0
1 3 3 202.0 240.0 412.0 560.0 0.0 119.3
1 2 0 207.0 0.0 0.0 0.0 0.0 207.0
0 4 3 237.0 275.0 426.0 520.0 0.0 94.3
0 2 3 105.0 119.0 237.0 581.0 0.0 158.6
0 3 3 243.0 353.0 425.0 464.0 0.0 73.6
0 1 1 236.0 411.0 0.0 0.0 0.0 175.0
1 4 4 122.0 210.0 361.0 455.0 561.0 109.7
3 4 1 1.0 243.0 0.0 0.0 0.0 242.0