Perl打印两个文件的差异

问题描述

我正在尝试在两个文件之间进行比较并输出比较。我下面的代码适用于file1中存在的项目,但在file2中丢失,但是不适用于file1中缺少的file2中的项目。尝试交换file1和file2,但不起作用。预先感谢。

use warnings;
use strict;
my $file1 = '1.txt';
my $file2 = '2.txt';



open my $fh,'<',$file2 or die $!;
my $file = [<$fh>];
open $fh,$file1 or die $!;
while(my $line = <$fh>) {
    chomp($line);
#print "$line\n";

    my $status = 0;
    for (@{$file}) {
        chomp;
        if (/$line/) {
            $status = 1;
            last;
        }
    }
    print $line,$/ if $status == 0 
}

文件1:

15122
16070
61
15106
16704
15105
7303
15201
21
16712
7308
16029
16008
16023
16025
16044
16045
16042
16043
16040
16041
16226
15112
16914
16915
31
16910
16911
16912
16913
16114
7505
1103
16018
16916

文件2:

1103 
15105 
15106 
15112 
15201 
15211 
16024 
16029 
16044 
16051 
16070 
16201 
16225 
16350 
21 
31 
61 
7303 
7505 

解决方法

您的代码有几个问题。

检查文件后,我发现file2带有一些尾随空格。由于file1没有它们,因此您永远无法在第一个没有空格的文件上匹配'1103 '

chomp仅删除最后一个新行(如果存在),因此对尾随空格将无济于事。

我将使用正则表达式代替行的开头来删除所有“ spacy”字符。为此,您可以使用s/\s*$//

此外,您正在使用正则表达式比较行。除非使用一些word boundary,否则可能会出现问题。因为如果不这样做,您将比较第一个文件上的1,它将与第二个文件上的123相匹配,这是不正确的。

我将使用eq来比较这两行。

因此,这是具有更改的脚本:

use warnings;
use strict;
my $file1 = '2.txt';  # Exchanged files to test the non-working case
my $file2 = '1.txt';

open my $fh,'<',$file2 or die $!;
my $file = [<$fh>];
open $fh,$file1 or die $!;
while(my $line = <$fh>) {
    $line =~ s/\s+$//;    # changed to remove all space-like trailing characters

    my $status = 0;
    for (@{$file}) {
        s/\s+$//;    # changed to remove all space-like trailing characters
        if ($_ eq $line) {    # changed to use a regular comparison
            $status = 1;
            last;
        }
    }
    print $line,$/ if $status == 0 
}

额外提示:

您实际上不需要使用带有数组引用的file1。您可以简单地使用数组。这样,您就可以避免在for循环上取消引用:

因此您可以更改以下行:

...
my @file_content = <$fh>;
...
for (@file_content) {
...

另一个提示:

对于大文件,代码可能太慢,因为算法的costO(n^2)

您可能想使用here中所述的一种技术。

,

根据我的理解,它不是逐行匹配,而是逐个文件进行数字比较。

1) Open the files

2) store the contents in the multiple arrays

3) Simply compare the two arrays.

   use Array::Utils qw(:all);

   my @file_arr1 = qw(15122 16070 61 15106 16704 15105 7303 15201 21 16712 7308 16029 16008 16023 16025 16044 16045 16042 16043 16040 16041 16226 15112 16914 16915 31 16910 16911 16912 16913 16114 7505 1103 16018 16916);

   my @file_arr2 = qw(1103 15105 15106 15112 15201 15211 16024 16029 16044 16051 16070 16201 16225 16350 21 31 61 7303 7505);

   my @unmatched_arr = array_diff(@file_arr1,@file_arr2);

   my @matched_arr = unique(@file_arr1,@file_arr2);

   print join "\n",@unmatched_arr;

谢谢。

,

另一种使用List :: Compare模块https://metacpan.org/pod/List::Compare

来解决上述要求的方法

脚本

use strict;
use warnings;

use File::Grep qw( fmap );
use String::Util qw(trim);
use List::Compare;
use Data::Dumper;

my $file_1 = "file1.txt";
my $file_2 = "file2.txt";

#fmap BLOCK LIST : Performs a map operation on the files in LIST,#using BLOCK as the mapping function. The results from BLOCK will be 
#appended to the list that is returned at the end of the call.
# trim : Returns the string with all leading and trailing whitespace removed.
my @data1= fmap { trim($_)  } $file_1;
my @data2= fmap { trim($_)  } $file_2;

#Create a List::Compare object. Put the two lists into arrays (named or anonymous) 
# and pass references to the arrays to the constructor.
my $diff_file1 = List::Compare->new(\@data1,\@data2);
#get_unique() : Get those items which appear (at least once) only in the first list.
my @data_missing_file2 = $diff_file1->get_unique;

my $diff_file2 = List::Compare->new(\@data2,\@data1);
my @data_missing_file1 = $diff_file2->get_unique;

print "Data missing in file2 which present in file1 : ",Dumper(\@data_missing_file2),"\n";
print "Data missing in file1 which present in file2: ",Dumper(\@data_missing_file1),"\n";

输出

Data missing in file2 which present in file1 : $VAR1 = [
          '15122','16008','16018','16023','16025','16040','16041','16042','16043','16045','16114','16226','16704','16712','16910','16911','16912','16913','16914','16915','16916','7308'
        ];

Data missing in file1 which present in file2: $VAR1 = [
          '15211','16024','16051','16201','16225','16350'
        ];