Perl Code

View: New views
4 Messages — Rating Filter:   Alert me  

Perl Code

by anirudh nair :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hii guys

I wrote this Perl code to find duplicate files in a directory.
The code chks the md5 digest of files with same sizes. If the digests match
then it reports it.

While testing it I found that the duplication is reported more than once. On
some occasions it was reported five times.
I tested it on the Linux kernel source tree.



#!/usr/bin/perl -w
# file_name: dup_search.pl
# usage ./dup_search.pl <directory to search> [<directory to search>]
# how it works? The program compares the sizes of the files. If the sizes
are equal then it
# calculates the md5 digest.

use 5.010;
use File::Find;
use Digest::MD5;
find(\&wanted,@ARGV);     #traverses through the directory specified in the
arguments
sub wanted{
         unless (-z $File::Find::name){
             unless (-d $File::Find::name){
                     my $file =$File::Find::name;          # $file stores
the path of the file
                         my $file_size= -s $file;

                 # Two hashes are maintained to store the size of files and
to store the md5 of the selected files

                     while((my $file_path_1,my $size_value)=each
%size_hash){
                     $test=0;
                     if ($size_value == $file_size){
                                 $md5_digest=&md5_finder($File::Find::name);

                         #this while loop chks if the md5 digest of
$file_path_1 is already present in the hash

                                     while(((my $file_path_2,my
$md5_value)=each %digest_hash) and $test==0){

                             if ($file_path_1 eq $file_path_2){
                                     $test=1;
                                     $hash_digest=$md5_value;
                                     $file_path=$file_path_2;
                                 }
                             }
                             if($test==0){
                                 $hash_digest=&md5_finder($file_path_1);
                                 $digest_hash{$file_path_1}=$hash_digest;
                                 $file_path=$file_path_1;
                             }
                                            print "$file\n$file_path\n\n" if
$hash_digest eq $md5_digest;
                               }
                     }
                     $size_hash{$file}=$file_size;
             }
         }
     }
sub md5_finder{
    open(FILE, $_[0]) or die "Can't open $_[0]: $!";
          binmode(FILE);
        Digest::MD5->new->addfile(*FILE)->hexdigest;
}


Cheers
Anirudh Nair


[Non-text portions of this message have been removed]


Re: Perl Code

by Saifi Khan-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 29 Jul 2009, anirudh nair wrote:

> Hii guys
>
> I wrote this Perl code to find duplicate files in a directory.
>      
> sub md5_finder{
>     open(FILE, $_[0]) or die "Can't open $_[0]: $!";
>           binmode(FILE);
>         Digest::MD5->new->addfile(*FILE)->hexdigest;
> }

Aniruddh, how does the md5_finder() function work ?


thanks
Saifi.

Re: Perl Code

by Jagadeesh N. Malakannavar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have only one comment. Your code needs to be indented properly.
Please set your editor session.

Thanks
Jagadeesh

On Wed, 29 Jul 2009, anirudh nair wrote:

> Hii guys
>
> I wrote this Perl code to find duplicate files in a directory.
> The code chks the md5 digest of files with same sizes. If the digests match
> then it reports it.
>
> While testing it I found that the duplication is reported more than once. On
> some occasions it was reported five times.
> I tested it on the Linux kernel source tree.
>
>
>
> #!/usr/bin/perl -w
> # file_name: dup_search.pl
> # usage ./dup_search.pl <directory to search> [<directory to search>]
> # how it works? The program compares the sizes of the files. If the sizes
> are equal then it
> # calculates the md5 digest.
>
> use 5.010;
> use File::Find;
> use Digest::MD5;
> find(\&wanted,@ARGV);     #traverses through the directory specified in the
> arguments
> sub wanted{
>         unless (-z $File::Find::name){
>             unless (-d $File::Find::name){
>                     my $file =$File::Find::name;          # $file stores
> the path of the file
>                         my $file_size= -s $file;
>
>                 # Two hashes are maintained to store the size of files and
> to store the md5 of the selected files
>
>                     while((my $file_path_1,my $size_value)=each
> %size_hash){
>                     $test=0;
>                     if ($size_value == $file_size){
>                                 $md5_digest=&md5_finder($File::Find::name);
>
>                         #this while loop chks if the md5 digest of
> $file_path_1 is already present in the hash
>
>                                     while(((my $file_path_2,my
> $md5_value)=each %digest_hash) and $test==0){
>
>                             if ($file_path_1 eq $file_path_2){
>                                     $test=1;
>                                     $hash_digest=$md5_value;
>                                     $file_path=$file_path_2;
>                                 }
>                             }
>                             if($test==0){
>                                 $hash_digest=&md5_finder($file_path_1);
>                                 $digest_hash{$file_path_1}=$hash_digest;
>                                 $file_path=$file_path_1;
>                             }
>                                            print "$file\n$file_path\n\n" if
> $hash_digest eq $md5_digest;
>                               }
>                     }
>                     $size_hash{$file}=$file_size;
>             }
>         }
>     }
> sub md5_finder{
>    open(FILE, $_[0]) or die "Can't open $_[0]: $!";
>          binmode(FILE);
>        Digest::MD5->new->addfile(*FILE)->hexdigest;
> }
>
>
> Cheers
> Anirudh Nair
>
>
> [Non-text portions of this message have been removed]
>
>

Peace,
jagadeesh

===========
Jagadeesh N.Malakannavar.         Bangalore, India.
GSM: 91 99010 01180               Software Tools Engineer.

Re: Perl Code

by anirudh nair :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 29, 2009 at 9:11 AM, Saifi Khan <saifi.khan@...>wrote:

>
>
> On Wed, 29 Jul 2009, anirudh nair wrote:
>
> > Hii guys
> >
> > I wrote this Perl code to find duplicate files in a directory.
> >
> > sub md5_finder{
> > open(FILE, $_[0]) or die "Can't open $_[0]: $!";
> > binmode(FILE);
> > Digest::MD5->new->addfile(*FILE)->hexdigest;
> > }
>
> Aniruddh, how does the md5_finder() function work ?
>
>  .
>

md5_finder() takes the file path as its argument.
A file handle FILE is opened to $_[0](which conatins the path)
then Digest::MD5->new->addfile(*FILE)->hexdigest calculates the md5 digest
and is returned back.
hexdigest returns the digest in hexadecimal form

Cheers
Anirudh


[Non-text portions of this message have been removed]