|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Perl CodeHii guys
I wrote this Perl code to find duplicate files in a directory. The code chks the md5 digest of files with same sizes. If the digests match then it reports it. While testing it I found that the duplication is reported more than once. On some occasions it was reported five times. I tested it on the Linux kernel source tree. #!/usr/bin/perl -w # file_name: dup_search.pl # usage ./dup_search.pl <directory to search> [<directory to search>] # how it works? The program compares the sizes of the files. If the sizes are equal then it # calculates the md5 digest. use 5.010; use File::Find; use Digest::MD5; find(\&wanted,@ARGV); #traverses through the directory specified in the arguments sub wanted{ unless (-z $File::Find::name){ unless (-d $File::Find::name){ my $file =$File::Find::name; # $file stores the path of the file my $file_size= -s $file; # Two hashes are maintained to store the size of files and to store the md5 of the selected files while((my $file_path_1,my $size_value)=each %size_hash){ $test=0; if ($size_value == $file_size){ $md5_digest=&md5_finder($File::Find::name); #this while loop chks if the md5 digest of $file_path_1 is already present in the hash while(((my $file_path_2,my $md5_value)=each %digest_hash) and $test==0){ if ($file_path_1 eq $file_path_2){ $test=1; $hash_digest=$md5_value; $file_path=$file_path_2; } } if($test==0){ $hash_digest=&md5_finder($file_path_1); $digest_hash{$file_path_1}=$hash_digest; $file_path=$file_path_1; } print "$file\n$file_path\n\n" if $hash_digest eq $md5_digest; } } $size_hash{$file}=$file_size; } } } sub md5_finder{ open(FILE, $_[0]) or die "Can't open $_[0]: $!"; binmode(FILE); Digest::MD5->new->addfile(*FILE)->hexdigest; } Cheers Anirudh Nair [Non-text portions of this message have been removed] |
|
|
Re: Perl CodeOn Wed, 29 Jul 2009, anirudh nair wrote:
> Hii guys > > I wrote this Perl code to find duplicate files in a directory. > > sub md5_finder{ > open(FILE, $_[0]) or die "Can't open $_[0]: $!"; > binmode(FILE); > Digest::MD5->new->addfile(*FILE)->hexdigest; > } Aniruddh, how does the md5_finder() function work ? thanks Saifi. |
|
|
Re: Perl CodeI have only one comment. Your code needs to be indented properly.
Please set your editor session. Thanks Jagadeesh On Wed, 29 Jul 2009, anirudh nair wrote: > Hii guys > > I wrote this Perl code to find duplicate files in a directory. > The code chks the md5 digest of files with same sizes. If the digests match > then it reports it. > > While testing it I found that the duplication is reported more than once. On > some occasions it was reported five times. > I tested it on the Linux kernel source tree. > > > > #!/usr/bin/perl -w > # file_name: dup_search.pl > # usage ./dup_search.pl <directory to search> [<directory to search>] > # how it works? The program compares the sizes of the files. If the sizes > are equal then it > # calculates the md5 digest. > > use 5.010; > use File::Find; > use Digest::MD5; > find(\&wanted,@ARGV); #traverses through the directory specified in the > arguments > sub wanted{ > unless (-z $File::Find::name){ > unless (-d $File::Find::name){ > my $file =$File::Find::name; # $file stores > the path of the file > my $file_size= -s $file; > > # Two hashes are maintained to store the size of files and > to store the md5 of the selected files > > while((my $file_path_1,my $size_value)=each > %size_hash){ > $test=0; > if ($size_value == $file_size){ > $md5_digest=&md5_finder($File::Find::name); > > #this while loop chks if the md5 digest of > $file_path_1 is already present in the hash > > while(((my $file_path_2,my > $md5_value)=each %digest_hash) and $test==0){ > > if ($file_path_1 eq $file_path_2){ > $test=1; > $hash_digest=$md5_value; > $file_path=$file_path_2; > } > } > if($test==0){ > $hash_digest=&md5_finder($file_path_1); > $digest_hash{$file_path_1}=$hash_digest; > $file_path=$file_path_1; > } > print "$file\n$file_path\n\n" if > $hash_digest eq $md5_digest; > } > } > $size_hash{$file}=$file_size; > } > } > } > sub md5_finder{ > open(FILE, $_[0]) or die "Can't open $_[0]: $!"; > binmode(FILE); > Digest::MD5->new->addfile(*FILE)->hexdigest; > } > > > Cheers > Anirudh Nair > > > [Non-text portions of this message have been removed] > > Peace, jagadeesh =========== Jagadeesh N.Malakannavar. Bangalore, India. GSM: 91 99010 01180 Software Tools Engineer. |
|
|
Re: Perl CodeOn Wed, Jul 29, 2009 at 9:11 AM, Saifi Khan <saifi.khan@...>wrote:
> > > On Wed, 29 Jul 2009, anirudh nair wrote: > > > Hii guys > > > > I wrote this Perl code to find duplicate files in a directory. > > > > sub md5_finder{ > > open(FILE, $_[0]) or die "Can't open $_[0]: $!"; > > binmode(FILE); > > Digest::MD5->new->addfile(*FILE)->hexdigest; > > } > > Aniruddh, how does the md5_finder() function work ? > > . > md5_finder() takes the file path as its argument. A file handle FILE is opened to $_[0](which conatins the path) then Digest::MD5->new->addfile(*FILE)->hexdigest calculates the md5 digest and is returned back. hexdigest returns the digest in hexadecimal form Cheers Anirudh [Non-text portions of this message have been removed] |
| Free embeddable forum powered by Nabble | Forum Help |