The script below will find duplicate files (files with the same md5sum) in a specified directory and output a new shell script containing commented-out rm statements for deleting them. You can then edit this output to decide which to keep.
OUTF=rem-duplicates.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -print0 |
xargs -0 -n1 md5sum |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;
chmod a+x $OUTF; ls -l $OUTF











1 comments:
Useful script, I remember writing a similar one with SHA-512 instead of MD5.
Is the "-n1" option of xargs really needed? The man page of md5sum states that it accepts more than one file on the command line.
"rm \1" probably could be changed to support file names containing spaces.
Post a Comment