Simple brute force duplicate file identification
Here is a way to identify files that have duplicates.
find dir -type f -print0 | xargs -0 md5sum > filelist.txt sort filelist.txt > filesort.txt uniq -w 33 -D filesort.txt # more legible uniq -w 33 --all-repeated=separate filesort.txt # also --all-rep=sep works
This will show which files have duplicates. I saved the results in a file instead of piping everything so one can go back to filesort.txt and identify the other files which have the same md5.
Make sure you actually compare the files. Some files could possibly have the same md5sum without being the same. They will likely have a different size. It is possible for two files of the same size to have the same md5sum.For better positive hits, use sha256 (slower).
Leave a Comment