![]() There are libraries that can access all those formats. For images, the duplicate check tools like Duplicate File Finder likely implement modern image comparison algorithms but even they miss similar files (false negatives) and have high false positive rates.įor ebooks specifically you're better off looking at the metadata. ![]() If you're doing it with text look at NLP methods like tf-idf or technologies like BERT. You are asking for an algorithmic approach to something people would give different answers to when asked if two images are similar. Gcc -L/usr/local/lib shash.o simi.o simiw.o lookup3.o -o shashĭefine "similar". Gcc -O2 -std=c99 -I/usr/local/include -c -o lookup3.o lookup3.c Gcc -O2 -std=c99 -I/usr/local/include -c -o simiw.o simiw.c Gcc -O2 -std=c99 -I/usr/local/include -c -o simi.o simi.c Trouble building it under FreeBSD and Linux: gcc -O2 -std=c99 -I/usr/local/include -c -o shash.o shash.c These docs are not identical, but they're similar enough that I wouldn't Shash is "a sample implementation of Charikar's hash for identification Versions of the GNUPlot documentation: me% cd /src/graphics/gnuplot/doc Like a similarity hash to compare the output. ![]() I think your best bet would be to extract just the text and then run something I have tried searching and tried other apps, but I am unable to find anything that can solve my problem. Is there any software that can find similar files (that search the content of the file) but may have a slight difference, like an extra page or cover, which is close to being a duplicate, but not 100%? I have also ran the duplicate plug-in in Calibre and it is also not flagging the files as dupes. Looking at the files through Calibre reader shows the file looks exactly the same to my eyes. I have 3 files with the same file name, format and size (Example: Alice In Wonderland.epub size 17.5MB)ĭupeGuru is not flagging these as dupes. I am running DupeGuru scan type for “Content”.įor example. However my issue is that I am running into very SIMILAR files (not exact dupes) which DupeGuru is not flagging. I have been using DupeGuru (been using it for years) and it finds exact duplicates, which is great. I am in the process of cleaning up and organizing 150GB worth of ebooks in various formats (i.e.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |