Show/Hide Toolbars

WordSmith Tools Manual

Navigation: Utility Programs > Corpus Checker > find duplicates

Find duplicate contents by examining the contents

Scroll Prev Top Next More

The point of it

The idea is to be able to check whether you have virtually duplicate text files, i.e. with the same or similar contents. This search compares all text files with each other regardless of their file-names.

 

How to do it

Specify your Folder to search and file-type(s). Choose a percentage of difference, then simply press "Search". Search will go through that folder and any sub-folders and will report any duplicates found.

 

Example

 

duplicate_contents_search

This example found two text files within 4% difference in size and with contents which matched in word types and word token counts within 4%.  

 

This is the original of text A0A:

A0A_unaltered

and here is an altered version I used for the test:

A0A_altered View, Copy or Delete them?

 

To view, double-click the file-name in either list. To copy results or delete any file(s) selected, right-click the left or right list.

 

See also: duplicate file-names