Please enable JavaScript to view this site.

WordSmith Tools Help

The point of it

The idea is to be able to check whether you have files with the same name or a copy in different folders. You may often make copies of files and a few weeks later cannot remember where they were.

 

By default this program only checks whether the files it is comparing have almost the same name but dates and file-size can be compared too.  It handles lots of folders, the point being to locate unnecessarily duplicated files or confusing reuse of the same file-names. Names must match or be numbered or labelled by the operating system as copies.

 

How to do it

Specify your Folder 1 and simply press "Search". The Duplicate Names procedure will go through that folder and any sub-folders and will report any duplicates found.

Or you can specify 2 different folders (e.g. on different drives) and the process compares one set with the other.

 

Example 1: one start folder

test_duplicate_names_1folder

 

duplicate_names_results_1folder

The cluster identified the two text files as copies. (when you copy a file, Windows inserts - Copy).

 

Example 2: different start folders

 

These test files occupy different folders:

test_duplicate_folders

Results :

 

duplicate_names_results_two_folders

Cluster 1 shows a match where the difference is a bracketed number.

Clusters 3 and 4 show matches where the difference has - Copy.

We are comparing only what is in \test1 folder with \test2 folder, the program cannot  find duplicates within \test1 (ST00086.txt and ST00086 - Copy.txt).

 

Requirements

check sub-folders: includes the contents of sub-folders of your main folder.

date and size must match: checks these match when comparing text files, not just the names.

chars to match: how many characters in the file-name must match when comparing text files. For example, is chars to match = 2, then GUA.TXT and GUB.TXT do get compared to see whether their file-names are just variants of each other. Chars to match = 0 will allow GUA.TXT to be compared with GUARDIAN_NEWS_TODAY.TXT.  The higher the number the quicker the program will run, on average.

The program has to work hard unless you require date or size or character matches. For example if examining ten text files, it must check each one with all the others, which would mean (10 * 9) / 2 =  45 checks.  For 10,000 files it'd mean 49,995,000 checks. If you require a number of chars to match you will speed things up dramatically..

 

Sub-folders to exclude

Useful if there are some sub-folders you know you're not interested in. In the example below, any folder whose name ends _old or whose name is demo or examples will be ignored as will any sub-folder below it.

 

See also : duplicate contents, text similarity

 

  

Keyboard Navigation

F7 for caret browsing
Hold ALT and press letter

This Info: ALT+q
Nav Header: ALT+n
Page Header: ALT+h
Topic Header: ALT+t
Topic Body: ALT+b
Exit Menu/Up: ESC