This section helps you look for repeated sections of text within articles. Sometimes it happens that there are numerous repeated sections, perhaps because of some mechanical copying problem in the original download.
How to do it
Choose a length in characters. Here we have 250 as the setting. The procedure will look for any repeated 250-character strings within each article. Press Find repeated chunks. If it finds any articles with verbatim repetition like that, they will get listed below. The offending articles get copied to a TEMPORARY sub-folder as shown below. You can at any time re-load the list here.
Why 250 characters?
The idea is to look for gross repetition which is unlikely to occur in a normal text. If you research clusters (n-grams) you'll know that if you look for e.g. repeated 10- or 15-word strings you get very few or none at all. The assumption is that 250 character strings which match up perfectly (including spaces, punctuation and carriage-returns) don't get there by chance but by errors in copying text by computer ... or gross plagiarism.
>