Show/Hide Toolbars

WordSmith Tools Manual

Navigation: Utility Programs > Corpus Checker


Scroll Prev Top Next More




The purpose is to check your corpus for corruption, relevance and duplicates.



it has got corrupted so what used to be good text is now just random characters or has got cut much shorter because of disk problems

it isn't even in the same language as the rest of the corpus

it is a copy of another text file or has very similar text

it contains too much boilerplate text

it is or is not relevant to a particular area of enquiry


The tool works in any language. It checks corruption by using a known sample of good text (in whatever language) and comparing that good text with all your corpus.


See also : How to check corruption