A tool to go through your corpus and seek out any text files which may have become corrupted. Works in any language.
See also: detecting corpus corruption
Page url: http://www.lexically.net/wordsmith/step_by_step_Chinese/?corpus_corruption_overview.htm