purpose

 

The point of it...

 

Character Profiler ws-48-characterprofiler, a tool to help find out which characters are most frequent in a text or a set of texts.

 

 

The purpose could be to check out which characters are most frequent (e.g. in normal English text the letter E followed by T will be most frequent as shown below), or it could be to check whether your text collection contains any oddities, such as accented characters or curly apostrophes you weren't expecting.

 

The first 32 codes used in computer storage of text are "control characters" such as tabs, line-feeds and carriage-returns. A plain .txt version of a text should only contain letters, numbers, punctuation and tabs, line-feeds and carriage-returns -- if there are other symbols you do not recognise you may have a .txt file which is really an old WordPerfect or Word .doc in disguise.

 

It would enable you to discover the most used characters across languages, as in this screenshot:

 

top_10_characters_11_languages

For further details see http://lexically.net/downloads/corpus_linguistics/1984_characters.xls.

Click the Permalink button if you want to copy a link to this page.