WordList display

 

Each WordList display shows

 

the word

its frequency

its frequency as a percent of the running words in the text(s) the word list was made from

the number of texts each word appeared in

that number as a percentage of the whole corpus of texts

 

The Frequency display might look like this:

 

wordlist_display

 

Here you see the top 7 word-types in a word list based on 480 texts. There are 72,028 occurrences of these words (tokens) altogether but in the screenshot we can only see the first few. The Freq. column shows how often each word cropped up (THE looks as if it appeared 72,010 times in the 480 texts), and the % column tells us that the frequency represents 6.07% of the running words in those texts. The Texts column shows that THE comes in 480 texts, that is 100% of the texts used for the word list.

 

If we pull the Freq. column a little wider

(cursor at the header edgeselect_for_wideningthen pull right) so that the 72,010 doesn't have any purple marks beside it,

 

wordlist display freq_expanded

we see the true frequency value is actually 172,010.

 

Another thing to note is that there seems to be a word #, with over 50 thousand occurrences.

 

hash_representing_numbers

That represents a number or any word with a number in it such as EX658.

 

wordlist_display alpha

 

The Alphabetical listing also shows us some of the words but now they're in alphabetical order. ABANDON comes 43 times altogether, and in 37 of the 480 texts (less than 8%). ABANDONED, on the other hand, not only comes more often (78 times) but also in more texts (14% of them).

 

Now let's examine the statistics.

 

statistics_display

 

In all 480 texts, there are 72,028 word types (as pointed out above). The total running words is 2,833,815. Each word is about 4.57 characters in length. There are 107,073 sentences altogether, on average 26.47 words in length. In the text of a00.txt, there are only 1,571 different word types and that interview is under 7,000 words in length. This is explained in more detail in the Statistics page.

 

Finally, here is a screenshot of the same word list sorted "reverse alphabetically". In the part which we can see, all the words end in -IC.

 

reverse_alphabetical

 

To do a reverse alphabetical sort, I had the Alphabetical window visible, then chose Edit | Other sorts | Reverse Word sort in the menu. To revert to an ordinary alphabetical sort, press F6.

 

See also : Consistency, Lemmatisation

Click the Permalink button if you want to copy a link to this page.