WordList display

  Previous topic Next topic JavaScript is required for the print function  


Each WordList display shows


the word
its frequency
its frequency as a percent of the running words in the text(s) the word list was made from
the number of texts each word appeared in
that number as a percentage of the whole corpus of texts


The Frequency display might look like this:




Here you see the top 7 words in a word list based on 480 texts. There are 72,028 words altogether but in the screenshot we can only see the first few. The Freq. column shows how often each word cropped up (THE looks as if it appeared 72,010 times in the 480 texts), and the % column tells us that the frequency represents 6.07% of the running words in those texts. The Texts column shows that THE comes in 480 texts, that is 100% of the texts used for the word list.


If we pull the Freq. column a little wider

(cursor at the header edgeselect_for_wideningthen pull right) so that the 72,010 doesn't have any purple marks beside it,


wordlist display freq_expanded

we see the true frequency value is actually 172,010.


Another thing to note is that there seems to be a word #, with over 50 thousand occurrences.



That represents a number or any word with a number in it such as EX658.


wordlist_display alpha


The Alphabetical listing also shows us some of the words but now they're in alphabetical order. ABANDON comes 43 times altogether, and in 37 of the 480 texts (less than 8%). ABANDONED, on the other hand, not only comes more often (78 times) but also in more texts (14% of them).


Now let's examine the statistics.




In all 480 texts, there are 72,028 word types (as pointed out above). The total running words is 2,833,815. Each word is about 4.57 characters in length. There are 107,073 sentences altogether, on average 26.47 words in length. In the text of a00.txt, there are only 1,571 different word types and that interview is under 7,000 words in length. This is explained in more detail in the Statistics page.


Finally, here is a screenshot of the same word list sorted "reverse alphabetically". In the part which we can see, all the words end in -IC.




To do a reverse alphabetical sort, I had the Alphabetical window visible, then chose Edit | Other sorts | Reverse Word sort in the menu. To revert to an ordinary alphabetical sort, press F6.


See also : Consistency, Lemmatisation

Page url: http://www.lexically.net/downloads/version5/HTML/?wordlistdisplay.htm