detailed consistency relations

With a detailed consistency list such as this, of five versions of the fairy story Little Red Riding Hood,

 

red_riding_hood_DC_list

it looks as if the most long-winded story is probably version 5 (red5.lst). If you click the detailed cons. relation tab

 

detailed_cons_relation_tab

you can see the relevant statistics more usefully:

 

detailed_cons_relation_stats

where it can be seen that red5 has a type-count of 462 words, more than any other, and that the relation between red2 and red3 is the closest with a relation statistic of 0.487.

This relation is the Dice coefficient, based on the joint frequency and the type-counts of the two texts. Type count is the number of different word types in each text. Joint frequency: there are 138 matches in the vocabulary of these two versions, which means that 138 distinct word types matched up in the two word lists. (If for example book appeared 20 times in one list and 3 times in the other, that would count as 1 match.)

A Dice coefficient ranges between 0 and 1. The 0.487 can be thought of like a percentage, i.e. there's about a 49% overlap between the vocabularies of the two versions of the same story.

 

See also : Detailed Consistency.

Click the Permalink button if you want to copy a link to this page.