Single words v. Clusters (Cluster) in WordList

 

WordList clusters

A word list doesn't need to be of single words. You can ask for a word list consisting of two, three, up to eight words on each line. To do cluster processing in WordList, first make an index.

 

How to see clusters…

Open the index. Now choose Compute | Clusters.

 

cluster_choices_for_index

 

tog_minus        Words to make clusters from

 

tog_minus        What you see

 

tog_minus        Working constraints

 

Phrase frames

These are what William H. Fletcher has defined as phrase-frames, i.e. "groups of wordgrams identical but for a single word", in his kfNgram program.

 

Here, processing 23 Dickens novels shows lots of phrase frames where the wildcard word is represented with *.

 

phrase_frames

If you double-click the lemmas column (highlighted here in yellow), you get to see the detail.

 

lemma_clusters

The process joins all the variants of the phrase in the Lemmas column. In the word list itself they will appear deleted (because they have been joined to another item, the phrase frame). You can un-join them all if you want (Edit | Joining | Unjoin or Unjoin all).

 

Omit phrase frames?

If you don't want to see phrase frames, select the omit phrase frames option.

 

omit_phrase_frames

 

Here below, the listing has all his hand sequences together but not drawing his hand across, gave his hand to, etc. as shown in the phrase frame view above.

 

clusters involving hand

 

Here is a small set of 3-word clusters involving rabies from the BNC corpus.

 

clusters_of_rabies

 

Some of them are plausible multi-word units.

 

It's a word list

 

Finally, remember this listing is just like a single-word word list. You can save it as a .lst file and open it again at any time, separately from the index.

 

See also: find the files for specific clusters, clusters in Concord

Click the Permalink button if you want to copy a link to this page.