Making an Index List

Top  Previous  Next

WordList > making a WordList Index

index files

Two files are created for each index:

.tokens file: a large file containing information about the position of every word token in your text files.        

.types file: knows the individual word types.

 

To create an index, first use the main Controller and choose Adjust Settings | Index. You will need to specify a basic filename for the index because WordSmith needs to know the filename before it can do the work (unlike a concordance where you only save the results after it has done the work of computing the concordance). In this screenshot below, the basic filename is new_one: WordSmith will add .tokens and .types to this basic filename as it works.

index_filename

 

If you choose an existing basic filename which you have already used, WordList will check whether you want to add to it or start it afresh:

over-write index_filename

Next, select your text files in the usual way. WordList will go through your selected texts and store information about the position of every instance of every word-type using the .tokens and .types files.

 

An index permits the computation of word clusters and Mutual Information scores for each word type. The screenshot below shows the progress bars for an index of the BNC World corpus; on a desktop PC with 1GB of RAM it has taken nearly one hour to do 96% of the work: a rate of about 1.8 million words per minute. The resulting BNC Words.tokens file was 1.6GB in size and the BNC Words.types file was 26 MB. On a basic laptop with 512MB of RAM it took about 3 hours 15 minutes.

 

making an index

 

adding to an index

To add to an existing index, just choose some more texts and choose File | New | Index. If the existing filename is already in use for an index, you will be asked whether to add more ('Yes') or start it afresh ('No').

 

See also Using Index Lists, Viewing Index Lists, WordList Help Contents.