Stop Lists (stoplist)

  Previous topic Next topic JavaScript is required for the print function  

 

Stop lists are lists of words which you don't want to include in analysis. For example you might want to make a word list or analyse key words excluding common function words like the, of, was, is, it.

 

To use stop lists, you first prepare a file, using Notepad or any plain text word processor, which specifies all the words you wish to ignore. Separate each word using commas, or else place each one on a new line. You can use capital letters or lower-case as you prefer. You can use a semi-colon for comment lines. There is no limit to the number of words.

There is a file called stop_wl.stp (in your \wsmith5 folder) which you could use as a basis and save under a new name. Or just make your own in Notepad and save it with .stp as the file-extension. If that is difficult, rename the .txt as .stp.

 

Example

 

; My stop list for test purposes.

THE,THIS,IS

IT

WILL

 

Then select Stop List in the menu to specify the stop list(s) you wish to use. Separate stop lists can be used for the WordList, Concord and KeyWords programs. If the stop list is activated, it is in effect: that is, the words in it will be stopped from being included in a word list. If you wish always to use the same stop list(s) you can specify them in wordsmith.ini as defaults.

 

stoplist_choices

To choose your stop list, click the small yellow button in the screenshot, find the stop list file, then press Load. You will see how many entries were correctly found and be shown the first few of them.

 

 stop_list_entries_loaded

With a stop list thus loaded, start a new word list. The words in your stop list should now not appear in the word list.

 

continuous

Normally, every word is read in while making the word list and stored in the computer's memory without checking whether it's the stop list. Eventually the set of words is checked in your stop list and omitted if it is present. That is much quicker. However, it means that for the most part, any statistics are computed on the whole text, disregarding your stop list.

If you choose continuous the processing will slow down dramatically since as every word is read in while making the word list, it will be checked against the stop list and ignored if found. In other words, every single case of THE and OF and IS etc. will be looked at as the texts are read in and sought in your stop list. The effect will be to give you detailed statistics which ignore the words in the stop lists.

 

subtract wordlengths in statistics

If you have not chosen continuous processing as explained above, you may want the statistics of your word list to attempt to deal in part with the stop list work done. With this choice, after the word list is computed, all the statistics concerning the number of types and tokens and 3-letter, 4-letter words etc. will be adjusted for the overall column (but not for the column for each single text) in your statistics.

 

See Match List for a more detailed explanation, with screenshots.

 

Another method of making a stop list file is to use WordList on a large corpus of text, setting a high minimum frequency if you want only the high-frequency words. Then save it as a text file. Next, use the Text Converter to format it, using stoplist.cod as the Conversion file.

 

See also: Making a Tag File, Match List, Lemmatisation.

Page url: http://www.lexically.net/downloads/version5/HTML/?stop_lists.htm