Choosing a lemma match file

Top  Previous  Next

WordList > choosing lemma file


The point of itů

You may choose to lemmatise all items in the current word list using a standard text file which groups words which belong together (be -> was, is, were, etc.). While it is time-consuming producing the text file the first time, it will be very useful if you want to lemmatise lots of word lists, and is much less "hit-and-miss" than auto-joining.

There is an English-language lemma list from Yasumasa Someya at


How to do it

In the main Controller, Settings | Adjust Settings | Lemma,Match,Stop lists, you will see a screen like this:




Choose the appropriate button (for Concord, KeyWords or WordList) and type the file name or browse for it.


The file should contain a plain text list of lemmas with items like this:





WordSmith then reads the file and displays them (or a sample if the list is long). The format allows any alphabetic or numerical characters in the language the list is for, plus the single apostrophe, space, underscore. In other words, if you mistakenly put GO = GOES that line won't be included because of the = symbol.


The actual processing of the list only takes place when you choose the menu option Match Lemmas (LEMMAS) in WordList, Concord or KeyWords. See Match List for a more detailed explanation, with screenshots.


What if my text files don't contain BE?

Suppose you are matching AM, ARE etc with BE as in the list above, but your texts don't actually contain the word BE. WordList won't find it to link to....  The best way around this is to make a new word-list on the basis of a plain text file (in which you include BE and any other base forms wanted), save it, and then merge it with your existing wordlist. Now WordList should find the form BE to add to it AM, ARE, WAS etc.


See also: Lemmatisation, Match List, Stop List