choosing lemma file

The point of it…

You may choose to lemmatise all items in the current word-list using a standard text file which groups words which belong together (be -> was, is, were, etc.). While it is time-consuming producing the text file the first time, it will be very useful if you want to lemmatise lots of word lists, and is much less "hit-and-miss" than auto-joining using a template.

There is an English-language lemma list from Yasumasa Someya at http://www.lexically.net/downloads/BNC_wordlists/e_lemma.txt.

How to do it

In the main Controller, Settings | Adjust Settings | Lemma,Match,Stop lists, you will see a screen like this:

choose_lemma_or_match_or_stop_file

Choose the appropriate button (for Concord, KeyWords or WordList) and type the file name or browse for it, then Load it.

The file should contain a plain text list of lemmas with items like this:

BE -> AM, ARE, WAS, WERE, IS

GO -> GOES, GOING, GONE, WENT

WordSmith then reads the file and displays them (or a sample if the list is long). The format allows any alphabetic or numerical characters in the language the list is for, plus the single apostrophe, space, underscore. In other words, if you mistakenly put GO = GOES that line won't be included because of the = symbol.

The actual processing of the list will take place when you compute your word list, key word list or concordance or when you choose the menu option Match Lemmas () in WordList, Concord or KeyWords. See Match List for a more detailed explanation, with screenshots. Lemmatising occurs before any stop list is processed.

What if my text files don't contain the headword of the lemma?

Suppose you are matching AM, ARE etc with BE as in the list above, but your texts don't actually contain the word BE. In that case the tool will insert BE with zero frequency and add AM, ARE etc as needed.

Page url: http://www.lexically.net/downloads/version5/HTML/?wordlistlemmamatch_list.htm