Stoplist

Whole 1990s BNC stoplist

Lemma lists

lemma list 5 extracted from the BNC using the File Utilities.

contains 20,437 head words which occur at least 5 times in the whole BNC corpus. Even so, includes items like

KOONKY -> KOONKIES
KOOP -> KOOPS
KORYAK -> KORYAKS
KOSKOTA -> KOSKOTAS
KOSMO -> KOSMOS
KOURO -> KOUROS

lemma list 10 from the BNC made using the File Utilities.

contains 15,843 head words which occur at least 10 times in the whole BNC corpus. Includes these two rare items

KOON -> KOONING,KOONS
KOURO -> KOUROS

lemma list 10 with c5 from the BNC also made using the File Utilities.

Required lemma frequency of 10, and each member of the lemma a minimum frequency of 3. Has c5 information in too:

ABANDON -> <VVB>ABANDON,<VVD>ABANDONED

Note: these lists rely on the BNC's own parsing, which is not 100% accurate. Thus you will find oddities like

BE -> AM,ARE,BEEN,BEING,BES,IS,WAS,WERE
HAVE -> HAD,HAS,HAVED,HAVEING,HAVES,HAVING,OF

I'm grateful to Sungmin Lee for pointing this out.

Yamasuma Someya's lemma list

for English, made in 1998 by Yasumasa Someya which "currently contains 40,569 words (tokens) in 14,762 lemma groups. It is still far from complete, but I hope you find the list useful in preparing your own more complete lemma list. If you have any questions or comments about this lemma list, feel free to contact me (ysomeya@gol.com)".)