Please enable JavaScript to view this site.

Handling the British National Corpus 

Navigation: WordSmith's Handling > Word list

Lemmatising the word list

Scroll Prev Top Next More

 

In the File Utilities you can build a lemma list automatically. It will read each original BNC file (not the simplified copy you made to make your word list of verbs) and look for the "hw=" attribute of each word so it knows which head-word it belongs to.

 

You can get a list I made here.

 

The result should look like this:

 

ENTRAIN -> <VVD>ENTRAINED,<VVI>ENTRAIN,<VVN>ENTRAINED

ENTRANCE -> <VVD>ENTRANCED,<VVG>ENTRANCING,<VVI>ENTRANCE,<VVN>ENTRANCED

ENTRAP -> <VVB>ENTRAP,<VVD>ENTRAPPED,<VVG>ENTRAPPING,<VVI>ENTRAP,<VVN>ENTRAPPED

 

If you now lemmatise (Compute | Lemmas | Lemma matches) you should get results like this:

 

lemmatised BNC verbsThe <VVD>SAID line looks deleted because it is a memnber of the first line, SAY. You can see its lemmas by double-clicking any entry in the Lemmas column.

 

See also: relative entropy