The point of it

The idea is to build up your own corpus of texts, by downloading web pages with the help of a search engine.


What you do

Just type a word or phrase, check the language, and press Download.


How it works




WebGetter visits the search engine you specify and downloads the first 1000 sources or so. Basically it uses the search engine just as you do yourself, getting a list of useful references. Then it sends out a robot to visit each web address and download the web page in each case (not from the search engine's cache but from the original web-site). Quite a few robots may be out there searching for you at once -- the advantage of this is that one slow download doesn't hold all the others up.


After downloading a web page, that WebGetter robot checks it meets your requirements (in Settings) and cleans up the resulting text. If the page is big enough, a file with a name very similar to the web address will be saved to your hard disk.


When it runs out of references, WebGetter re-visits the search engine and gets some more.


See also: Settings, Display, Limitations

Page url: http://www.lexically.net/downloads/version5/HTML/?webgetteroverview.htm