A Tool for Investigating Awareness

Mike Scott

University of Liverpool

This paper outlines and demonstrates a piece of software which analyses the key words of a large corpus of text, comprising all the Guardian newspaper's output for the past 18 years, a total of over 800,000 texts.

Using patterns of repetition, the procedure first identifies the key words (KWs) in each text, and builds these into a series of databases. It becomes possible to determine the "associates" of each KW and distinguish different "clumps" of use, so that for example Christmas is found to belong with a series of different clumps concerned with toys and presents; food consumed at Christmas time; shopping; druids, yule, etc. This procedure provides a demonstrable insight into "the world of the Guardian", which in turn suggests the awarenesses of the journalists working for the paper and leads to analysis of stereotype.