These are found in the main Controller marked KeyWords.
This is because some of the choices may affect other Tools. KeyWords and WordList both use similar routines: KeyWords to calculate the key words of a text file, and WordList when comparing word-lists.
Max. p value
The default level of significance. See p value for more details.
Max. wanted (500), Min. frequency (3), Min. % of texts (5%), Min. log ratio (2.0) and BIC score (2.5)
You may want to restrict the number of key words (KWs) identified so as to find for example the ten most "key" for each text. The program will identify all the key words, sort them by key-ness, and then throw away any excess. It will thus favour positive key words over negative ones.
The minimum frequency is a setting which will help to eliminate any words or clusters which are unusual but infrequent. For example, a proper noun such as the name of a village will usually be extremely infrequent in your reference corpus, and if mentioned only once in the text you're analysing, it is likely not to be "key". The default setting of 3 mentions as a minimum helps reduce spurious hits here. In the case of short texts, less than 600 words long, a minimum of 2 will automatically be used.
The minimum percentage of texts (default = 5%) allows you to ignore words which are not found in many texts. Here the percentage is of the text files in the set you are comparing against a reference corpus. If you're comparing a word-list based on one text, each word in it will occur in 100% of the texts and thus won't get ignored. If you compare a word-list based on 200 texts against your reference corpus, the default of 5% would mean that only words which occur in at least 10 of those texts will be considered for keyness. The KeyWords display shows the number of texts each KW was found in. (If you see ?? that is because the data were computed before that facility came into WordSmith.)
Min. log ratio acts as a threshold (for positive KWs only), so 2.0 omits any KWs which are not at least 4 times as frequent in the text as in your reference corpus.
Exclude negative KWs
If this is checked, KeyWords will not compute negative key words (ones which occur significantly infrequently).
If this is checked, KeyWords will not compute plots, links or KW clusters as it computes the key words (they can always be computed later assuming you do not move or delete the original text files). This is useful if computing a lot of KW files in a batch, e.g. to make a database.
If this is checked, KeyWords will compute the keyness of each lemmatised item. For example if GO represents WENT, GOES etc. and GO alone had a frequency of 10 but the whole set GO, WENT, GONE etc. totalled 100,
•if full lemma processing is checked, GO would count only 10, and WENT, GONE etc will be checked for keyness independently.
•if it is not checked, the frequency of GO will be counted as 100 and WENT, GONE etc won't get checked for keyness.
show BIC in plots
When a plot is computed, this option will show BIC scores (if checked), otherwise log likelihood scores.
If checked, this computes text dispersion keyness.
Links and Clusters
Max. link calc. frequency: to compute a plot is hard work as all the KWs have to be concordanced so as to work out where they crop up. To then compute links between each KW can take time especially if your KWs include some which occur thousands or hundreds of times in the text. To keep this process more manageable, you can set a default. Here 2000 means that any KW which occurs more than 2000 times in the text will not be used for computing links. (It will still appear in the plots and list of KWs, of course.)
min. cluster frequency: your threshold for link clusters to be shown.
min. linked types: threshold for number of KW types linked to each key item.
link span: the default is 1 to 5, checking all positions up to 5 words away from the key item (2 to 3 would only consider key items occurring either 2 or 3 positions either side of each key item).
min. link strength: how many instances of each linked key item are required for it to be listed.