Custom settings

  Previous topic Next topic JavaScript is required for the print function  


Custom Tagsets

In  the main Settings | Tags window, you will see this, but you won't find "Shakespeare" as one of the options.


The point of it...

customising tag settings

The point of this choice is to change a whole series of settings according to the type of corpus you wish to process.

When you change the setting above, any valid data as explained below will get loaded into your defaults.


How to do it

1. Create a plain text file called "custom_tag_settings.txt" and save it in your Documents\wsmith5 folder. The format is like this:


Each entry starts <n> and ends </n>, where n is a number up to 20.
An entry must contain a label (such as Shakespeare) and may contain any of the other markers specified below:

<label> </label>

<default> </default> (use this for one entry only to determine which label is selected when WordSmith starts)

<entity_file> </entity_file>

<tag_file> </tag_file>

<tags_exclude_file> </tags_exclude_file>

<ignore_string> </ignore_string>

<header_string> </header_string>

<sentence_begin> </sentence_begin>

<sentence_end> </sentence_end>

<paragraph_begin> </paragraph_begin>

<paragraph_end> </paragraph_end>

<heading_begin> </heading_begin>

<heading_end> </heading_end>

<section_begin> </section_begin>

<section_end> </section_end>

<lemma_file> </lemma_file>

<matchlist_file> </matchlist_file>

<stoplist_file> </stoplist_file>


All of these will have leading and trailing spaces removed.
Use auto for automatic processing eg. of sentence ends.



I wanted a choice of Shakespeare to determine which tags were chosen and how sentences, paragraphs etc. would be recognised in my Shakespeare corpus.

Here is how I made "Shakespeare":


<label> Shakespeare </label>

<entity_file> sgmltrns.tag </entity_file>

<tag_file> Shakespeare.tag </tag_file>

<tags_exclude_file> Shakespeare exclusion tags.tag </tags_exclude_file>

<ignore_string> <*> </ignore_string>

<header_string> </Header> </header_string>

<sentence_begin> </sentence_begin>

<sentence_end> auto </sentence_end>

<paragraph_begin> </paragraph_begin>

<paragraph_end> </paragraph_end>

<heading_begin> </heading_begin>

<heading_end> </heading_end>

<section_begin> </section_begin>

<section_end> </section_end>


There were <2>...</2>, <3> ... </3> etc. but they aren't supplied here.

There was no point in trying to recognise paragraph breaks in Shakespeare plays, but I did want an idea of sentences, to be recognised simply by full stops etc.


See also : Tags as text selectors

Page url: