Custom settings

Top  Previous  Next

Controller > custom settings


Custom Tagsets

In  the main Settings | Tags window, you will see this, but you won't find "Shakespeare" as one of the options.


The point of it...

customising tag settings

The point of this choice is to change a whole series of settings according to the type of corpus you wish to process.

When you change the setting above, any valid data as explained below will get loaded into your defaults.


How to do it

1. Create a plain text file called "custom_tag_settings.txt" and save it in your \wsmith4 folder. The format is like this:


Each entry starts <n> and ands </n>, where n is a number up to 20.
An entry must contain a label and may contain any of the other markers specified below:

<label> </label>

<default> </default> (this can be used for one entry only and will determine which label is selected)

<entity_file> </entity_file>

<tag_file> </tag_file>

<tags_exclude_file> </tags_exclude_file>

<ignore_string> </ignore_string>

<header_string> </header_string>

<sentence_begin> </sentence_begin>

<sentence_end> </sentence_end>

<paragraph_begin> </paragraph_begin>

<paragraph_end> </paragraph_end>

<heading_begin> </heading_begin>

<heading_end> </heading_end>

<section_begin> </section_begin>

<section_end> </section_end>

All of these will have leading and trailing spaces removed.
Use auto for automatic processing eg. of sentence ends.



I wanted a choice of Shakespeare to determine which tags were chosen and how sentences, paragraphs etc. would be recognised in my Shakespeare corpus.

Here is how I made "Shakespeare":


<label> Shakespeare </label>

<entity_file> sgmltrns.tag</entity_file>

<tag_file> Shakespeare.tag</tag_file>

<tags_exclude_file> Shakespeare exclusion tags.tag</tags_exclude_file>

<ignore_string> <*> </ignore_string>

<header_string> </Header></header_string>

<sentence_begin> </sentence_begin>


<paragraph_begin> </paragraph_begin>

<paragraph_end> </paragraph_end>

<heading_begin> </heading_begin>

<heading_end> </heading_end>

<section_begin> </section_begin>

<section_end> </section_end>


There were <2>...</2>, <3> ... </3> etc. but they aren't supplied here.

There was no point in trying to recognise paragraph breaks in Shakespeare plays, but I did want an idea of sentences, to be recognised simply by full stops etc.


See also : Tags as text selectors