Making a Tag File

Top  Previous  Next

Tags and Markup > making a tag file

 

Tag Syntax

Each tag is case sensitive.

Tags conventionally begin with < and end with > but the first & last characters of the tag can be any symbol.

You can use

       * to mean any sequence of characters;

       ? to mean any one character;

       # to mean any numerical digit.

 

Don't use [ to insert comments in a tag file, since [ is useful as a potential tag symbol. You can use # to represent a number (e.g. <h#> will pick up <h5>, <h1>, etc.). And use ? to represent any single character (<?> will pick up <s>, <p>, etc.), or * to represent any number of characters (e.g. <u*> will pick up <u who=Fred>, <u who=Mariana>, etc.). Otherwise, prepare your tag list file in the same way as for Stop Lists.

 

Use notepad or any other plain text editor, to create a new .tag file. Write one entry on each line.

Any number of pre-defined tags can be stored. But the more you use, the more work WordSmith has to do, of course and it will take time & memory ...

 

Mark-up to EXclude

 

tags_to_include_or_exclude

A tag file for stretches of mark-up like this <SCENE>A public library in London. A bald-headed man is sitting reading the News of the World.</SCENE>

where you want to exclude the whole stretch above from your concordance or word list, e.g. because you're processing a play and want only the actors' words. Mark-up to exclude will cut out the whole string from the opening to the closing tag inclusive.

 

The syntax requires ></ or >*</ to be present.

Legal syntax examples would be:

<SCENE></SCENE>

<SCENE>*</SCENE>

<SCENE #>*</SCENE>

<HELLO?? #>*</GOODBYE>

(In this last example it'll cut only if <HELLO is followed by 2 characters, a space and a number then >, and if </GOODBYE> is found beyond that.)

 

Mark-up to INclude

A tag file for tags to retain contains a simple list of all the tags you want to retain. Sample tag list files for BNC handling (e.g bnc world.tag) are included with your installation (in your \wsmith4 folder): you could make a new tag file by reading one of them in, altering it, and saving it under a new name.

 

Tags will by default be displayed in a standard tag colour (default=grey) but you can specify the foreground & background for tags which you want to be displayed differently by putting

/colour="foreground on background"

e.g. <noun> /colour="yellow on red"

Available colours:

'Black','White','Cream',

'Red','Maroon',

'Yellow',

'Navy','Blue','Light Blue','Sky Blue',

'Green','Olive','Dollar Green','Grey-Green','Lime',

'Purple','Light Purple',

'Grey','Silver','Light Grey','Dark Grey','Medium Grey'.

 

The colour names are not case sensitive (though the tags are). Note UK spelling of "grey" and "colour".

 

Also, you can put "/play media" if you wish a given tag, when found in your text files, to be able to attempt to play a sound or video file. For example, with a tag like

<sound *> /colour="blue on yellow" /play media

and a text occurrence like

<sound c:\windows\Beethoven's 5th Symphony.wav>

or

<sound http://www.political_speeches.com/Mao_Ze_Dung.mp3>

you will be able to choose to hear the .wav or .mp3 file.

 

Finally, you can put in a descriptive label, using /description "label" like this:

<w NN*> /description "noun" /colour="Cream on Purple"

<ABSTRACT> /description "section"

<INTRODUCTION> /description "section"

<SECTION 1> /description "section"

 

Section tag

In the examples using "section", Concord's "Nearest Tag" will find the section however remote in the text file it may be.

This is particularly useful e.g. if you want to identify the speech of all characters in a play, and have a list of the characters, and they are marked up appropriately in the text file.

<Romeo> /description "section"

<Mercutio> /description "section"

<Benvolio> /description "section"

 

 

Here is an example of what you see after selecting a tag file and pressing "Load". The first tag is a "play media" tag, as is shown by the icon. You can see the cream on purple colour for nouns too. The tag file (BNC World.tag) is included in your installation.

 

tag_file_viewing

 

Entity File (entities to be translated)

 

entity_file

A tag file for translation of one entity reference into another uses the following syntax: entity reference to be found + space + replacement. For example:

&Eacute; É

&eacute; é

A sample tag file for translation (\wsmith4\sgmltrns.tag) is included with your installation: you could make a new one by reading it in, altering it, and saving it under a new name.

 

See also: Overview of Tags, Handling Tags, Showing Nearest Tags in Concord, Tag Concordancing, Types of Tag, Viewing the Tags, Using Tags as Text Selectors