Start and end of text segments

  Previous topic Next topic JavaScript is required for the print function  

 

WordSmith attempts to recognise 4 types of text segment: sentences, paragraphs, headings, sections. Processing is case sensitive. You can use <Enter> and <Tab> as strings representing an end of paragraph or a tab in your texts. For sentence ends, auto is another option.

 

Define these in your language settings.

 

Sentences

For example, <s> might represent the beginning of a sentence and </s>  the end. If you leave the choice as auto, ends of sentences are determined by full stops or question marks or exclamation marks followed by a capital letter.

 

Paragraphs

For example, <p *> or <p> might represent the beginning of a paragraph and </p>  the end.

 

Headings

For example, <head> might represent the beginning and </head>  the end. Note that the British National Corpus marks sentences within headings. Eg.

<head>

<s n="2"><w NN1>Introduction

</head>

in text HXL. It seems odd for the one word Introduction to count as a sentence, so WordSmith does not use sentence-tags within headings.

 

Sections

For example, <section *> might represent the beginning and </section>  the end.

 

Each of these is counted preferably when its closing tag such as </s>, </p> etc. is encountered. If there are no closing </p> tags in the entire text then paragraphs will be counted each time the opening paragraph tag is found.

 

 

See also: Overview of Tags, Handling Tags, Showing Nearest Tags in Concord, Tag Concordancing, Types of Tag, Viewing the Tags, Using Tags as Text Selectors, Guide to handling the BNC.

 

Page url: http://www.lexically.net/downloads/version5/HTML/?startandendoftextsegments.htm