Start and end of text segments

 

WordSmith attempts to recognise 4 types of text segment: sentences, paragraphs, headings, sections. Processing is case sensitive. You can use <Enter> and <Tab> as strings representing an end of paragraph or a tab in your texts. For sentence ends, auto is another option.

 

Define these in your language settings.

 

Sentences

For example, <s> might represent the beginning of a sentence and </s>  the end. If you leave the choice as auto, ends of sentences are determined by according to the definition of a sentence which gives a approximation. (There is no 100% accurate way of handling sentence recognition.)

 

Paragraphs

For example, <p *> or <p> might represent the beginning of a paragraph and </p>  the end.

 

Headings

For example, <head> might represent the beginning and </head>  the end. Note that the British National Corpus marks sentences within headings. Eg.

<head>

<s n="2"><w NN1>Introduction

</head>

in text HXL. It seems odd for the one word Introduction to count as a sentence, so WordSmith does not use sentence-tags within headings.

 

Sections

For example, <section *> might represent the beginning and </section>  the end.

 

Each of these is counted preferably when its closing tag such as </s>, </p> etc. is encountered. If there are no closing </p> tags in the entire text then paragraphs will be counted each time the opening paragraph tag is found.

 

 

See also: Overview of Tags, Handling Tags, Showing Nearest Tags in Concord, Tag Concordancing, Types of Tag, Viewing the Tags, Using Tags as Text Selectors, Guide to handling the BNC.

 

Click the Permalink button if you want to copy a link to this page.