Only part of file: selecting within texts
|Top Previous Next|
Tags and Markup > selecting within texts
Cut start of each line/paragraph
The point of this is that some corpora (e.g. LOB) have a fixed number of line-detail codings at the start of each line. Here you want to cut them out (that is, after every <Enter>). Choose the number of characters to cut, up to 100; the default is 0. Use -1 if you want to cut everything up to the first alphabetical character at the start of each line, and -2 to cut everything up to the first tab.
Sections to Cut
If you are using text files with SGML, XML or HTML headers (e.g. the British National Corpus) you may simply want to cut out the header from your word lists, concordances, etc. as shown in the Document header example.
For more complex choices, you may here specify what is to be cut, where it starts (for example <HEAD>) and where you want to cut to (e.g. </HEAD>). You can choose to cut out up to 3 different and separate sections (<HEAD> to </HEAD> or <BODY> to </BODY>). This function cuts out any section located as many times as it is found within the whole text.
Sections to Keep (contexts)
You want to select one section of a text and cut out the rest. Specify one tag to define the desired start, and one to specify the end, e.g. <Intro> to <Body>
(these would analyse only text introductions), or Mary: to Peter: (these would get all of Mary's contributions in the discourse but nothing else).
Naturally you must be sure that there is something unique like a < or > symbol to define each section. For example, in the case of Mary: and Peter: you'd want to be sure that every contribution made by Mary has a colon immediately following her name, and that all her contributions were followed by Peter:. This function is case sensitive (so it would not find MARY:).
If you used <H1> to </H1> with this function in HTMLtext you'd get all the major headings in your texts, however many, but nothing else.
You can choose to use 2 different sections, e.g. <Intro> to </Intro> to get the introduction and <Conclusion> to </Conclusion> to get the conclusion as well. The "off" switch doesn't have to look like the "on" switch -- you could keep, for example, <INTRO> to </BODY> and thereby cut out the conclusion if that comes after the </BODY>.