Text Converter Syntax

Top  Previous  Next

Utility Programs > Text Converter > Text Converter: syntax


The syntax for a Conversion File is:


Only lines beginning / or " are used. Others are ignored completely.

Every string for conversion is of the form "A" -> "B". That is, the original string, the one you're searching for, enclosed in double quotes, is followed by a space, a hyphen, the > symbol, and the replacement string.


Removing all tags

To remove all tags, choose "<*>" -> "" as your search string.


Control Codes

Control codes can be symbolised like this: {CHR(xxx)} where xxx is the number of the code. Examples: {CHR(13)} is a carriage-return, {CHR(10)} is a line-feed, {CHR(9)} is a tab, {CHR(12)} is a printer form-feed. To represent <Enter> which comes at the end of paragraphs and sometimes at the end of each line, you'd type {CHR(13)}{CHR(10)} which is carriage-return followed immediately by line-feed.

Use {CHR(34)} if you need to refer to double inverted commas.


Wildcards (*,?,# and ~)

*        You can use the asterisk as a wildcard. Thus "<*>" -> "" will delete any string in < > brackets from your text. "<head */head>" will delete any string starting "<head " and ending "/head>", even if there are hundreds of characters between them. The default search distance is 1,000 characters, with a maximum of 25,000. (The text is read chunk by chunk into a 30,000 character buffer, so the maximum will work fine at the start of the text; after this only 1,000 characters of search-space are guaranteed.) As deleting a lot of text can get rid of more text than you expect if the text is not properly marked up in the first place, you will probably need to over-ride the default search distance by specifying it in brackets, e.g. "<head*(100)/head>".The asterisk may not be the first or last symbol between the double quotation marks in the search-string.

The asterisk also retains up to 1,000 characters. "<div*(100)>" remembers all the characters up to > and can use them in the replacement: Thus "<div*(100)>" -> "[section *]" will produce [section 1 They Meet Again] if the original has <div1 They Meet Again>. "<div*>" will do the same thing but would allow up to 1,000 characters' search for the >.

#        Use # to symbolise any number. "<div#>" will find <div1>, <div2> , <div468>, etc. If # is in the replacement too, the exact same number will be used in the replacement. Thus "<div#>" -> "[section #]" will produce [section 468] if the original has <div468>.

?        The question mark stands for any single character, except a space. Up to ten ?s can be used in the replacement string to reproduce the character referred to by the ?s in the search-string.

~        The tilde means except. ~"<p>" "<*>" -> "" means delete everything in between angle brackets, except a case of <p>.


Use {CHR(42)} if you need to refer to *, {CHR(35)} for #, {CHR(63)} for ? and {CHR(126)} for ~.



Whole word, case Insensitive, Confirm, redundant Spaces

/C stops to confirm you wish to go ahead before each change.

/W does a whole word search (ensuring the alteration only happens if there's a word separator on either side) (/W "the" finds the but not other or then or bathe).

/I does a case insensitive search (/I "restaurant" -> "hotel" replaces restaurant with hotel and RESTAURANT with HOTEL and Restaurant with Hotel, i.e. respecting case as far as possible).

You can combine these, e.g.

/IWC "the" -> "this"

/S cuts out all redundant spaces. That is, it will reduce any sequence of two or more spaces to one, and it also removes some common formatting problems such as a lone space after a carriage-return or before punctuation marks such as .,; and ). /S can be used on a line of its own or in combination with other searches.


Additions (/A, /T and {v})

/A means add text. /A "Ulan" START inserts Ulan at the start, /A "Bator" END inserts Bator at the end of the text. See \wsmith4\convert.txt to see one in use.

/T means add title. So /T "<title>*</title>" -> "*" looks for <title> </title> and if it's found, inserts the wording given into the file. This will make your browser show the title at the top of the screen.

{v="} means remember this and use it in another line of the conversion file when you find {v}. "26 Dec." -> "Boxing Day" {v="Xmas"} stores the reference Xmas and "1 May" -> "Mayday" {v="after Easter"} stores after Easter for use in a later line, such as "/celebration/" -> "{v}". Assuming that your text has a mention of 26 Dec. and 1 May, this example, on finding /celebration/ in the text, will put Xmas if the most recent mention in the text was 26 Dec. and after Easter if the most recent mention was 1 May.


See \wsmith4\convert.txt to see examples in use.


See also: Text Converter Contents.