Text Converter Syntax |
Top Previous Next |
Utility Programs > Text Converter > Text Converter: syntax
The syntax for a Conversion File is:
Only lines beginning / or " are used. Others are ignored completely. Every string for conversion is of the form "A" -> "B". That is, the original string, the one you're searching for, enclosed in double quotes, is followed by a space, a hyphen, the > symbol, and the replacement string.
Removing all tags To remove all tags, choose "<*>" -> "" as your search string.
Control Codes Control codes can be symbolised like this: {CHR(xxx)} where xxx is the number of the code. Examples: {CHR(13)} is a carriage-return, {CHR(10)} is a line-feed, {CHR(9)} is a tab, {CHR(12)} is a printer form-feed. To represent <Enter> which comes at the end of paragraphs and sometimes at the end of each line, you'd type {CHR(13)}{CHR(10)} which is carriage-return followed immediately by line-feed. Use {CHR(34)} if you need to refer to double inverted commas.
Wildcards (*,?,# and ~) * You can use the asterisk as a wildcard. Thus "<*>" -> "" will delete any string in < > brackets from your text. "<head */head>" will delete any string starting "<head " and ending "/head>", even if there are hundreds of characters between them. The default search distance is 1,000 characters, with a maximum of 25,000. (The text is read chunk by chunk into a 30,000 character buffer, so the maximum will work fine at the start of the text; after this only 1,000 characters of search-space are guaranteed.) As deleting a lot of text can get rid of more text than you expect if the text is not properly marked up in the first place, you will probably need to over-ride the default search distance by specifying it in brackets, e.g. "<head*(100)/head>".The asterisk may not be the first or last symbol between the double quotation marks in the search-string. The asterisk also retains up to 1,000 characters. "<div*(100)>" remembers all the characters up to > and can use them in the replacement: Thus "<div*(100)>" -> "[section *]" will produce [section 1 They Meet Again] if the original has <div1 They Meet Again>. "<div*>" will do the same thing but would allow up to 1,000 characters' search for the >. # Use # to symbolise any number. "<div#>" will find <div1>, <div2> , <div468>, etc. If # is in the replacement too, the exact same number will be used in the replacement. Thus "<div#>" -> "[section #]" will produce [section 468] if the original has <div468>. ? The question mark stands for any single character, except a space. Up to ten ?s can be used in the replacement string to reproduce the character referred to by the ?s in the search-string. ~ The tilde means except. ~"<p>" "<*>" -> "" means delete everything in between angle brackets, except a case of <p>.
Use {CHR(42)} if you need to refer to *, {CHR(35)} for #, {CHR(63)} for ? and {CHR(126)} for ~.
Whole word, case Insensitive, Confirm, redundant Spaces /C stops to confirm you wish to go ahead before each change. /W does a whole word search (ensuring the alteration only happens if there's a word separator on either side) (/W "the" finds the but not other or then or bathe). /I does a case insensitive search (/I "restaurant" -> "hotel" replaces restaurant with hotel and RESTAURANT with HOTEL and Restaurant with Hotel, i.e. respecting case as far as possible). You can combine these, e.g. /IWC "the" -> "this" /S cuts out all redundant spaces. That is, it will reduce any sequence of two or more spaces to one, and it also removes some common formatting problems such as a lone space after a carriage-return or before punctuation marks such as .,; and ). /S can be used on a line of its own or in combination with other searches.
Additions (/A, /T and {v}) /A means add text. /A "Ulan" START inserts Ulan at the start, /A "Bator" END inserts Bator at the end of the text. See \wsmith4\convert.txt to see one in use. /T means add title. So /T "<title>*</title>" -> "*" looks for <title> … </title> and if it's found, inserts the wording given into the file. This will make your browser show the title at the top of the screen. {v="} means remember this and use it in another line of the conversion file when you find {v}. "26 Dec." -> "Boxing Day" {v="Xmas"} stores the reference Xmas and "1 May" -> "Mayday" {v="after Easter"} stores after Easter for use in a later line, such as "/celebration/" -> "{v}". Assuming that your text has a mention of 26 Dec. and 1 May, this example, on finding /celebration/ in the text, will put Xmas if the most recent mention in the text was 26 Dec. and after Easter if the most recent mention was 1 May.
See \wsmith4\convert.txt to see examples in use.
See also: Text Converter Contents. |