Text Converter Syntax

  Previous topic Next topic JavaScript is required for the print function  

 

The syntax for a Conversion File is:

 

Only lines beginning / or " are used. Others are ignored completely.

Every string for conversion is of the form "A" -> "B". That is, the original string, the one you're searching for, enclosed in double quotes, is followed by a space, a hyphen, the > symbol, and the replacement string.

You can use " (double quotes), hyphen and > where you like without any need to substitute them, but for obvious reasons there must not be a sequence like " -> " in your search or replace string.

 

Removing all tags

To remove all tags, choose "<*>" -> "" as your search string.

 

Control Codes

Control codes can be symbolised like this: {CHR(xxx)} where xxx is the number of the code. Examples: {CHR(13)} is a carriage-return, {CHR(10)} is a line-feed, {CHR(9)} is a tab, {CHR(12)} is a printer form-feed. To represent <Enter> which comes at the end of paragraphs and sometimes at the end of each line, you'd type {CHR(13)}{CHR(10)} which is carriage-return followed immediately by line-feed.

Use {CHR(34)} if you need to refer to double inverted commas.

 

Wildcards (*,}}},?,# and ~)

*        You can use the asterisk as a wildcard. Thus "<*>" -> "" will delete any single-word string in < > brackets from your text.

}}}        This is used to mean any string at all whether containing one word or more. "<head }}}/head>" will delete any string starting "<head " and ending "/head>", even if there are hundreds of characters between them. The default search distance is 1,000 characters, with a maximum of 25,000. As deleting a lot of text can get rid of more text than you expect if the text is not properly marked up in the first place, you will probably need to over-ride the default search distance by specifying it in brackets, e.g. "<head }}}(100)/head>".}}} or * may not be the first or last symbol between the double quotation marks in the search-string.

#        Use # to symbolise any number. "<div#>" will find <div1>, <div2> , <div468>, etc. If # is in the replacement too, the exact same number will be used in the replacement. Thus "<div#>" -> "[section #]" will produce [section 468] if the original has <div468>.

?        The question mark stands for any single character, except a space. Up to ten ?s can be used in the replacement string to reproduce the character referred to by the ?s in the search-string.

~        The tilde means except. ~"<p>" "<*>" -> "" means delete everything in between angle brackets, except a case of <p>.

 

 

Use {CHR(42)} if you need to refer to *, {CHR(35)} for #, {CHR(63)} for ? and {CHR(126)} for ~.

 

 

Whole word, case Insensitive, Confirm, redundant Spaces, redundant <Enter>s

/C stops to confirm you wish to go ahead before each change.

/W does a whole word search (ensuring the alteration only happens if there's a word separator on either side) (/W "the" finds the but not other or then or bathe).

/I does a case insensitive search (/I "restaurant" -> "hotel" replaces restaurant with hotel and RESTAURANT with HOTEL and Restaurant with Hotel, i.e. respecting case as far as possible).

You can combine these, e.g.

/IWC "the" -> "this"

/S cuts out all redundant spaces. That is, it will reduce any sequence of two or more spaces to one, and it also removes some common formatting problems such as a lone space after a carriage-return or before punctuation marks such as .,; and ). /S can be used on a line of its own or in combination with other searches.

/E cuts out all redundant <Enter>s. That is, it will reduce any sequence of two or more carriage-return+line-feeds (what you get when you press Enter or Return) to one. /E can be used on a line of its own or in combination with other searches.

 

 

See Documents\wsmith5 \convert.txt to see examples in use.

 

See also: Text Converter Contents.

Page url: http://www.lexically.net/wordsmith/step_by_step_Chinese/?converter_syntax_info.htm