Convert Text File Format

Top  Previous  Next

Utility Programs > Text Converter > Convert Text File Format

To convert a series of whole text files from one format to another, choose between these options:

 

Text_converter_into_Unicode

 

These formats allow you to convert into formats which will be suited to text processing. (UTF8, a format which was devised for many languages some years ago when disk space was limited and character encoding was problematic, is generally not suitable. That's because it uses a variable number of bytes to represent the different characters. A to Z will be only 1 byte but for example Japanese characters may well need 2, 3 or even more bytes to represent one character.)

 

DOS to Windows:

... choose the "codepage" that your old DOS texts were encoded with, eg. DOS 850 Multilingual.

 

Unix to Windows:

... Unix-saved texts don't use the same codes for end-of-paragraph as Windows-saved ones.

 

into Unicode:

.... this is a better standard than ANSI as it allows many more characters to be used, suiting lots of languages. This is UTF16 Unicode, 2 bytes for each character.

 

from MS Word .doc

... like using "Save as Text" in Word.

 

HTML/BNC entities to characters

... converts symbols which are hard to read such as é to ones like é

 

from column tagged, using <> except column

... The Stuttgart Tree Tagger produces output like this:

 

word          pos          lemma 

The          DT          the 

TreeTagger          NP          TreeTagger 

is          VBZ          be 

easy          JJ          easy 

to          TO          to 

use          VB          use 

.          SENT          . 

 

If you set the column to 1, Text Converter will convert this to

 

The<DT><the> TreeTagger<NP><TreeTagger> is<VBZ><be> easy<JJ><easy> to<TO><to> use<VB><use> .<SENT><.>

 

Lemmatised using ...

... converts each file using a lemma file. Where if your source text has "she was tired" and your lemma file has BE -> AM, WAS, WERE, IS, ARE, then you will get "she be tired" in your converted text file. Where your source text has "Was she tired?" you'll get "Be she tired?"