In WordSmith you need plain text files, such as you get if you save a Word .doc as Plain Text (.txt). The text format should be ASCII or ANSI or Unicode (UTF16).

 

Any Word .doc or .docx files will look crossed out and should not be used: convert them to .txt first.

Text files within .zip files can be used; they will be coloured red in the Files available display but WordSmith can read them and get the texts you select within them.

 

Why not .PDF files?

Don't choose .pdfs either, they have a very special format. Essentially a PDF is a set of instructions telling a printer or browser where to place coloured dots. The plain text is usually hard  to extract even if you use Adobe Acrobat (and Adobe invented the format).

 

Why not .DOC files?

A .DOC is rather unsuitable even if it does contain the text: this is what a .DOC containing only the word hello looks like in Word, then opened up in Notepad, then the .PDF of the same.

 

hello_Word_doc

 

hello_doc_notepad

 

 

hello_pdf_notepad

Check the format is OK

In the file-choose window you can test the format of the texts you've chosen with the Test File Format (unicode) button.

Click the Permalink button if you want to copy a link to this page.