Mike Scott's Web
WordSmith Tools University of Liverpool Publications Language Awareness Contact & Links
You are at: Home > WordSmith Tools > Version 3 > FAQs > Answers

Frequently Asked Questions, WordSmith 3

Installation

Where should I put WS_PART1.EXE and WS_PART2.EXE and why?

When you download WordSmith over the Internet your browser will offer you a chance to store WS_PART1.EXE and WS_PART2.EXE wherever you like. I recommend you to put them in a clean directory where there are no other files. The reason is that it will later be very easy to delete the lot to recuperate your valuable hard disk space, with no risk of deleting other files. A suitable place is c:\temp. Use this as a kind of rubbish bin where you keep files which you can later delete without worrying about whether they're important. If c:\temp does not yet exist, I suggest you "Create Directory" (File Manager in Windows 3.1), or "New Folder" (Explorer in Windows 95) now: it'll be useful for lots of purposes.

What are WS_PART1.EXE and WS_PART2.EXE and what are they for?

There are about 30 component files in WordSmith Tools, and these in their normal state after the installation process is finished take up about 4 megabytes of disk space. To speed up electronic transmission and reduce risk of files getting missed out, we have put them all into 2 compressed files, WS_PART1.EXE and WS_PART2.EXE, which total about 2 megabytes in size and either of which will fit on one floppy. You simply run these two files and they unpack themselves, creating all the 30-odd component files within them on your hard disk, usually into the same directory that WS_PART1.EXE and WS_PART2.EXE are in at the time. You will have used up about 6 MB of disk space. After installation, once you've checked everything is working properly, you will want to delete all the files in c:\temp to get this space back.

Should I keep WS_PART1.EXE and WS_PART2.EXE?

You don't need to keep them unless you really want to, as you can always download a fresh copy, which will usually have improvements, from the website which is visible in WordSmith and specified in the readme.txt file which comes with it.

What is SETUP.EXE for? I seem to have loads of SETUP.EXE files all over the place!

You do. Nearly all software comes with an accompanying file called setup.exe or install.exe. These sub-programs manage the installation for you so that the main program you've bought will work smoothly, and will for example create any necessary sub-directories, visible icons and program groups. In many cases running setup.exe will copy files into various parts of your hard disk, often without you being told about changes made to your whole system! In the case of WordSmith Tools, there is a setup.exe program which you should use to manage the installation. It will copy the relevant files to a suitable directory on your hard disk. I suggest c:\wsmith but you can easily set it to a different directory. WordSmith Tools' setup.exe does NOT alter your basic system settings or copy any extra files into hidden places at all.

So what should I do now?

After you've downloaded WS_PART1.EXE and WS_PART2.EXE, extracted the component files, and run the setup.exe which WS_PART1.EXE and WS_PART2.EXE will have extracted with the other files, you will have a complete installed copy of WordSmith in the right directory of your hard disk. Now you should run WordSmith itself, in the \wsmith directory. The main controller is called WSHELL.EXE and that's what you should run. If you created an icon, it should automatically run wshell.exe. When you run it, your version will not yet be a full one (you haven't yet given it your name or registration code, so it will "complain" that it's in demo mode and will suggest you Update from Demo.

Why don't you just use disks? It'd be a lot easier than all this hassle!

Would it? You would have to wait for the disks to arrive, for a start. Bookshops don't like handling software and cannot give good support, and even big software shops don't stock much specialised software like WordSmith Tools. Customers outside the UK (that is MOST users of WordSmith judging by the feedback I get) might have to wait a long time, depending on 2 postal systems and customs formalities. The cost of WordSmith would be higher. It wouldn't fit onto one floppy; not everybody has a cd-rom drive available. And how would we make the frequent updates available to you? I make an updated version at least twice a month. The Internet is a great way of distributing software, actually; it's not so good for distributing hardware consumer goods like tv sets but is designed for distributing information which is what software is.

I have the Oxford University Press registration code; how do I update?

There's a menu item visible in WordSmith Tools, called Update from Demo. This will run UPDATER.EXE which, as its name suggests, allows you to type in your name and registration code, converting WordSmith from demo into full operational use. Make sure you type everything in exactly as specified by OUP. If it's right, you will be told so after you've clicked on OK. And you will no longer see the menu option to update or get bothered by demo mode messages.

My registration code didn't work...

If OUP have mis-spelled your name (they do try ever so hard not to!) you should register with the mis-spelt name anyway. WordSmith Tools should work okay with the name and code as supplied to you. Then, with a working copy of WordSmith, get back to OUP and ask for a new registration code. Make sure you give them the correct name legibly! For any other WS3 registration problems contact Oxford University Press.

Does WordSmith...

Can WordSmith handle Language X?

WordSmith 3 can handle most European languages and any which use a 1-byte-per-character system. A user can define their own alphabet and own alphabetical order but there are problems in getting a Windows pc to show things correctly.
WS3 uses alphabets which can be represented in one byte (a number between 0 and 255), a system which was usual in computers until recently. With such a one-byte representation system there needs to be a "codepage" (a table of 256 characters) which contains the symbols you need.
In practice you can handle English, French, Greek, Russian, etc.

Does WordSmith tag texts?

No. You have to tag your own manually or use a tagger to tag them automatically.

Does WordSmith come with a corpus?

No. I cannot legally supply you with a body of texts. But you can easily build up your own using Internet resources. There are lots of corpora, some of which are freely accessible, others can be purchased cheaply, and others are extremely expensive. Try a google search on "test corpus". Or visit newspaper web sites.

Text Handling & Display

Accents are not displayed right!

1) Check the format of 1 or 2 of your texts in both DOS & Windows. In Windows, look at them using Notepad. You will immediately see whether the accented characters look right. If they do, you have Windows-format texts (ANSI). If they don't look right, then go to MSDOS, and then try EDIT xxx.txt -- if that works you'll see them in their DOS encarnation. Look again at the accented characters. If your pc cannot understand EDIT, then try TYPE xxx.txt -- the text will flash by fast and will be impossible to scrutinise but if there are accented characters in the last few lines of xxx.txt you will be able to see them. If they look right in one of these two ways go to 3.

2) Using Tag File 2 to convert accents on the fly should not be necessary unless your accented characters are like this: "é" "À" in the text. (If they are in this sort of format you have HTML or similar and WILL need to use Tag File 2 correctly and enough should be there in Help to guide you.)

3) Once you know which format they're in, in Text Characteristics set the Language to the right one, and the format to Windows or DOS accordingly.

4) If they're in DOS or HTML format, you could convert all your texts to the usual Windows format if you wanted, using Text Converter, but would need to know the correct codes for each conversion needed -- the codes can be found in the Appendix of a DOS or Windows manual, but it can be a pain finding them accurately. Or, one by one you might be able to convert them successfully in MS Word. It'd depend how many texts you had as to whether that'd be easy to do or not I guess.

Concord

Trying to concordance "can", it keeps giving me "can't" as well. I've told the text settings that I don't want the apostrophe to be included in the word but the only difference that seems to make is that the resulting concordance treats them as different words in the sorting (it doesn't when the apostrophe is 'included in word').

The problem is one of ambiguity. The apostrophe is used for at least 3 purposes in English (genitives, irony and quotes). Here you're concerned with how Concord recognises the end of a word, since by default if you ask for "can" you want " can ". But there are other word separators besides the space symbol, including carriage returns and punctuation symbols such as apostrophes. In other words, the apostrophe in BROTHERS' must be seen as a word separator, just like the space after SISTER or the full stop after COUSIN.

One solution -- the easiest -- is to delete the unwanted concordance lines with "can't". Sort on the search word (F6) so that they all come together, then delete them all in one go. Another is to use a stop list. This will be a bit slower than solution 1, because it forces Concord to check every occurrence of "can" to see whether it is in the stop list.

I wanted to concordance phrases like big, bold and beautiful so I entered *  ,  *  and  * but it didn't work. Why not?

Use *  *,  *  and  * (in other words put an asterisk before the comma). By default Concord assumes anything, even a comma, is a "whole word".

Can the software work with large corpora of over 200M words? Would there would be notable delays before concordance lines start appearing on a high-spec PC?

Yes. NO noticeable delays; concordance lines start appearing the millisecond they're found. You can stop at any time if you have enough. But it will take a long time to go through 200M words (= 1,200 MB if pure untagged text, on a fast pc I'd guess about 3 minutes though this depends on search-word etc. and more info is supplied on this in the Help.) See also no. 3 below.

Is the corpus indexed?

Not usually. But now it can be, if you so choose. To do lots of concordances always using the same very big corpus you'd be best advised to make an index of it.

I got a "General Protection Fault" message when running Concord.

The GPF message means that Concord tried to tread into some memory space it wasn't allowed access to. This is a pain, and even MS Word, Eudora, Windows Help and other programs occasionally do it.

Possibility 1) A GPF is most likely to happen if there is a straightforward error in the program. In the current OUP version this is not very likely in ordinary use; I would have had lots of angry messages since launch date if this usually happened! Solution: get a new version from my website and try again.

Possibility 2) A GPF could be caused by a shortage of memory whilst Concord is trying to do its job. This might happen if there were other sizeable programs such as MS Word loaded up at the same time, or else if there was something wrong with the hard disk. Windows uses the hard disk as a storage area for memory when it runs out of room on the chips in the machine. Solution: re-boot the machine so as to start with a nice fresh reset setup. Then run Scandisk to check whether the hard disk is screwed up at all and correct any faults which appear. Ensure there is at least 25 MB of room on the hard disk. Now run WordSmith again.

Possibility 3) There might be something special about the machine in question, especially if it is somehow different from the usual setup commonly found. I developed WordSmith Tools 1-3 using Pentium PCs running Win 95B, 98, NT and 2000. Things which might be special include: non-standard operating system, networked setup, unusual non-Intel CPU, very old CPU (eg. 386) or very latest CPU, laptop machine. Solution: try it on another machine if possible.

WordList

I got a "List Index Out of Bounds" Message when trying to use the Match List function.

What the error means is that when trying to build up a list of words, WS3 has done something absurd, such as trying to read item -1 or item 5 if there are only 4 in the list. That is why the list item is out of bounds. The result is the hourglass (which starts when WS3 begins to go through the list) doesn't disappear as it would if it managed to finish processing the list. Unfortunately this bug means that Match List from a text file doesn't work, though matching from a template does (explained in the Help).