Show/Hide Toolbars

WordSmith Tools Manual

The idea here is to mark up your corpus with clusters or phrases you want treated as single items.


You can do that in 2 ways:


insert _ so Los Angeles becomes Los_Angeles and New York = New_York (yellow ring)


annotate the text so Los Angeles becomes <mwu>Los Angeles</mwu>. (grey ring)






For either method you'll need a phrase file which contains the items you're interested in. Mine contained just this:



After processing (using the tag insertion method) my source text looked like this:



with <mwu> before and </mwu> after each item found.


Case sensitivity

Whether you choose case insensitive or sensitive, the replacement will match the case in your phrase file. If your phrase file has a lot then a case insensitive search will also find A lot or a LOT.

Original: A lot of people ...

Conversion: <mwu>a lot</mwu> of people ...


Handling the text now it has been modified

With method 1, you merely need to teach WordSmith that the underscore character is to be accepted as a valid character.

With method 2, you merely have to let WordSmith handle your mark-up to make a word list with clusters in single-word list.