Show/Hide Toolbars

WordSmith Tools Manual

The idea here is to mark up your corpus with clusters or phrases you want treated as single items.

 

You can do that in 2 ways:

 

insert _ so Los Angeles becomes Los_Angeles and New York = New_York (yellow ring)

and/or

annotate the text so Los Angeles becomes <mwu>Los Angeles</mwu>. (grey ring)

 

 

 

text_converter_MWU_window

 

For either method you'll need a phrase file which contains the items you're interested in. Mine contained just this:

 

LANY_txt

After processing (using the tag insertion method) my source text looked like this:

 

LANY_in_corpus

with <mwu> before and </mwu> after each item found.

 

Case sensitivity

Whether you choose case insensitive or sensitive, the replacement will match the case in your phrase file. If your phrase file has a lot then a case insensitive search will also find A lot or a LOT.

Original: A lot of people ...

Conversion: <mwu>a lot</mwu> of people ...

 

Handling the text now it has been modified

With method 1, you merely need to teach WordSmith that the underscore character is to be accepted as a valid character.

With method 2, you merely have to let WordSmith handle your mark-up to make a word list with clusters in single-word list.