Text Converter: converting BNC XML version

 

The British National Corpus is a valuable resource but has certain problems as it comes straight off the cdrom:

 

it is in Unix format

it has entities like é to represent characters like é

its structure is opaque and file-names mean nothing

 

You will find it much easier to use if you

 

convert it to Unicode

filter the files to make a useful structure

 

as explained at http://lexically.net/wordsmith/Handling_BNC/index.html

 

The easiest way to do that is in two stages.

 

Conversion:

 

BNC_XML_conversion_choosing_texts

After choosing the texts,

 

BNC_XML_conversion

and when you press OK you'll be asked something like this

BNC_XML_conversion_confirm

After the work is done you will see the BNC texts copied to a similar structure (in our case stemming from j:\temp)

 

BNC_texts_copied_1

BNC_texts_copied_2

BNC_texts_copied_3

 

Filter

 

Choose the converted texts in the first window:

 

BNC_XML_filter_choosing_texts

de-activate conversion,

BNC_XML_filter_deactivate conversion

and choose filtering like this:

 

BNC_XML_filter_settings

Eventually you should get folder structures like this:

 

BNC_XML_filtered

 

 

 

Click the Permalink button if you want to copy a link to this page.