For computing collocation strength, we can use

• | the joint frequency of two words: how often they co-occur, which assumes we have an idea of how far away counts as "neighbours". (If you live in London, does a person in Liverpool count as a neighbour? From the perspective of Tokyo, maybe they do. If not, is a person in Oxford? Heathrow?) |

• | the frequency word 1 altogether in the corpus |

• | the frequency of word 2 altogether in the corpus |

• | the span or horizons we consider for being neighbours |

• | the total number of running words in our corpus: total tokens |

Mutual Information

Log to base 2 of (A divided by (B times C))

where

A = joint frequency divided by total tokens

B = frequency of word 1 divided by total tokens

C = frequency of word 2 divided by total tokens

MI3

Log to base 2 of ((J cubed) times E divided by B)

where

J = joint frequency

F1 = frequency of word 1

F2 = frequency of word 2

E = J + (total tokens-F1) + (total tokens-F2) + (total tokens-F1-F2)

B = (J + (total tokens-F1)) times (J + (total tokens-F2))

T Score

((X divided by total tokens) - X) divided by (square root of (J))

where

J = joint frequency

F1 = frequency of word 1

F2 = frequency of word 2

X = F1 times F2

Z Score

(J - E) divided by the square root of (E times (1-P))

where

J = joint frequency

S = collocational span

F1 = frequency of word 1

F2 = frequency of word 2

P = F2 divided by (total tokens - F1)

E = P times F1 times S

Dice Coefficient

(J times 2) divided by (F1 + F2)

where

J = joint frequency

F1 = frequency of word 1 or corpus 1 word count

F2 = frequency of word 2 or corpus 2 word count

Ranges between 0 and 1.

Log Likelihood

based on Oakes p. 170-2.

2 times (

a Ln a + b Ln b + c Ln c + d Ln d

- (a+b) Ln (a+b)

- (a+c) Ln (a+c)

- (b+d) Ln (b+d)

- (c+d) Ln (c+d)

+ (a+b+c+d) Ln (a+b+c+d)

)

where

a = joint frequency

b = frequency of word 1

c = frequency of word 2

d := frequency of pairs involving neither w1 nor w2

and "Ln" means Natural Logarithm

See also: this link from Lancaster University, Mutual Information

**Page url:**
http://www.lexically.net/downloads/version5/HTML/?formulae.htm