Formulae |
Top Previous Next |
Reference > formulae For computing collocation strength, we can use
Mutual Information
Log to base 2 of (A divided by (B times C)) where A = joint frequency divided by total tokens B = frequency of word 1 divided by total tokens C = frequency of word 2 divided by total tokens
MI3
Log to base 2 of ((J cubed) times E divided by B) where J = joint frequency F1 = frequency of word 1 F2 = frequency of word 2 E = J + (total tokens-F1) + (total tokens-F2) + (total tokens-F1-F2) B = (J + (total tokens-F1)) times (J + (total tokens-F2))
Z Score
(J - E) divided by the square root of (E times (1-P)) where J = joint frequency S = collocational span F1 = frequency of word 1 F2 = frequency of word 2 P = F2 divided by (total tokens - F1) E = P times F1 times S
Log Likelihood based on Oakes p. 170-2. 2 times ( a Ln a + b Ln b + c Ln c + d Ln d - (a+b) Ln (a+b) - (a+c) Ln (a+c) - (b+d) Ln (b+d) - (c+d) Ln (c+d) + (a+b+c+d) Ln (a+b+c+d) ) where a = joint frequency b = frequency of word 1 c = frequency of word 2 d := frequency of pairs involving neither w1 nor w2 and "Ln" means Natural Logarithm
See also: this link from Lancaster University, Mutual Information |