@@ -988,18 +988,46 @@ sets of samples (see also the note in {meth}`~TreeSequence.divergence`).
988988##### One-way
989989
990990The two-locus summary functions all take haplotype counts and sample set size
991- as input. Each of our summary functions has the signature
991+ as input. Suppose that at the first site there are alleles
992+ {math}` (a_1, a_2, ...) ` , and at the second site there are alleles
993+ {math}` (b_1, b_2, ...) ` . For a pair of focal alleles {math}` a_i ` and
994+ {math}` b_j ` , we define two-locus counts
995+ {math}` (n(a_i,b_j), n(a_i,\sim b_j), n(\sim a_i, b_j)) ` , where
996+ {math}` n(a_i,b_j) ` is the number of two-locus haplotypes in the sample set that
997+ carry both alleles {math}` a_i ` and {math}` b_j ` ,
998+ {math}` n(a_i,\sim b_j) ` is the number that carry the allele {math}` a_i `
999+ and do not carry the allele {math}` b_j ` , and
1000+ {math}` n(\sim a_i, b_j) ` is the number that carry the allele {math}` b_j `
1001+ and do not carry the allele {math}` a_i ` . That is,
1002+ {math}` n(\sim a_i, b_j) = \sum_{k\not=i} n(a_k, b_j) ` , and
1003+ {math}` n(a_i, \sim b_j) = \sum_{l\not=j} n(a_i, b_l) ` .
1004+
1005+ We informally refer to focal alleles as {math}` A,B ` and the above sets of
1006+ haplotypes as {math}` (AB, Ab, aB) ` , so that {math}` Ab ` refers to the set
1007+ of all haplotypes {math}` (a_i, \sim b_j) ` and {math}` aB ` refers to
1008+ {math}` (\sim a_i, b_j) ` .
1009+ Their counts are labeled similarly: {math}` n_{AB} = n(A,B) ` ,
1010+ {math}` n_{Ab} = n(A, \sim B) ` , and {math}` n_{aB} = n(\sim A, B) ` .
1011+ Then each of our summary functions has the signature
9921012{math}` f(n_{AB}, n_{Ab}, n_{aB}, n) ` , converting to haplotype frequencies
993- {math}` \{p_{AB}, p_{Ab}, p_{aB}\} ` by dividing by {math}` n ` . Below,
1013+ {math}` \{p_{AB}, p_{Ab}, p_{aB}\} ` by dividing by the number {math}` n ` of
1014+ samples in the sample set. Then
9941015{math}` n_{ab} = n - n_{AB} - n_{Ab} - n_{aB} ` , {math}` n_A = n_{AB} + n_{Ab} `
9951016and {math}` n_B = n_{AB} + n_{aB} ` , with frequencies {math}` p ` found by dividing
9961017by {math}` n ` .
9971018
998- Our convention is to use {math}` A,B ` to denote derived alleles, and {math}` a,b `
999- ancestral alleles (or other alleles, if the site is multi-allelic). For
1000- polarised statistics, we average statistics over all non-ancestral alleles. For
1001- unpolarised statistics, the labeling is arbitrary as we average over all
1002- alleles (derived and ancestral).
1019+ For polarised statistics, we compute the statistic using all pairs of
1020+ non-ancestral alleles as focal alleles: so, we do not compute the summary
1021+ function with haplotype counts for which the focal alleles are the ancestral
1022+ allele at either of the two loci.
1023+ For unpolarised statistics, we compute the summary function over all
1024+ pairs of alleles. Thus, for polarised statistics, the summary function is
1025+ called {math}` (n_1-1)\times(n_2-1) ` times, where {math}` n_1 ` and {math}` n_2 `
1026+ are the total number of alleles at the first and second locus, respectively.
1027+ For unpolarised statistics, the summary function is called {math}` n_1 n_2 `
1028+ times. The result is then averaged over the results computed for
1029+ each pair of focal alleles, using the specified weighting approach for a
1030+ given summary function.
10031031
10041032` D `
10051033: {math}` f(n_{AB}, n_{Ab}, n_{aB}, n) = p_{AB}p_{ab} - p_{Ab}p_{aB} \, (=p_{AB} - p_A p_B) `
0 commit comments