Gene history correlations

Correlation of gene histories in the human genome determines the patterns of genetic variation (haplotype structure) and is crucial to understanding genetic factors in common diseases. We derive closed analytical expressions for the correlation of gene histories in established demographic models for genetic evolution and show how to extend the analysis to more realistic (but more complicated) models of demographic structure.

We identify two contributions to the correlation of gene histories in divergent populations: linkage disequilibrium, and differences in the demographic history of individuals in the sample. These two factors contribute to correlations at different length scales: the former at small, and the latter at large scales.

a In DNA, genetic information is encoded by the sequence of the four nucleic acids adenine (A), thymine (T), guanine (G), and cytosine (C). In a sample of three individuals, three polymorphic sites are shown. b The most common variation is a difference at a single position (single-nucleotide polymorphism or SNP), caused by a mutation at one position. The three mutations in panel a are shown as filled circles in a genealogy of the three individuals (blue). Mutation 4 does not cause a polymorphism in the sample, since all individuals in the sample inherit the mutation from the common ancestor. \textbf{c} In recombination, one of the two copies of a chromosome is inherited from one parent and the rest from the other parent. A sample gene history with one recombination event is shown, for two loci (a and b).

