Genetic Drift

Author

Shane E. Ridoux

Abstract
What is genetic drift? Essentially, it is the concept that allele frequencies within populations change (or drift) over time due to random chance alone. Assume there are 100 people (N=100), and each person can carry either allele A or allele B for some gene we care about. Since humans are diploid, each person has two copies of this gene. This gives us 200 (2N) alleles to track within the population. Let’s look at what happens to allele frequency over 100 generations in 1000 populations, all starting with the same initial allele frequency of 50% (p = 0.5).
# ---------------------- inputs ------------------------
set.seed(123) # for reproducibility
N <- 100 # number of people in each population
generations <- 100 # number of generations
pops <- 1000 # number of populations
p <- 0.5 # starting allele frequency

# ------------- genetic drift simulation ---------------
freq <- matrix(nrow = generations+1, ncol = pops) # initialize matrix for generations
freq[1,] <- rep(p, pops) # each population has allele frequency of 0.5 to start

for (i in 2:nrow(freq)) {
# sample from a binomial distribution where the number of trials is the
# number alleles in the population with the probability of success being
# the allele frequency from the prior generation and then we divide by the
# total number of alleles to get the frequency
  freq[i,] <- rbinom(pops, 2*N, freq[i-1,])/(2*N) 
}

# --------------------- results --------------------
# plot
colors <- c(brewer.pal(8, "Dark2"),brewer.pal(3, "Set1"))
rs <- sample(1:101, 10, replace = FALSE)
matplot(0:100,
        freq[,rs],
        type = "l",
        lty = 1,
        col = colors,
        xlab = "Generation",
        ylab = "Allele Frequency",
        main = "Random sample of 10 Populations")
abline(h = p, lwd = 2, lty = "dashed")

Note 1: Allele frequencies become fixed when they reach 1 (meaning the allele is present in all individuals) or 0 (meaning the allele is lost completely).

Note 2: Although the average allele frequency across populations stays near the initial value of 0.5, the variance increases with each generation. This reflects the randomness of genetic drift, which causes different populations to diverge over time.

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.0

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Denver
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-3 DT_0.33            tibble_3.3.0      

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.5         knitr_1.50        rlang_1.1.6      
 [5] xfun_0.52         jsonlite_2.0.0    glue_1.8.0        htmltools_0.5.8.1
 [9] sass_0.4.10       rmarkdown_2.29    crosstalk_1.2.1   evaluate_1.0.4   
[13] jquerylib_0.1.4   fastmap_1.2.0     yaml_2.3.10       lifecycle_1.0.4  
[17] compiler_4.3.1    htmlwidgets_1.6.4 pkgconfig_2.0.3   rstudioapi_0.16.0
[21] digest_0.6.37     R6_2.6.1          pillar_1.10.2     magrittr_2.0.3   
[25] bslib_0.9.0       tools_4.3.1       cachem_1.1.0