# ---------------------- inputs ------------------------
set.seed(123) # for reproducibility
<- 100 # number of people in each population
N <- 100 # number of generations
generations <- 1000 # number of populations
pops <- 0.5 # starting allele frequency
p
# ------------- genetic drift simulation ---------------
<- matrix(nrow = generations+1, ncol = pops) # initialize matrix for generations
freq 1,] <- rep(p, pops) # each population has allele frequency of 0.5 to start
freq[
for (i in 2:nrow(freq)) {
# sample from a binomial distribution where the number of trials is the
# number alleles in the population with the probability of success being
# the allele frequency from the prior generation and then we divide by the
# total number of alleles to get the frequency
<- rbinom(pops, 2*N, freq[i-1,])/(2*N)
freq[i,]
}
# --------------------- results --------------------
# plot
<- c(brewer.pal(8, "Dark2"),brewer.pal(3, "Set1"))
colors <- sample(1:101, 10, replace = FALSE)
rs matplot(0:100,
freq[,rs],type = "l",
lty = 1,
col = colors,
xlab = "Generation",
ylab = "Allele Frequency",
main = "Random sample of 10 Populations")
abline(h = p, lwd = 2, lty = "dashed")
Genetic Drift
Abstract
What is genetic drift? Essentially, it is the concept that allele frequencies within populations change (or drift) over time due to random chance alone. Assume there are 100 people (N=100), and each person can carry either allele A or allele B for some gene we care about. Since humans are diploid, each person has two copies of this gene. This gives us 200 (2N) alleles to track within the population. Let’s look at what happens to allele frequency over 100 generations in 1000 populations, all starting with the same initial allele frequency of 50% (p = 0.5).
Note 1: Allele frequencies become fixed when they reach 1 (meaning the allele is present in all individuals) or 0 (meaning the allele is lost completely).
Note 2: Although the average allele frequency across populations stays near the initial value of 0.5, the variance increases with each generation. This reflects the randomness of genetic drift, which causes different populations to diverge over time.
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.0
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Denver
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-3 DT_0.33 tibble_3.3.0
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.5 knitr_1.50 rlang_1.1.6
[5] xfun_0.52 jsonlite_2.0.0 glue_1.8.0 htmltools_0.5.8.1
[9] sass_0.4.10 rmarkdown_2.29 crosstalk_1.2.1 evaluate_1.0.4
[13] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
[17] compiler_4.3.1 htmlwidgets_1.6.4 pkgconfig_2.0.3 rstudioapi_0.16.0
[21] digest_0.6.37 R6_2.6.1 pillar_1.10.2 magrittr_2.0.3
[25] bslib_0.9.0 tools_4.3.1 cachem_1.1.0