of Massachussets, Boston:
Invasive Species and Plant Databases:
Bioinformatics Molecular Evolution Tools:
Education Resources :
Population Genetic and Evolution Tools :
General Interest Web Sites:
The Genetic Software Forum http://rannala.org/gsf
Provides a forum for posting questions and answers about the use of software for genetic analysis including (but not limited to) PAML, MrBayes, Immanc, BayesAss, etc. We have recently made improvements to the site by implementing the phpbb forum interface. Postings from the old forum have been preserved at http://rannala.org/gsf/oldgsf/gsf.
Bruce Rannala email@example.com
Phylogenetic Software Web site: 194 packages listed by method, computer system etc.
Simulations in Population Genetics:
PopG version 3.0 simulates evolution of a single locus in the presence of natural selection, mutation, migration, and genetic drift. The program is free with executables and its source code and compilation support files available. No special permissions or licenses are needed to run multiple copies of the program in a class.
(note that this is an ftp: rather than an http: link).
Joe Felsenstein firstname.lastname@example.org
HETERO: A program to simulate the evolution of DNA on a four-taxon tree
The program allows the user to specify the lineage-specific nucleotide substitution models used in the simulation, together with information on the ancestral sequence, and the order and timing of the divergence events. Hetero has a simple user-interface and output, making it equally useful in the teaching and research of phylogenetics.
Hetero directs its output to two files and to the monitor. One file contains the alignments of simulated nucleotide sequences; they are stored in the sequential PHYLIP format, allowing them to be analyzed directly using the phylogenetic programs from the PHYLIP program package. The other file contains the details used in order to do the simulation, and information collected during the simulation that may be used to obtain a better understanding of molecular evolution as a dynamic process and to understand why phylogenetic methods do not always perform as well as expected.
A description of Hetero is published in Applied Bioinformatics 2:159-163 .
Hetero is available for academic use from:
Dr Lars S Jermiin email@example.com
MrBayes is a program for the Bayesian estimation of phylogeny .
Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.
The program takes as input a character matrix in a NEXUS file format. The output is several files with the parameters that were sampled by the MCMC algorithm. MrBayes can summarize the information in these files for the user. The program features include:
* Ability to analyze nucleotide, amino acid, restriction site, and morphological data;
* Mixing of data types; molecular and morphological characters, in a single analysis;
* A general method for assigning parameters across data partitions;
* An abundance of evolutionary models,
* Estimation of positively selected sites in a fully Bayesian framework;
* Linking or unlinking model parameters such as state frequencies, gamma shape, overall rate, topology, and branch lengths across partitions;
* Estimation of model likelihoods for Bayesian model choice;
* The ability to spread jobs over a cluster of computers using MPI
The program is available free, for academic use only.
John Huelsenbeck firstname.lastname@example.org
Microsatellite Analyzer 3.0
The new version fixes some bugs of earlier versions:
1) Problems with distance calculations D1, delta mu squared, Dad. Note that the previous versions incorrectly estimated the distances=20 based on PCR product size rather than repeat length 2) Dc per locus was calculated incorrectly
New features: 1) estimates for the sampling variance of gene diversity 2) calculates theta from gene diversity assuming a stepwise mutation=20 behavior
Mesquite 1.0 has been released formally at
Mesquite is open-source software for evolutionary analysis. It has modules for phylogenetic interpretations of character evolution (likelihood, parsimony, comparative methods), simulations (sequence evolution, coalescence, speciation), multivariate analysis (PCA, CVA), and other analyses (e.g., parametric bootstrap, compositional bias, tree comparisons, randomizations, cluster analysis). Molecular, morphological and continuous-valued data can be edited and analyzed. Its interface is graphical and interactive, with charts and phylogenetic visualizations. Mesquite operates on Windows,
Linux/Unix, and the Mac OS (9 and X). An outline of features (with screenshots) is given at
We welcome your feedback about how Mesquite can be improved for use in both research and teaching. If you might be interested in building your own modules for Mesquite , or contributing to the coding of the core Mesquite classes, please contact us.
Other sites of interest:
Tree of Life: http://tolweb.org
Wayne Maddison email@example.com
David R. Maddison firstname.lastname@example.org
Hickory , a software package
Uses Bayesian methods for analysis of geographical structure in genetic
data. The major changes in this release are:
* Analyses of co-dominant marker data with an arbitrary number of alleles per locus
* In addition to the GUI version that was distributed last time, we are including pre-compiled command-line executables that will be useful for those who need to do large analyses in the background or who want to do exploratory simulations.
* Documentation has been expanded and describes the use of Spiegelhalter et al.'s (2002) Deviance Information Criterion as a method of choosing among alternative models.
* Bayes factor calculations are no longer reported. We were not able to find a way to make them numerically stable.
We are releasing executables and source code for Hickory under terms of the GNU General Public License, version 2.0.
Kent E. Holsinger email@example.com
Sequence alignments and haplotype
I have been looking for a small program or webservice which would take an alignment or set of sequences, compare them all and output a "haplotype” which is often used in population studies. I only found Collapse 1.1 for MAC (by D. Posada) which doesn't work well for all the non-mac users. Hence I wrote a small service myself, it's available at the following address:
Palle Villesen, | www.birc.dk - www.palle.ninja.dk
Three programs for testing Hardy Weinberg and other population parameters.
They require MsWindows 95+ (or a good emulator). They do not require setup, do not change the windows registry, are really simple to install (just unzip the files to a directory of your choice).
ABOestimator : a program to estimate ABO allele frequencies
Program uses several methods (heuristic and maximum likelihood), and test the hardy-weinberg assumption. some example data sets are provided, and you can also use your own data. comes with a help file.
HWpower : a program to help study the power of chi-square tests --
pearson's x2 and g (log-likelihood ratio) only -- of the hardy-weinberg hypothesis for an autosomal gene with two alleles. wahlund, inbreeding and selection (as well as hardy-weinberg itself) can be set as the true hypothesis. comes with a simple help file and some
X2Calculator : calculates the cdf and its complement, inverse cdf, and
pdf of the chi-square distribution.
May be useful to replace standard tables.
Pedro J.N. Silva Pedro.Silva@fc.ul.pt
GIMLET: is a windows based program to analyse data from individual identification using microsatellite data. Different tasks are available: identification and pooling of genotypes, construction of consensus genotypes and estimation of error rates, estimation of allele frequencies and heterozygozities, calculation of probabilities of identity, estimation of population size from genotypes.
A new version of GIMLET (version 1.3.2.) is now available.
Valiere et al.GIMLET, a computer program for analysing genetic individual=
identification data. Molecular Ecology Notes (2002) 2:377-379.
GEMINI (windows based program that simulates population studies using microsatellites genotyping and allows the determination of the best strategy to adopt, especially in the case where genotyping errors are introduced.) is always avalaible free at
Valiere et al. Molecular Ecology Notes (2002) 2:83-86.
Nathaniel Valiere firstname.lastname@example.org
DnaSP, version 4.0 DNA Sequence Polymorphism,
A software package for PC-Windows that performs extensive population genetics analyses from DNA sequence data. MAIN NEW FEATURES
Possibility to define set of sequences. Store that information in NEXUS files.
Increase the length of the DNA sequence to be analyzed (until ~5 Mb)
Extensive Coalescent Simulation analyses
Estimation of Gene Flow between and among populations
Testing for Genetic Differentiation (Permutation test)
Analysis of preferred and unpreferred synonymous codons. Possibility to define synonymous codon preference tables. Store codon preferences information in NEXUS
Estimation of new test statistics (Fay and Wu's H; Rozas et al.'s ZZ; Ramos-Onsins and Rozas R2; and more). Analysis of Fay and Wu's H by sliding window
Analysis of Pi(a)/Pi(s) and Ka/Ks ratios by the sliding window method
New Predefined Genetic Codes.
Windows 98, NT, 2000, XP, Macinthosh, Linux
INPUT Data Files: NBRF/PIR, MEGA, NEXUS, FASTA, PHYLIP
For academic uses, DnaSP is distributed free of charge.
Rozas, J.et al. (2003). DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-2497.
Julio Rozas email@example.com
FSTAT a program for the analysis of population structure
I have just uploaded on the web a new version of FSTAT
(126.96.36.199, Feb 2002) with a small bug fix and new features:
A bug was fixed in the pairwise test of differentiation procedure
Added relatedness, Ho and Hs as potential statistic to test in the BiasDisp menu.
-Vertical and horizontal scrollbars were added, allowing to scroll through FSTAT window when the display resolution is lower than 1024X768.
-Moved the Progress bar (which appears when randomisations are carried out) to the top of FSTAT window.
DATAMONKEY: Detecting selection on individual amino acid sites
Interested in looking for positive selection in sequence data? Like maximum likelihood, but tired of waiting? Worried about synonymous variation in your dataset? Worry no more! We are proud to announce the launch of DataMonkey A heavily modified and efficiently implemented version of the method of Suzuki and Gojobori (Molecular Biology And Evolution, 1999, 16:1315-1328), which detects positive and negative selection on individual amino acid sites. In brief, the method involves counting the number of nonsynonymous and synonymous substitutions that have occurred at each site over its evolutionary history, based on a maximum likelihood ancestral reconstruction obtained using a codon-based substitution model.
a web-based interface to
HyPhy a multi-platform, free phylogenetic analysis program
Sergei L. Kosakovsky Pond & Simon D.W. Frost
MEGA (Molecular Evolutionary Genetics Analysis) software project: useful methods of comparative sequence analysis . The first version of the MEGA computer program was released in 1993 and has become widely used in diverse areas of molecular biology and evolution. We have made a special effort to design the user-interface to retain the ease-of-use that researchers have come to identify with MEGA . The user interface is objective-driven, with two common threads running through all program and user-interface design: (1) to ask the user required information on the need-to-know basis, and (2) to eliminate the need to save data subsets or results to files unless the user specifically needed. For this reason we have provided input data and output result explorers. This has resulted in a clutter-free work environment. Another important decision was to curtail the temptation to over-populate MEGA2 with statistical methods. If we were to attempt that, MEGA2 would not have seen daylight for many more years. In the future, we plan to build on the solid base of analysis methods included in this version and the selection of input data to further enhance the versatility of MEGA2 . http://www.megasoftware.net/text/downloads.sht
MEGA2 comes with on-line help showing how to use different aspects of the MEGA2 user-interface. Extensive details of statistical and computational methods available in MEGA2 are presented in the book "Molecular Evolution and Phylogenetics" (Nei and Kumar, Oxford University Press, 2000). This book explains various statistical methods for analyzing molecular data and shows how to interpret the results obtained by various computer programs. The book´s website (http://lifesciences.asu.edu/mep/) for use in research and teaching.
DAMBE: Data Analysis in Molecular Biology and Evolution I ntroduces biologists to DAMBE, a proprietary, user-friendly computer program for molecular data analysis. The unique combination of this book and software will allow biologists not only to understand the rationale behind a variety of computational tools in molecular biology and evolution, but also to gain instant access to these tools for use in their laboratories. Data Analysis in Molecular Biology and Evolution serves as an excellent resource for advanced level undergraduates or graduates as well as for professionals working in the field.
Xuhua Xia firstname.lastname@example.org
BAPS(Bayesian Analysis of Population Structure) is a Windows based program for Bayesian inference of population genetic structure.
BAPS treats both the allele frequencies of the molecular markers and the number of populations as random variables. The posterior distributions of the population structure and the relative allele frequencies are estimated jointly. The program performs an exact
Bayesian analysis by enumerative calculation when the number of populations is 10 or less. For more than 10 populations is a Markov Chain Monte Carlo algorithm used. Based on the posterior distribution of the structure parameters, a measure of uncertainty regarding the specified populations is obtained for all pair-wise comparisons (i.e. the probability that the gene frequencies of two populations are the same). BAPS also prints summary statistics of the FST parameter to text files. Details of the methods can (soon) be found in Corander, J., Waldmann, P. and M.J. Sillanp. 2003. Bayesian analysis of
genetic differentiation between populations. Genetics, in press.
Correspondence to: email@example.com
Patrik Waldmann <Patrik.Waldmann@djingis.se>
TREEFINDER There is a more realistic model for protein coding sequences, which is assuming independent rates at the three codon positions. The relative codon position rates
can be estimated from the data. Codon position heterogeneity can be combined
orthogonally with the usual Gamma heterogeneity and all the substitution models
up to the GTR. TREEFINDER can now perform ML bootstrap analysis to assess the confidence of inferred relationships. As a special feature, it has a consensus routine that is
producing trees with edge lengths. The possibility of specifying topological constraints lets one incorporate ideas about the prior distribution of phylogenetic trees to direct the search. TREEFINDER is available for Windows, Linux and Mac.
Gangolf Jobb firstname.lastname@example.org
Structure (version 2.1) for analyzing genetic population structure . The new version
includes some updates to the basic algorithm, as well as a user-friendly graphical front-end. Structure is designed for studying population structure using multilocus genotype data from each of a sample of individuals. The program implements a model-based clustering method for identifying the presence of distinct populations represented in the sample. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Applications include detecting the presence of population structure, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and association mapping in the presence of population structure.
http://pritch.bsd.uchicago.edu (click on "software").
Distruct: Users may also be interested in the program 'distruct' written by Noah
Rosenberg for plotting Structure results (see Science 298:2381).
The details are in a paper by Falush, Stephens and Pritchard (Genetics, in Press; available from the Structure website).
Jonathan Pritchard email@example.com
A suite of parentage analysis software
PrDM Software This program calculates the probability of detecting a multiple mating (fertilization) in a sample of offspring. The model assumes single-sex
multiple mating (i.e., polygyny or polyandry) and therefore all offspring in a brood are either full-sibs or half-sibs. The criterion used to detect the multiple mating is three or more paternal alleles in the set offspring. Journal of Fish Biology 61: 739-750 Assessing the statistical power of genetic analyses to detect multiple mating in fish Neff and Pitcher
FMM Software This program calculates the frequency of multiple mating (fmm) in a population based on a sample of nests. It places the algorithm developed by Neff and Pitcher for calculating the probability of detecting a multiple mating (PrDM) into a Bayesian framework. The program outputs the probability distribution associated with each fmm from 0 to 100% in increments of 1%. From the distribution the expected fmm is calculated as well as any desired confidence interval. The model assumes single-sex
multiple mating (i.e., polygyny or polyandry). The criterion used to detect the multiple mating is three or more paternal alleles in the set offspring. Journal of Heredity 93: 406-414 A Bayesian model for assessing the frequency of multiple mating in nature Neff et al.
Two-Sex Paternity Software This program calculates the paternity (or maternity) of a putative parent to a sample of next generation individuals (NGIs) following the equations
outlined in Neff, Repka and Gross (2000a,b) and Neff (2001). The program is run by double clicking the "2sexpat.exe" file (from windows explorer, for example). The program requires three input files outlined below, and outputs the most-likely paternity, the expected paternity and the 95% confidence interval in the paternity estimate.
Molecular Ecology 9: 515-528, 9: 529-539 Parentage analysis with incomplete sampling of candidate parents and offspring Bryan D. Neff et al.
Bryan D. Neff firstname.lastname@example.org
Trevor Pitcher email@example.com
Estimating outrcrossing rates. Kermit Ritland's is the standard of the field. MLTR accommodates codominant data. Kermit has another program, MLDT, which is intended for dominant data. Both programs are apparently pretty identical, except for the data type. They can be downloaded from:
Nested Clade Analysis (NCA) for characterizing phylogeographic patterns. A new inference key associated with the nested clade analysis (NCA) is now available at the GeoDis 2.0 website. The explanations associated with the new key are discussed in the upcoming publication: Templeton, A. R. (2004). "Statistical phylogeography: methods of evaluating and minimizing inference errors." Molecular Ecology In press. This paper will be published as part of a special issue on phylogeography soon to be published in Molecular Ecology.
Keith A. Crandall" <firstname.lastname@example.org>
Nei 1975 book "Molecular Population Genetics and Evolution,"
North Holland and Elsevier, can now be printed from his web site
Click the "Books" section. You will then find it.
NeEstimator software estimates effective population sizes (Ne) from allele frequency data. The user can estimate Ne using any of the three internal methods or three third party programs. Genotypes from a sample of the population are used as input. The user provides this data in GENEPOP, ARLEQUIN or simple column (eg. saved as a tab delimited text file from Microsoft Excel) format. The three internal methods are as follows.
*A point estimation method using linkage/gametic disequilibrium, (Hill, 1981).
*A point estimation method using heterozygote excess (Pudovkin, et al. 1996).
*A temporal method using moments based F-statistics (Krimbas and Tsakas, 1971; Nei and Tajima, 1981; Pollock, 1983 or Waples, 1989). The elapsed number of generations between temporal samples is required.
The three third party programs that NeEstimator is able to utilise are as follows.
* A temporal method using a Bayesian based approach called TM3 (http://www.rubic.rdg.ac.uk/~mab/software.html).
* A temporal method using a maximum likelihood based approach called MCLEEPS (http://www.stat.washington.edu/thompson/Genepi/Mcleeps.shtml).
* A temporal method using a pseudo likelihood approach called MLNE (http://www.zoo.cam.ac.uk/ioz/people/wang.htm).
Jenny Ovenden email@example.com
MICRO-CHECKER, is Windows-based, and can test the genotyping of microsatellite data from diploid populations. The program aids the identification of various genotyping errors, and can also detect typographic errors. MICRO-CHECKER
estimates the frequency of null alleles at a locus using a series of algorithms. If null alleles are detected, MICRO-CHECKER can also adjust allele and genotype frequencies of the amplified alleles, which allows the data to be used in further population genetic analysis, for instance with GenePop, Arlequin or Fstat.
Cock van Oosterhout C.van-Oosterhout@hull.ac.uk
Tracer v1.0 a graphical program for analysing the output of Bayesian MCMC software including our program BEAST and the popular MrBayes.
BEAST - http://evolve.zoo.ox.ac.uk/beast/
MrBayes - http://morphbank.ebc.uu.se/mrbayes/
It can plot the traces, estimate autocorrelation, plot posterior densities and give confidence intervals. It can also compare and combine output from multiple runs.
It can be used to look for convergence, select burn-ins and check for adequate chain length. For each parameter it can estimate the Effective Sample Size (ESS) - the
number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to. It can produce publication quality output either as SVG graphics or by Printing to PDF if available (i.e., Mac OS X).
Note that this program analyses the continuous parameters of the models (i.e., not the trees) - for MrBayes this means the '.p' files and for BEAST the '.log' files.
Versions of this software are available for Mac OS X, Linux, Unix and Windows:
Andrew Rambaut & Alexei Drummond: firstname.lastname@example.org
Software programs that can simulate the changes in the level of population genetic diversity (allelic diversity, heterozygostiy).
1) See example Bohonak, et al. 2001. Invasion genetics of New World medflies: testing alternative colonization scenarios. Biological Invasions 3:103-111 and software: http://www.bio.sdsu.edu/pub/andy/ESP.html
2) METAPOP program. It simulates the evolution of the genetic diversity in a metapopulation. You can specify the number of populations, population size, migration rate, mating system, number of loci etc. See e.g. Le Corre, Machon, Petit & Kremer. 1997. Colonization with long distance seed dispersal and genetic structure of maternally inherited genes in forest trees: a simulation study. Genet. Res. 69:117-125
3) VORTEX by Bob Lacy. Population Viability Analysis
Related Links for conservation biology, risk analysis and inbreeding analyses can be obtained from Bob Lacy's web site at VORTEX
4) A population genetics program which can do that, it is available at:
Depending on your exact needs, you might wish to use a slightly modified version. One option might be a version where you enter the seeds for the random number generator manually, so that you can replicate simulations with the same seeds but different number of generations. Alternatively you might wish to use a version which provides genotypes at different generations. Note that depending on the complexity of your problem it might also be solved analytically.
5) Here's something that might fit your need (DGS-9D.EXE): http://www.ntnu.no/~jmork/jmork/software/softE.html
6) Populus (D. Alstad) and Popgen ( J. Aspi & J. Lumme).
7) Geneloss ( England , P.R. & Osler, G.H.R. (2001). GENELOSS : a computer program for simulating the effects of population bottlenecks on genetic diversity. Molecular Ecology Notes 1, 111-113.
Software that can detect clonally related individuals in a population sample (of otherwise sexuals) and estimate their frequencies. Preferably usable for AFLP dominant markers, and with possibilities for adjusting for scoring errors. I need the software for a study of populations Poa pratensis with both sexual and apomictic reproduction.
1) Software FaMoz can calculate the identity probability (for codominant or dominant loci), which gives you, according to the number of loci considered, the number of identical pairs of individuals. It does not answer directly to your question, but if you have many clonally propagated individuals, the identity probabilities are expected to be high !
2) Using co-dominant markers (microsatellites), MLGsim (described in Mol =
Ecol Notes a couple of issues back) can detect the probability that two identical genotypes have not been produced sexually. Afica Gomez (Gomez & Carvalho Mol Ecol 2000) did something along similar lines - the software is not published,
3) By hand on an Excel file with each allele at each locus on a single column and each individual in a line. The sum of each allele is equal between identical genotypes. Then sorting using the sum scores should allow to see the repeated genotypes. There is also a a software called Multilocus (Agapow and Burt) it computes the multilocus genotypic diversity (among many other things), but crashes your machine when more than one sample is to be analysed (need to analyse each sample one by one).
4) You might take a look at Stenberg et al 2003a,b. Molecular Ecology Notes 3: 329-331; Biology and Evolution, 20, 1626-1632. http://www.molbiol.umu.se/forskning/saura/software.htm
They describe a program that might be of use, or they might have some ideas to help (their email address is in the article). Another approach would be simulation (such as described by Ceplitis 2001. Evolution 55:1581-1591), although I think this would just give you frequencies (or probabilities) and not point to individual cases of clonal reproduction.
5) When we've analyzed clonal structure, we just did it the hard way by identifying identical genotypes and then calculating the likelihood that they'd share a multi-locus genotype by chance attached. One possible complication would be the degree self-fertilization (or other inbreeding) as this would make the production of identical genotypes via sexual reproduction more likely. Burke et al. 2000. Evolution 54(1): 137-144 Douhovnikoff and Dodd (2003) Theor. App Gen. 106: 1307-1315.
Calculating relatedness from a pedigree, not from molecular markers .
1) Many responses referred me to software that estimates relatedness using markers. Goodnight software packages (MAC) at http://www.gsoftnet.us/GSoft.html
2) A group of programs by Bob Lacy and collaborators, available through
Not particularly easy to use if the data aren't in standard zoo (studbook) format.
3) SAS PROC INBREED, which also looks useful.
4) There are several packages developed for animal breeders and the like
that look as though they are also likely to be useful in this context.
The relevant URLs that I have are:
a) PEDSYS: http://www.sfbr.org/sfbr/public/software/software.html
b) 10 programes for pedigree analysis http://dga.jouy.inra.fr/sgqa/diffusions/pedig/pedigE.htm
c) Lineage: http://www.ansci.cornell.edu/lineage/download.html
Identify pedigree errors. http://qtl.well.ox.ac.uk/GRR/
d) What did I actually use? Joe Felsenstein was kind enough to send me a short C program that does almost exactly what I wanted. Note that this is @2003 by J. Felsenstein. Permission is granted to copy and use this code, but it may not be resold or
incorporated in commercial packages without permission.
PHASE (2.0) - a software package for estimating haplotypes from population genotype data: http://www.stat.washington.edu/stephens/software.html .
Several improvements on the previous version of PHASE, including: a) the introduction of a new computational approach, resulting in much faster haplotype resolution. b) the introduction of a new model that allows for recombination and decay of Linkage Disequilibrium (LD) with distance, which results in more accurate haplotype estimates. This model also allows the user to estimate recombination rates, and identify recombination hotspots from population genotype data., c) the facility to perform a test for haplotype frequency differences between cases and controls, d) more extensive output summarising the results. Matthew Stephens email@example.com
MultiLocus - analysis of multi-locus population genetic data. Multilocus is a program written to facilitate analysis of multi-locus population genetic data. In particular, it allows calculation of various genotypic diversity indices, linkage disequilibrium indices,
masures of population differentiation, and allows one to search for subpopulations which do not share polymorphisms (and thus might be reproductively isolated). It includes a modification of the IA (Index of Association) metric that corrects the scaling effect seen when one applies the metric to different numbers of loci. In addition, there are
randomization routines which allow one to test various null hypotheses.
Dr Paul-Michael Agapow firstname.lastname@example.org
WinPop 2.0 , http://evol.biology.mcmaster.ca/paulo/soft.php
Paulo Nuin email@example.com
SplitsTree by Daniel Huson's excellent program may be interested in a Mac OS X port of the latest version (3.2) of the program. The port provides the same graphical user interface available to Windows and Linux users, but it runs natively under Mac
OS X. The port is available from http://darwin.zoology.gla.ac.uk/~rpage/macosx/xsplits/
and requires Tcl/Tk Aqua to be installed on your machine (the web site listed
above has details on how to do this).
TREEFINDER now has an utility to compute rate profiles along sequence
alignments, which might be useful if one is investigating selective forces on DNA sequences. Another new tool can calibrate phylogenetic trees in time using Michael
J. Sanderson's nonparametric rate smoothing method (NPRS). It outputs a chronogram and also a ratogram with estimates of absolute evolutionary rates. Finally, the new version comes with a more informative documentation on the TL programming interface.
Gangolf Jobb firstname.lastname@example.org
Ape (analysis of phylogenetics and evolution). Ape is written in R, a language and environment for statistics and graphics. Ape provides functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, analyses of diversification and macroevolution, computing distances from allelic and nucleotide data, reading nucleotide sequences, and several tools such as Mantel's test, computation of minimum spanning tree, or the population parameter theta based on various approaches.
Emmanuel Paradis: email@example.com
Seq-Gen is a program that will simulate the evolution of nucleotide sequences along a phylogeny , using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. Nucleotide frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally
tractable) models of DNA sequence evolution.
Andrew Rambaut, EMAIL - firstname.lastname@example.org
Softwares to analyse relatedness/kinship.
- FSTAT http://www2.unil.ch/izea/softwares/fstat.html
- MaRQ http://www.genetics.forestry.ubc.ca/ritland/programs.html
- IDENTIX http://www.Univ-montp2.fr/~genetix/identix01.zip
- SPAGEDI http://www.ulb.ac.be/sciences/lagev/spagedi.html
- DELRIOUS http://www.zoo.utoronto.ca/stone/DELRIOUS/delrious.htm
- MER http://www.zoo.cam.ac.uk/ioz/software.htm
Blouin, M.S. DNA-based methods for pedigree reconstruction and kinship
analysis in natural populations. TREE 18(10):503-511.
QTL Express : http://qtl.cap.ed.ac.uk/
User-friendly web-based QTL mapping package, and the addition of new modules. We now cater for the following population structures: Comments to: email@example.com
PHYLOGENTIC INDEPENDENCE version 2.0 , is a program written to conduct
Test For Serial Independence (TFSI) on continuously valued characters and the Runs Test on discretely valued characters. The TFSI and Runs test are designed to test the assumption of phylogenetic independence within a set of comparative data, i.e., test whether a trait is significantly associated with its phylogenetic history.
Ehab Abouheif, (1999: Evolutionary Ecology Research 1: 895-909)
Binary characteristics in comparative studies.
Compare genome size (continuous variable) with life history and breeding system (both coded as binary variables). I used the computer programs
1) Continuous by Mark Pagel: http://sapc34.rdg.ac.uk/meade/Mark/
which is based on the generalised least squares method and
2) CAIC (Comparative Analysis using Independent Contrasts)by Andy Purvis: http://www.bio.ic.ac.uk/evolve/software/caic/
which is a program using the independent contrast method I strongly recommend comparing different approaches because both methods have their assumptions that cannot be easily met. Both authors were very helpful answering questions about the program.
BRUNCH option in CAIC can do this http://www.bio.ic.ac.uk/evolve/software/caic/index.html
3) COMPARE www.compare.bio.indiana.edu
4) Felsenstein, J. 1985. Phylogenies and the comparative method. American Naturalist 125:1-15.
5) PDAP http://www.biology.ucr.edu/people/faculty/Garland/PDAP.html
Relatedness & Kinship (for Mac only) allows exploring subgroups and for relatedness calculations either among alleles frequencies of the studied (sub-)population or other imposed alleles frequencies of, for example, complete population.
Identix 1.1 (for PC)
The =AB paper =BB is there:
Software to estimate relatedness with three methods
Queller & Goodnights (1989), Lynch & Ritlands (1999) and Identity. This=20
program implements a resampling procedure (allelic or genotypic) in order=20
to test the mean relatedness coefficient of population and its variance=20
against the null hypothesis of panmixy. But this software can only take=20
account of the own alleles frequencies of the studied sample (without=20
possibility to put reference alleles frequencies to weight the calculation).
Interesting Questions and Answers off the Web.
How to calculate a Mantel test when you have empty cells in the matrices . As requested, I've compiled the advice I received:
PASSAGE : "The program PASSAGE is available for free download at
where you can find contact information for the author, Dr. Michael S. Rosenberg. He may be able to address the issue of missing data if the user's manual does not.
EXCEL add-in : The PopTools add-in for Excel will also do Mantel's test, is available for free download at www.cse.csiro.au/poptools
By Greg Hood. If the help menu isn't helpful regarding missing data, you might give the author an email at firstname.lastname@example.org
GenAlEx : The program is easy to use for PCA, AMOVAs, genetic distances and other simple population genetic work is GenAlEx (Genetic Analysis in Excel). It free to download: www.anu.edu.au/BoZo/GenAlEx
ECOSIM : http://homepages.together.net/~gentsmin/ecosim.htm
can transform your matrices in vectors (as if they were regular variables), run a regression analysis and look at the correlation coefficients tab. This program will perform a random permutation of your data (like a Mantel program would), but it will not care whether there are empty cells in the matrices (provided that both matrices have the same number of non-missing cells).
NTSYS : can handle missing data during matrix comparison. Maybe this can help. You can put a certain value (e.g. 9999) as a code for missing data. Also with NTSYS it is possible to do normalization as well as permutations.
Should you use Fst/(1-Fst) or Fst for mantel test, when looking for isolation by distance?
Using Fst / (1 – Fst) is based on a formal population genetics model , under which you actually expect a relationship with distance (see Rousset Genetics 145: 1219-28; and Handbook of statistical genetics, ed by Balding et al 2001).
Fst is distributed between 0 and 1, whereas Fst/(1-Fst) potentially varies between 0 and infinity . When you are doing correlations, upper or lower bounds are a nuisance, and in this way you at least get rid of the former and you are regressing against geographic distance, which also goes to infinity.
The term isolation by distance (ibd) is used in two ways , both loosely and strictly. In many recent papers, I see people loosely call ibd any correlation between genetic and geographic distance. In these cases, a Mantel test is Ok. Strictly speaking, though, (which means in the sense of Malècot-Morton and Kimura-Weiss), ibd is the product of genetic drift leading populations to diverge, and short-range gene flow leading them to converge genetically. Both in models where populations are continuously (Malècot-Morton) or discontinuously (Kimura-Weiss) distributed, that combination of drift and gene flow causes an asymptotic decline of genetic resemblance with distance. In other words, kinship between populations decreases up to a certain distance, and then it goes to zero.
In practice, the two different definitions do make a difference , because (1) under the former, but not under the latter any genetic gradient or cline (including those determined by long-range gene flow, range expansion and the like) are taken as evidence of ibd, and (2) over an area much greater than the dispersal distance of the species you are studying,
drift and gene flow may generate local clines but not a general correlation between genetic and geographic distances. The relevant literature is wide, but among the oldies Cavalli-Sforza and Wijsman (1984) Annu. Rev. Ecol. Syst. 15:279-301, Kimura & Weiss (1964) Genetics 49:561-576 and Slatkin (1989) Genome 31:196-202.
Specify whether your populations are in 1-dimensional arrays or 2-d arrays . If they are essentially 1-dimensional (like along a river) you should regress against geographic distance, but if you consider your populations to be in 2-d arrays, you should use its log transform. Rousset covers this as well- that and Slatkin's 1993 (Evolution 47: 264-79) and 1985 (Ann Rev Ecol Syst) also Hutchinson , D.W. and A.R. Templeton. 1999, Evolution 53: 1898-914.
Examine scatterplots and try a variety of metrics, log-transformed or not . For the IBD slope, then the most important consideration is that the plot meets the standard assumptions of regression (the relationship is linear, residuals are not skewed, etc.
See software at http://www.bio.sdsu.edu/pub/andy/IBD.html
How to test if the diversity values are significantly different ?
Dr. Mike Weale (University College London) has just developed a program to the testing of h "test_h_diff" http://www.ucl.ac.uk/tcga/software/index.html
1) I did a t-test using microsoft Excel to test statistical significance of difference at genetic diversity parameters between populations. The SAS program can also do a significance test for you.
2) To compare means, Nei (1987) suggests simply the paired t-test. For microsats you could perhaps do it straight on but e.g. for allozymes it is good to consider arcsine transformation. Tapio et al. (2003) Comparison of microsatellite and blood protein diversity in sheep: inconsistencies in fragmented breeds. Mol. Ecol. 12, 2045-2056.
3) The basic technique consists in a Wilcoxon paired (by locus) test, but if you have four populations this will make 6 comparisons with a corrected level of significance of 0.008 (very conservative but necessary). Depending on the question you might pool some samples to compare two groups of interest. If not then you can do a Friedman analysis of variance (for three entries.
4) This is a problem I have grappled with for some time. It is remarkably difficult to find a robust test for the difference in two h values for the following reasons.
a) Permutation tests are not valid. These operate under the null hypothesis of no difference in haplotype frequencies, which is not the same as no difference in h values.
b) Bootstrapping methods are biased. Under this method, you resample assuming that observed haplotype frequencies exactly equate to real population haplotype frequencies. This leads to a bias of n/(n-1) between the mean h of the bootstrap samples and the expected h of the real population. If h is close to 1, this bias difference, even for moderately large n (20<n<100), can be so large that the expected real h of the population falls outside of the 95% confidence interval on the bootstrapped h.
c) The sampling distribution of h is skewed, especially for values close to 1. Thus, although unbiased formulae exist for the mean and variance of this distributution, Z-tests based on these formulae may fall foul of the departure from assumed Normality.
Thomas MG et al (2002) Founding mothers of Jewish communities:Geographically separated Jewish groups were independently founded by very few female ancestors. Am. J. Hum. Gen. 70:1411-1420.
5) In regard to allelic richness, I refer you to the following paper: Leberg, PL (2002) Estimating allelic richness: effects of sample size and bottlenecks. Molecular Ecology 11:2445-2449.SAS routine that you can use to perform statistical tests.
6) If it's DNA sequences diversity that you want to test for significant differences, take a look at Innan, H. and F. Tajima, 2002 A statistical test for the difference in amounts of DNA variation between two populations. Genetical Research ( Cambridge ) 80: 15-25
7) if you have the variance of your estimations, you can do a ANOVA.
Bonferroni correction for multiple tests. I have been doing association studies with the most polymorphic expressed genetic system, the HLA loci. Here is a webpage on statistical analysis of HLA association studies with particular emphasis on multiple comparisons issue and an extensive discussion of this issue with a lot of references. http://home.att.net/~dorak/stat.html
Software to develop primers.
1) LASERGENE - program called " DNA star ".
2 ) Primer3 at http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi
3) Oligoperfect Designer , http://www.invitrogen.com/content.cfm?pageid=9716
4) OligoAnalyzer at http://biotools.idtdna.com/analyzer/
5) Oligo (a bit pricy) http://www.oligo.net/
6) Amplify (mac only): http://engels.genetics.wisc.edu/amplify/
7) GeneRunner program http://www.generunner.com/
8) Oligowiz server at: http://www.cbs.dtu.dk/services/OligoWiz/
Developed in the context of micro array probes, but used for other oligo design.
9) Primo primer design used that for degenerate primer design
note that you must install a new version of clustalw to get it to work through bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html
(clustalw for windows XP - copy it to bioedit/apps folder:
10) readseq on the web a lot - a sequence conversion service
- - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -