May 10th, 2013.
A new method, PRIME, an abbreviation for PRoperty Informed Model of Evolution,
which is capable of asking the question: which biochemical properties are driving
substitutions at a given site, e.g. if a site is positively selected, then which properties are being selected for / against.
This analysis is also a demostration of the redesigned datamonkey interface, extensively using standard libraries and tools,
for formatting and visualization in the browser.
February 19rd, 2013.
The FUBAR method paper has now been published in Mol Biol and Evol.
Give this much faster (can process 1000 sequences < 10 mins) and statistically more robust method than REL (or PAML) a try, or see
the papers which already cited it.
Welcome to the free public server for comparative analysis of sequence alignments using state-of-the-art statistical models.
This service is brought to you by the viral evolution group at the School Of Medicine of the University of California, San Diego.
Over its lifetime Datamonkey.org has processed 312509
analyses at a rate of 257.467 jobs/day (over the last 30 days).
Use our recommended method,
to look for evidence of both diversifying, and importantly, episodic, selection at individual sites.
Four different codon-based maximum likelihood methods,
can be used estimate the dN/dS (also known as Ka/Ks or ω) ratio at every codon in the alignment. An exhaustive discussion of each approach can be found in the
The codon-based maximum likelihood IFEL method
can investigate whether sequences sampled from a population (e.g. viral sequences from different hosts) have been subject to selective pressure at the
population level (i.e. along internal branches). A discussion of the method and its application can be found
All six methods can also take recombination into account.
This is done by screening the sequences for recombination breakpoints, identifying non-recombinant regions and allowing each to have its own phylogentic tree.
Protein sequences can be screened for evidence of directional using the
DEPS method, described
here, useful when one wants to detect convergent evolution or selective sweeps.
For coding sequences, the
TOGGLE model, developed by
Wayne Delport and colleagues,
can detect selection-driven changes that result in amino-acid toggling. A canonical example of this can be found in immune-driven evolution of HIV-1 (escape and reversion).
to look for site-specific aminoacid properties (e.g. charge, polarity) which are being preserved or modified by the evolutionary process.
For example, when a site is positively selected, evolution may be working to change side-chain volume, while maintaining polarity.
Using the modeling framework,
which allows the efficient estimations with models which permit dN/dS variation along
both sites and lineages, Datamonkey implements a test for finding lineages subject to episodic diversifying selection (EDS).:Branch-site REL method, identifies those branches where a proportion of sites evolves under EDS.
If you are primarily interested in finding which lineages (but don't care about which sites) have experienced EDS, use this method.
Deprecated in favor of Branch-site REL. The codon-based genetic algorithm GABranch method
can automatically partition all branches of the phylogeny describing non-recombinant data into groups according to dN/dS. Robust multi-model inference is used to collate results from all
models examined during the run to provide confidence intervals on dN/dS for each branch and guard against model misspecification and overfitting
PARRIS method, developed by Konrad Scheffler and colleagues, extends traditional codon-based likelihood ratio tests to detect
if a proportion of sites in the alignment evolve with dN/dS>1. The method takes recombination and synonymous rate variation into account.
ESD method, described in a 2010 paper, fits a versatile
general discrete bivariate model of site-by-site selective force variation to partition all sites into selective classes, and obtains an approximate posterior distribution of this partititoning.
The resulting "noisy" distribution of selective regimes is the evolutionary fingerprint of a gene. The EVF (evolutionary fingerprinting) module implements this procedure, and can also infer which individual sites appear to be
positively selected while accounting for parameter estimation error (analogous to the BEB methodology of the PAML package).
A Bayesian graphical model is deduced from reconstructed substitutions at each branch/site combination to infer conditional evolutionary
dependancies of sites in the alignments, i.e. whether a site is more or less likely to experience a non-synonymous substitution at a branch
when certain other sites do (or do not) experience non-synonymous substitutions at the same branch.
The SPIDERMONKEY method was introduced in the evolutionary context in our
on the evolution of the phenotypically important and highly variable V3 loop of the envelope glycoprotein in HIV-1.
Recombination leaves an imprint on sequence alignments: different segments of the alignment may be
described by different phylogenetic trees, called phylogenetic discordance. Datamonkey.org implementes two methods: SBP, suitable for answering the question "Is there evidence of
recombination in the alignment?", and GARD, that attempts to find all the recombination
breakpoints. Both method are described in this paper. The output of GARD is accepted by most other analyses, and because recombination
can mislead phylogenetic analysis that do not account for it, we strongly urge that recombination testing be done on any alignment that is going to be
analyzed for positive selection.
You can also submit a collection of HIV-1 sequences for recombination screening by a specialized recombination detection algorithm SCUEAL described in this paper.
For each type of data, nucleotide, amino-acid and codon, Datamonkey implements separate model selection procedures. An exhaustive search is performed for all possible (Markov, time-reversible) models of nucleotide
evolution. For protein data, a collection of published empirical models are fitted to the alignment and the best one is selected using AICc. Finally, for coding data, a sophisticated genetic-algorithm
procedure described in our recent paper is used to examine thousands of potential models and report the best one and various metrics based on the set of credible models - this feature is implemented in the CMS module.
The ASR module implements three different approaches to reconstructing ancestral sequences: joint, marginal and sampled - see
this paper for
a description and original methodology attribution, from simple or partitioned alignments.