How does TOGGLE infer selection?
Phase 1: Nucleotide model maximum likelihood (ML) fit
A nucleotide model (any model from the time-reversible class can be chosen) is fitted to the data and tree
(either NJ or user supplied) using maximum likelihood to obtain branch lengths and substitution rates. If the input alignment contains multiple segments,
base frequencies and substitution rates are inferred jointly from the entire alignment, while branch lengths are fitted to each segment separately.
The "best-fitting" model can be determined automatically by a model selection procedure or chosen by the user.
Phase 2: Codon model ML fit
Holding branch lengths and subsitution rate parameters constant at the values estimated in Phase 1, a codon model
obtained by crossing MG94 and the nucleotide model of Phase 1 is fitted to the data to obtain a global ω=dN/dS ratio.
Phase 3: ML ancestral sequence reconstuction using SLAC
Utilizing parameter estimates from Phases 1 and 2, codon ancestral sequences are reconstructed site by site using maximum likelihood, in such a way as to maximize the likelihood of the data at the site over all possible ancestral character states. Here you will have the option to select the sites to be tested for toggling, either manually, or using pre-defined site-specific statistics. All invariable codon sites will be excluded as will sites which only have synonymous substitutions. At this stage we permit a maximum of 50 sites per analysis.
Phase 4: Inference of toggling at each site
For each site selected in Phase 3 above the test of toggling is conducted. Briefly, for each of the twenty potential wild-type amino acids a test of escape and reversion from/to the wildtype is conducted. Toggling sites are reported as are some summary statistics: the likelihood ratio test statistic, p-value, toggling rate and the proportion of time each site "spends" in the wild-type, single-step and multiple-step escape amino acid states. See the methodology paper
for more information.
Selecting which sites to check for Toggling
Previously, we have found that sites not detected by standard diversifying selection methods (SLAC, FEL, REL) are good targets for Toggling inference. This is since the method was developed to detect selection associated with low amino acid diversity at a site. This is what we would expect from directional selection that involved a few target residues. We have therefore provided some scripts to reduce the number of sites being tested. In the case of the counts and proportions of branches with non-synonymous substitutions the threshold is a lower threshold, whereas in the case of amino acid diversity it is an upper threshold. Ideal toggling sites may be those that have several branches with non-synonymous substitutions but which only involve a few target residues. Alternatively, you can select sites to test based on known host-immune pathogen interactions.
UCSD Viral Evolution Group 2004-2014