Lset

This command sets the parameters of the likelihood model. The likelihood function is the probability of observing the data conditional on the phylogenetic model. In order to calculate the likelihood, you must assume a model of character change. This command lets you tailor the biological assumptions made in the phylogenetic model. The correct usage is

lset <parameter>=<option> ... <parameter>=<option>

For example, "lset nst=6 rates=gamma" would set the model to a general model of DNA substition (the GTR) with gamma-distributed rate variation across sites.

Options:

Applyto -- This option allows you to apply the lset commands to specific partitions. This command should be the first in the list of commands specified in lset. Moreover, it only makes sense to be using this command if the data have been partitioned. A default partition is set on execution of a matrix. If the data are homogeneous (i.e., all of the same data type), then this partition will not subdivide the characters. Up to 30 other partitions can be defined, and you can switch among them using "set partition=<partition name>". Now, you may want to specify different models to different partitions of the data. Applyto allows you to do this. For example, say you have partitioned the data by codon position, and you want to apply a nst=2 model to the first two partitions and nst=6 to the last. This could be implemented in two uses of lset:

lset applyto=(1,2) nst=2

lset applyto=(3) nst=6

The first applies the parameters after "applyto" to the first and second partitions. The second lset applies nst=6 to the third partition. You can also use applyto=(all), which attempts to apply the parameter settings to all of the data partitions. Importantly, if the option is not consistent with the data in the partition, the program will not apply the lset option to that partition.

Nucmodel -- This specifies the general form of the nucleotide substitution model. The options are "4by4" [the standard model of DNA substitution in which there are only four states (A,C,G,T/U)], "doublet" (a model appropriate for modelling the stem regions of ribosomal genes where the state space is the 16 doublets of nucleotides), and "codon" (the substitution model is expanded around triplets of nucleotides--a codon).

Nst -- Sets the number of substitution types: "1" constrains all of the rates to be the same (e.g., a JC69 or F81 model); "2" allows transitions and transversions to have potentially different rates (e.g., a K80 or HKY85 model); "6" allows all rates to be different, subject to the constraint of time-reversibility (e.g., a GTR model).

Code -- Enforces the use of a particular genetic code. The default is the universal code. Other options include "vertmt" for vertebrate mitocondrial DNA, "mycoplasma", "yeast", "ciliates", and "metmt" (for metazoan mitochondrial DNA except vertebrates).

Ploidy -- Specifies the ploidy of the organism. Options are "Haploid" or "Diploid". This option is used when a coalescence prior is used on trees.

Rates -- Sets the model for among-site rate variation. In general, the rate at a site is considered to be an unknown random variable.

The valid options are:

* equal -- No rate variation across sites.
* gamma -- Gamma-distributed rates across sites. The rate at a site is drawn from a gamma distribution. The gamma distribution has a single parameter that describes how much rates vary.
* adgamma -- Autocorrelated rates across sites. The marginal rate distribution is gamma, but adjacent sites have correlated rates.
* propinv -- A proportion of the sites are invariable.
* invgamma -- A proportion of the sites are invariable while the rate for the remaining sites are drawn from a gamma distribution.

Note that MrBayes versions 2.0 and earlier supported options that allowed site specific rates (e.g., ssgamma). In versions 3.0 and later, site specific rates are allowed, but set using the "prset ratepr" command for each partition.

Ngammacat -- Sets the number of rate categories for the gamma distribution. The gamma distribution is continuous. However, it is virtually impossible to calculate likelihoods under the continuous gamma distribution. Hence, an approximation to the continuous gamma is used; the gamma distribution is broken into ncat categories of equal weight (1/ncat). The mean rate for each category represents the rate for the entire cateogry. This option allows you to specify how many rate categories to use when approximating the gamma. The approximation is better as ncat is increased. In practice, "ncat=4" does a reasonable job of approximating the continuous gamma.

Nbetacat -- Sets the number of rate categories for the beta distribution. A symmetric beta distribution is used to model the stationary frequencies when morphological data are used. This option specifies how well the beta distribution will be approximated.

Omegavar -- Allows the nonsynonymous/synonymous rate ratio (omega) to vary across codons. Ny98 assumes that there are three classes, with potentially different omega values (omega1, omega2, omega3): omega2 = 1; 0 < omega1 <1; and omega3 > 1. Like the Ny98 model, the M3 model has three omega classes. However, their values are less constrained, with omega1 < omega2 < omega3. The default (omegavar = equal) has no variation on omega across sites.

Covarion -- This forces the use of a covarion-like model of substitution for nucleotide or amino acid data. The valid options are "yes" and "no". The covarion model allows the rate at a site to change over its evolutionary history. Specifically, the site is either on or off. When it is off, no substitutions are possible. When the process is on, substitutions occur according to a specified substitution model (specified using the other lset options).

Coding -- This specifies how characters were sampled. If all site patterns had the possibility of being sampled, then "all" should be specified (the default). Otherwise "variable" (only variable characters had the possibility of being sampled), "noabsence" (characters for which all taxa were coded as absent were not sampled), and "nopresence" (characters for which all taxa were coded as present were not sampled. "All" works for all data types. However, the others only work for morphological (all/variable) or restriction site (all/variable/noabsence/nopresence) data.

Parsmodel -- This forces calculation under the so-called parsimony model described by Tuffley and Steel (1998). The options are "yes" or "no". Note that the biological assumptions of this model are anything but parsimonious. In fact, this model assumes many more parameters than the next most complicated model implemented in this program. If you really believe that the parsimony model makes the biological assumptions described by Tuffley and Steel, then the parsimony method is miss-named.

Default model settings:

Parameter    Options                               Current Setting
------------------------------------------------------------------
Nucmodel     4by4/Doublet/Codon                    4by4
Nst          1/2/6                                 1
Code         Universal/Vertmt/Mycoplasma/
             Yeast/Ciliates/Metmt                  Universal
Rates        Equal/Gamma/Propinv/Invgamma/Adgamma  Equal
Ngammacat    <number>                              4
Nbetacat     <number>                              5
Omegavar     Equal/Ny98/M3                         Equal
Covarion     No/Yes                                No
Coding       All/Variable/Noabsencesites/
             Nopresencesites All
Parsmodel    No/Yes                                No
------------------------------------------------------------------

Return to Help Menu.