This command is used in a data block to define the format of the character matrix. The correct usage is
format datatype=<name> ... <parameter>=<option>
The format command must be the second command in a data block. The following provides an example of the proper use of this command:
begin data; dimensions ntax=4 nchar=10; format datatype=dna gap=-; matrix taxon_1 AACGATTCGT taxon_2 AAGGAT--CA taxon_3 AACGACTCCT taxon_4 AAGGATTCCT ; end;
Here, the format command tells MrBayes to expect a matrix with DNA characters and with gaps coded as "-".
The following are valid options for format:
Datatype -- This parameter MUST BE INCLUDED in the format command. More-over, it must be the first parameter in the line. The datatype command specifies what type of characters are in the matrix. The following are valid options:
Datatype = Dna: DNA states (A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N)
Datatype = Rna: DNA states (A,C,G,U,R,Y,M,K,S,W,H,B,V,D,N)
Datatype = Protein: Amino acid states (A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V)
Datatype = Restriction: Restriction site (0,1) states
Datatype = Standard: Morphological (0,1) states
Datatype = Continuous: Real number valued states
Datatype = Mixed(<type>:<range>,...,<type>:<range>): A mixture of the above datatypes. For example, "datatype=mixed(dna:1-100,protein:101-200)" would specify a mixture of DNA and amino acid characters with the DNA characters occupying the first 100 sites and the amino acid characters occupying the last 100 sites.
Interleave -- This parameter specifies whether the data matrix is in interleave format. The valid options are "Yes" or "No", with "No" as the default. An interleaved matrix looks like
format datatype=dna gap=- interleave=yes;
matrix
taxon_1 AACGATTCGT
taxon_2 AAGGAT--CA
taxon_3 AACGACTCCT
taxon_4 AAGGATTCCT
taxon_1 CCTGGTAC
taxon_2 CCTGGTAC
taxon_3 ---GGTAG
taxon_4 ---GGTAG
;
Gap -- This parameter specifies the format for gaps. Note that gap character can only be a single character and that it cannot correspond to a standard state (e.g., A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N for nucleotide data).
Missing -- This parameter specifies the format for missing data. Note that the missing character can only be a single character and cannot correspond to a standard state (e.g.,A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N for nucleotide data). This is often an unnecessary parameter to set because many data types, such as nucleotide or amino acid, already have a missing character specified. However, for morphological or restriction site data, "missing=?" is often used to specify ambiguity or unobserved data.
Matchchar -- This parameter specifies the matching character for the matrix. For example,
format datatype=dna gap=- matchchar=.;
matrix
taxon_1 AACGATTCGT
taxon_2 ..G...--CA
taxon_3 .....C..C.
taxon_4 ..G.....C.
;
is equivalent to
format datatype=dna gap=-;
matrix
taxon_1 AACGATTCGT
taxon_2 AAGGAT--CA
taxon_3 AACGACTCCT
taxon_4 AAGGATTCCT
;
The only non-standard NEXUS format option is the use of the "mixed", "restriction", "standard" and "continuous" datatypes. Hence, if you use any of these datatype specifiers, a program like PAUP* or MacClade will report an error (as they should because MrBayes is not strictly NEXUS compliant).