This command specifies the actual data for the phylogenetic analysis. The character matrix should follow the dimensions and format commands in a data block. The matrix can have all of the characters for a taxon on a single line:
begin data; dimensions ntax=4 nchar=10; format datatype=dna gap=-; matrix taxon_1 AACGATTCGT taxon_2 AAGGAT--CA taxon_3 AACGACTCCT taxon_4 AAGGATTCCT ; end;
or be in "interleaved" format:
begin data; dimensions ntax=4 nchar=20; format datatype=dna gap=- interleave=yes; matrix taxon_1 AACGATTCGT taxon_2 AAGGAT--CA taxon_3 AACGACTCCT taxon_4 AAGGATTCCT taxon_1 TTTTCGAAGC taxon_2 TTTTCGGAGC taxon_3 TTTTTGATGC taxon_4 TTTTCGGAGC ; end;
Note that the taxon names must not have spaces. If you really want to indicate a space in a taxon name (perhaps between a genus and species name), then you might use an underline ("_"). There should be at least a single space after the taxon name, separating the name from the actual data on that line. There can be spaces between the characters.
If you have mixed data, then you specify all of the data in the same matrix. Here is an example that includes two different data types:
begin data; dimensions ntax=4 nchar=20; format datatype=mixed(dna:1-10,standard:21-30) interleave=yes; matrix taxon_1 AACGATTCGT taxon_2 AAGGAT--CA taxon_3 AACGACTCCT taxon_4 AAGGATTCCT taxon_1 0001111111 taxon_2 0111110000 taxon_3 1110000000 taxon_4 1000001111 ; end;
The matrix command is terminated by a semicolon.
Finally, just a note on data presentation. It is much easier for others to (1) understand your data and (2) repeat your analyses if you make your data clean, comment it liberally (using the square brackets), and embed the commands you used in a publication in the mrbayes block. Remember that the data took a long time for you to collect. You might as well spend a little time making the data file look nice and clear to any that may later request the data for further analysis.