frestml |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Restriction site maximum Likelihood methodDescription
Estimation of phylogenies by maximum likelihood using restriction sites data (not restriction fragments but presence/absence of individual sites). It employs the Jukes-Cantor symmetrical model of nucleotide change, which does not allow for differences of rate between transitions and transversions. This program is very slow.Algorithm
This program implements a maximum likelihood method for restriction sites data (not restriction fragment data). This program is one of the slowest programs in this package, and can be very tedious to run. It is possible to have the program search for the maximum likelihood tree. It will be more practical for some users (those that do not have fast machines) to use the U (User Tree) option, which takes less run time, optimizing branch lengths and computing likelihoods for particular tree topologies suggested by the user. The model used here is essentially identical to that used by Smouse and Li (1987) who give explicit expressions for computing the likelihood for three-species trees. It does not place prior probabilities on trees as they do. The present program extends their approach to multiple species by a technique which, while it does not give explicit expressions for likelihoods, does enable their computation and the iterative improvement of branch lengths. It also allows for multiple restriction enzymes. The algorithm has been described in a paper (Felsenstein, 1992). Another relevant paper is that of DeBry and Slade (1985).The assumptions of the present model are:
- Each restriction site evolves independently.
- Different lineages evolve independently.
- Each site undergoes substitution at an expected rate which we specify.
- Substitutions consist of replacement of a nucleotide by one of the other three nucleotides, chosen at random.
Note that if the existing base is, say, an A, the chance of it being replaced by a G is 1/3, and so is the chance that it is replaced by a T. This means that there can be no difference in the (expected) rate of transitions and transversions. Users who are upset at this might ponder the fact that a version allowing different rates of transitions and transversions would run an estimated 16 times slower. If it also allowed for unequal frequencies of the four bases, it would run about 300,000 times slower! For the moment, until a better method is available, I guess I'll stick with this one!
Subject to these assumptions, the program is an approximately correct maximum likelihood method.
Usage
Here is a sample session with frestml
% frestml Restriction site maximum Likelihood method Input file: restml.dat Phylip tree file (optional): Phylip restml program output file [restml.frestml]: numseqs: 1 Adding species: 1. Alpha 2. Beta 3. Gamma 4. Delta 5. Epsilon Output written to file "restml.frestml" Tree also written onto file "restml.treefile" Done. |
Go to the input files for this example
Go to the output files for this example
Command line arguments
Restriction site maximum Likelihood method Version: EMBOSS:6.3.0 Standard (Mandatory) qualifiers: [-data] discretestates File containing one or more sets of restriction data [-intreefile] tree Phylip tree file (optional) [-outfile] outfile [*.frestml] Phylip restml program output file Additional (Optional) qualifiers (* if not always prompted): -weights properties Weights file -njumble integer [0] Number of times to randomise (Integer 0 or more) * -seed integer [1] Random number seed between 1 and 32767 (must be odd) (Integer from 1 to 32767) -outgrno integer [0] Species number to use as outgroup (Integer 0 or more) -[no]allsites boolean [Y] All sites detected * -lengths boolean [N] Use lengths from user trees -sitelength integer [6] Site length (Integer from 1 to 8) * -global boolean [N] Global rearrangements * -[no]rough boolean [Y] Speedier but rougher analysis -[no]trout toggle [Y] Write out trees to tree file * -outtreefile outfile [*.frestml] Phylip tree output file -printdata boolean [N] Print data at start of run -[no]progress boolean [Y] Print indications of progress of run -[no]treeprint boolean [Y] Print out tree Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-outfile" associated qualifiers -odirectory3 string Output directory "-outtreefile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-data] (Parameter 1) |
discretestates | File containing one or more sets of restriction data | Discrete states file | |
[-intreefile] (Parameter 2) |
tree | Phylip tree file (optional) | Phylogenetic tree | |
[-outfile] (Parameter 3) |
outfile | Phylip restml program output file | Output file | <*>.frestml |
Additional (Optional) qualifiers | ||||
-weights | properties | Weights file | Property value(s) | |
-njumble | integer | Number of times to randomise | Integer 0 or more | 0 |
-seed | integer | Random number seed between 1 and 32767 (must be odd) | Integer from 1 to 32767 | 1 |
-outgrno | integer | Species number to use as outgroup | Integer 0 or more | 0 |
-[no]allsites | boolean | All sites detected | Boolean value Yes/No | Yes |
-lengths | boolean | Use lengths from user trees | Boolean value Yes/No | No |
-sitelength | integer | Site length | Integer from 1 to 8 | 6 |
-global | boolean | Global rearrangements | Boolean value Yes/No | No |
-[no]rough | boolean | Speedier but rougher analysis | Boolean value Yes/No | Yes |
-[no]trout | toggle | Write out trees to tree file | Toggle value Yes/No | Yes |
-outtreefile | outfile | Phylip tree output file | Output file | <*>.frestml |
-printdata | boolean | Print data at start of run | Boolean value Yes/No | No |
-[no]progress | boolean | Print indications of progress of run | Boolean value Yes/No | Yes |
-[no]treeprint | boolean | Print out tree | Boolean value Yes/No | Yes |
Advanced (Unprompted) qualifiers | ||||
(none) | ||||
Associated qualifiers | ||||
"-outfile" associated outfile qualifiers | ||||
-odirectory3 -odirectory_outfile |
string | Output directory | Any string | |
"-outtreefile" associated outfile qualifiers | ||||
-odirectory | string | Output directory | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
frestml input is fairly standard, with one addition. As usual the first line of the file gives the number of species and the number of sites, but there is also a third number, which is the number of different restriction enzymes that were used to detect the restriction sites. Thus a data set with 10 species and 35 different sites, representing digestion with 4 different enzymes, would have the first line of the data file look like this:
10 35 4
The first line of the data file will also contain a letter W following these numbers (and separated from them by a space) if the Weights option is being used. As with all programs using the weights option, a line or lines must then follow, before the data, with the weights for each site.
The site data are in standard form. Each species starts with a species name whose maximum length is given by the constant "nmlngth" (whose value in the program as distributed is 10 characters). The name should, as usual, be padded out to that length with blanks if necessary. The sites data then follows, one character per site (any blanks will be skipped and ignored). Like the DNA and protein sequence data, the restriction sites data may be either in the "interleaved" form or the "sequential" form. Note that if you are analyzing restriction sites data with the programs DOLLOP or MIX or other discrete character programs, at the moment those programs do not use the "aligned" or "interleaved" data format. Therefore you may want to avoid that format when you have restriction sites data that you will want to feed into those programs.
The presence of a site is indicated by a "+" and the absence by a "-". I have also allowed the use of "1" and "0" as synonyms for "+" and "-", for compatibility with MIX and DOLLOP which do not allow "+" and "-". If the presence of the site is unknown (for example, if the DNA containing it has been deleted so that one does not know whether it would have contained the site) then the state "?" can be used to indicate that the state of this site is unknown.
User-defined trees may follow the data in the usual way. The trees must be unrooted, which means that at their base they must have a trifurcation.
Input files for usage example
File: restml.dat
5 13 2 Alpha ++-+-++--+++- Beta ++++--+--+++- Gamma -+--+-++-+-++ Delta ++-+----++--- Epsilon ++++----++--- |
Output file format
frestml outputs a graph to the specified graphics device. outputs a report format file. The default format is ...Output files for usage example
File: restml.frestml
Restriction site Maximum Likelihood method, version 3.69 Recognition sequences all 6 bases long Sites absent from all species are assumed to have been omitted +----Gamma | | +Beta 1--2 | | +Epsilon | +--3 | +Delta | +Alpha remember: this is an unrooted tree! Ln Likelihood = -40.47082 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 Gamma 0.10794 ( 0.01144, 0.21872) ** 1 2 0.01244 ( zero, 0.04712) 2 Beta 0.00100 ( zero, infinity) 2 3 0.05878 ( zero, 0.12675) ** 3 Epsilon 0.00022 ( zero, infinity) 3 Delta 0.01451 ( zero, 0.04459) ** 1 Alpha 0.01244 ( zero, 0.04717) * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 |
File: restml.treefile
(Gamma:0.10794,(Beta:0.00100,(Epsilon:0.00022, Delta:0.01451):0.05878):0.01244,Alpha:0.01244); |
Data files
NoneNotes
None.References
None.Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with status 0.Known bugs
None.See also
Program name | Description |
---|---|
distmat | Create a distance matrix from a multiple sequence alignment |
ednacomp | DNA compatibility algorithm |
ednadist | Nucleic acid sequence Distance Matrix program |
ednainvar | Nucleic acid sequence Invariants method |
ednaml | Phylogenies from nucleic acid Maximum Likelihood |
ednamlk | Phylogenies from nucleic acid Maximum Likelihood with clock |
ednapars | DNA parsimony algorithm |
ednapenny | Penny algorithm for DNA |
eprotdist | Protein distance algorithm |
eprotpars | Protein parsimony algorithm |
erestml | Restriction site Maximum Likelihood method |
eseqboot | Bootstrapped sequences algorithm |
fdiscboot | Bootstrapped discrete sites algorithm |
fdnacomp | DNA compatibility algorithm |
fdnadist | Nucleic acid sequence Distance Matrix program |
fdnainvar | Nucleic acid sequence Invariants method |
fdnaml | Estimates nucleotide phylogeny by maximum likelihood |
fdnamlk | Estimates nucleotide phylogeny by maximum likelihood |
fdnamove | Interactive DNA parsimony |
fdnapars | DNA parsimony algorithm |
fdnapenny | Penny algorithm for DNA |
fdolmove | Interactive Dollo or Polymorphism Parsimony |
ffreqboot | Bootstrapped genetic frequencies algorithm |
fproml | Protein phylogeny by maximum likelihood |
fpromlk | Protein phylogeny by maximum likelihood |
fprotdist | Protein distance algorithm |
fprotpars | Protein parsimony algorithm |
frestboot | Bootstrapped restriction sites algorithm |
frestdist | Distance matrix from restriction sites or fragments |
fseqboot | Bootstrapped sequences algorithm |
fseqbootall | Bootstrapped sequences algorithm |
Author(s)
This program is an EMBOSS conversion of a program written by Joe Felsenstein as part of his PHYLIP package.Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
History
Written (2004) - Joe Felsenstein, University of Washington.Converted (August 2004) to an EMBASSY program by the EMBOSS team.