ffactor |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Multistate to binary recoding programDescription
Takes discrete multistate data with character state trees and produces the corresponding data set with two states (0 and 1). Written by Christopher Meacham. This program was formerly used to accomodate multistate characters in MIX, but this is less necessary now that PARS is available.Algorithm
This program factors a data set that contains multistate characters, creating a data set consisting entirely of binary (0,1) characters that, in turn, can be used as input to any of the other discrete character programs in this package, except for PARS. Besides this primary function, FACTOR also provides an easy way of deleting characters from a data set. The input format for FACTOR is very similar to the input format for the other discrete character programs except for the addition of character-state tree descriptions.Note that this program has no way of converting an unordered multistate character into binary characters. Fortunately, PARS has joined the package, and it enables unordered multistate characters, in which any state can change to any other in one step, to be analyzed with parsimony.
FACTOR is really for a different case, that in which there are multiple states related on a "character state tree", which specifies for each state which other states it can change to. That graph of states is assumed to be a tree, with no loops in it.
Usage
Here is a sample session with ffactor
% ffactor Multistate to binary recoding program Phylip factor program input file: factor.dat Phylip factor program output file [factor.ffactor]: Data matrix written on file "factor.ffactor" Done. |
Go to the input files for this example
Go to the output files for this example
Command line arguments
Multistate to binary recoding program Version: EMBOSS:6.3.0 Standard (Mandatory) qualifiers: [-infile] infile Phylip factor program input file [-outfile] outfile [*.ffactor] Phylip factor program output file Additional (Optional) qualifiers: -anc boolean [N] Put ancestral states in output file -factors boolean [N] Put factors information in output file -[no]progress boolean [Y] Print indications of progress of run Advanced (Unprompted) qualifiers: -outfactorfile outfile [*.ffactor] Phylip factor data output file (optional) -outancfile outfile [*.ffactor] Phylip ancestor data output file (optional) Associated qualifiers: "-outfile" associated qualifiers -odirectory2 string Output directory "-outfactorfile" associated qualifiers -odirectory string Output directory "-outancfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-infile] (Parameter 1) |
infile | Phylip factor program input file | Input file | Required |
[-outfile] (Parameter 2) |
outfile | Phylip factor program output file | Output file | <*>.ffactor |
Additional (Optional) qualifiers | ||||
-anc | boolean | Put ancestral states in output file | Boolean value Yes/No | No |
-factors | boolean | Put factors information in output file | Boolean value Yes/No | No |
-[no]progress | boolean | Print indications of progress of run | Boolean value Yes/No | Yes |
Advanced (Unprompted) qualifiers | ||||
-outfactorfile | outfile | Phylip factor data output file (optional) | Output file | <*>.ffactor |
-outancfile | outfile | Phylip ancestor data output file (optional) | Output file | <*>.ffactor |
Associated qualifiers | ||||
"-outfile" associated outfile qualifiers | ||||
-odirectory2 -odirectory_outfile |
string | Output directory | Any string | |
"-outfactorfile" associated outfile qualifiers | ||||
-odirectory | string | Output directory | Any string | |
"-outancfile" associated outfile qualifiers | ||||
-odirectory | string | Output directory | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
ffactor reads character state tree data.This program factors a data set that contains multistate characters, creating a data set consisting entirely of binary (0,1) characters that, in turn, can be used as input to any of the other discrete character programs in this package, except for PARS. Besides this primary function, FACTOR also provides an easy way of deleting characters from a data set. The input format for FACTOR is very similar to the input format for the other discrete character programs except for the addition of character-state tree descriptions.
Note that this program has no way of converting an unordered multistate character into binary characters. Fortunately, PARS has joined the package, and it enables unordered multistate characters, in which any state can change to any other in one step, to be analyzed with parsimony.
FACTOR is really for a different case, that in which there are multiple states related on a "character state tree", which specifies for each state which other states it can change to. That graph of states is assumed to be a tree, with no loops in it.
First line
The first line of the input file should contain the number of species and the number of multistate characters. This first line is followed by the lines describing the character-state trees, one description per line. The species information constitutes the last part of the file. Any number of lines may be used for a single species.
The first line is free format with the number of species first, separated by at least one blank (space) from the number of multistate characters, which in turn is separated by at least one blank from the options, if present.
Character-state tree descriptions
The character-state trees are described in free format. The character number of the multistate character is given first followed by the description of the tree itself. Each description must be completed on a single line. Each character that is to be factored must have a description, and the characters must be described in the order that they occur in the input, that is, in numerical order.
The tree is described by listing the pairs of character states that are adjacent to each other in the character-state tree. The two character states in each adjacent pair are separated by a colon (":"). If character fifteen has this character state tree for possible states "A", "B", "C", and "D":
A ---- B ---- C | | | D
then the character-state tree description would be
15 A:B B:C D:B
Note that either symbol may appear first. The ancestral state is identified, if desired, by putting it "adjacent" to a period. If we wanted to root character fifteen at state C:
A <--- B <--- C | | V D
we could write
15 B:D A:B C:B .:C
Both the order in which the pairs are listed and the order of the symbols in each pair are arbitrary. However, each pair may only appear once in the list. Any symbols may be used for a character state in the input except the character that signals the connection between two states (in the distribution copy this is set to ":"), ".", and, of course, a blank. Blanks are ignored completely in the tree description so that even B:DA:BC:B.:C or B : DA : BC : B. : C would be equivalent to the above example. However, at least one blank must separate the character number from the tree description.
Deleting characters from a data set
If no description line appears in the input for a particular character, then that character will be omitted from the output. If the character number is given on the line, but no character-state tree is provided, then the symbol for the character in the input will be copied directly to the output without change. This is useful for characters that are already coded "0" and "1". Characters can be deleted from a data set simply by listing only those that are to appear in the output.
Terminating the list of tree descriptions
The last character-state tree description should be followed by a line containing the number "999". This terminates processing of the trees and indicates the beginning of the species information.Species information
The format for the species information is basically identical to the other discrete character programs. The first ten character positions are allotted to the species name (this value may be changed by altering the value of the constant nmlngth at the beginning of the program). The character states follow and may be continued to as many lines as desired. There is no current method for indicating polymorphisms. It is possible to either put blanks between characters or not.
There is a method for indicating uncertainty about states. There is one character value that stands for "unknown". If this appears in the input data then "?" is written out in all the corresponding positions in the output file. The character value that designates "unknown" is given in the constant unkchar at the beginning of the program, and can be changed by changing that constant. It is set to "?" in the distribution copy.
Input files for usage example
File: factor.dat
4 6 1 A:B B:C 2 A:B B:. 4 5 0:1 1:2 .:0 6 .:# #:$ #:% 999 Alpha CAW00# Beta BBX01% Gamma ABY12# Epsilon CAZ01$ |
Output file format
The first line of ffactor output will contain the number of species and the number of binary characters in the factored data set followed by the letter "A" if the A option was specified in the input. If option F was specified, the next line will begin "FACTORS". If option A was specified, the line describing the ancestor will follow next. Finally, the factored characters will be written for each species in the format required for input by the other discrete programs in the package. The maximum length of the output lines is 80 characters, but this maximum length can be changed prior to compilation.In fact, the format of the output file for the A and F options is not correct for the current release of PHYLIP. We need to change their output to write a factors file and an ancestors file instead of putting the Factors and Ancestors information into the data file.
Output files for usage example
File: factor.ffactor
4 5 Alpha CA00# Beta BB01% Gamma AB12# Epsilon CA01$ |
File: factor.factor
|
File: factor.ancestor
|
Data files
NoneNotes
None.References
None.Warnings
None.Diagnostic Error Messages
The output should be checked for error messages. Errors will occur in the character-state tree descriptions if the format is incorrect (colons in the wrong place, etc.), if more than one root is specified, if the tree contains loops (and hence is not a tree), and if the tree is not connected, e.g.
A:B B:C D:E
describes
A ---- B ---- C D ---- E
This "tree" is in two unconnected pieces. An error will also occur if a symbol appears in the data set that is not in the tree description for that character. Blanks at the end of lines when the species information is continued to a new line will cause this kind of error.
Exit status
It always exits with status 0.Known bugs
None.See also
Program name | Description |
---|---|
eclique | Largest clique program |
edollop | Dollo and polymorphism parsimony algorithm |
edolpenny | Penny algorithm Dollo or polymorphism |
efactor | Multistate to binary recoding program |
emix | Mixed parsimony algorithm |
epenny | Penny algorithm, branch-and-bound |
fclique | Largest clique program |
fdollop | Dollo and polymorphism parsimony algorithm |
fdolpenny | Penny algorithm Dollo or polymorphism |
fmix | Mixed parsimony algorithm |
fmove | Interactive mixed method parsimony |
fpars | Discrete character parsimony |
fpenny | Penny algorithm, branch-and-bound |
Author(s)
This program is an EMBOSS conversion of a program written by Joe Felsenstein as part of his PHYLIP package.Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
History
Written (2004) - Joe Felsenstein, University of Washington.Converted (August 2004) to an EMBASSY program by the EMBOSS team.