dbxtax |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Index NCBI taxonomy using b+tree indicesDescription
dbxflat indexes NCBI taxonomy files, and builds EMBOSS B+tree format index files.These indexes allow access of flat files larger than 2Gb.
Usage
Here is a sample session with dbxtax
% dbxtax Index NCBI taxonomy using b+tree indices Basename for index files [taxon]: Resource name [taxresource]: Database directory [.]: taxonomy id : ID acc : Synonym tax : Scientific name rnk : Rank up : Parent gc : Genetics code mgc : Mitochondrial genetic code Index fields [*]: Compressed index files [Y]: General log output file [outfile.dbxtax]: |
Go to the output files for this example
Command line arguments
Index NCBI taxonomy using b+tree indices Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-dbname] string [taxon] Basename for index files (Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/) [-dbresource] string [taxresource] Resource name (Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/) -directory directory [.] Database directory -fields menu [*] Index fields (Values: id (ID); acc (Synonym); tax (Scientific name); rnk (Rank); up (Parent); gc (Genetics code); mgc (Mitochondrial genetic code)) -[no]compressed boolean [Y] Compressed index files -outfile outfile [*.dbxtax] General log output file Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -release string [0.0] Release number (Any string up to 9 characters) -date string [00/00/00] Index date (Date string dd/mm/yy) -indexoutdir outdir [.] Index file output directory Associated qualifiers: "-directory" associated qualifiers -extension string Default file extension "-indexoutdir" associated qualifiers -extension string Default file extension "-outfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||||||||||||||||
[-dbname] (Parameter 1) |
string | Basename for index files | Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/ | taxon | ||||||||||||||
[-dbresource] (Parameter 2) |
string | Resource name | Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/ | taxresource | ||||||||||||||
-directory | directory | Database directory | Directory | . | ||||||||||||||
-fields | list | Index fields |
|
* | ||||||||||||||
-[no]compressed | boolean | Compressed index files | Boolean value Yes/No | Yes | ||||||||||||||
-outfile | outfile | General log output file | Output file | <*>.dbxtax | ||||||||||||||
Additional (Optional) qualifiers | ||||||||||||||||||
(none) | ||||||||||||||||||
Advanced (Unprompted) qualifiers | ||||||||||||||||||
-release | string | Release number | Any string up to 9 characters | 0.0 | ||||||||||||||
-date | string | Index date | Date string dd/mm/yy | 00/00/00 | ||||||||||||||
-indexoutdir | outdir | Index file output directory | Output directory | . | ||||||||||||||
Associated qualifiers | ||||||||||||||||||
"-directory" associated directory qualifiers | ||||||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||||||
"-indexoutdir" associated outdir qualifiers | ||||||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||||||
"-outfile" associated outfile qualifiers | ||||||||||||||||||
-odirectory | string | Output directory | Any string | |||||||||||||||
General qualifiers | ||||||||||||||||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||||||
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||||||
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||||||
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||||||
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||||||
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||||||
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||||||
-warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||||||
-error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||||||
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||||||
-die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||||||
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
dbxtax reads and indexes the NCBI taxonomy nodes.dmp, names.dmp, division.dmp, gencode.dmp,and merged.dmp files.
Output file format
dbxtax creates one summary file for the database and two files for each field indexed.
- dbalias.ent is the master file containing the names of the files that have been indexed. It is an ASCII file. This file also contains the database release and date information.
- dbalias.xid is the B+tree index file for the ID names. It is a binary file.
- dbalias.pxid is an ASCII file containing information regarding the
structure of the ID name index.
Output files for usage example
File: outfile.dbxtax
Processing directory: /homes/user/test/data/taxonomy/ Processing file: nodes.dmp entries: 41 (41) time: 0.0s (0.0s) Total time: 0.0s
File: taxon.ent
# Number of files: 1 # Release: 0.0 # Date: 00/00/00 Dual filename database nodes.dmp names.dmp
File: taxon.pxac
Type Identifier Compress Yes Pages 3 Order 71 Fill 46 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 101 Count 1 Fullcount 1 Kwlimit 15
File: taxon.pxgc
Type Secondary Compress Yes Pages 6 Order 132 Fill 65 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 169 Count 1 Fullcount 41 Kwlimit 2
File: taxon.pxid
Type Identifier Compress Yes Pages 3 Order 99 Fill 56 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 101 Count 41 Fullcount 41 Kwlimit 7
File: taxon.pxmgc
Type Secondary Compress Yes Pages 12 Order 132 Fill 65 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 169 Count 3 Fullcount 39 Kwlimit 2
File: taxon.pxtax
Type Identifier Compress Yes Pages 19 Order 16 Fill 14 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 101 Count 86 Fullcount 90 Kwlimit 110
File: taxon.pxrnk
Type Secondary Compress Yes Pages 54 Order 68 Fill 45 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 169 Count 17 Fullcount 41 Kwlimit 16
File: taxon.pxup
Type Identifier Compress Yes Pages 15 Order 99 Fill 56 Pagesize 2048 Level 0 Cachesize 100 Order2 99 Fill2 101 Count 33 Fullcount 41 Kwlimit 7
File: taxon.xac
This file contains non-printing characters and so cannot be displayed here.
File: taxon.xgc
This file contains non-printing characters and so cannot be displayed here.
File: taxon.xid
This file contains non-printing characters and so cannot be displayed here.
File: taxon.xmgc
This file contains non-printing characters and so cannot be displayed here.
File: taxon.xtax
This file contains non-printing characters and so cannot be displayed here.
File: taxon.xrnk
This file contains non-printing characters and so cannot be displayed here.
File: taxon.xup
This file contains non-printing characters and so cannot be displayed here.
Data files
None.Notes
None.References
None.Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with status 0.Known bugs
None.See also
Program name Description dbiblast Index a BLAST database dbifasta Index a fasta file database dbiflat Index a flat file database dbigcg Index a GCG formatted database dbxcompress Compress an uncompressed dbx index dbxedam Index the EDAM ontology using b+tree indices dbxfasta Index a fasta file database using b+tree indices dbxflat Index a flat file database using b+tree indices dbxgcg Index a GCG formatted database using b+tree indices dbxobo Index an obo ontology using b+tree indices dbxreport Validate index and report internals for dbx databases dbxresource Index a data resource catalogue using b+tree indices dbxstat Dump statistics for dbx databases dbxuncompress Uncompress a compressed dbx index Author(s)
Peter Rice
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UKPlease report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
History
Target users
This program is intended to be used by administrators responsible for software and database installation and maintenance.Comments
None