jaspextract |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Extract data from JASPARDescription
JASPAR is a collection of transcription factor DNA-binding preferences, modelled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences.JASPAR is the only database with this scope where the data can be used with no restrictions (open-source).
This program copies the JASPAR distribution into its component matrix sets (e.g. JASPAR_CORE, JASPAR_PHYLOFACTS etc) and copies them into the EMBOSS data directories, performing any necessary conversions
The home page of JASPAR is: http://jaspar.genereg.net/
The EMBOSS program jaspscan will not work unless this program is run.
Running this program may be the job of your system manager.
Usage
Here is a sample session with jaspextract
% jaspextract Extract data from JASPAR JASPAR database directory [.]: jaspar |
Go to the output files for this example
Command line arguments
Extract data from JASPAR Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-directory] directory The FlatFileDir directory containing the .pfm files and the matrix_list.txt file Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-directory" associated qualifiers -extension1 string Default file extension General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-directory] (Parameter 1) |
directory | The FlatFileDir directory containing the .pfm files and the matrix_list.txt file | Directory | |
Additional (Optional) qualifiers | ||||
(none) | ||||
Advanced (Unprompted) qualifiers | ||||
(none) | ||||
Associated qualifiers | ||||
"-directory" associated directory qualifiers | ||||
-extension1 -extension_directory |
string | Default file extension | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
The input files are part of the uncompressed and extracted Archive.zip file provided in the JASPAR html/DOWNLOAD directory of the JASPAR homepage (http://jaspar.genereg.net). After extracting the file you should specify the all_data/FlatFileDir directory when running jasparextract. It is advisable to first delete any old data files from your EMBOSS data file area e.g. from the /usr/local/emboss/share/EMBOSS/data/JASPAR_* directoriesOutput file format
The output file format is currently the same as the JASPAR distribution format, but with the matrix files separated into directories according to their type.Output files for usage example
Directory: JASPAR_CNE
This directory contains output files.
Directory: JASPAR_CORE
This directory contains output files, for example MA0070.1.pfm MA0071.1.pfm MA0072.1.pfm MA0073.1.pfm MA0074.1.pfm MA0075.1.pfm MA0076.1.pfm MA0077.1.pfm MA0078.1.pfm MA0079.1.pfm and matrix_list.txt.
File: JASPAR_CORE/MA0070.1.pfm
5 3 16 1 0 17 17 0 0 16 12 8 6 9 1 1 18 1 0 0 18 1 0 2 2 3 1 0 0 0 0 1 0 0 1 2 5 3 0 16 0 0 1 17 0 1 5 6 |
File: JASPAR_CORE/MA0071.1.pfm
15 9 6 11 21 0 0 0 0 25 1 1 12 2 0 0 0 0 25 0 2 0 4 5 4 25 25 0 0 0 7 15 3 7 0 0 0 25 0 0 |
File: JASPAR_CORE/MA0072.1.pfm
9 17 15 35 23 2 0 28 0 0 0 0 36 15 8 2 0 1 0 12 0 0 0 0 0 36 0 6 8 7 3 0 0 13 0 8 36 36 0 0 0 10 11 10 18 0 13 9 36 0 0 0 36 0 0 5 |
File: JASPAR_CORE/MA0073.1.pfm
3 1 3 0 7 9 8 4 0 11 4 1 3 4 2 4 4 4 1 4 8 10 8 11 4 2 3 6 11 0 7 10 8 6 9 5 5 6 7 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 3 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 1 1 0 1 |
File: JASPAR_CORE/MA0074.1.pfm
3 0 0 0 0 9 4 2 2 5 0 0 1 0 7 0 0 0 0 9 0 2 4 0 0 0 0 0 9 1 7 10 9 0 0 1 0 2 8 5 10 0 0 0 2 0 0 1 10 1 0 4 2 0 0 0 10 9 1 0 |
File: JASPAR_CORE/MA0075.1.pfm
52 59 0 0 58 2 0 0 0 0 4 0 1 0 1 1 0 58 59 0 |
File: JASPAR_CORE/MA0076.1.pfm
16 0 0 0 0 20 16 4 1 1 20 20 0 0 0 0 1 6 2 0 0 20 20 0 0 15 0 1 0 0 0 0 0 4 0 13 |
File: JASPAR_CORE/MA0077.1.pfm
24 54 59 0 65 71 4 24 9 7 6 4 72 4 2 0 6 9 31 7 0 2 0 1 1 38 55 14 9 13 2 7 2 71 8 3 |
File: JASPAR_CORE/MA0078.1.pfm
7 8 3 30 0 0 0 0 0 9 8 18 0 1 0 0 0 17 6 4 1 0 0 0 31 2 10 9 11 9 1 30 31 0 29 4 |
File: JASPAR_CORE/MA0079.1.pfm
1 2 0 0 0 2 0 0 1 2 1 1 0 0 5 0 1 0 1 0 4 4 8 8 2 4 5 6 6 0 2 1 0 0 1 2 2 2 0 6 |
File: JASPAR_CORE/matrix_list.txt
MA0072.1 17.4248426117905 RORA_2 Zinc-coordinating ; acc "NP_599022" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear Receptor" ; medline "7926749" ; pazar_tf_id "TF0000048" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0075.1 9.06306510239134 Prrx2 Helix-Turn-Helix ; acc "Q06348" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7901837" ; pazar_tf_id "TF0000051" ; species "10090" ; tax_group "vertebrates" ; type "SELEX" MA0071.1 13.1897301896459 RORA_1 Zinc-coordinating ; acc "NP_599023" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear Receptor" ; medline "7926749" ; pazar_tf_id "TF0000047" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0073.1 22.2782723704014 RREB1 Zinc-coordinating ; acc "Q92766" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline "8816445" ; pazar_tf_id "TF0000049" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0070.1 14.6408952002356 PBX1 Helix-Turn-Helix ; acc "Q5T486" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7910944" ; pazar_tf_id "TF0000046" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0074.1 20.4511671987138 RXRA::VDR Zinc-coordinating ; acc "P19793,P11473" ; collection "CORE" ; comment "heterodimer between RXRA and VDR" ; family "Hormone-nuclear Receptor" ; medline "8674817" ; pazar_tf_id "TF0000050" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0078.1 10.5018372361999 Sox17 Other Alpha-Helix ; acc "Q61473" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medline "8636240" ; pazar_tf_id "TF0000054" ; species "10090" ; tax_group "vertebrates" ; type "SELEX" MA0079.1 9.7185757452318 SP1 Zinc-coordinating ; acc "P08047" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline "2192357" ; pazar_tf_id "TF0000055" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0079.2 11.1288626921664 SP1 Zinc-coordinating ; acc "P08047" ; collection "CORE" ; comment "Annotations from PAZAR SP1 + SP1_MOUSE + SP1_HUMAN + SP1_RAT in the pleiades genes project (TF0000105, TF0000121, TF0000137, TF0000146)." ; family "BetaBetaAlpha-zinc finger" ; medline "17916232" ; pazar_tf_id "TF0000055" ; species "9606,10090,10116" ; tax_group "vertebrates" ; type "COMPILED" MA0077.1 9.07881462267178 SOX9 Other Alpha-Helix ; acc "P48436" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medline "9973626" ; pazar_tf_id "TF0000053" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0076.1 14.123230134165 ELK4 Winged Helix-Turn-Helix ; acc "P28324" ; collection "CORE" ; comment "-" ; family "Ets" ; medline "8524663" ; pazar_tf_id "TF0000052" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" |
Directory: JASPAR_FAM
This directory contains output files.
Directory: JASPAR_PBM
This directory contains output files.
Directory: JASPAR_PBM_HLH
This directory contains output files.
Directory: JASPAR_PBM_HOMEO
This directory contains output files.
Directory: JASPAR_PHYLOFACTS
This directory contains output files.
Directory: JASPAR_POLII
This directory contains output files.
Directory: JASPAR_SPLICE
This directory contains output files.
Data files
NoneNotes
The home page of JASPAR is: http://jaspar.genereg.net Running this program may be the job of your system manager.References
- DNA binding sites: representation and discovery Bioinformatics. 2000 Jan;16(1):16-23
- Applied bioinformatics for the identification of regulatory elements Nat Rev Genet. 2004 Apr;5(4):276-87
Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with status 0 unless an error is reportedKnown bugs
None.See also
Program name | Description |
---|---|
aaindexextract | Extract amino acid property data from AAINDEX |
cutgextract | Extract codon usage tables from CUTG database |
printsextract | Extract data from PRINTS database for use by pscan |
prosextract | Processes the PROSITE motif database for use by patmatmotifs |
rebaseextract | Process the REBASE database for use by restriction enzyme applications |
tfextract | Process TRANSFAC transcription factor database for use by tfscan |
Author(s)
Alan BleasbyEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.