pepcoil |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Predicts coiled coil regions in protein sequencesDescription
pepcoil calculates the probability of a coiled-coil structure for windows of (by default) 28 residues through a protein sequence using the method of Lupas A, van Dyke M & Stock J (1991); Science 252:1162-4. It reads one or more sequences and writes a standard EMBOSS report file including the location and score of the putative coiled-coil structures and their score for each input sequence. Optionally, coiled coil regions, non-coiled coil regions and coil frameshifts are reported.
Usage
Here is a sample session with pepcoil
% pepcoil Predicts coiled coil regions in protein sequences Input protein sequence(s): tsw:gcn4_yeast Window size [28]: Output report [gcn4_yeast.pepcoil]: |
Go to the input files for this example
Go to the output files for this example
Command line arguments
Predicts coiled coil regions in protein sequences Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-sequence] seqall Protein sequence(s) filename and optional format, or reference (input USA) -window integer [28] Window size (Integer from 7 to 28) [-outfile] report [*.pepcoil] Output report file name (default -rformat motif) Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -[no]coil boolean [Y] Report coiled coil regions -frame boolean [Yes if -coil is true] Show coil frameshifts -other boolean [N] Report non coiled coil regions Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rstrandshow2 boolean Show the nucleotide strand in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-sequence] (Parameter 1) |
seqall | Protein sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
-window | integer | Window size | Integer from 7 to 28 | 28 |
[-outfile] (Parameter 2) |
report | Output report file name | (default -rformat motif) | <*>.pepcoil |
Additional (Optional) qualifiers | ||||
(none) | ||||
Advanced (Unprompted) qualifiers | ||||
-[no]coil | boolean | Report coiled coil regions | Boolean value Yes/No | Yes |
-frame | boolean | Show coil frameshifts | Boolean value Yes/No | Yes if -coil is true |
-other | boolean | Report non coiled coil regions | Boolean value Yes/No | No |
Associated qualifiers | ||||
"-sequence" associated seqall qualifiers | ||||
-sbegin1 -sbegin_sequence |
integer | Start of each sequence to be used | Any integer value | 0 |
-send1 -send_sequence |
integer | End of each sequence to be used | Any integer value | 0 |
-sreverse1 -sreverse_sequence |
boolean | Reverse (if DNA) | Boolean value Yes/No | N |
-sask1 -sask_sequence |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
-snucleotide1 -snucleotide_sequence |
boolean | Sequence is nucleotide | Boolean value Yes/No | N |
-sprotein1 -sprotein_sequence |
boolean | Sequence is protein | Boolean value Yes/No | N |
-slower1 -slower_sequence |
boolean | Make lower case | Boolean value Yes/No | N |
-supper1 -supper_sequence |
boolean | Make upper case | Boolean value Yes/No | N |
-sformat1 -sformat_sequence |
string | Input sequence format | Any string | |
-sdbname1 -sdbname_sequence |
string | Database name | Any string | |
-sid1 -sid_sequence |
string | Entryname | Any string | |
-ufo1 -ufo_sequence |
string | UFO features | Any string | |
-fformat1 -fformat_sequence |
string | Features format | Any string | |
-fopenfile1 -fopenfile_sequence |
string | Features file name | Any string | |
"-outfile" associated report qualifiers | ||||
-rformat2 -rformat_outfile |
string | Report format | Any string | motif |
-rname2 -rname_outfile |
string | Base file name | Any string | |
-rextension2 -rextension_outfile |
string | File name extension | Any string | |
-rdirectory2 -rdirectory_outfile |
string | Output directory | Any string | |
-raccshow2 -raccshow_outfile |
boolean | Show accession number in the report | Boolean value Yes/No | N |
-rdesshow2 -rdesshow_outfile |
boolean | Show description in the report | Boolean value Yes/No | N |
-rscoreshow2 -rscoreshow_outfile |
boolean | Show the score in the report | Boolean value Yes/No | Y |
-rstrandshow2 -rstrandshow_outfile |
boolean | Show the nucleotide strand in the report | Boolean value Yes/No | Y |
-rusashow2 -rusashow_outfile |
boolean | Show the full USA in the report | Boolean value Yes/No | N |
-rmaxall2 -rmaxall_outfile |
integer | Maximum total hits to report | Any integer value | 0 |
-rmaxseq2 -rmaxseq_outfile |
integer | Maximum hits to report for one sequence | Any integer value | 0 |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
pepcoil reads one or more protein sequences.
The input is a standard EMBOSS sequence query (also known as a 'USA').
Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl
Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application.
The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
Input files for usage example
'tsw:gcn4_yeast' is a sequence entry in the example protein database 'tsw'
Database entry: tsw:gcn4_yeast
ID GCN4_YEAST Reviewed; 281 AA. AC P03069; P03068; Q70D88; Q70D91; Q70D96; Q70D99; Q70DA0; Q96UT3; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DT 21-JUL-1986, sequence version 1. DT 15-JUN-2010, entry version 120. DE RecName: Full=General control protein GCN4; DE AltName: Full=Amino acid biosynthesis regulatory protein; GN Name=GCN4; Synonyms=AAS3, ARG9; OrderedLocusNames=YEL009C; OS Saccharomyces cerevisiae (Baker's yeast). OC Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomyceta; OC Saccharomycotina; Saccharomycetes; Saccharomycetales; OC Saccharomycetaceae; Saccharomyces. OX NCBI_TaxID=4932; RN [1] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA]. RX MEDLINE=85038531; PubMed=6387704; DOI=10.1073/pnas.81.20.6442; RA Hinnebusch A.G.; RT "Evidence for translational regulation of the activator of general RT amino acid control in yeast."; RL Proc. Natl. Acad. Sci. U.S.A. 81:6442-6446(1984). RN [2] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA]. RX MEDLINE=84298088; PubMed=6433345; DOI=10.1073/pnas.81.16.5096; RA Thireos G., Penn M.D., Greer H.; RT "5' untranslated sequences are required for the translational control RT of a yeast regulatory gene."; RL Proc. Natl. Acad. Sci. U.S.A. 81:5096-5100(1984). RN [3] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND VARIANTS PRO-24; SER-62; RP ALA-82; ALA-91; ALA-125 AND GLU-196. RC STRAIN=CLIB 219, CLIB 382, CLIB 388, CLIB 410, CLIB 413, CLIB 556, RC CLIB 630, CLIB 95, K1, R12, R13, Sigma 1278B, YIIc12, and YIIc17; RX PubMed=15087486; DOI=10.1093/nar/gkh529; RA Leh-Louis V., Wirth B., Despons L., Wain-Hobson S., Potier S., RA Souciet J.-L.; RT "Differential evolution of the Saccharomyces cerevisiae DUP240 RT paralogs and implication of recombination in phylogeny."; RL Nucleic Acids Res. 32:2069-2078(2004). RN [4] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RC STRAIN=ATCC 204511 / S288c / AB972; RX MEDLINE=97313264; PubMed=9169868; RA Dietrich F.S., Mulligan J.T., Hennessy K.M., Yelton M.A., Allen E., RA Araujo R., Aviles E., Berno A., Brennan T., Carpenter J., Chen E., RA Cherry J.M., Chung E., Duncan M., Guzman E., Hartzell G., RA Hunicke-Smith S., Hyman R.W., Kayser A., Komp C., Lashkari D., Lew H., RA Lin D., Mosedale D., Nakahara K., Namath A., Norgren R., Oefner P., RA Oh C., Petel F.X., Roberts D., Sehl P., Schramm S., Shogren T., RA Smith V., Taylor P., Wei Y., Botstein D., Davis R.W.; RT "The nucleotide sequence of Saccharomyces cerevisiae chromosome V."; [Part of this file has been deleted for brevity] FT /FTId=PRO_0000076490. FT DOMAIN 253 274 Leucine-zipper. FT DNA_BIND 231 249 Basic motif. FT REGION 89 100 Required for transcriptional activation. FT REGION 106 125 Required for transcriptional activation. FT MOD_RES 17 17 Phosphoserine. FT MOD_RES 165 165 Phosphothreonine; by PHO85. FT MOD_RES 218 218 Phosphoserine. FT VARIANT 24 24 S -> P (in strain: CLIB 219). FT VARIANT 62 62 P -> S (in strain: CLIB 630 haplotype FT Ha2). FT VARIANT 82 82 T -> A (in strain: CLIB 556 haplotype FT Ha1). FT VARIANT 91 91 D -> A (in strain: CLIB 95, CLIB 219, FT CLIB 382, CLIB 388, CLIB 410, CLIB 413, FT CLIB 556, CLIB 630, K1, R12, R13 FT haplotype Ha2, Sigma 1278B haplotype Ha1, FT YIIc12 and YIIc17). FT VARIANT 125 125 D -> A (in strain: CLIB 556 haplotype FT Ha1). FT VARIANT 196 196 D -> E (in strain: CLIB 388, CLIB 410, FT CLIB 413, CLIB 630 haplotype Ha1, K1, FT YIIc12 haplotype Ha2 and YIIc17 haplotype FT Ha1). FT MUTAGEN 97 98 FF->AA: Reduces transcriptional FT activation activity; when associated with FT A-107; A-110; A-113; A-120; A-123 and A- FT 124. FT MUTAGEN 107 107 M->A: Reduces transcriptional activation FT activity; when associated with A-97; A- FT 98; A-110; A-113; A-120; A-123 and A-124. FT MUTAGEN 110 110 Y->A: Reduces transcriptional activation FT activity; when associated with A-97; A- FT 98; A-107; A-113; A-120; A-123 and A-124. FT MUTAGEN 113 113 L->A: Reduces transcriptional activation FT activity; when associated with A-97; A- FT 98; A-107; A-110; A-120; A-123 and A-124. FT MUTAGEN 120 124 WTSLF->ATSAA: Reduces transcriptional FT activation activity; when associated with FT A-97; A-98; A-107; A-110 and A-113. FT CONFLICT 239 281 ARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVG FT ER -> PGVLVRESCKE (in Ref. 2; AAA65521). FT HELIX 230 248 FT HELIX 251 280 SQ SEQUENCE 281 AA; 31310 MW; 2ED1B8E35D509578 CRC64; MSEYQPSLFA LNPMGFSPLD GSKSTNENVS ASTSTAKPMV GQLIFDKFIK TEEDPIIKQD TPSNLDFDFA LPQTATAPDA KTVLPIPELD DAVVESFFSS STDSTPMFEY ENLEDNSKEW TSLFDNDIPV TTDDVSLADK AIESTEEVSL VPSNLEVSTT SFLPTPVLED AKLTQTRKVK KPNSVVKKSH HVGKDDESRL DHLGVVAYNR KQRSIPLSPI VPESSDPAAL KRARNTEAAR RSRARKLQRM KQLEDKVEEL LSKNYHLENE VARLKKLVGE R // |
Output file format
The output is a standard EMBOSS report file.
The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, draw, restrict, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq.
See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats.
By default the report is in 'motif' format.
Output files for usage example
The SwissProt annotation marks the true leucine zipper motif as from 253 to 274. The leucine zipper is a special case of a coiled-coil region.File: gcn4_yeast.pepcoil
######################################## # Program: pepcoil # Rundate: Fri 15 Jul 2011 12:00:00 # Commandline: pepcoil # -sequence tsw:gcn4_yeast # Report_format: motif # Report_file: gcn4_yeast.pepcoil ######################################## #======================================= # # Sequence: GCN4_YEAST from: 1 to: 281 # HitCount: 1 # # Window size: 28 residues # #======================================= Max_coil_pos at "*" (1) Score 1.910 length 49 at residues 233->281 * Sequence: ARNTEAARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER | | 233 281 Probability: 1.000 Predict: coiled Max_coil_pos: 249 #--------------------------------------- #--------------------------------------- |
Data files
None.Notes
Coiled coils are formed by two or three alpha helices in parallel and in register that cross at an angle of approximately 20 degrees, are strongly amphipathic and display a pattern of hydrophilic and hydrophobic residues that is repeated every seven residues. The seven positions of the heptad repeat are designated a through g, a and d being generally hydrophobic, while the others are hydrophilic.
The parallel two-stranded alpha-helical coiled coil is the most frequently encountered subunit-oligomerization motif in proteins.
References
- Lupas A, van Dyke M & Stock J; Predicting Coiled Coils from Protein Sequences. Science 252:1162-4 (1991)
Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with a status of 0.Known bugs
None.See also
Program name | Description |
---|---|
garnier | Predicts protein secondary structure using GOR method |
helixturnhelix | Identify nucleic acid-binding motifs in protein sequences |
pepnet | Draw a helical net for a protein sequence |
pepwheel | Draw a helical wheel diagram for a protein sequence |
Author(s)
Alan BleasbyEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
Original program "PEPCOIL" by Peter Rice (EGCG 1991)