makeprotseq |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Create random protein sequencesDescription
makeprotseq writes an output file with a set of random protein sequences. The sequence composition is defined from reading a pepstats output file of protein composition: the sequences are created with amino acid frequencies matching those given in the file. The number of sequences to create and length of each sequence is specified. Optionally, a user-defined string may be inserted into each output sequence at a specified position
Algorithm
Usage
Here is a sample session with makeprotseq
% makeprotseq Create random protein sequences Pepstats program output file (optional): Number of sequences created [100]: Length of each sequence [100]: protein output sequence(s) [makeseq.fasta]: |
Go to the output files for this example
Example 2
% makeprotseq Create random protein sequences Pepstats program output file (optional): ../pepstats-keep/laci_ecoli.pepstats Number of sequences created [100]: Length of each sequence [100]: protein output sequence(s) [makeseq.fasta]: |
Go to the input files for this example
Go to the output files for this example
Command line arguments
Create random protein sequences Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers (* if not always prompted): -pepstatsfile infile This file should be a pepstats output file. Protein sequences will be created with the composition in the pepstats output file. -amount integer [100] Number of sequences created (Integer 1 or more) -length integer [100] Length of each sequence (Integer 1 or more) * -insert string String that is inserted into sequence (Any string) * -start integer [1] Start point of inserted sequence (Integer 1 or more) [-outseq] seqoutall [ |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
-pepstatsfile | infile | This file should be a pepstats output file. Protein sequences will be created with the composition in the pepstats output file. | Input file | Required |
-amount | integer | Number of sequences created | Integer 1 or more | 100 |
-length | integer | Length of each sequence | Integer 1 or more | 100 |
-insert | string | String that is inserted into sequence | Any string | |
-start | integer | Start point of inserted sequence | Integer 1 or more | 1 |
[-outseq] (Parameter 1) |
seqoutall | Protein sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
Additional (Optional) qualifiers | ||||
-useinsert | toggle | Do you want to make an insert | Toggle value Yes/No | No |
Advanced (Unprompted) qualifiers | ||||
(none) | ||||
Associated qualifiers | ||||
"-outseq" associated seqoutall qualifiers | ||||
-osformat1 -osformat_outseq |
string | Output seq format | Any string | |
-osextension1 -osextension_outseq |
string | File name extension | Any string | |
-osname1 -osname_outseq |
string | Base file name | Any string | |
-osdirectory1 -osdirectory_outseq |
string | Output directory | Any string | |
-osdbname1 -osdbname_outseq |
string | Database name to add | Any string | |
-ossingle1 -ossingle_outseq |
boolean | Separate file for each entry | Boolean value Yes/No | N |
-oufo1 -oufo_outseq |
string | UFO features | Any string | |
-offormat1 -offormat_outseq |
string | Features format | Any string | |
-ofname1 -ofname_outseq |
string | Features file name | Any string | |
-ofdirectory1 -ofdirectory_outseq |
string | Output directory | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
makeprotseq reads a pepstats output file.Input files for usage example 2
File: ../pepstats-keep/laci_ecoli.pepstats
PEPSTATS of LACI_ECOLI from 1 to 360 Molecular weight = 38590.16 Residues = 360 Average Residue Weight = 107.195 Charge = 1.5 Isoelectric Point = 6.8820 A280 Molar Extinction Coefficients = 22920 (reduced) 23045 (cystine bridges) A280 Extinction Coefficients 1mg/ml = 0.594 (reduced) 0.597 (cystine bridges) Improbability of expression in inclusion bodies = 0.660 Residue Number Mole% DayhoffStat A = Ala 44 12.222 1.421 B = Asx 0 0.000 0.000 C = Cys 3 0.833 0.287 D = Asp 17 4.722 0.859 E = Glu 15 4.167 0.694 F = Phe 4 1.111 0.309 G = Gly 22 6.111 0.728 H = His 7 1.944 0.972 I = Ile 18 5.000 1.111 J = --- 0 0.000 0.000 K = Lys 11 3.056 0.463 L = Leu 41 11.389 1.539 M = Met 10 2.778 1.634 N = Asn 12 3.333 0.775 O = --- 0 0.000 0.000 P = Pro 14 3.889 0.748 Q = Gln 28 7.778 1.994 R = Arg 19 5.278 1.077 S = Ser 32 8.889 1.270 T = Thr 19 5.278 0.865 U = --- 0 0.000 0.000 V = Val 34 9.444 1.431 W = Trp 2 0.556 0.427 X = Xaa 0 0.000 0.000 Y = Tyr 8 2.222 0.654 Z = Glx 0 0.000 0.000 Property Residues Number Mole% Tiny (A+C+G+S+T) 120 33.333 Small (A+B+C+D+G+N+P+S+T+V) 197 54.722 Aliphatic (A+I+L+V) 137 38.056 Aromatic (F+H+W+Y) 21 5.833 Non-polar (A+C+F+G+I+L+M+P+V+W+Y) 200 55.556 Polar (D+E+H+K+N+Q+R+S+T+Z) 160 44.444 Charged (B+D+E+H+K+R+Z) 69 19.167 Basic (H+K+R) 37 10.278 Acidic (B+D+E+Z) 32 8.889 |
Output file format
The output is a standard EMBOSS sequence file.
The results can be output in one of several styles by using the command-line qualifier -osformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
Output files for usage example
File: makeseq.fasta
>EMBOSS_001 nrvlhpepnprtdniytpawirllygvwvwnrqachnkeerkryppklmmydsqfwcdfe wadccspkqgwhgnlvkvnrteemfgmqflpqvhpgkkvd >EMBOSS_002 vtvkddwhkdwwcrpamdylhywlkqrnhytdlslyyttstprwarmadtflapegndcv qtmywrwvndgdivclecqvcgrfdiymvqdsgqidkghs >EMBOSS_003 celpniypyweragingdwhetvtvrmhcnnddilwyqmnykppsshavhyivwrrnwcw nfidqgdgdnrncmnytsnapeqksqlkyghkrqftvvvr >EMBOSS_004 cshpdepancgridtykhvaydmtdtkaeyhgsspelqslrqkfsnqvwhnraviwwehp iqdcrlkhselrchskhlseikmpvevtmsdwlmytgyfm >EMBOSS_005 cpcqytiqygsdlfldsqmpkckkisvelvclvynaqsnlsyfiheaafmvfhpfsllci meecinwincriaiwppkfvqleidkmiwkvklqcknvcw >EMBOSS_006 ifawkitieywnktydldkmrklakdfgfppfdpwpihvgccnisnwfmepkfwaqmkcw mstltiedndwlmlnttefgeqllfywmhwmpcqdewqph >EMBOSS_007 eiviqqfmvshealkqlgnkwnsqqmhvndriydvkhlvdasnfihhplnkryfrenvns tacccvhtwsipclfqtidhivnldgaygpwyrvkyshsp >EMBOSS_008 hmcmmmshfyvgycffvsvrdqrqtceyphvlmhnilftqgralwvrskqpqcadnhqpk ghwwvawrlqsymkgpqykpqkdwwqgkkffghiwemvrc >EMBOSS_009 rwydtkfimsgkfaysarqyprqikegeatalrsgpqicpaewiaanypgasfekrqfmd mqwicgyeprehrwsekymshesvkkgyrhglkngveyqt >EMBOSS_010 yatmyyygtgmtmkwgepiyvaqflirneqepkvkhahghdascrpkirldlfleerpnp yksvsfnrfyaggkliigiitydttchkihahdrkeekar >EMBOSS_011 arakqhrkhmlvknsanwayqswdgkskllvtfghvmenmfkhwrkrsmncvrpinrhfe gpvmigvkadcqghgqidniqcawpnfedmhamrtvqqvm >EMBOSS_012 dkihtlrhnhypshmtmtewetvvgvfmipcsmariscpvwgnwphmqglcyppwsgtpn esqgctnnitfwmnwvspywlfpdlpnfatfmtlgnqrfh >EMBOSS_013 fkghwqakfphwlsydlkitkyftrqfhmnfiarmngasnfhgrsriawmahlwnqhara qaflmrlstheyewyfrnqapldqlfnecvlpvmsawhmw >EMBOSS_014 shhsqgkrnnectscqdqagfdcadnfttqvmekhwtheifnhivgisaittthyvytmt wcqsfsnmnnlwsragchwevdiagvrmmdicfsvyercf >EMBOSS_015 hknwdqldarqikalervvclpcenqvidtmvvglifkkkdlfmintwqtwkgisvysci hqfligfinkgwfpdaysvgvlmfdqdaienadahgdhhl >EMBOSS_016 gwylwhlgntdaqfeghstgnvhhedkathldfyhedwgchnrtqppvfgmanrwdvakk seygwmvgfhqcddtlgyfemfhawywgypgcdfhfnrpp >EMBOSS_017 lfqtfqvwnaymgcgryaclewysaisgcsvmqfgdfkdpifhfptdrlwggiesgdkev [Part of this file has been deleted for brevity] >EMBOSS_084 yscilavytfadkdtptkkapetdlnpdhermykmhdhsrtghwtrnnigcksssyisqv tciplripvnfrlvvawrcwmdlqddwkphmnmffmrray >EMBOSS_085 iivhagtkvksgdpaiglirectikwcemwpdsicdtfkivkyfiqskttmqsyyvnilr lsravaskdsqetdtcgdsmdmshvcpkchqnwhslgsdh >EMBOSS_086 frqhcqqnsdvfqyrhwktfpfryakchgihvwpkptmeccwlprhqwkktnrvnpvlnf ytdhhkviceqlelvvsykhskdrvthqmprrnvgegqdw >EMBOSS_087 ctedmshteihecsmfddlrdwhmkrdcsipnagrkkegwipkldmmrycvylqipktti fhdvsatcvasfmvhkpndfmlckrciyhykgdtwwrpwt >EMBOSS_088 ygscslkqtqlgsggkehqqqyleqayewavmisgspnciqsepwmmsmgkpkfwivryn yekcvmdgyacfqkwegpdhwvsvvhrkmydrggffylge >EMBOSS_089 hhqkiqscaitwtgiinqqwfmmnrrgwakkqdhqasafvitdnkrswtreyfnkmvism mkmhcvqtnvhcsaqrrsmhtppyntlitvprmgcpyfka >EMBOSS_090 akrdfsklilitwvlhficlrfgafaffgmemailrpsdqiywsnmqhnamfkptdhmlv pelycargfavhpadqgghwwlderteiphyrikdrshde >EMBOSS_091 pyrehgtmcpckvwqcplydgtadsrgagywpsfhneppqgvseiefeccmaciqqctka ffqgdamkkkylqnqevptpypdhildnmlqsvyqggqsq >EMBOSS_092 alhntshnerdmetlwgagcfivemkahfvcyelrkgcrcrrsfnvkfhahaqdwwffmd snptmgncgeqltgpkqhadilfidinqmmctmchchqye >EMBOSS_093 aflkktaalkrfrmmtvtmvnqlwcagatifmhilkdlsrsmilgrhcegmwillcfasy ralvkvdsgtkrwwhkdflelywwdeayrhysetqyasav >EMBOSS_094 rshvqnvdhpssidmkripdwtcqglqmqinlwllrragirravmykiwdkgemnvelah iwayhgfrsfikfgqpiqkmrvmyerilptdllyiedcnh >EMBOSS_095 yenmtynnavemahpeirnwlwrsgfdpmgvhchladnppvscatlvmklwitnfwaaqy pvwnqfltcyqmprqleflrglvgnchhdrwwqdalitrt >EMBOSS_096 rdhgrspnwmkftvgvdmtalmycwrdrlewliacfhftmkrgkciewmselicppptnf vqisnrcmsirgwmymradkeaqcnirwyiyahlkhkwgm >EMBOSS_097 aygfwfqqhhhfeptkqpeylawqtwtcvkkrcqsytachfeqvthvnkcpsqpiapced dpsptitsssscnierktgigltcysnhgalmritpkyci >EMBOSS_098 vaaiydmiaqwvviqtwfvttddqerclwwesrhqyydeyeffftistpvwcyprrrlia ghaqvikywawvdeslipsefqqapeipdpigpgkvqpeq >EMBOSS_099 vlmpqdsprpefivnvfntadpscecsckfqlfmcekffncikangvnvdmpchipkaam tecytlqiawppqnkltypgvhtksfgeewldhmpieqgg >EMBOSS_100 seffvlpnicymnwnramyvfshwwycfpcsymkdklfwdtrypqihpapklnmedacpg ytctrpqryqilarqtwyarkhygwvnwtdiascntmdgq |
Output files for usage example 2
File: makeseq.fasta
>EMBOSS_001 eqngievefrcplriaaeyahqsgatanpaedqllifgvlqsqaerggfgavpdkallvv nllvlargqkaikegpsaecpvlgttgqtgrdntekssnv >EMBOSS_002 npnsvlatsvmavqeyslmgtmagsqdriaavgagmapaaprcmydsllavglrvkevln qaranqnaelilialsvlqnltqvvsarpqlqtqilhtta >EMBOSS_003 llgersmqmmvdyksrivatvaapndgtveevligamdgemhrraqiyataipnqqrmln rlhldiltlrdevgrmpaeyevqgadghlkihcdkpappq >EMBOSS_004 laielvelevkdhvaasinlmlgavagylmttqaevgqqsqqglprdaairqyasmavtr sdlvcgsiqvgqvtasisplsssravapgqvassmatakr >EMBOSS_005 vevqaphdmtavsksvqdfrsvhssqavgalsalryqaegqmksivyllfaktrvpssli flvlsrmseldsysaeqsvaqglhvsginsagsdwhrnla >EMBOSS_006 ivynsspslmnfgamlglsgdhsygvktleelvrnrhtailvrsqeavsvrhvnldfgva fqagpsvlelngrgeaalvtlqsgkmmginsevqllndei >EMBOSS_007 vsaiqdkgaqillssqstrgaraqqftarvqhlvagtsnlyqrvhttegegdaldvepea allvvaipnqievgvqahvthargltlatealqpglatqr >EMBOSS_008 iglrssattmniawklaaaqlqqdpllmqingfifiglaqidysnnqasqrdvlleidqh kimmnyadsqalshkrqmsedglanqtghvkttsnvgnql >EMBOSS_009 qallaskifqtsklaqyqdaqqdssltvlpygcqieqslrlvaiyyrldtlpvvgddvsv rqasvtavdqliqaalgaraivqasgtmditggriplaqa >EMBOSS_010 myafmmakatgpssatvqianydkgsceldlegasiytkilypvqrsidslgvsvvqqre asaaavrqkmlktssihkisamvpavtssilivdgllslq >EMBOSS_011 lclgdiqsiggngfqyraymdaalksagggppvtiafleglstndsdagfvpqrifqikv idpssiaslllqtitdslridlyarrvvlrtygdpnddas >EMBOSS_012 lssiagdiriarqtspgavmlanpkpvgirlasyqsaleaatraeirdtglarraatpqe vqdtvaeesavasranpemasvqvgeeklavfpgirdqkt >EMBOSS_013 ksitaqyhvetngpmlghsasmvpqdktgrviydrrtyarkitdqdslagytgmrqiyql qwvggdsqatvalaatcrdlqgldskrvvagdpgqyaisn >EMBOSS_014 atiadkgdrrvlppvqldlkkllylrkaadagvstnptvsleiiaksqlspaptanapsp aldqkqrgffgaaqyivtnlplswkaqgsvsvvqpmvdvl >EMBOSS_015 igeavdgllddhslgldaplselvrqnhlasnntgskhshlsksheaadaagthaaacls tqksstkseskakrlyaqatagflvqlysvrlvlttltts >EMBOSS_016 inagntgtealydkvtiaatrattvvsypiglkaivlmivirqpqrqavtryednlawsg avmiasaikidlvvpsiaklrvtlmmakaekvlkitrdre >EMBOSS_017 gvqpvqaarlasiliqalvsvnmqysqtvqarqktlkhvesltlqplqgaktslptvgln [Part of this file has been deleted for brevity] >EMBOSS_084 mqvsgynaavlvslpepsgyevpvsreltvdsasrivtaqaitaadresivgppqmipda alsrgqhenrlqsaalmdwnsvsqvlasrifrgvkfddla >EMBOSS_085 hsailtasngatvelhtsiqlvpssallgaevqslvpvhhasakhqchppsqammarssd gqqyayphvadlplalklqrlsaiavqgliqfaiasiqli >EMBOSS_086 kqdtwdqrqlakqmqtasakrkqmygvtthinmehrpslvvasectqashprdpernsrv maliisnslvdglsaaqmhiasldaatdsecqeptvkqva >EMBOSS_087 valvgqtavitvlprkvvgdvatssqllqsqeltcssvtniegglrrqmvnagqhqsapi livnalplalavsptserllsgvsdvsmiastvpnnqeaa >EMBOSS_088 atalqgsqaqskattsvtqddagvdlavalashpkaerlsdaleargagthrslnsaqae mlgvpfviallldgmvievtanapntdhgmlditvvastv >EMBOSS_089 ttqsidalyhpmaiiirdqnvgfeqdknlgsdviqlqyvnsplegcqnpqlmkrgrasqr ssriwpdaraivqydqqagtpedmrpssparqftlellgy >EMBOSS_090 yhdltqsgissanngtkiwgqvklvylktglfyssdeqvqsmnqrgdielrvgepvtssn qvsmlldiklptrlvqktimasvvdalhriaqhhvqqilv >EMBOSS_091 eacvtkaglrvgpnqlegmltpwvadklkaarqvielerdiaavhlvlvlryvsqqvahy vkqtllggsgmgdrdvneprmdvisgvegsdqnlqtkqqd >EMBOSS_092 lgieaqtrvdvglpsniltlvsnvgsytkalmvgqgilqlqqpkraskiliyqvmntvgl cedagielilqsatqsqtylhsvilsrqrflpgvtliqav >EMBOSS_093 lvghhalyshdvqsgpnpsafqsnlytypivgihssvspqqssskdtlvtsmssswvyaa dygpgalptpsqantsltglsaamvlymdiaalaqayala >EMBOSS_094 qqindeavteaqhvrhqselapvqksdrqsrsnssqqlticdyarmsinvhivreavslt inymitvqakisvkdqsqssdagmvqigrpvggmsvvvft >EMBOSS_095 mlrgpaerypvgyievscrngaqqikvrrinilisyvreeaqvyagarsgasaelayydl eanrqkspvadrqqqslvgqkgairvtildnaqvlsspqp >EMBOSS_096 qvtiqqeensgvpakavgplggavaqlcslmgillvtkasgdtslivmsplghledrprv nqsardwgqsqkasarqlvgvldlricamhmytsgignts >EMBOSS_097 lmtkavqqtitkvqagdrlasyndamalaghqvdqaayvtkvdppinrhleaqqsyelvv leqrpspqaqalrsvdsptstgpvmariilggqspesals >EMBOSS_098 nlyhmvrswqaaasqpmvapalvqvcvganvqdtdmmvvmvtkkpsqpenavaddqqssl tiyqasslmlnnllagiqavlddlqvsdvrhtrtsndrld >EMBOSS_099 ngredlqedrvvsnrakrawveplvvqwhvdsvgllgvlelhglrkneavgrvihegyls pllaagdslnqrqrgsamekatpsqtklvnglifeildtt >EMBOSS_100 plklpgqfslafrnedlgmnkqtanmlkevqmsslggkavpdaeditelegseflvwldt mpvaqqddadhgldqanaydhiminpraplilqleasltq |
Data files
Notes
None.References
None.Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with status 0.Known bugs
None.See also
Program name | Description |
---|---|
aligncopy | Reads and writes alignments |
aligncopypair | Reads and writes pairs from alignments |
biosed | Replace or delete sequence sections |
codcopy | Copy and reformat a codon usage table |
cutseq | Removes a section from a sequence |
degapseq | Removes non-alphabetic (e.g. gap) characters from sequences |
descseq | Alter the name or description of a sequence |
entret | Retrieves sequence entries from flatfile databases and files |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from sequence(s) |
extractseq | Extract regions from a sequence |
featcopy | Reads and writes a feature table |
featreport | Reads and writes a feature table |
feattext | Return a feature table original text |
listor | Write a list file of the logical OR of two sets of sequences |
makenucseq | Create random nucleotide sequences |
maskambignuc | Masks all ambiguity characters in nucleotide sequences with N |
maskambigprot | Masks all ambiguity characters in protein sequences with X |
maskfeat | Write a sequence with masked features |
maskseq | Write a sequence with masked regions |
newseq | Create a sequence file from a typed-in sequence |
nohtml | Remove mark-up (e.g. HTML tags) from an ASCII text file |
noreturn | Remove carriage return from ASCII files |
nospace | Remove whitespace from an ASCII text file |
notab | Replace tabs with spaces in an ASCII text file |
notseq | Write to file a subset of an input stream of sequences |
nthseq | Write to file a single sequence from an input stream of sequences |
nthseqset | Reads and writes (returns) one set of sequences from many |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a nucleotide sequence |
seqcount | Reads and counts sequences |
seqret | Reads and writes (returns) sequences |
seqretsetall | Reads and writes (returns) many sets of sequences |
seqretsplit | Reads sequences and writes them to individual files |
sizeseq | Sort sequences by size |
skipredundant | Remove redundant sequences from an input set |
skipseq | Reads and writes (returns) sequences, skipping first few |
splitsource | Split sequence(s) into original source sequences |
splitter | Split sequence(s) into smaller sequences |
trimest | Remove poly-A tails from nucleotide sequences |
trimseq | Remove unwanted characters from start and end of sequence(s) |
trimspace | Remove extra whitespace from an ASCII text file |
union | Concatenate multiple sequences into a single sequence |
vectorstrip | Removes vectors from the ends of nucleotide sequence(s) |
yank | Add a sequence reference (a full USA) to a list file |
Author(s)
This application was contributed by Henrikki Almusa, Medicel, Helsinki, FinlandPlease report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.