cai |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
Calculate codon adaptation indexDescription
cai calculates the Codon Adaptation Index for a given nucleotide sequence, given a reference codon usage table. The CAI index is a simple, effective measure of synonymous codon usage bias. It index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
Algorithm
The CAI index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon. A score for a gene sequence is calculated from the frequency of use of all codons in that gene sequence.
Usage
Here is a sample session with cai
% cai TEMBL:AB009602 Calculate codon adaptation index Codon usage file [Eyeast_cai.cut]: Output file [ab009602.cai]: |
Go to the input files for this example
Go to the output files for this example
Command line arguments
Calculate codon adaptation index Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-seqall] seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) -cfile codon [Eyeast_cai.cut] Codon usage table name [-outfile] outfile [*.cai] Output file name Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-seqall" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-cfile" associated qualifiers -format string Data format "-outfile" associated qualifiers -odirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-seqall] (Parameter 1) |
seqall | Nucleotide sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
-cfile | codon | Codon usage table name | Codon usage file in EMBOSS data path | Eyeast_cai.cut |
[-outfile] (Parameter 2) |
outfile | Output file name | Output file | <*>.cai |
Additional (Optional) qualifiers | ||||
(none) | ||||
Advanced (Unprompted) qualifiers | ||||
(none) | ||||
Associated qualifiers | ||||
"-seqall" associated seqall qualifiers | ||||
-sbegin1 -sbegin_seqall |
integer | Start of each sequence to be used | Any integer value | 0 |
-send1 -send_seqall |
integer | End of each sequence to be used | Any integer value | 0 |
-sreverse1 -sreverse_seqall |
boolean | Reverse (if DNA) | Boolean value Yes/No | N |
-sask1 -sask_seqall |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
-snucleotide1 -snucleotide_seqall |
boolean | Sequence is nucleotide | Boolean value Yes/No | N |
-sprotein1 -sprotein_seqall |
boolean | Sequence is protein | Boolean value Yes/No | N |
-slower1 -slower_seqall |
boolean | Make lower case | Boolean value Yes/No | N |
-supper1 -supper_seqall |
boolean | Make upper case | Boolean value Yes/No | N |
-sformat1 -sformat_seqall |
string | Input sequence format | Any string | |
-sdbname1 -sdbname_seqall |
string | Database name | Any string | |
-sid1 -sid_seqall |
string | Entryname | Any string | |
-ufo1 -ufo_seqall |
string | UFO features | Any string | |
-fformat1 -fformat_seqall |
string | Features format | Any string | |
-fopenfile1 -fopenfile_seqall |
string | Features file name | Any string | |
"-cfile" associated codon qualifiers | ||||
-format | string | Data format | Any string | |
"-outfile" associated outfile qualifiers | ||||
-odirectory2 -odirectory_outfile |
string | Output directory | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
cai reads a nucleic acid sequence of a gene.Input files for usage example
Database entry: TEMBL:AB009602
ID AB009602; SV 1; linear; mRNA; STD; FUN; 561 BP. XX AC AB009602; XX DT 15-DEC-1997 (Rel. 53, Created) DT 14-APR-2005 (Rel. 83, Last updated, Version 2) XX DE Schizosaccharomyces pombe mRNA for MET1 homolog, partial cds. XX KW MET1 homolog. XX OS Schizosaccharomyces pombe (fission yeast) OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina; OC Schizosaccharomycetes; Schizosaccharomycetales; Schizosaccharomycetaceae; OC Schizosaccharomyces. XX RN [1] RP 1-561 RA Kawamukai M.; RT ; RL Submitted (07-DEC-1997) to the EMBL/GenBank/DDBJ databases. RL Makoto Kawamukai, Shimane University, Life and Environmental Science; 1060 RL Nishikawatsu, Matsue, Shimane 690, Japan RL (E-mail:kawamuka@life.shimane-u.ac.jp, Tel:0852-32-6587, Fax:0852-32-6499) XX RN [2] RP 1-561 RA Kawamukai M.; RT "S.pmbe MET1 homolog"; RL Unpublished. XX FH Key Location/Qualifiers FH FT source 1..561 FT /organism="Schizosaccharomyces pombe" FT /mol_type="mRNA" FT /clone_lib="pGAD GH" FT /db_xref="taxon:4896" FT CDS <1..275 FT /codon_start=3 FT /transl_table=1 FT /product="MET1 homolog" FT /db_xref="GENEDB:SPCC1739.06c" FT /db_xref="GOA:O74468" FT /db_xref="InterPro:IPR000878" FT /db_xref="InterPro:IPR003043" FT /db_xref="InterPro:IPR006366" FT /db_xref="InterPro:IPR006367" FT /db_xref="InterPro:IPR012066" FT /db_xref="InterPro:IPR014776" FT /db_xref="InterPro:IPR014777" FT /db_xref="InterPro:IPR016040" FT /db_xref="UniProtKB/Swiss-Prot:O74468" FT /protein_id="BAA23999.1" FT /translation="SMPKIPSFVPTQTTVFLMALHRLEILVQALIESGWPRVLPVCIAE FT RVSCPDQRFIFSTLEDVVEEYNKYESLPPGLLITGYSCNTLRNTA" XX SQ Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other; gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt 60 tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac 120 cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg 180 aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg 240 gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt 300 tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac 360 ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt 420 ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt 480 tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa 540 aacaattcta atggtcaaaa a 561 // |
Output file format
cai writes the Codon Adaptation Index to the output file.Output files for usage example
File: ab009602.cai
Sequence: AB009602 CAI: 0.188 |
Data files
cai requires a reference codon usage table prepared from a set of genes which are known to be highly expressed. This is specified by the -cfile option and must exist in the EMBOSS data directory. The default codon usage table Eyeastcai.cut is the standard set of Saccharomyces cerevisiae highly expressed gene codon frequiencies. Another table (Eschpo_cai.cut) was prepared from a set of Schizosaccharomyces pombe genes by Peter Rice for the S. pombe sequencing team at the Sanger Centre, and is available in the EMBOSS data directory. You should prepare your own codon usage table for your organism of interest.
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
- . (your current directory)
- .embossdata (under your current directory)
- ~/ (your home directory)
- ~/.embossdata
Notes
Codons are nucleotide triplet that encode an amino acid residue in a polypeptide chain. There are four possible nucleotides in DNA; adenine (A), guanine (G), cytosine (C) and thymine (T), therefore 64 possible triplets to encode the 20 amino acids plus the translation termination signal. The encoding is therefore redundant, with all but two amino acids coded for by more than one triplet. Organisms often have a particular preference for one of the possible codons for a given amino acid.
Codon preferences reflect a balance between mutational bias and selection for efficiency of translation. In fast-growing microorganisms there are optimal codons that reflect the composition of the genomic tRNA pool and probably help achieve faster translation rates and high accuracy. Such selection is expected to be strong in highly expressed genes, as is the case for Escherichia coli or Saccharomyces cerevisiae. In contrast, codon usage optimization is normally absent in organisms with slower growing rates such as Homo sapiens (human), where codon preferences are determined by mutational biases characteristic to a particular genome.
Various factors are thought to influence codon usage bias in baceteria, including gene expression level already mentioned, %G+C composition (reflecting horizontal gene transfer or mutational bias), GC skew (reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, and optimal growth temperature.
Various methods have been used to analyze codon usage bias. CAI and methods such as the 'frequency of optimal codons' (Fop) are commonly used to predict gene expression levels. Others such as the 'effective number of codons' (Nc) and Shannon entropy are used to measure codon usage evenness, whereas multivariate statistical methods, iincluding correspondence analysis and principal component analysis, may be used to analyze variations in codon usage between genes.
References
- Sharp PM., Li W-H. "The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications." Nucleic Acids Research 1987 vol 15, pp 1281-1295.
- Synonymous codon usage in bacteria. Curr Issues Mol Biol. 2001 Oct;3(4):91-7.
Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with status 0.Known bugs
None.See also
Program name | Description |
---|---|
chips | Calculates Nc codon usage statistic |
codcmp | Codon usage table comparison |
codcopy | Copy and reformat a codon usage table |
cusp | Create a codon usage table from nucleotide sequence(s) |
syco | Draw synonymous codon usage statistic plot for a nucleotide sequence |
Author(s)
Alan BleasbyEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.