emira |
Wiki
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.Please help by correcting and extending the Wiki pages.
Function
MIRA fragment assembly programDescription
**************** EDIT HERE ****************Algorithm
**************** EDIT HERE ****************Usage
Here is a sample session with emira
% emira -setparam fasta -project cjejuni_demo -genome accurate -mxti -rns tigr -orh MIRA fragment assembly program This is MIRA V2.8.3 (production version). Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. Mail questions, bug reports, ideas or suggestions to: bach@chevreux.org Compiled in boundtracking mode. Compiled in bugtracking mode. Parsing parameters: -genomeaccurate -fasta -GE:project=cjejuni_demo -GE:mxti=yes -OUT:orh=yes -GE:rns=tigr Using quickmode switch -genomeaccurate : -GE:uti=yes -AS:mrl=40:nop=4:sep=yes:rbl=4:sd=yes:sdlpo=yes:ugpf=yes -DP:ure=yes:rewl=30:rewme=2:feip=0;leip=0:tpae=no -CL:pvc=yes:pvcmla=18:qc=no:mbc=no:emlc=yes:mlcr=25:smlc=30 -SK:bph=16:hss=4:pr=45:mhpr=200 -AL:bip=20:bmin=25:bmax=130:mo=15:ms=30:mrs=65:egp=yes:egpl=low -CO:rodirs=25:mr=yes:asir=no:mrpg=2:emea=25 amgb=yes:amgbemc=yes:amgbnbs=yes -ED:ace=no Using quickmode switch fasta : -GE:lj=fasta Parameters parsed without error, perfect. Used parameter settings: General (-GE): Project name (pro) : cjejuni_demo Load job (lj) : FASTA file (fasta) Filecheck only (fo) : No External quality (eq) : from SCF (scf) Ext. qual. override (eqo) : No Discard reads on e.q. error (droeqe): No Read naming scheme (rns) : TIGR (tigr) Merge with XML trace info (mxti) : Yes Use template information (uti) : Yes EST-assembly start step (ess) : 1 Assembly options (-AS): Minimum read length (mrl) : 40 Number of passes (nop) : 4 Skim each pass (sep) : Yes Maximum number of RMB break loops (rbl) : 4 Spoiler detection (sd) : Yes Last pass only (sdlpo) : Yes Base default quality (bdq) : Yes Use genomic pathfinder (ugpf) : Yes Use emergency search stop (uess) : Yes ESS partner depth (esspd) : 500 Use emergency blacklist (uebl) : Yes Use max. contig build time (umcbt) : No Build time in seconds (bts) : 10000 Strain and backbone options (-SB): Load straindata (lsd) : No Load backbone (lb) : No Start backbone usage in pass (sbuip): 3 Backbone strain name (bsn) : (none) Backbone file type (bft) : FASTA file (fasta) Backbone rail length (brl) : 2500 Backbone base quality (bbq) : 0 Also build new contigs (abnc) : Yes Dataprocessing options (-DP): Use read extensions (ure) : Yes Read extension window length (rewl) : 30 Read extension w. maxerrors (rewme) : 2 First extension in pass (feip) : 0 Last extension in pass (leip) : 0 Tag poly A/T at ends (tpae) : No Polybase window length (pbwl) : 7 Polybase window maxerrors (pbwme) : 2 Polyb. window grace distance (pbwgc): 9 Clipping options (-CL): Possible vector leftover clip (pvc) : Yes maximum len allowed (pvcmla) : 18 Quality clip (qc) : No Minimum quality (qcmq) : 20 Window length (qcwl) : 30 Masked bases clip (mbc) : No Gap size (mbcgs) : 20 Max front gap (mbcmfg) : 40 Max end gap (mbcmeg) : 60 Ensure minimum left clip (emlc) : Yes Minimum left clip req. (mlcr) : 25 Set minimum left clip to (smlc) : 30 Parameters for SKIM algorithm (-SK): Bases per hash (bph) : 16 Hash save stepping (hss) : 4 Percent required (pr) : 45 Maximum hashes in memory (mhim) : 15000000 Max hits per read (mhpr) : 200 Align parameters for Smith-Waterman align (-AL): Bandwidth in percent (bip) : 20 Bandwidth max (bmax) : 130 Bandwidth min (bmin) : 25 Minimum score (ms) : 30 Minimum overlap (mo) : 15 Minimum relative score in % (mrs) : 65 Extra gap penalty (egp) : Yes extra gap penalty level (egpl) : low Max. egp in percent (megpp) : 100 Contig parameters (-CO): Name prefix (np) : cjejuni_demo Error analysis (an) : SCF signal (signal) Reject on drop in relative alignment score (%) : 25 Max. error rate in dangerous zones in % (dmer) : 1 Mark repeats (mr) : Yes Assume SNP instead of repeats (asir) : No Minimum reads per group needed for tagging (mrpg) : 2 Minimum neighbour quality needed for tagging (mnq) : 20 Minimum Group Quality needed for RMB Tagging (mgqrt) : 30 End-read Marking Exclusion Area in bases (emea) : 25 Also mark gap bases (amgb) : Yes Also mark gap bases - even multicolumn (amgbemc) : Yes Also mark gap bases - need both strands (amgbnbs): Yes Default template insert size minimum (dismin) : 500 Default template insert size maximum (dismax) : 5000 Edit options (-ED): Automatic contig editing (ace) : No Strict editing mode (sem) : No Confirmation threshold in percent (ct): 50 Directories (-DI): When loading EXP files: When loading SCF files: For writing log files : cjejuni_demo_log For writing gap4 DA res.: cjejuni_demo_out Input files (-FI): When loading EXP fofn : cjejuni_demo_in.fofn When loading project from PHD : cjejuni_demo_in.phd.1 When loading project from CAF : cjejuni_demo_in.caf When loading sequences from FASTA : cjejuni_demo_in.fasta When loading qualities from FASTA quality: cjejuni_demo_in.fasta.qual When loading straindata : cjejuni_demo_straindata_in.txt When loading XML trace info files : cjejuni_demo_traceinfo_in.xml When loading backbone from CAF : cjejuni_demo_backbone_in.caf When loading backbone from GenBank : cjejuni_demo_backbone_in.gbf When loading backbone from FASTA : cjejuni_demo_backbone_in.fasta Output files (-OUTPUT/-OUT): Result files: Saved as CAF (orc): Yes Saved as FASTA (orf): Yes Saved as GAP4 (directed assembly) (org): Yes Saved as phrap ACE (ora): Yes Saved as HTML (orh): Yes Saved as Transposed Contig Summary (ors): Yes Saved as simple text format (ort): Yes Temporary result files: Saved as CAF (otc): No Saved as FASTA (otf): No Saved as GAP4 (directed assembly) (otg): No Saved as phrap ACE (ota): No Saved as HTML (oth): No Saved as Transposed Contig Summary(ots): No Saved as simple text format (ott): No Extended temporary result files: Saved as CAF (oetc): No Saved as FASTA (oetf): No Saved as GAP4 (directed assembly) (oetg): No Saved as phrap ACE (oeta): No Saved as HTML (oeth): No Save also singlets (oetas): No Alignment output customisation: TEXT characters per line (tcpl): 60 HTML characters per line (hcpl): 60 TEXT characters per line (tegfc): ' ' HTML characters per line (hegfc): ' ' File / directory names: CAF : cjejuni_demo_out.caf FASTA : cjejuni_demo_out.unpadded.fasta FASTA quality : cjejuni_demo_out.unpadded.fasta.qual FASTA (padded) : cjejuni_demo_out.padded.fasta FASTA qual.(pad): cjejuni_demo_out.padded.fasta.qual GAP4 (directory): cjejuni_demo_out.gap4da ACE : cjejuni_demo_out.ace HTML : cjejuni_demo_out.html Simple text : cjejuni_demo_out.txt TCS overview : cjejuni_demo_out.tcs Creating directory cjejuni_demo_log ... done. Creating directory cjejuni_demo_results ... done. Creating directory cjejuni_demo_info ... done. Localtime: Thu Jul 15 12:00:00 2010 Loading data normal (probably Sanger type) from FASTA file cjejuni_demo_in.fasta Counting sequences in FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Loading sequence data from FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Loading quality data from FASTA quality file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. There haven been 544 reads given, 544 of which have quality accounted for. Localtime: Thu Jul 15 12:00:00 2010 Checking SCF files (loading qualities only if needed): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. 0 SCF files loaded ok. 544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names). Localtime: Thu Jul 15 12:00:00 2010 Merging data from XML trace info file cjejuni_demo_traceinfo_in.xml ...Num reads: 496 Building hash table ... done. Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Done merging XML data, matched 496 reads. Localtime: Thu Jul 15 12:00:00 2010 Checking SCF files (loading qualities only if needed): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. 0 SCF files loaded ok. 544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names). Starting minimum left vector clip ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 626 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4243 possible: 4607 permbans: 0 Hits chosen: 4243 Localtime: Thu Jul 15 12:00:00 2010 Pre-assembly alignment search for read extension and / or vector clipping: Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.2 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Pre-assembly read extension: Localtime: Thu Jul 15 12:00:00 2010 Searching possible read extensions: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Changed length of 258 sequences. Mean length gained in these sequences: 73.2713 bases. Pre-assembly vector clipping Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4512 possible: 4913 permbans: 0 Hits chosen: 4512 Localtime: Thu Jul 15 12:00:00 2010 Pass: 1 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++ [120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++ [178] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [238] ++++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++ [296] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [356] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [416] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [476] +++++a++++++++a+a++++++++++++++++a++++++++++++++++++++ RL1 [526] aaaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40028 Avg. contig coverage: 8.66 Consensus contains: A: 13590 C: 5845 G: 6941 T: 13404 N: 0 IUPAC: 24 Funny: 0 *: 224 Num reads: 526 Avg. read length: 659 Reads contain 343983 bases, 0 Ns and 2661 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 1 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering tags to readpool. The previously assembled contig had grave misassemblies, rebuilding contig 2 now. Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++ [120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++ [178] +++++++++++++++++++++++++++++++++++p+++p++++++++++++++++++++ [236] +++++++++a+++++a++++++++++++++++++++++++++++++++++++++++++++ [294] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [354] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [414] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [474] +++++++++a++++p+a+p+++++++++a+++++a+++++++++++++++++++++ RL1 [524] aaapThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342555 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.1.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.1.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.1.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.1.txt Pass: 2 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4512 possible: 4913 permbans: 0 Hits chosen: 4512 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.2.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.2.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.2.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.2.txt Pass: 3 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4498 possible: 4913 permbans: 14 Hits chosen: 4498 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.3.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.3.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.3.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.3.txt Localtime: Thu Jul 15 12:00:00 2010 Hunting contig join spoiler ... done. Localtime: Thu Jul 15 12:00:00 2010 Pass: 4 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4498 possible: 4913 permbans: 14 Hits chosen: 4498 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.4.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.4.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.4.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.4.txt Assembly finished, saving final results. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_info/cjejuni_demo_info_contigstats.txt Localtime: Thu Jul 15 12:00:00 2010 Saving read tag list to file: cjejuni_demo_info/cjejuni_demo_info_readtaglist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contig tag list to file: cjejuni_demo_info/cjejuni_demo_info_consensustaglist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving project contig<->read list to file: cjejuni_demo_info/cjejuni_demo_info_contigreadlist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.caf Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to directory: cjejuni_demo_results/cjejuni_demo_out.gap4da (first deleting old directory) (now creating new directory) (saving contigs) Done. Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta Saving padded contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta Saving contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta.qual Saving padded contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta.qual Localtime: Thu Jul 15 12:00:00 2010 Saving contigs TCS to file: cjejuni_demo_results/cjejuni_demo_out.tcs Localtime: Thu Jul 15 12:00:00 2010 Saving SNP analysis to file: cjejuni_demo_info/cjejuni_demo_info_snpanalysis.txt Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.ace Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.html Localtime: Thu Jul 15 12:00:00 2010 End of assembly process, thank you for using MIRA. |
Go to the output files for this example
Command line arguments
MIRA fragment assembly program Version: EMBOSS:6.3.0 Standard (Mandatory) qualifiers: -project string [mira] Default is mira. Defines the project name for this assembly. The project name automatically influences the name of input and output files or directories. E.g. in the default setting, the file names for the output of the assembly in FASTA format would be mira_out.fasta and mira_out.fasta.qual. Setting the project name to 'MyProject' would generate MyProject_out.fasta and MyProject_out.fasta.qual. (Any string) Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -paramsfile infile Loads parameters from the filename given. Allows a maximum of 10 levels of recursion, i.e. a -params option appearing within a file that loads other parameter files -setparam menu [unspecified] Sets parameters suited for loading sequences from FASTA, PHD or CAF files. The default is not to specify the type of input file. (Values: unspecified (Unspecified); fasta (Fasta); phd (PHD); caf (CAF)) -expdir directory [.] Defines the directory where mira should search for experiment files (EXP). -scfdir directory [.] Defines the directory where mira should search for SCF files -feifile infile [mira_in.fofn] Defines the file of filenames where the names of the EXP files of a project are located. -fpifile infile [mira_in.fofn] Defines the file of filenames where the names of the PHD files of a project are located. -pifile infile [mira_in.phd] Defines the PHD file to load sequences of a project from. -faifile infile [mira_in.fasta] Defines the FASTA file to load sequences of a project from. -fqifile infile [mira_in.fasta.qual] Defines the fasta file to load base qualities of a project from. Although the order of reads in the quality file does not need to be the same as in the fasta or fofn projects (although it saves a bit of time if they are). -cifile infile [mira_in.caf] Defines the file to load a CAF project from. Filename must end with '.caf'. -sdifile infile [mira_straindata_in.txt] Defines the file to load straindata from. Only used in EST projects (miraEST). -xtiifile infile [mira_xmltraceinfo_in.xml] Defines the file to load a trace info file in XML format from. This can be used both when merging XML data to loaded files or when loading a project from an XML trace info file. -genome menu [normal] Quality grades of de-novo genome assembly. Draft is quick-and-dirty, suited to get a first look on approximate coverage of a running project. Should not be used for anything else. Normal is the default parameter set of mira that is able to tackle most genomes. A bit slower than the draft version, but includes such options as read extension and vector remnant clipping. Accurate is still slower than the normal mode but should be used for genomes that pose a problem to the normal mode. (Values: draft (Draft); normal (Normal); accurate (Accurate)) -mapping menu [normal] Work like the -genome switches except they are to be used when performing mapping assemblies against given backbone sequences. (Values: draft (Draft); normal (Normal); accurate (Accurate)) -clipping menu [medium] Three clipping grade modifiers, from light clipping when working with well preprocessed sequences to heavy clipping when the sequences that are being assembled had only sloppy or no preprocessing. Note 1 - the light version is already included in the -genome and -mapping switches. Note 2 - it is recommended that you perform a thorough preprocessing (clipping sequencing vector stretches, clipping of low quality bases, tagging standard repeats etc.) before assembling sequences. The clipping routines of mira are more optimised to cope with the last remnants of wrongly preprocessed sequences than with sequences having had no pre-processing at all. (Values: light (Light); medium (Medium); heavy (Heavy)) -highlyrepetitive boolean [N] A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats. -highqualitydata boolean [N] A modifier switch when the sequences that are used are of exceptional quality. mira will then bump up a few quality parameters which should lead to less false positives in the repeat and SNP detection routines. -estmode boolean [N] Switches mira to a good initial preset for assembling EST data. Note that this is not needed (and even counterproductive) when used with miraEST. -horrid boolean [N] Sets a number of parameters useful when dealing with really horrid data sets. Useful means that parameters are chosen to so that time and memory consumption do not explode beyond all hope of the program returning. Note that MIRA will return in most cases useful assemblies with this switch, but these might not be as optimised as with normal operation. The definition of 'horrid' is a bit flexible, for example, (a) a genomic projects with more than 2.000 reads that all seem to align partly to each other but have different repetitive structures or (b) EST clusters with a few thousand almost similar reads. -borg boolean [N] Sets several parameters to have mira try to assemble as many reads as possible. Will probably slow down the assembly process and use more memory. 'We are MIRA of borg. You will be assembled, resistance is futile!' -lj menu [fofnexp] Defines whether to load and assemble EXP files from a file of filenames ('mira_in.fofn'), load and assemble FASTA sequences ('mira_in.fasta') and their qualities ('mira_in.fasta.qual'), load and assemble sequences or qualities from a phd file ('mira_in.phd') or to load a project from a CAF file ('mira_in.caf') and assemble or eventually reassemble it. N.B. fofnphd is not currently available. (Values: fofnexp (EXP files from a file of filenames); fasta (Load and assemble FASTA); caf (Load and assemble CAF); phd (Load and assemble PHD); fofnphd (PHD files from a file of filenames)) -fo boolean [N] If set to 'Y', the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files. -mxti boolean [N] Some file formats above (FASTA, PHD or even CAF and EXP) possibly don't contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to 'Y' and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though. -rns menu [sanger] Defines the centre naming scheme for read suffixes. Currently, only Sanger Institute and TIGR naming schemes are supported out of the box. How to choose? Please read the documentation available at the different centres or ask your sequence provider. In a nutshell, the Sanger scheme is 'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...' (e.g. U13a08f10.p1ca), TIGR scheme is 'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or GCPDL68TABRPT103A58B). (Values: sanger (Sanger); tigr (TIGR)) -eq menu [SCF] Defines the source format for reading qualities from external sources. Normally takes effect only when these are not present in the format of the load_job project (EXP and FASTA can have them, CAF and PHD must have them). (Values: none (None); SCF (SCF)) -eqo boolean [N] Only takes effect when 'lj' is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones. -[no]droeqe boolean [Y] Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source). -[no]uti boolean [Y] Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to 'Y' if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax -ess integer [1] Controls the starting step of the EST assembly and is therefore only useful in miraEST. EST assembly is a three step process, each with different settings to the assembly engine, with the result of each step being saved to disk. If results of previous steps are present in a directory, one can easily 'play around' with different setting for subsequent steps by reusing the results of the previous steps and directly starting with step two or three. (Integer from 1 to 4) -[no]ps boolean [Y] Controls whether date and time are printed out during the assembly. Suppressing it isn't useful in normal operation, only when debugging or benchmarking. -lsd boolean [N] Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. -lb boolean [N] A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read 'very closely related' as in 'only SNP mutations or short indels present'. -sbuip integer [3] When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb - if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example - nop 4 and sbuip 3). (Integer 1 or more) -bsn string Defines the name of the strain that the backbone sequences have. (Any string) -bft menu [fasta] Defines the filetype of the backbone file given. Currently (2.8.1 ) only FASTA, CAF and GBF files are supported. When GBF (GenBank files, also named .gbk) files are loaded, the features within these files are automatically transformed into Staden-compatible tags and get passed through the assembly. (Values: fasta (Fasta); caf (CAF); gbf (GenBank)) -brl integer [2500] Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases. (Integer from 1000 to 3000) -bbq integer [-1] Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads. (Integer from -1 to 100) -[no]abnc boolean [Y] The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to 'N', the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs. -mrl integer [40] Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won't be present in the final project. (Integer 20 or more) -nop integer [3] Defines how many iterations of the whole assembly process are done. Rule of thumb - for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats. (Integer 1 or more) -[no]sep boolean [Y] Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to 'N', skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to 'Y' is highly recommended, setting it to 'N' is only for quick and dirty assemblies. -rbl integer [2] Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found. (Integer 1 or more) -[no]sd boolean [Y] Default is 'Y' for mira and 'N' for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect. -[no]sdlpo boolean [Y] Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on. -bdq integer [10] Defines the default base quality of reads that have no quality read from a file. (Integer 0 or more) -[no]ugpf boolean [Y] MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas 'Y' for genome projects and set to 'N' for EST projects. -[no]uess boolean [Y] Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy. -esspd integer [500] Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read. (Integer 1 or more) -umcbt boolean [N] Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. -bts integer [10000] Depending on -umcbt above, this number defines the time in seconds alloted to building one contig. (Integer 1 or more) -[no]ure boolean [Y] Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. -rewl integer [30] Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length. (Integer 1 or more) -rewme integer [2] Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window. (Integer 1 or more) -feip integer [0] Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass. (Integer 0 or more) -leip integer [0] Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass. (Integer 0 or more) -tpae boolean [N] This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs. -pbwl integer [7] Only takes effect when -tpae is set to 'Y'. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch. (Integer 1 or more) -pbwme integer [2] Only takes effect when -tpae is set to 'Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important. (Integer 1 or more) -pbwgd integer [9] Only takes effect when -tpae is set to 'Y'. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one. (Integer 1 or more) -[no]pvc boolean [Y] Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results. -pvcmla integer [18] The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found. (Integer 0 or more) -qc boolean [N] Default is 'N', but is automatically set to 'Y' when using the setparam options 'fasta' or 'phd' (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on. -qcmq integer [20] This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don't use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed. (Integer from 15 to 35) -qcwl integer [30] This is the length of a window in bases for the quality clip. (Integer 10 or more) -[no]mbc boolean [Y] This will let mira perform a 'clipping' of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be 'clipped'. -mbcgs integer [20] While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart. (Integer 0 or more) -mbcmfg integer [40] While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch. (Integer 0 or more) -mbcmeg integer [60] While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch. (Integer 0 or more) -[no]emlc boolean [Y] If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc -mlcr integer [25] If -emlc is 'Y', checks whether there is a left clip whose length is at least the size specified here. (Integer 0 or more) -smlc integer [30] If -emlc is 'Y' and the actual left clip is < -mlcr, then set the left clip of read to the value given here. (Integer 0 or more) -bph integer [14] Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. (Integer 1 or more) -hss integer [4] This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found. (Integer 1 or more) -pr integer [50] Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches). (Integer 1 or more) -mhpr integer [200] Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much. (Integer 1 or more) -bip integer [15] The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER - increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed). (Integer from 1 to 100) -bmin integer [25] Minimum bandwidth in bases to each side. (Integer 1 or more) -bmax integer [50] Maximum bandwidth in bases to each side. (Integer 1 or more) -mo integer [15] Minimum number of overlapping bases needed in an alignment of two sequences to be accepted. (Integer 1 or more) -ms integer [15] Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory). (Integer 1 or more) -mrs integer [65] Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode. (Integer from 1 to 100) -egp boolean [N] Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to 'Y' might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence ... although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to 'Y' for EST projects and de-novo genome assembly, set it to 'N' for assembly of closely related strains (assembly against a backbone). When set to 'N', it is recommended to have -amgb and -amgbemc both set to 'Y'. -egpl menu [low] Has no effect if extra_gap_penalty is off. Defines an extra penalty applied to 'long' gaps. There are these predefined levels - 1. low - use this if you expect your base caller frequently misses two or more bases. 2. medium - use this if your base caller is expected to frequently miss one to two bases. 3. high - use this if your base caller does not frequently miss more than one base. For some stages of the EST assembly process, a special value 'est' is used. (Values: low (Low); medium (Medium); high (High); est (EST split splices)) -megpp integer [100] Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to 'long' gaps. (Integer from 1 to 100) -np string [mira] Contigs will have this string prepended to their names. (Any string) -an menu [signal] When adding reads to a contig, dangerous regions can get an extra integrity check. none = no extra check. text = check is only text-based. signal = check is signal based, if the SCF trace is not available, fallback is 'text'. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. (Values: none (None); text (Text); signal (Signal)) -rodirs integer [15] When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance. (Integer from 1 to 100) -dmer integer [1] When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. (Integer from 1 to 100) -[no]mr boolean [Y] One of the most important switches in MIRA. If set to 'Y', MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs. -asir boolean [N] Only takes effect when -mr is set to 'Y', effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to 'Y',= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly. -mrpg integer [2] Only takes effect when -mr is set to 'Y'. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP). (Integer 2 or more) -mgqrt integer [30] Only takes effect when -mr is set to 'Y'. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you'd end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false positives, a value of 40 will probably never show false positives. (Integer 25 or more) -emea integer [15] Only takes effect when -mr is set to 'Y'. Using the end of sequences of Sanger type shotgun sequencing is always a bit risky, as wrongly called bases tend to crowd there or some sequencing vector relicts hang around. It is even more risky to use these stretches for detecting possible repeats, so one can define an exclusion area where the bases are not used when determining whether a mismatch is due to repeats or not. (Integer 0 or more) -[no]amgb boolean [Y] Determines whether columns containing gap bases (indels) are also tagged. -[no]amgbemc boolean [Y] Only takes effect when -amgb is set to 'Y'. Determines whether multiple columns containing gap bases (indels) are also tagged. -[no]amgbnbs boolean [Y] Only takes effect when -amgb is set to 'Y'. Determines whether, for both tagging columns containing gap bases, both strands need to have a gap. Setting this to 'N' is not recommended except when working in desperately low coverage situations. -dismin integer [500] The minimum distance that read pairs may be apart. There is an additional error margin of 10% subtracted from this value during internal computations. (Integer 0 or more) -dismax integer [5000] The maximum distance that read pairs may be apart. There is an additional error margin of 10% added to this value during internal computations. (Integer 0 or more) -ace boolean [N] Once contigs have been build, mira can call a built-in version of the automatic contig editor EdIt. EdIt will try to resolve discrepancies in the contig by performing trace analysis and correct even hard to resolve errors. This option is always useful, but especially in conjunction with -nop and -ure. Notice: the current development version has a memory leak in the editor, therefore the option is not automatically turned on. -[no]sem boolean [Y] If set to 'Y' the automatic editor will not take error hypotheses with a low probability into account, even if all the requirements to make an edit are fulfilled. -ct integer [50] The higher this value, the more strict the automatic editor will apply its internal rule set. Going below 40 is not recommended. (Integer from 1 to 100) -[no]orc boolean [Y] Output CAF results -[no]org boolean [Y] Output GAP4 results -[no]orf boolean [Y] Output FASTA results -ora boolean [N] Output ACE results -[no]ort boolean [Y] Output TXT results -[no]ors boolean [Y] Output TCS results -orh boolean [N] Output HTML results -otc boolean [N] Output temporary CAF results -otg boolean [N] Output temporary GAP4 results -otf boolean [N] Output temporary FASTA results -ota boolean [N] Output temporary ACE results -ott boolean [N] Output temporary TXT results -ots boolean [N] Output temporary TCS results -oth boolean [N] Output temporary HTML results -oetc boolean [N] Output extra temporary CAF results -oetg boolean [N] Output extra temporary GAP4 results -oetf boolean [N] Output extra temporary FASTA results -oeta boolean [N] Output extra temporary ACE results -oett boolean [N] Output extra temporary TXT results -oeth boolean [N] Output extra temporary HTML results -tcpl integer [60] When producing an output in text format (-ort|ott|oett), this parameter defines how many bases each line of an alignment should contain. (Integer 1 or more) -hcpl integer [60] When producing an output in text format (-orh|oth|oeth), this parameter defines how many bases each line of an alignment should contain. (Integer 1 or more) -gapfda string [gap4da] Defines the extension of the directory where mira will write the result of an assembly ready to import into the Staden package (GAP4) in Direct Assembly format. The name of the directory will then be |
Qualifier | Type | Description | Allowed values | Default | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||||||||||||
-project | string | Default is mira. Defines the project name for this assembly. The project name automatically influences the name of input and output files or directories. E.g. in the default setting, the file names for the output of the assembly in FASTA format would be mira_out.fasta and mira_out.fasta.qual. Setting the project name to 'MyProject' would generate MyProject_out.fasta and MyProject_out.fasta.qual. | Any string | mira | ||||||||||
Additional (Optional) qualifiers | ||||||||||||||
(none) | ||||||||||||||
Advanced (Unprompted) qualifiers | ||||||||||||||
-paramsfile | infile | Loads parameters from the filename given. Allows a maximum of 10 levels of recursion, i.e. a -params option appearing within a file that loads other parameter files | Input file | Required | ||||||||||
-setparam | list | Sets parameters suited for loading sequences from FASTA, PHD or CAF files. The default is not to specify the type of input file. |
|
unspecified | ||||||||||
-expdir | directory | Defines the directory where mira should search for experiment files (EXP). | Directory | . | ||||||||||
-scfdir | directory | Defines the directory where mira should search for SCF files | Directory | . | ||||||||||
-feifile | infile | Defines the file of filenames where the names of the EXP files of a project are located. | Input file | mira_in.fofn | ||||||||||
-fpifile | infile | Defines the file of filenames where the names of the PHD files of a project are located. | Input file | mira_in.fofn | ||||||||||
-pifile | infile | Defines the PHD file to load sequences of a project from. | Input file | mira_in.phd | ||||||||||
-faifile | infile | Defines the FASTA file to load sequences of a project from. | Input file | mira_in.fasta | ||||||||||
-fqifile | infile | Defines the fasta file to load base qualities of a project from. Although the order of reads in the quality file does not need to be the same as in the fasta or fofn projects (although it saves a bit of time if they are). | Input file | mira_in.fasta.qual | ||||||||||
-cifile | infile | Defines the file to load a CAF project from. Filename must end with '.caf'. | Input file | mira_in.caf | ||||||||||
-sdifile | infile | Defines the file to load straindata from. Only used in EST projects (miraEST). | Input file | mira_straindata_in.txt | ||||||||||
-xtiifile | infile | Defines the file to load a trace info file in XML format from. This can be used both when merging XML data to loaded files or when loading a project from an XML trace info file. | Input file | mira_xmltraceinfo_in.xml | ||||||||||
-genome | list | Quality grades of de-novo genome assembly. Draft is quick-and-dirty, suited to get a first look on approximate coverage of a running project. Should not be used for anything else. Normal is the default parameter set of mira that is able to tackle most genomes. A bit slower than the draft version, but includes such options as read extension and vector remnant clipping. Accurate is still slower than the normal mode but should be used for genomes that pose a problem to the normal mode. |
|
normal | ||||||||||
-mapping | list | Work like the -genome switches except they are to be used when performing mapping assemblies against given backbone sequences. |
|
normal | ||||||||||
-clipping | list | Three clipping grade modifiers, from light clipping when working with well preprocessed sequences to heavy clipping when the sequences that are being assembled had only sloppy or no preprocessing. Note 1 - the light version is already included in the -genome and -mapping switches. Note 2 - it is recommended that you perform a thorough preprocessing (clipping sequencing vector stretches, clipping of low quality bases, tagging standard repeats etc.) before assembling sequences. The clipping routines of mira are more optimised to cope with the last remnants of wrongly preprocessed sequences than with sequences having had no pre-processing at all. |
|
medium | ||||||||||
-highlyrepetitive | boolean | A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats. | Boolean value Yes/No | No | ||||||||||
-highqualitydata | boolean | A modifier switch when the sequences that are used are of exceptional quality. mira will then bump up a few quality parameters which should lead to less false positives in the repeat and SNP detection routines. | Boolean value Yes/No | No | ||||||||||
-estmode | boolean | Switches mira to a good initial preset for assembling EST data. Note that this is not needed (and even counterproductive) when used with miraEST. | Boolean value Yes/No | No | ||||||||||
-horrid | boolean | Sets a number of parameters useful when dealing with really horrid data sets. Useful means that parameters are chosen to so that time and memory consumption do not explode beyond all hope of the program returning. Note that MIRA will return in most cases useful assemblies with this switch, but these might not be as optimised as with normal operation. The definition of 'horrid' is a bit flexible, for example, (a) a genomic projects with more than 2.000 reads that all seem to align partly to each other but have different repetitive structures or (b) EST clusters with a few thousand almost similar reads. | Boolean value Yes/No | No | ||||||||||
-borg | boolean | Sets several parameters to have mira try to assemble as many reads as possible. Will probably slow down the assembly process and use more memory. 'We are MIRA of borg. You will be assembled, resistance is futile!' | Boolean value Yes/No | No | ||||||||||
-lj | list | Defines whether to load and assemble EXP files from a file of filenames ('mira_in.fofn'), load and assemble FASTA sequences ('mira_in.fasta') and their qualities ('mira_in.fasta.qual'), load and assemble sequences or qualities from a phd file ('mira_in.phd') or to load a project from a CAF file ('mira_in.caf') and assemble or eventually reassemble it. N.B. fofnphd is not currently available. |
|
fofnexp | ||||||||||
-fo | boolean | If set to 'Y', the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files. | Boolean value Yes/No | No | ||||||||||
-mxti | boolean | Some file formats above (FASTA, PHD or even CAF and EXP) possibly don't contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to 'Y' and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though. | Boolean value Yes/No | No | ||||||||||
-rns | list | Defines the centre naming scheme for read suffixes. Currently, only Sanger Institute and TIGR naming schemes are supported out of the box. How to choose? Please read the documentation available at the different centres or ask your sequence provider. In a nutshell, the Sanger scheme is 'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...' (e.g. U13a08f10.p1ca), TIGR scheme is 'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or GCPDL68TABRPT103A58B). |
|
sanger | ||||||||||
-eq | list | Defines the source format for reading qualities from external sources. Normally takes effect only when these are not present in the format of the load_job project (EXP and FASTA can have them, CAF and PHD must have them). |
|
SCF | ||||||||||
-eqo | boolean | Only takes effect when 'lj' is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones. | Boolean value Yes/No | No | ||||||||||
-[no]droeqe | boolean | Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source). | Boolean value Yes/No | Yes | ||||||||||
-[no]uti | boolean | Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to 'Y' if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax | Boolean value Yes/No | Yes | ||||||||||
-ess | integer | Controls the starting step of the EST assembly and is therefore only useful in miraEST. EST assembly is a three step process, each with different settings to the assembly engine, with the result of each step being saved to disk. If results of previous steps are present in a directory, one can easily 'play around' with different setting for subsequent steps by reusing the results of the previous steps and directly starting with step two or three. | Integer from 1 to 4 | 1 | ||||||||||
-[no]ps | boolean | Controls whether date and time are printed out during the assembly. Suppressing it isn't useful in normal operation, only when debugging or benchmarking. | Boolean value Yes/No | Yes | ||||||||||
-lsd | boolean | Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. | Boolean value Yes/No | No | ||||||||||
-lb | boolean | A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read 'very closely related' as in 'only SNP mutations or short indels present'. | Boolean value Yes/No | No | ||||||||||
-sbuip | integer | When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb - if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example - nop 4 and sbuip 3). | Integer 1 or more | 3 | ||||||||||
-bsn | string | Defines the name of the strain that the backbone sequences have. | Any string | |||||||||||
-bft | list | Defines the filetype of the backbone file given. Currently (2.8.1 ) only FASTA, CAF and GBF files are supported. When GBF (GenBank files, also named .gbk) files are loaded, the features within these files are automatically transformed into Staden-compatible tags and get passed through the assembly. |
|
fasta | ||||||||||
-brl | integer | Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases. | Integer from 1000 to 3000 | 2500 | ||||||||||
-bbq | integer | Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads. | Integer from -1 to 100 | -1 | ||||||||||
-[no]abnc | boolean | The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to 'N', the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs. | Boolean value Yes/No | Yes | ||||||||||
-mrl | integer | Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won't be present in the final project. | Integer 20 or more | 40 | ||||||||||
-nop | integer | Defines how many iterations of the whole assembly process are done. Rule of thumb - for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats. | Integer 1 or more | 3 | ||||||||||
-[no]sep | boolean | Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to 'N', skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to 'Y' is highly recommended, setting it to 'N' is only for quick and dirty assemblies. | Boolean value Yes/No | Yes | ||||||||||
-rbl | integer | Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found. | Integer 1 or more | 2 | ||||||||||
-[no]sd | boolean | Default is 'Y' for mira and 'N' for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect. | Boolean value Yes/No | Yes | ||||||||||
-[no]sdlpo | boolean | Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on. | Boolean value Yes/No | Yes | ||||||||||
-bdq | integer | Defines the default base quality of reads that have no quality read from a file. | Integer 0 or more | 10 | ||||||||||
-[no]ugpf | boolean | MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas 'Y' for genome projects and set to 'N' for EST projects. | Boolean value Yes/No | Yes | ||||||||||
-[no]uess | boolean | Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy. | Boolean value Yes/No | Yes | ||||||||||
-esspd | integer | Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read. | Integer 1 or more | 500 | ||||||||||
-umcbt | boolean | Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. | Boolean value Yes/No | No | ||||||||||
-bts | integer | Depending on -umcbt above, this number defines the time in seconds alloted to building one contig. | Integer 1 or more | 10000 | ||||||||||
-[no]ure | boolean | Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. | Boolean value Yes/No | Yes | ||||||||||
-rewl | integer | Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length. | Integer 1 or more | 30 | ||||||||||
-rewme | integer | Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window. | Integer 1 or more | 2 | ||||||||||
-feip | integer | Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass. | Integer 0 or more | 0 | ||||||||||
-leip | integer | Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass. | Integer 0 or more | 0 | ||||||||||
-tpae | boolean | This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs. | Boolean value Yes/No | No | ||||||||||
-pbwl | integer | Only takes effect when -tpae is set to 'Y'. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch. | Integer 1 or more | 7 | ||||||||||
-pbwme | integer | Only takes effect when -tpae is set to 'Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important. | Integer 1 or more | 2 | ||||||||||
-pbwgd | integer | Only takes effect when -tpae is set to 'Y'. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one. | Integer 1 or more | 9 | ||||||||||
-[no]pvc | boolean | Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results. | Boolean value Yes/No | Yes | ||||||||||
-pvcmla | integer | The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found. | Integer 0 or more | 18 | ||||||||||
-qc | boolean | Default is 'N', but is automatically set to 'Y' when using the setparam options 'fasta' or 'phd' (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on. | Boolean value Yes/No | No | ||||||||||
-qcmq | integer | This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don't use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed. | Integer from 15 to 35 | 20 | ||||||||||
-qcwl | integer | This is the length of a window in bases for the quality clip. | Integer 10 or more | 30 | ||||||||||
-[no]mbc | boolean | This will let mira perform a 'clipping' of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be 'clipped'. | Boolean value Yes/No | Yes | ||||||||||
-mbcgs | integer | While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart. | Integer 0 or more | 20 | ||||||||||
-mbcmfg | integer | While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch. | Integer 0 or more | 40 | ||||||||||
-mbcmeg | integer | While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch. | Integer 0 or more | 60 | ||||||||||
-[no]emlc | boolean | If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc | Boolean value Yes/No | Yes | ||||||||||
-mlcr | integer | If -emlc is 'Y', checks whether there is a left clip whose length is at least the size specified here. | Integer 0 or more | 25 | ||||||||||
-smlc | integer | If -emlc is 'Y' and the actual left clip is < -mlcr, then set the left clip of read to the value given here. | Integer 0 or more | 30 | ||||||||||
-bph | integer | Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. | Integer 1 or more | 14 | ||||||||||
-hss | integer | This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found. | Integer 1 or more | 4 | ||||||||||
-pr | integer | Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches). | Integer 1 or more | 50 | ||||||||||
-mhpr | integer | Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much. | Integer 1 or more | 200 | ||||||||||
-bip | integer | The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER - increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed). | Integer from 1 to 100 | 15 | ||||||||||
-bmin | integer | Minimum bandwidth in bases to each side. | Integer 1 or more | 25 | ||||||||||
-bmax | integer | Maximum bandwidth in bases to each side. | Integer 1 or more | 50 | ||||||||||
-mo | integer | Minimum number of overlapping bases needed in an alignment of two sequences to be accepted. | Integer 1 or more | 15 | ||||||||||
-ms | integer | Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory). | Integer 1 or more | 15 | ||||||||||
-mrs | integer | Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode. | Integer from 1 to 100 | 65 | ||||||||||
-egp | boolean | Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to 'Y' might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence ... although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to 'Y' for EST projects and de-novo genome assembly, set it to 'N' for assembly of closely related strains (assembly against a backbone). When set to 'N', it is recommended to have -amgb and -amgbemc both set to 'Y'. | Boolean value Yes/No | No | ||||||||||
-egpl | list | Has no effect if extra_gap_penalty is off. Defines an extra penalty applied to 'long' gaps. There are these predefined levels - 1. low - use this if you expect your base caller frequently misses two or more bases. 2. medium - use this if your base caller is expected to frequently miss one to two bases. 3. high - use this if your base caller does not frequently miss more than one base. For some stages of the EST assembly process, a special value 'est' is used. |
|
low | ||||||||||
-megpp | integer | Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to 'long' gaps. | Integer from 1 to 100 | 100 | ||||||||||
-np | string | Contigs will have this string prepended to their names. | Any string | mira | ||||||||||
-an | list | When adding reads to a contig, dangerous regions can get an extra integrity check. none = no extra check. text = check is only text-based. signal = check is signal based, if the SCF trace is not available, fallback is 'text'. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. |
|
signal | ||||||||||
-rodirs | integer | When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance. | Integer from 1 to 100 | 15 | ||||||||||
-dmer | integer | When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. | Integer from 1 to 100 | 1 | ||||||||||
-[no]mr | boolean | One of the most important switches in MIRA. If set to 'Y', MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs. | Boolean value Yes/No | Yes | ||||||||||
-asir | boolean | Only takes effect when -mr is set to 'Y', effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to 'Y',= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly. | Boolean value Yes/No | No | ||||||||||
-mrpg | integer | Only takes effect when -mr is set to 'Y'. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP). | Integer 2 or more | 2 | ||||||||||
-mgqrt | integer | Only takes effect when -mr is set to 'Y'. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you'd end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false positives, a value of 40 will probably never show false positives. | Integer 25 or more | 30 | ||||||||||
-emea | integer | Only takes effect when -mr is set to 'Y'. Using the end of sequences of Sanger type shotgun sequencing is always a bit risky, as wrongly called bases tend to crowd there or some sequencing vector relicts hang around. It is even more risky to use these stretches for detecting possible repeats, so one can define an exclusion area where the bases are not used when determining whether a mismatch is due to repeats or not. | Integer 0 or more | 15 | ||||||||||
-[no]amgb | boolean | Determines whether columns containing gap bases (indels) are also tagged. | Boolean value Yes/No | Yes | ||||||||||
-[no]amgbemc | boolean | Only takes effect when -amgb is set to 'Y'. Determines whether multiple columns containing gap bases (indels) are also tagged. | Boolean value Yes/No | Yes | ||||||||||
-[no]amgbnbs | boolean | Only takes effect when -amgb is set to 'Y'. Determines whether, for both tagging columns containing gap bases, both strands need to have a gap. Setting this to 'N' is not recommended except when working in desperately low coverage situations. | Boolean value Yes/No | Yes | ||||||||||
-dismin | integer | The minimum distance that read pairs may be apart. There is an additional error margin of 10% subtracted from this value during internal computations. | Integer 0 or more | 500 | ||||||||||
-dismax | integer | The maximum distance that read pairs may be apart. There is an additional error margin of 10% added to this value during internal computations. | Integer 0 or more | 5000 | ||||||||||
-ace | boolean | Once contigs have been build, mira can call a built-in version of the automatic contig editor EdIt. EdIt will try to resolve discrepancies in the contig by performing trace analysis and correct even hard to resolve errors. This option is always useful, but especially in conjunction with -nop and -ure. Notice: the current development version has a memory leak in the editor, therefore the option is not automatically turned on. | Boolean value Yes/No | No | ||||||||||
-[no]sem | boolean | If set to 'Y' the automatic editor will not take error hypotheses with a low probability into account, even if all the requirements to make an edit are fulfilled. | Boolean value Yes/No | Yes | ||||||||||
-ct | integer | The higher this value, the more strict the automatic editor will apply its internal rule set. Going below 40 is not recommended. | Integer from 1 to 100 | 50 | ||||||||||
-[no]orc | boolean | Output CAF results | Boolean value Yes/No | Yes | ||||||||||
-[no]org | boolean | Output GAP4 results | Boolean value Yes/No | Yes | ||||||||||
-[no]orf | boolean | Output FASTA results | Boolean value Yes/No | Yes | ||||||||||
-ora | boolean | Output ACE results | Boolean value Yes/No | No | ||||||||||
-[no]ort | boolean | Output TXT results | Boolean value Yes/No | Yes | ||||||||||
-[no]ors | boolean | Output TCS results | Boolean value Yes/No | Yes | ||||||||||
-orh | boolean | Output HTML results | Boolean value Yes/No | No | ||||||||||
-otc | boolean | Output temporary CAF results | Boolean value Yes/No | No | ||||||||||
-otg | boolean | Output temporary GAP4 results | Boolean value Yes/No | No | ||||||||||
-otf | boolean | Output temporary FASTA results | Boolean value Yes/No | No | ||||||||||
-ota | boolean | Output temporary ACE results | Boolean value Yes/No | No | ||||||||||
-ott | boolean | Output temporary TXT results | Boolean value Yes/No | No | ||||||||||
-ots | boolean | Output temporary TCS results | Boolean value Yes/No | No | ||||||||||
-oth | boolean | Output temporary HTML results | Boolean value Yes/No | No | ||||||||||
-oetc | boolean | Output extra temporary CAF results | Boolean value Yes/No | No | ||||||||||
-oetg | boolean | Output extra temporary GAP4 results | Boolean value Yes/No | No | ||||||||||
-oetf | boolean | Output extra temporary FASTA results | Boolean value Yes/No | No | ||||||||||
-oeta | boolean | Output extra temporary ACE results | Boolean value Yes/No | No | ||||||||||
-oett | boolean | Output extra temporary TXT results | Boolean value Yes/No | No | ||||||||||
-oeth | boolean | Output extra temporary HTML results | Boolean value Yes/No | No | ||||||||||
-tcpl | integer | When producing an output in text format (-ort|ott|oett), this parameter defines how many bases each line of an alignment should contain. | Integer 1 or more | 60 | ||||||||||
-hcpl | integer | When producing an output in text format (-orh|oth|oeth), this parameter defines how many bases each line of an alignment should contain. | Integer 1 or more | 60 | ||||||||||
-gapfda | string | Defines the extension of the directory where mira will write the result of an assembly ready to import into the Staden package (GAP4) in Direct Assembly format. The name of the directory will then be <projectname>_.<extension> | Any string | gap4da | ||||||||||
-log | string | Defines the directory where mira will write some log files to. Note that the name of the actual project will be prepended. | Any string | miralog | ||||||||||
-co | string | Defines the file in CAF format to save an assembled project to. Filename must end with '.caf'. | Any string | mira_out.caf | ||||||||||
Associated qualifiers | ||||||||||||||
"-expdir" associated directory qualifiers | ||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||
"-scfdir" associated directory qualifiers | ||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||
General qualifiers | ||||||||||||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||
-warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||
-error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||
-die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Input file format
emira reads any normal sequence USAs.
Output file format
emira outputs a graph to the specified graphics device. outputs a report format file. The default format is ...Output files for usage example
File: EdIt.log
|
Directory: cjejuni_demo_info
This directory contains output files.
Directory: cjejuni_demo_log
This directory contains output files.
Directory: cjejuni_demo_results
This directory contains output files.
Data files
**************** EDIT HERE ****************Notes
None.References
None.Warnings
None.Diagnostic Error Messages
None.Exit status
It always exits with status 0.Known bugs
None.See also
Program name | Description |
---|---|
emiraest | MIRAest fragment assembly program |
Author(s)
This program is an EMBOSS wrapper for a program written by Bastien Chevreux as part of the MIRA package.Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.