




Copy_reads
NAME
copy_reads -- copies overlapping reads from a source database to a destination database
SYNOPSIS
Usage:
copy_reads
[-win
] [-source_trace_dir
directory of source traces]
[-contigs_from
file of contigs in source database]
[-min_contig_len
minimum contig length]
[-min_average_qual
minimum average read quality]
[-contigs_to
file of contigs in destination database]
[-mask
masking mode]
[-tag_types
list of tag types]
[-word_length
word length]
[-min_overlap
minimum overlap]
[-max_pmismatch
maximum percentage mismatch]
[-min_match
minimum match]
[-band
use banding algorithm]
[-display_cons
display consensus alignments]
[-align_max_mism
maximum percent mismatch]
[-display_seq
display reading alignments]
source database
destination database
DESCRIPTION
During large scale sequencing projects where the genome is cloned into e.g.
BACs prior to being subcloned into sequencing vectors it is generally
the case that the ends of the DNA from one BAC will overlap that of two other
BACs. Unless it is being used for quality control, it is a waste of time to
sequence the overlapping regions twice, and so most labs transfer the relevant
data between the adjacent gap4 databases. This is the function of copy_reads
which copies readings from a "source" database to a "destination" database.
The consensus sequences for user selected contigs in each of the two databases are compared in both orientations. If an overlapping region is found, readings of sufficient quality are automatically assembled into the destination database. In the source database readings which have been added to the destination database will be tagged with a "LENT" tag and the equivalent readings in the destination databse will be tagged with a "BORO" (borrowed) tag.
OPTIONS
-win
- Bring up a dialogue window
-source_trace_dir
directory of source traces- The location of the traces of the source database can either be specified by giving the directory name or if this is not specified, determined from the rawdata note (see section Trace File Location) held within the database. The program will add the location of the source traces into the rawdata note of the destination database. If the environment variable RAWDATA is set, this will be taken to be the location of the destination database traces and will also be added to the rawdata note of the destination database. If there are no traces for the source database, no rawdata note will be created.
-contigs_from
file of contigs in source database- One or more contigs from the source database can be compared. These are selected either by providing a file containing a list of contig names (any reading name from within that contig, typically the first reading name). If no file is specified, all contigs will be compared.
-min_contig_len
minimum contig length- Only contigs in the source database over a user defined length will be used. The default is 2000 bases.
-min_average_qual
minimum average read quality- A minimum reading quality can be set so that only readings with an average quality over the specified amount will be entered into the destination database. The default is 30.0.
-contigs_to
file of contigs in destination database- One or more contigs from the destination database can be compared. These are selected either by providing a file containing a list of contig names (any reading name from within that contig, typically the first reading name). If no file is specified, all contigs will be compared.
-mask
masking mode- The consensus sequence is determined for each contig in both databases using either the standard consensus algorithm (none) or "Mask active tags" (mask). Masking the active tags means that all segments covered by tags that are "active" will not be used by the matching algorithms. A typical use of this mode is to avoid finding matches in segments covered by tags of type ALUS (ie segments thought to be Alu sequence) or REPT (ie segment that are known to be repeated elsewhere in the data (see section Tag types). The default is none.
-tag_types
list of tag types- A list of tag types to be used when the -mask option (above) is specified to be in "mask" mode. The list is delimited by "".
-word_length
word length- The consensus searching parameters are equivalent to those found in the find internal joins algorithm (see section Find Internal Joins). The search algorithm first finds matching words of length Word length. Possible values are 4 or 8. The default is 8.
-min_overlap
minimum overlap- The search algorithm only considers overlaps of length at least Minimum overlap. The default is 20.
-max_pmismatch
maximum percentage mismatch- Only alignments better than Maximum percent mismatch will be reported. The default is 30.0.
-min_match
minimum match- The algorithm considers in its initial phase only matching segments of length Minimum initial match length. However it does a dynamic programming alignment of all the chunks between the matching segments, and so produces an optimal alignment. The default is 15.
-band
use banding algorithm- A banded dynamic algorithm can be selected, but as this only applies to the chunks between matching segments, which for good alignments will be very short and it should make little difference to the speed. Possible values are 0 (no) or 1 (yes). The default is 1.
-display_cons
display consensus alignments- This allows the alignments between the consensus sequences to be displayed.
-align_max_mism
maximum percent mismatch- If a match between two consensus sequences is found, the readings in that overlap are assembled into the destination database using the "directed assembly" function (see section Directed Assembly). Only readings for which the maximum percent mismatch is not exceeded, and which have an average reading quality higher than the specified minimum, will be entered into the database. The default value is 10.0.
-display_seq
display reading alignments- This allows the alignments between the source database readings and the destination consensus to be displayed.
EXAMPLE
To copy readings from `source_db' to `destination_db' and display the consensus match
copy_reads -display_cons source_db destination_db





This page is maintained by staden-package. Last generated on 25 April 2003.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_unix_3.html