




Vector_clip
NAME
vector_clip -- finds and marks vector segments in sequence readings
SYNOPSIS
vector_clip
-
[schr
]
[-w
word_length (4)] [-n
num_diags (7)]
[-d
diagonal_score (0.35)] [-l
minimum_match (20/70%)]
[-m
minimum_5'_position] [-t
] [-p
passed_fofn] [-f
failed_fofn] input_fofn
DESCRIPTION
vector_clip
finds and marks vector segments in sequence readings stored
in experiment file format. For sequencing vectors it can be used to find the
5' primer and, for short inserts, the sequence to the 3' side of the cloning
site. It can also be used to find 3' primer sequences. A further option can do
a final check for any vector rearrangements that could be missed by the more
specific searches around the cloning site. For cloning vectors it will search
both orientations of the sequence and mark any segments found. The vector
sequences must be stored as simple text files. For cloning vector
searches the reading's experiment file must contain the name of the
cloning vector file. For sequencing vector searches, either the experiment
file for each reading must contain the information about the vector
sequence (the file name, cloning site and primer offset) or
vector-primer files must be used. Vector-primer files contain sets of
sequences from around cloning sites, and vector_clip can use these to
find the vector that matches each reading best. If the match is above
the cutoff score the reading is clipped. Vector-primer files are the
simplest method of providing vector_clip with the data it needs for
finding sequencing vectors. More information is available elsewhere
(see section Screening Against Vector Sequences).
The program processes batches of readings by the use of file of file names: one is used for input and two for output. The input file lists the names of all the readings to process, one name per line. One output file contains the names of all the readings that pass the screening and the other contains the names of those that fail.
OPTIONS
-s
- Mark sequencing vector. Searches for 5' primer, 3' running into vector.
-c
- Mark cloning vector. Searches both strands for cloning vector.
-h
- Hgmp primer. Searches 3' end for a primer.
-i vector_primer filename
- Mark transposon data.
-r
- Vector rearrangements. Searches for sequencing vector rearrangements.
-t
- Test only. Does not change the experiment files, displays hits.
-L
minimum percentage match 5' end (60)- sequencing vector searches and transposon search
-R
minimum percentage match 3' end (80)- sequencing vector searches and transposon search
-m
minimum 5' position- allows a minimum 5' end cutoff to be set if a sufficiently good match is not found (i.e. it is really a default 5' cutoff position). If a value of -1 is used the program will set the cutoff to be the distance between the primer and the cloning site.
-v
vector-primer-pair filename- sequencing vector search using vector-primer-pair file
-V
vector_primer length- the length of the sequence stored in the vector_primer file to use for the 5' search
-w
word_length (4)- cloning vector search hash length
-P
probability- cloning vector search, (a score less likely than P is a match)
-n
num_diags (7)- cloning vector search, old score based algorithm: number of diagonals to combine
-d
diagonal score (0.35)- cloning vector search, old score based algorithm
-l
minimum match (20)- sequencing vector rearrangements and transposon search minimum match length
-M
maximum vector length (100000)- all algorithms, reset for vectors >100000 bases
-p
passed fofn- file of file names for passed files
-f
failed fofn- file of file names for failed files
- input fofn ...
- input file of file names
EXAMPLES
Usage: vector_clip [options] file_of_filenames Where options are: [-s mark sequencing vector] [-c mark cloning vector] [-h hgmp primer] [-r vector rearrangements] [-w word_length (4)] [-n num_diags (7)] [-d diagonal score (0.35)] [-l minimum match (20)] [-L minimum % 5' match (60)] [-R minimum % 3' match (80)] [-m default 5' position] [-t test only] [-M Max vector length (100000)] [-P max Probability] [-v vector_primer filename] [-i vector_primer filename] [-V vector_primer length] [-p passed fofn] [-f failed fofn]
Screen for sequencing vector using 5' cutoff of 70%, a 3' cutoff of 90% and default 5' primer position of 30. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -s -L70 -R90 -m30 -pfiles.pass -f files.fail files.in
Screen for sequencing vector using 5' cutoff of 60%, a 3' cutoff of 80% and default 5' primer position of 30. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail. This shows that the default search is for sequencing vector.
vector_clip -m30 -pfiles.pass -f files.fail files.in
Screen for sequencing vector using 5' cutoff of 60%, a 3' cutoff of 80% and a vector-primer-pair file called vector_primer_file. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -v vector_primer_file -pfiles.pass -f files.fail files.in
Screen transposon data using 5' cutoff of 80%, a 3' cutoff of 85%, a match length of 10 and a vector-primer-pair file called vector_primer_file. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -i vector_primer_file -L 80 -R 85 -l 10 -pfiles.pass \
-f files.fail files.in
Screen for cloning vector using the old algorithm with a word length of 4, summing 7 diagonals and diagonal cutoff score of 0.4. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -c -w4 -n7 -d0.4 -pfiles.pass -f files.fail files.in
Screen for cloning vector using the probability based algorithm with a word length of 4 and probability cutoff of 1.0e-13. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -c -P 1.0e-13 -pfiles.pass -f files.fail files.in
Screen for 3' primer using a cutoff of 75%. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -h -R75 -pfiles.pass -f files.fail files.in
Screen for sequencing vector rearrangements using a cutoff of 20 bases. The batch of files to process are named in files.in, the names of the passed files are written to files.pass and the names of those that fail to files.fail.
vector_clip -r -l20 -pfiles.pass -f files.fail files.in
NOTES
The following error messages can be generated.
- Error: could not open experiment file
- Error: no sequence in experiment file
- Error: sequence too short
- Error: missing vector file name
- Error: missing cloning site
- Error: missing primer site
- Error: could not open vector file
- Error: could not write to experiment file
- Error: could not read vector file
- Error: missing primer sequence
- Error: hashing problem
- Error: alignment problem
- Error: invalid cloning site
- Warning: sequence now too short (no message)
- Warning: sequence entirely cloning vector (no message)
- Warning: possible vector rearrangement (no message)
- Warning: error parsing vector_primer file
- Warning: primer pair mismatch!
- Aborting: more than X entries in vector_primer file
SL, SR, CL, CR, CS, PS, PR and SF records are written to the experiment files.
SEE ALSO
See section Experiment File.
For notes on defining the cloning and primer sites,
See section Defining the Positions of Cloning and Primer Sites for Vector_Clip.
See section scf(4).





This page is maintained by staden-package. Last generated on 25 April 2003.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_unix_16.html