




Database I/O in C
Introduction and Overview
[General notes to go somewhere: It is better to check success return codes rather than failure ones as the failure ones are often variable (-1, 1, >0, etc) but most return 0 for success.]
The Gap4 I/O access from within C consists of several layers. These layers provide ways of breaking down the tasks into discrete methods, and of hiding most of the implementation details. For the programmer willing to extend Gap4, only the higher layer levels are of interest. Hence the lowest levels are described only briefly.
"g" Level - Raw Database Access
At the final end of any I/O is the actual code to read and write information to the disk. In Gap4 this is handled through a library named "g". This contains code for reading, writing, locking and updating of the physical database. It does not describe the structures contained in the gap database format itself, but rather provides functions to read and write arbitrary blocks of data. Don't delve into this unless you're feeling brave!
The code for this library is contained within the `src/g' directory. No documentation is currently available on these functions.
"Communication" Level - Interfaces to the "g" Level
This level of code deals with describing the real Gap4 data structures and the interfacing with the g library. Generally this code should not be used.
This code is contained within the `src/gap4' directory and breaks down as follows:
- `gap-if.c'
- `gap-local.c'
- `gap-remote.c'
- Interface functions with the g library. These are to provide support for a local (ie compiled in) or remote (unimplemented) database server.
- `gap-io.c'
-
Contains
GAP_READ
andGAP_WRITE
functions in byte swap and non byte swap forms (depending on the system arch.). Thegap_io_init()
function automatically determines the machine endian and sets up function pointers to call the correct functions. - `gap-error.c'
-
Definitions of
GAP_ERROR
andGAP_ERROR_FATAL
functions. - `gap-dbstruct.c'
- `gap-create.c'
- Functions for creation, initialisation, and copying of database files.
- `gap-dbstruct.h'
- VERY USEFUL! The definitions of the gap structures that are stored in the database.
- `gap-init.c'
-
Initialises communication with the "g" database server by use of
gap_init()
,gap_open_server()
andgap_shutdown_server()
functions.
No documentation is currently available on these functions.
Basic Gap4 I/O
This level contains the basic functions for reading, writing, creation and
deletion of the Gap4 structures, such as readings and templates as well as
higher level functions built on top of these. It is this level of code that
should generally be used by the programmer. The implementation of this level
has function code and prototypes spread over a variety of files, but the
programmer should only #include
the `IO.h' file.
The primary functions are:
- `IO.c'
-
open_db
close_db
del_db
- Opening/creation, closing and deletion of databases.
GT_Read, GT_Write, GT_Write_cached
TextRead, TextAllocRead, TextWrite
DataRead, DataWrite
ArrayRead, ArrayWrite
BitmapRead, BitmapWrite
- The basic IO calls. Note that the GT ones are for handling structures (eg GReadings) and the others for data of the associated type.
io_init_contig
io_init_annotations
io_init_reading
-
Some functions for initialising new data structures. These in turn
call the
allocate()
function to create new database records. io_read_seq
io_write_seq
- Reads and writes sequence information.
io_read_rd
- Fetches the trace type and name values for a reading.
io_read_annotation
io_write_annotation
- Reading and writing of annotations (also known as tags).
allocate
deallocate
io_deallocate_reading
- Allocation and deallocation of records.
flush2t
- Flushes changes back to disk. The various write commands write the data to disk, but until a flush occurs they will not be committed as the up to date copies.
- `io_handle.c'
-
io_handle
handle_io
- Converts between C GapIO pointer and an integer value which can be passed around in Tcl and Fortran. The integer handle is used in the Tcl scripting language.
- `io_utils.[ch]'
-
get_gel_num, lget_gel_num
get_contig_num, lget_contig_num
- Converts single or lists of reading identifiers into reading or contig numbers (with start and end ranges).
to_contigs_only
- Converts a list of reading identifiers to contig numbers.
get_read_name
get_contig_name
get_vector_name
get_template_name
get_clone_name
- Converts a structure number into its textual name.
chain_left
- Finds the left most reading number in a contig from a given reading number.
rnumtocnum
- Converts from a reading number into a contig number.
Other I/O Functions
Still more I/O functions exist that aren't listed under the "Basic Gap4 I/O" header. The reason for this is primarily due to code structure rather than any particular grouping based on functionality. Specifically, these functions cannot be easily linked into "external" applications without a considerable amount of effort.
The file break down is as follows.
- `IO2.c'
-
io_complement_seq
- Complements, in memory, a sequence and associated structures.
io_insert_seq
io_delete_seq
io_replace_seq
- Modifies in memory sequence details.
io_insert_base
io_modify_base
io_delete_base
- Modifies a single base in a sequence on the disk.
pad_consensus
- Inserts pads to the consensus sequence and all the readings at that point.
io_delete_contig
- Removes a contig structure.
- `IO3.c'
-
get_read_info
get_vector_info
get_clone_info
- Fetches miscellaneous information for reads (primers, insert size, etc), vectors and clones.
io_get_extension
- Returns the right cutoff of a reading. Found by checking the cut points and any vector tags.
io_mod_extension
- Modifies the cutoffs of readings.
write_rname
- Updates a reading name in memory and disk.
Compiling and Linking with Other Programs
If you require usage of the Gap4 I/O functions in a program other than Gap4 itself you will need to compile and link in particular ways to use the function prototypes and to add the Gap4 functions to your binary. At present, the object files required for database access do not comprise a library.
The compiler include search path needs adjusting to add the `$STADENROOT/src/gap4' directory and possibly the `$STADENROOT/src/g' directory. Once your own object files are compiled, they need to be linked with the following gap4 object files.
$STADENROOT/src/gap4/$MACHINE-binaries/actf.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-create.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-dbstruct.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-error.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-if.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-init.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-io.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-local.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-remote.o
$STADENROOT/src/gap4/$MACHINE-binaries/IO.o
$STADENROOT/src/gap4/$MACHINE-binaries/io_handle.o
$STADENROOT/src/gap4/$MACHINE-binaries/io-reg.o
$STADENROOT/src/gap4/$MACHINE-binaries/io_utils.o
$STADENROOT/src/gap4/$MACHINE-binaries/text-io-reg.o
Finally, a library search path of `$STADENROOT/lib/$MACHINE-binaries'
should be used to link the -lg -ltext_utils -lmisc
libraries.
All of the above definitions have been added to a single Makefile held in
`$STADENROOT/src/mk/gap4_defs.mk' as the GAPDB_EXT_INC
,
GAPDB_EXT_OBJS
and GAPDB_EXT_LIBS
variables. When possible,
these should be used in preference to hard coding the variable object
filenames as this provides protection against future coding changes.
So for example, if we have a program held in the file `demo.c' we could
have a simple Makefile as follows.
SRCROOT=$(STADENROOT)/src include $(SRCROOT)/mk/global.mk include $(SRCROOT)/mk/$(MACHINE).mk OBJS = $(O)/demo.o LIBS = $(MISC_LIB) $(O)/demo: $(OBJS) $(CLD) -o $ $(OBJS) $(LIBS) $(LIBSC)
If we now extend this program so that it requires the Gap4 I/O routines, the Makefile should be modified to:
SRCROOT=$(STADENROOT)/src include $(SRCROOT)/mk/global.mk include $(SRCROOT)/mk/$(MACHINE).mk include $(SRCROOT)/mk/gap4_defs.mk INCLUDES_E += $(GAPDB_EXT_INC) OBJS = $(O)/demo.o $(GAPDB_EXT_OBJS) LIBS = $(MISC_LIB) $(GAPDB_EXT_LIBS) $(O)/demo: $(OBJS) $(CLD) -o $ $(OBJS) $(LIBS) $(LIBSC)
If you require an example of a program that utilises the Gap4 I/O functions,
see the convert
program in `$STADENROOT/src/convert/'.





This page is maintained by staden-package. Last generated on 25 April 2003.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/scripting_111.html