Motivation: We have developed software to assist in the computation of gene expression from data obtained in competitive reverse transcription-polymerase chain reaction (RT-PCR). This report describes the mathematical basis of competitive RT-PCR and discusses the criteria which must be met to permit accurate estimations of gene expression to be obtained using this technique.
Results: The software that has been developed assists in both the assessment of assay performance (specifically in establishing the equality of amplification efficiency of the native and competitor templates) and in the routine analysis of data obtained in quantitation of gene expression by competitive RT-PCR. The software is a 100 kb module which functions as a Microsoft Excel add-in. It is compatible with both Windows and Mac versions of Excel 5 and Excel 7 on the Windows 95 platform, and employs the spreadsheet, statistical and graphing capabilities incorporated into Excel.
Availability: The software can be downloaded from http://www.grad.ttuhsc.edu/archive/. A brief summary in both HTML and Microsoft Word 6 format of the installation and use of the software is also located at this website.
Motivation: We have previously reported an algorithm for discovering patterns conserved in sets of related unaligned protein sequences. The algorithm was implemented in a program called Pratt. Pratt allows the user to define a class of patterns (e.g. the degree of ambiguity allowed and the length and number of gaps), and is then guaranteed to find the conserved patterns in this class scoring highest according to a defined fitness measure. In many cases, this version of Pratt was very efficient, but in other cases it was too time consuming to be applied. Hence, a more efficient algorithm was needed.
Results: In this paper, we describe a new and improved searching strategy that has two main advantages over the old strategy. First, it allows for easier integration with programs for multiple sequence alignment and data base search. Secondly, it makes it possible to use branch-and-bound search, and heuristics, to speed up the search. The new search strategy has been implemented in a new version of the Pratt program.
Motivation: The aims were to: enable users to deposit complex search profiles against the sequence databases; interface to an independent Sequence Retrieval System (SRS) server through the network to perform these searches on a daily basis through the last day's updates of these databases; mail users the reformatted search results, enabling local usage when loaded by a WWW browser.
Results: The deposition of one to many search profiles by the user leads to a daily search of the EMBL and SWISSPROT databases. The search profile is restricted to entries that were deposited during the last 24 h by using the SRS query manager to combine search sets. If the search is successful, the resulting html page is modified from relative URLs to absolute ones, enabling local usage by loading from disk. The results are sent to the user by e-mail.
Motivation: International sequencing efforts are creating huge nucleotide databases, which are used in searching applications to locate sequences homologous to a query sequence. In such applications, it is desirable that databases are stored compactly, that sequences can be accessed independently of the order in which they were stored, and that data can be rapidly retrieved from secondary storage, since disk costs are often the bottleneck in searching.
Results: We present a purpose-built direct coding scheme for fast retrieval and compression of genomic nucleotide data. The scheme is lossless, readily integrated with sequence search tools, and does not require a model. Direct coding gives good compression and allows faster retrieval than with either uncompressed data or data compressed by other methods, thus yielding significant improvements in search times for high-speed homology search tools.
Motivation: To provide data management tools to maintain and query efficiently experimental and derived protein data with the goal of providing new insights into structure-function relationships. The tools should be portable, extensible, and accessible locally, or via the World Wide Web, providing data that would not otherwise be available.
Results: The initial phase of the work, the data representation and query of all available macromolecular structure data, including real-time access to complex property patterns based on the amino acid sequence, is reported. protein structure data taken from the Protein Data Bank (PDB) are decomposed into native and derived elementary properties, and represented as compact indexed objects minimizing storage requirements and query time for select types of query. In addition, collections of indices representing a particular property are maintained and can be queried for specific property patterns found across the whole database. The approach is proving applicable to a wide variety of data available on specific protein families.
Motivation: The approaches usually used for building large genetic maps consist of dividing the marker set into linkage groups and provide local orders that can be tested by multi-point linkage analysis. To deal with the limitations of these approaches, a strategy taking the marker set into account globally is defined.
Results: The paper presents a new approach called 'Bi-Dimensional Scaling Map (BDS-Map) for inferring marker orders and distances in genetic maps based on the use of an additional dimension orthogonal to the map into which markers are projected. Dynamical forces based on a two-point analysis are applied to tend to optimize the marker locations in space. The efficiency of the approach is exemplified on real data (16 and 70 markers on chromosomes 6 and 2, respectively) and simulated data (50 maps of 70 markers).
Motivation: To meet the demands of large-scale sequencing, thousands of clones must be fingerprinted and assembled into contigs. To determine the order of clones, a typical experiment is to digest the clones with one or more restriction enzymes and measure the resulting fragments. The probability of two clones overlapping is based on the similarity of their fragments. A contig contains two or more overlapping clones and a minimal tiling path of clones is selected to be sequenced. Interactive software with algorithmic support is necessary to assemble the clones into contigs quickly.
Results: FPC (fingerprinted contigs) is an interactive program for building contigs from restriction fingerprinted clones. FPC uses an algorithm to cluster clones into contigs based on their probability of coincidence score. For each contig, it builds a consensus band (CB) map which is similar to a restriction map; but it does not try to resolve all the errors. The CB map is used to assign coordinates to the clones based on their alignment to the map and to provide a detailed visualization of the clone overlap. FPC has editing facilities for the user to refine the coordinates and to remove poorly fingerprinted clones. Functions are available for updating an FPC database with new clones. Contigs can easily be merged, split or deleted. Markers can be added to clones and are displayed with the appropriate contig. Sequence-ready clones can be selected and their sequencing status displayed. As such, FPC is an integrated program for the assembly of sequence-ready clones for large-scale sequencing projects.