Motivation: Group contribution methods are frequently used for estimating physical properties of compounds from their molecular structures. An algorithm for estimating Gibbs energies of formation through group contribution methods has been automated in an object-oriented framework. The algorithm decomposes compound structures according to a basis set of groups. It permits the use of wildcards and is able to distinguish between ring groups and chain groups that use similar search structures. Past methods relied on manual decomposition of compounds into constituent groups.
Results: The software is written in Common LISP and requires < 2 min to estimate Gibbs energies of formation for a database of 780 species of varying size and complexity. The software allows rapid expansion to incorporate different basis sets and to estimate a variety of other physical properties.
Motivation: Since the protein structure database has been growing very rapidly in recent years, the development of efficient methods for searching for similar structures is very important.
Results: This paper presents a novel method for searching for similar fragments of proteins. In this method, a hash vector (a vector of real numbers) is associated with each fixed-length fragment of three-dimensional protein structure. Each vector consists of low-frequency components of the Fourier-like spectrum for the distances between C alpha atoms and the centroid. Then, we can analyze the similarity between fragments by evaluating the difference between hash vectors. The novel aspect of the method is that the following property is proved theoretically: if the root mean square distance between two fragments is small, then the distance between the hash vectors is small. Several variants of this method were compared with a naive method and a previous method using PDB data. The results show that the fastest one among the variants is 18-80 times faster than the naive method, and 3-10 times faster than the previous method.
Motivation: When routinely analysing protein sequences, detailed analysis of database search results made with BLAST and FASTA becomes exceedingly time consuming and tedious work, as the resultant file may contain a list of hundreds of potential homologies. The interpretation of these results is usually carried out with a text editor which is not a convenient tool for this analysis. In addition, the format of data within BLAST and FASTA output files makes them difficult to read.
Results: To facilitate and accelerate this analysis, we present for the first time, two easy-to-use programs designed for interactive analysis of full BLAST and FASTA output files containing protein sequence alignments. The programs, Visual BLAST and Visual FASTA, run under Microsoft Windows 95 or NT systems. They are based on the same intuitive graphical user interface (GUI) with extensive viewing, searching, editing, printing and multithreading capabilities. These programs improve the browsing of BLAST/FASTA results by offering a more convenient presentation of these results. They also implement on a computer several analytical tools which automate a manual methodology used for detailed analysis of BLAST and FASTA outputs. These tools include a pairwise sequence alignment viewer, a Hydrophobic Cluster Analysis plot alignment viewer and a tool displaying a graphical map of all database sequences aligned with the query sequence. In addition. Visual Blast includes tools for multiple sequence alignment analysis (with an amino acid patterns search engine), and Visual FASTA provides a GUI to the FASTA program.
Motivation: Statistical methods that compare observed and expected distributions of experimental observables provide powerful tools for the quality control of protein structures. The distribution of backbone dihedral angles ('Ramachandran plot') has often been used for such quality control, but without a firm statistical foundation.
Results: A new and-simple method is presented for judging the quality of a protein structure based on the distribution of backbone dihedral angles. Inputs to the method are 60 torsion angle distributions extracted from protein structures solved at high resolution; one for each combination of residue type and tri-state secondary structure. Output for a protein is a Ramachandran Z-score, expressing the quality of the Ramachandran plot relative to current state-of-the-art structures.