|
Abstract : |
We have developed a general system, QGB, for performing complex queries on the informa-tion in the DDBJ/EMBL/GenBank databases, including queries over the structural features of sequences implied in the FEATURE TABLE. Queries are formed in an SQL-like syntax with language extensions to support complex types (e.g., sets, ordered sets and records) appropriate for representing and querying sequence data. A novel aspect of QGB is its ability to deduce missing features and infer relationships among features as a consequence of constructing a parse tree of sequence structure from information described in the FEATURE TABLE. The grammar for the parse tree is implemented in a customized form of the De nite Clause Grammar syntax of the logic programming language Prolog. The logic grammar formalism was chosen because it provides a perspicuous representation for features and constraints, and Prolog provides an exe-cution model for the grammar rules. Construction of the parse tree also identi es inconsistencies and errors in the FEATURE TABLE which can in some cases be automatically corrected and used to generate an augmented version of the table. 1, |