Browse > Article

An Index Structure for Substructure Searching In Chemical Databases  

Lee Hwangu (한양대학교 소프트웨어공학)
Cha Jaehyuk (한양대학교 정보통신학부)
Abstract
The relationship between chemical structures and biological activities is researched briskly in the area of 'Medicinal Chemistry' At the base of these structure-based drug design tries, medicinal chemists search the existing drugs of similar chemical structure to target drug for the development of a new drug. Therefore, it is such necessary that an automatic system selects drug files that have a set of chemical moieties matching a user-defined query moiety. Substructure searching is the process of identifying a set of chemical moieties that match a specific query moiety. Testing for substructure searching was developed in the late 1950s. In graph theoretical terms, this problem corresponds to determining which graphs in a set are subgraph isomorphic to a specified query moiety. Testing for subgraph isomorphism has been proved, in the general case, to be an NP- complete problem. For the purpose of overcoming this difficulty, there were computational approaches. On the 1990s, a US patent has been granted on an atom-centered indexing scheme, used by the RS3 system; this has the virtue that the indexes generated can be searched by direct text comparison. This system is commercially used(http://www.acelrys.com/rs3). We define the RS3 system's drawback and present a new indexing scheme. The RS3 system treats substructure searching with substring matching by means of expressing chemical structure aspredefined strings. However, it has insufficient 'rerall' and 'precision‘ because it is impossible to index structures uniquely for same atom and same bond. To resolve this problem, we make the minimum-cost- spanning tree for one centered atom and describe a structure with paths per levels. Expressing 2D chemical structure into 1D a string has limit. Therefore, we break 2D chemical structure into 1D structure fragments. We present in this paper a new index technique to improve recall and precision surprisingly.
Keywords
Substructure Searching; Subgraph Isomorphism; RS3 system;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Z. M. Nagy, S. Kozics, T. Veszpremi, and P. Bruck, 'Substructure Search on Very Large Files Using Tree-structured Databases,' 'Chemical Structures: The International Language of Chemistry,' ed. W. A. Warr, Springer-Verlag, Heidelberg, pp. 127-130, 1988
2 Z. M. Nagy, 'How can parallel algorithms help to find new sequential algorithms?,' J. Chem. Inf. Comput Sci., 33, 542-544, 1993   DOI   ScienceOn
3 A. Bartmann, H. Maier, D. Walkowiak, B. Roth, and M. G. Hicks, 'substructure searching on very large files by using multiple storage techniques,' J. Chem. Inf. Comput. Sci., 33, 539-541, 1993   DOI   ScienceOn
4 RS3, http://www.accelrys.com/rs3
5 J. Moore and J. R. Hoover, US Patent 5 577 239, 1996
6 M. F. Lynch, 'R&D in chemical information science : Retrospect and prospect,' Chemical Structures : The international language of chemistry, W. A. Warr ed., pp. 1-10, Springer-Verlag, 1988
7 W. Graf, H. K. Kaindl, H. Kniess, and R. Warszawski, 'The third BASIC fragment search dictionary,' J. Chem. Inf. Comput. Sci., 22, 177-181, 1982   DOI   ScienceOn
8 A. P. Johnson and A. P. Cook, 'Automatic keyword generation for reaction searching,' 'Modern Approaches to Chemical Reaction Searching,' ed. P. Willett, Gower, Aldershot, pp. 184-193, 1985
9 R. J. Feldmann, G. W. A. Milne, S. R. Heller, A. Fein, J. A. Miller, and B. Koch, 'An interactive substructure search system,' J. Chem. Inf. Comput. Sci., 17, 157-163, 1977   DOI   ScienceOn
10 R. Attias, 'DARC substructure search system : a new approach to chemical information,' J. Chem. Inf. Comput. Sci., 23, 102-108, 1983   DOI   ScienceOn
11 P. G. Dittmar, N. A. Farmer, W. Fisanik, R. C. Haines, J. Mockus, 'The CAS ONLINE Search system 1. General system design and selection, generation, and use of search screens,' Journal of Chemical Information and Computer Sciences, vol.23, no.3, pp.93-102, 1983   DOI   ScienceOn
12 Daylight, http://www.daylight.com, Daylight Chemical Information Systems, Inc., 27401 Los Altos, Suite 370, Mission Viejo, CA 92691, USA.
13 G. A. Hopkinson, 'The Accord Component Software Approach,' J. Chem. Inf. Comput. Sci., 37, 143-145, 1997   DOI   ScienceOn
14 L. C. Ray and R. A. Kirsch, 'Finding chemical records by digital computers,' Science, 126, 814-819, 1957   DOI
15 J. R. Ullmann, 'An algorithm for subgraph isomorphism,' Journal of ACM, vol. 23, 31-42, 1976   DOI   ScienceOn
16 R. C. Read and D. G. Corneil, 'The graph isomorphism disease,' J. Graph Theory, 1, 339-363, 1977   DOI
17 Alfred Burger, A Guide to the Chemical Basis of Drug Design, John Wiley & Sons Inc., July 1983