{"title":"Indexing methods for efficient protein 3D surface search","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1145/2390068.2390078","DOIUrl":null,"url":null,"abstract":"This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390068.2390078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.
本文利用三维泽尼克描述符(3DZD)将蛋白质结构表示为向量,利用高效索引技术进行蛋白质结构搜索。3DZD将蛋白质三级结构的表面形状紧凑地表示为矢量,简化后的表示加速了结构搜索。但是,为了解决多个用户同时访问数据库的场景,需要进一步提高速度。我们利用两种索引技术,即iDistance和iKernel,在3DZDs上解决了这一需求,以进一步加快蛋白质结构搜索。结果表明,iDistance和iKernel都显著提高了搜索速度。此外,我们还介绍了一种基于使用3DZD特征的索引技术的蛋白质结构搜索扩展方法。在扩展方法中,索引结构仅使用3dzd中的前几个数字来构建。为了寻找top-k个相似结构,首先使用约简索引结构选择top-10 x k个相似结构,然后使用所选结构的全3dzd的相似性度量选择top-k个相似结构。使用索引技术,使用iDistance方法的搜索时间减少了69.6%,使用iKernel方法的搜索时间减少了77%,使用扩展iDistance方法的搜索时间减少了77.4%,使用扩展iKernel方法的搜索时间减少了87.9%。