{"title":"Distance Profiles (DiP): A translationally and rotationally invariant 3D structure descriptor capturing steric properties of molecules","authors":"K. Baumann","doi":"10.1002/1521-3838(200211)21:5<507::AID-QSAR507>3.0.CO;2-L","DOIUrl":null,"url":null,"abstract":"A novel translationally and rotationally invariant structure descriptor based on the distribution of 3D-atom pairs is described. The new Distance Profiles (DiP) descriptor was applied to two data sets which were previously studied with various 3D-QSAR techniques. DiP compares favorably to the other descriptors for these two data sets and obtains better models in both cases. Since DiP is used in combination with variable selection to achieve interpretability, special emphasize was put on validating the derived models. Avoiding overfitted models was accomplished by constraining the maximum number of variables allowed to select, and by using leave-50%-out cross-validation instead of leave-one-out cross-validation as objective function in variable selection. Furthermore, the derived models were validated with a permutation test where the entire variable selection procedure is repeated each time the response data are scrambled.","PeriodicalId":20818,"journal":{"name":"Quantitative Structure-activity Relationships","volume":"54 1","pages":"507-519"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Structure-activity Relationships","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/1521-3838(200211)21:5<507::AID-QSAR507>3.0.CO;2-L","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
A novel translationally and rotationally invariant structure descriptor based on the distribution of 3D-atom pairs is described. The new Distance Profiles (DiP) descriptor was applied to two data sets which were previously studied with various 3D-QSAR techniques. DiP compares favorably to the other descriptors for these two data sets and obtains better models in both cases. Since DiP is used in combination with variable selection to achieve interpretability, special emphasize was put on validating the derived models. Avoiding overfitted models was accomplished by constraining the maximum number of variables allowed to select, and by using leave-50%-out cross-validation instead of leave-one-out cross-validation as objective function in variable selection. Furthermore, the derived models were validated with a permutation test where the entire variable selection procedure is repeated each time the response data are scrambled.