Pre-processing aspects for complexity reduction of the QSAR problem

L. Dumitriu, C. Segal, M. Craciun, A. Cocu
{"title":"Pre-processing aspects for complexity reduction of the QSAR problem","authors":"L. Dumitriu, C. Segal, M. Craciun, A. Cocu","doi":"10.1109/IS.2008.4670547","DOIUrl":null,"url":null,"abstract":"Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.","PeriodicalId":305750,"journal":{"name":"2008 4th International IEEE Conference Intelligent Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 4th International IEEE Conference Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS.2008.4670547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预处理方面降低QSAR问题的复杂性
预测毒理学(Predictive Toxicology, PT)是数据库知识发现(Knowledge Discovery in Databases, KDD)领域的最新研究目标之一。其目的是描述化合物的化学结构与生物和毒理学过程之间的关系。在实际PT问题中,有一个非常重要的问题需要考虑:大量的化学描述符。不相关的、冗余的、嘈杂的和不可靠的数据具有负面影响,因此KDD的主要目标之一是检测这些不需要的属性并消除或纠正它们。这假设了数据清理、降噪和特征选择,因为应用机器学习算法的性能与所使用数据的质量密切相关。在本文中,我们提出了在执行实际知识发现之前准备数据时可以考虑的一些问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fuzzy Neural Network for detecting nonlinear determinism in gastric electrical activity: Fractal dimension approach Clustering and sorting multi-attribute objects in multiset metric space Design of a context script language for developing context-aware applications in ubiquitous intelligent environment The software for 3D-viewing of educational topic maps Semantics-based information valuation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1