Novelty and Similarity: Detection Using Data-Driven Soft Independent Modeling of Class Analogy

IF 2.3 4区 化学 Q1 SOCIAL WORK Journal of Chemometrics Pub Date : 2024-07-18 DOI:10.1002/cem.3587
O. Y. Rodionova, N. I. Kurysheva, G. A. Sharova, A. L. Pomerantsev
{"title":"Novelty and Similarity: Detection Using Data-Driven Soft Independent Modeling of Class Analogy","authors":"O. Y. Rodionova,&nbsp;N. I. Kurysheva,&nbsp;G. A. Sharova,&nbsp;A. L. Pomerantsev","doi":"10.1002/cem.3587","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Novelty and similarity are complex concepts that have numerous applications in various fields, including biology and medicine. Novelty detection is a technique used to determine whether a dataset is different from another dataset considered as a standard. Similarity detection is a technique used to determine whether two datasets belong to the same population. Novelty and similarity are closely related concepts; however, they are not complementary. Novelty is a much more popular one, and there are many publications about it. Similarity is, in fact, a new concept that has not yet been explored in depth. Classical statistics offers a large number of tools suitable for detection of similarity, mostly in the univariate case. At the same time, this topic has been insufficiently studied in the field of machine learning. This paper suggests several principles which are important for this research and also present a method for the detection of both novelty and similarity. The method uses a one-class classifier, known as Data-Driven Soft Independent Modeling of Class Analogy (DD-SIMCA). Three examples illustrate our approach. The first one uses simulated data and demonstrates the performance of DD-SIMCA for the detection of novelty. The second example uses a real-world data and studies similarity of two groups of patients who participate in the evaluation of the effectiveness of the treatment of primary angle-closure glaucoma. The third example comes from medical diagnostics. This is a real-world publicly available data used for comparison of various classification algorithms.</p>\n </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 10","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3587","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

Novelty and similarity are complex concepts that have numerous applications in various fields, including biology and medicine. Novelty detection is a technique used to determine whether a dataset is different from another dataset considered as a standard. Similarity detection is a technique used to determine whether two datasets belong to the same population. Novelty and similarity are closely related concepts; however, they are not complementary. Novelty is a much more popular one, and there are many publications about it. Similarity is, in fact, a new concept that has not yet been explored in depth. Classical statistics offers a large number of tools suitable for detection of similarity, mostly in the univariate case. At the same time, this topic has been insufficiently studied in the field of machine learning. This paper suggests several principles which are important for this research and also present a method for the detection of both novelty and similarity. The method uses a one-class classifier, known as Data-Driven Soft Independent Modeling of Class Analogy (DD-SIMCA). Three examples illustrate our approach. The first one uses simulated data and demonstrates the performance of DD-SIMCA for the detection of novelty. The second example uses a real-world data and studies similarity of two groups of patients who participate in the evaluation of the effectiveness of the treatment of primary angle-closure glaucoma. The third example comes from medical diagnostics. This is a real-world publicly available data used for comparison of various classification algorithms.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
新颖性与相似性:利用数据驱动的类比软独立建模进行检测
新颖性和相似性是复杂的概念,在生物学和医学等多个领域都有大量应用。新颖性检测是一种用于确定一个数据集是否不同于另一个标准数据集的技术。相似性检测是一种用于确定两个数据集是否属于同一群体的技术。新颖性和相似性是密切相关的概念,但二者并不互补。新颖性是一个更受欢迎的概念,关于它的出版物很多。事实上,相似性是一个尚未深入探讨的新概念。经典统计学提供了大量适用于检测相似性的工具,其中大部分是单变量工具。与此同时,机器学习领域对这一主题的研究还不够充分。本文提出了对这一研究非常重要的几条原则,并介绍了一种同时检测新颖性和相似性的方法。该方法使用单类分类器,即数据驱动的类类比软独立建模(DD-SIMCA)。三个例子说明了我们的方法。第一个例子使用模拟数据,展示了 DD-SIMCA 在检测新颖性方面的性能。第二个例子使用真实世界的数据,研究参与原发性闭角型青光眼治疗效果评估的两组患者的相似性。第三个例子来自医疗诊断。这是一个真实世界的公开数据,用于比较各种分类算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
期刊最新文献
Issue Information Cover Image Past, Present and Future of Research in Analytical Figures of Merit Analytical Figures of Merit in Univariate, Multivariate, and Multiway Calibration: What Have We Learned? What Do We Still Need to Learn? Paul Geladi (1951–2024) Chemometrician, spectroscopist and pioneer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1