双约束一致性聚类及其在网上防伪中的应用

IF 2.5 4区 综合性期刊 Q2 CHEMISTRY, MULTIDISCIPLINARY Applied Sciences-Basel Pub Date : 2023-09-06 DOI:10.3390/app131810050
Claudio Carpineto, Giovanni Romano
{"title":"双约束一致性聚类及其在网上防伪中的应用","authors":"Claudio Carpineto, Giovanni Romano","doi":"10.3390/app131810050","DOIUrl":null,"url":null,"abstract":"Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (CC) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed CCC double-constrained consensus clustering), was more effective than plain CC at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that CCC is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that CCC makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance.","PeriodicalId":48760,"journal":{"name":"Applied Sciences-Basel","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting\",\"authors\":\"Claudio Carpineto, Giovanni Romano\",\"doi\":\"10.3390/app131810050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (CC) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed CCC double-constrained consensus clustering), was more effective than plain CC at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that CCC is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that CCC makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance.\",\"PeriodicalId\":48760,\"journal\":{\"name\":\"Applied Sciences-Basel\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Sciences-Basel\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.3390/app131810050\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences-Basel","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/app131810050","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

半监督一致性聚类是一种很有前途的策略,可以补偿聚类的主观性及其对设计因素的敏感性,最近提出了各种技术来集成领域知识和多个聚类分区。在本文中,我们提出了一种双重利用领域知识的新方法,即构建初始分区,以及将它们组合在一起。特别是,我们展示了如何将必须链接和不能链接的约束建模和集成到通用一致性聚类(CC)框架的目标函数中,该框架最大限度地提高了一致性分区和输入分区之间的相似性,而输入分区又被相同的约束所丰富。此外,借用函数依赖理论,集成框架利用演绎闭包和最小覆盖的概念,充分利用约束之间的逻辑含义。使用标准的UCI基准,我们发现所得到的算法(称为CCC双约束一致性聚类)在组合基本约束分区方面比普通CC更有效,平均性能提高了5.54%。然后我们认为CCC特别适合分析假冒电子商务网站,因为可以通过利用特定的领域功能来获得限制,并展示其检测联盟营销计划的潜力。总之,我们的实验表明,CCC使聚类过程更加稳健,能够承受聚类算法、数据集和特征的变化,平均性能显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting
Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (CC) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed CCC double-constrained consensus clustering), was more effective than plain CC at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that CCC is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that CCC makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Sciences-Basel
Applied Sciences-Basel CHEMISTRY, MULTIDISCIPLINARYMATERIALS SCIE-MATERIALS SCIENCE, MULTIDISCIPLINARY
CiteScore
5.30
自引率
11.10%
发文量
10882
期刊介绍: Applied Sciences (ISSN 2076-3417) provides an advanced forum on all aspects of applied natural sciences. It publishes reviews, research papers and communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files and software regarding the full details of the calculation or experimental procedure, if unable to be published in a normal way, can be deposited as supplementary electronic material.
期刊最新文献
Application of Digital Holographic Imaging to Monitor Real-Time Cardiomyocyte Hypertrophy Dynamics in Response to Norepinephrine Stimulation. Study on Shear Resistance and Structural Performance of Corrugated Steel–Concrete Composite Deck Clustering Analysis of Wind Turbine Alarm Sequences Based on Domain Knowledge-Fused Word2vec Unraveling Functional Dysphagia: A Game-Changing Automated Machine-Learning Diagnostic Approach Spatial Overlay Analysis of Geochemical Singularity Index α-Value of Porphyry Cu Deposit in Gangdese Metallogenic Belt, Tibet, Western China
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1