基于图的半监督学习距离函数的自动设计

Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp
{"title":"基于图的半监督学习距离函数的自动设计","authors":"Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp","doi":"10.1109/Trustcom/BigDataSE/ICESS.2017.333","DOIUrl":null,"url":null,"abstract":"Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.","PeriodicalId":170253,"journal":{"name":"2017 IEEE Trustcom/BigDataSE/ICESS","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatically Design Distance Functions for Graph-Based Semi-Supervised Learning\",\"authors\":\"Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp\",\"doi\":\"10.1109/Trustcom/BigDataSE/ICESS.2017.333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.\",\"PeriodicalId\":170253,\"journal\":{\"name\":\"2017 IEEE Trustcom/BigDataSE/ICESS\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Trustcom/BigDataSE/ICESS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Trustcom/BigDataSE/ICESS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

自动数据分类通常由监督学习算法执行,产生一个模型来分类新的实例。半监督学习(SSL)方法反映了标记实例是昂贵的,一旦学习只需要几个标记实例,则证明是执行数据分类的一种替代方法。有许多SSL算法,基于图的算法具有重要的特性。特别是,基于图的模型允许在没有统计模型参数先验知识的情况下识别不同分布的类别。然而,可能影响其分类性能的一个缺点依赖于图的构造,这需要测量实例之间的距离(或相似性)。由于特定的距离函数可以提高某些数据集的性能,而降低其他数据集的性能,在这里,我们引入了一种新的方法,称为GEAD,一种用于自动设计基于图的半监督学习的距离函数的语法进化。我们对100个公共数据集进行了广泛的实验,以评估我们的方法的性能,并将其与文献中的传统距离函数进行了比较。结果表明,对于不同的预测指标,如Micro-F1和Macro-F1, GEAD能够设计出明显优于基线人工设计的距离函数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatically Design Distance Functions for Graph-Based Semi-Supervised Learning
Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Insider Threat Detection Through Attributed Graph Clustering SEEAD: A Semantic-Based Approach for Automatic Binary Code De-obfuscation A Public Key Encryption Scheme for String Identification Vehicle Incident Hot Spots Identification: An Approach for Big Data Implementing Chain of Custody Requirements in Database Audit Records for Forensic Purposes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1