分布式异构迁移学习

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-05-14 DOI:10.1016/j.bdr.2024.100456
Paolo Mignone , Gianvito Pio , Michelangelo Ceci
{"title":"分布式异构迁移学习","authors":"Paolo Mignone ,&nbsp;Gianvito Pio ,&nbsp;Michelangelo Ceci","doi":"10.1016/j.bdr.2024.100456","DOIUrl":null,"url":null,"abstract":"<div><p>Transfer learning has proved to be effective for building predictive models even in complex conditions with a low amount of available labeled data, by constructing a predictive model for a target domain also using the knowledge coming from a separate domain, called source domain. However, several existing transfer learning methods assume identical feature spaces between the source and the target domains. This assumption limits the possible real-world applications of such methods, since two separate, although related, domains could be described by totally different feature spaces. Heterogeneous transfer learning methods aim to overcome this limitation, but they usually <em>i)</em> make other assumptions on the features, such as requiring the same number of features, <em>ii)</em> are not generally able to distribute the workload over multiple computational nodes, <em>iii)</em> cannot work in the Positive-Unlabeled (PU) learning setting, which we also considered in this study, or <em>iv)</em> their applicability is limited to specific application domains, i.e., they are not general-purpose methods.</p><p>In this manuscript, we present a novel distributed heterogeneous transfer learning method, implemented in Apache Spark, that overcomes all the above-mentioned limitations. Specifically, it is able to work also in the PU learning setting by resorting to a clustering-based approach, and can align totally heterogeneous feature spaces, without exploiting peculiarities of specific application domains. Moreover, our distributed approach allows us to process large source and target datasets.</p><p>Our experimental evaluation was performed in three different application domains that can benefit from transfer learning approaches, namely the reconstruction of the human gene regulatory network, the prediction of cerebral stroke in hospital patients, and the prediction of customer energy consumption in power grids. The results show that the proposed approach is able to outperform 4 state-of-the-art heterogeneous transfer learning approaches and 3 baselines, and exhibits ideal performances in terms of scalability.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000327/pdfft?md5=33cf99e10874514291bfc635b26d260f&pid=1-s2.0-S2214579624000327-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Distributed Heterogeneous Transfer Learning\",\"authors\":\"Paolo Mignone ,&nbsp;Gianvito Pio ,&nbsp;Michelangelo Ceci\",\"doi\":\"10.1016/j.bdr.2024.100456\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Transfer learning has proved to be effective for building predictive models even in complex conditions with a low amount of available labeled data, by constructing a predictive model for a target domain also using the knowledge coming from a separate domain, called source domain. However, several existing transfer learning methods assume identical feature spaces between the source and the target domains. This assumption limits the possible real-world applications of such methods, since two separate, although related, domains could be described by totally different feature spaces. Heterogeneous transfer learning methods aim to overcome this limitation, but they usually <em>i)</em> make other assumptions on the features, such as requiring the same number of features, <em>ii)</em> are not generally able to distribute the workload over multiple computational nodes, <em>iii)</em> cannot work in the Positive-Unlabeled (PU) learning setting, which we also considered in this study, or <em>iv)</em> their applicability is limited to specific application domains, i.e., they are not general-purpose methods.</p><p>In this manuscript, we present a novel distributed heterogeneous transfer learning method, implemented in Apache Spark, that overcomes all the above-mentioned limitations. Specifically, it is able to work also in the PU learning setting by resorting to a clustering-based approach, and can align totally heterogeneous feature spaces, without exploiting peculiarities of specific application domains. Moreover, our distributed approach allows us to process large source and target datasets.</p><p>Our experimental evaluation was performed in three different application domains that can benefit from transfer learning approaches, namely the reconstruction of the human gene regulatory network, the prediction of cerebral stroke in hospital patients, and the prediction of customer energy consumption in power grids. The results show that the proposed approach is able to outperform 4 state-of-the-art heterogeneous transfer learning approaches and 3 baselines, and exhibits ideal performances in terms of scalability.</p></div>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2214579624000327/pdfft?md5=33cf99e10874514291bfc635b26d260f&pid=1-s2.0-S2214579624000327-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214579624000327\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000327","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

事实证明,迁移学习可以有效地构建预测模型,即使在可用标注数据较少的复杂条件下,也能利用来自另一个领域(称为源领域)的知识构建目标领域的预测模型。然而,现有的几种迁移学习方法都假设源域和目标域的特征空间完全相同。这一假设限制了此类方法在现实世界中的应用,因为两个独立的领域虽然相关,但可能由完全不同的特征空间来描述。异构迁移学习方法旨在克服这一限制,但它们通常 i) 对特征做出其他假设,如要求特征数量相同;ii) 通常无法在多个计算节点上分配工作量;iii) 无法在正向无标记(PU)学习环境中工作,我们在本研究中也考虑了这一点;或者 iv) 它们的适用性仅限于特定的应用领域,也就是说,它们不是通用方法、在本手稿中,我们介绍了一种在 Apache Spark 中实现的新型分布式异构迁移学习方法,它克服了上述所有局限。具体来说,它通过采用基于聚类的方法,也能在 PU 学习环境中工作,并能对齐完全异构的特征空间,而无需利用特定应用领域的特殊性。此外,我们的分布式方法允许我们处理大型源数据集和目标数据集。我们在三个不同的应用领域进行了实验评估,这些应用领域可以从迁移学习方法中获益,即人类基因调控网络的重建、医院病人脑中风的预测以及电网客户能源消耗的预测。结果表明,所提出的方法能够超越 4 种最先进的异构迁移学习方法和 3 种基线方法,并且在可扩展性方面表现理想。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Distributed Heterogeneous Transfer Learning

Transfer learning has proved to be effective for building predictive models even in complex conditions with a low amount of available labeled data, by constructing a predictive model for a target domain also using the knowledge coming from a separate domain, called source domain. However, several existing transfer learning methods assume identical feature spaces between the source and the target domains. This assumption limits the possible real-world applications of such methods, since two separate, although related, domains could be described by totally different feature spaces. Heterogeneous transfer learning methods aim to overcome this limitation, but they usually i) make other assumptions on the features, such as requiring the same number of features, ii) are not generally able to distribute the workload over multiple computational nodes, iii) cannot work in the Positive-Unlabeled (PU) learning setting, which we also considered in this study, or iv) their applicability is limited to specific application domains, i.e., they are not general-purpose methods.

In this manuscript, we present a novel distributed heterogeneous transfer learning method, implemented in Apache Spark, that overcomes all the above-mentioned limitations. Specifically, it is able to work also in the PU learning setting by resorting to a clustering-based approach, and can align totally heterogeneous feature spaces, without exploiting peculiarities of specific application domains. Moreover, our distributed approach allows us to process large source and target datasets.

Our experimental evaluation was performed in three different application domains that can benefit from transfer learning approaches, namely the reconstruction of the human gene regulatory network, the prediction of cerebral stroke in hospital patients, and the prediction of customer energy consumption in power grids. The results show that the proposed approach is able to outperform 4 state-of-the-art heterogeneous transfer learning approaches and 3 baselines, and exhibits ideal performances in terms of scalability.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Vitamin B12: prevention of human beings from lethal diseases and its food application. Current status and obstacles of narrowing yield gaps of four major crops. Cold shock treatment alleviates pitting in sweet cherry fruit by enhancing antioxidant enzymes activity and regulating membrane lipid metabolism. Removal of proteins and lipids affects structure, in vitro digestion and physicochemical properties of rice flour modified by heat-moisture treatment. Investigating the impact of climate variables on the organic honey yield in Turkey using XGBoost machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1