Machine Learning Pipeline for Reusing Pretrained Models

M. Alshehhi, Di Wang
{"title":"Machine Learning Pipeline for Reusing Pretrained Models","authors":"M. Alshehhi, Di Wang","doi":"10.1145/3415958.3433054","DOIUrl":null,"url":null,"abstract":"Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3415958.3433054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
重用预训练模型的机器学习管道
事实证明,机器学习方法在分析各种格式的大量数据以获取模式、检测趋势、获得洞察力和基于历史数据预测结果方面是有效的。然而,从时间和数据消耗的角度来看,在各种实际应用程序中从头开始训练模型是非常昂贵的。模型自适应(域自适应)是解决这一问题的一种很有前途的方法。它可以重用嵌入在现有模型中的知识来训练另一个模型。然而,由于数据集偏差或域移位,模型自适应是一项具有挑战性的任务。此外,由于数据隐私和成本问题(收集额外的数据可能需要花钱),从原始(源)域和目的地(目标)域访问数据在现实世界中经常是一个问题。近年来介绍了几种领域自适应算法和方法;他们为不同但相关的目标领域重用来自一个源领域的训练模型。现有的许多领域自适应方法都是利用源领域的数据来修改训练好的模型结构或调整目标领域的潜在空间。领域自适应技术可以根据几个标准进行评估,即准确性、知识转移、培训时间和预算。在本文中,我们从这样的概念出发,即在许多现实场景中,训练模型的所有者限制对模型结构和源数据集的访问。为了解决这一问题,我们提出了一种方法,在不访问源域的情况下,有效地从目标域中选择数据(最小化目标域数据的消耗)以适应现有模型,同时仍然达到可接受的精度。我们的方法是为监督学习和半监督学习设计的,并可扩展到无监督学习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Selection of Information Streams in Social Sensing: an Interdependence- and Cost-aware Ranking Method LEOnto Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities A Novel Framework for Event Interpretation in a Heterogeneous Information System Spatial Information Retrieval in Digital Ecosystems: A Comprehensive Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1