让机器学习报告更具可持续性和可信度

IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data Mining and Knowledge Discovery Pub Date : 2024-04-30 DOI:10.1007/s10618-024-01020-3
Raphael Fischer, Thomas Liebig, Katharina Morik
{"title":"让机器学习报告更具可持续性和可信度","authors":"Raphael Fischer, Thomas Liebig, Katharina Morik","doi":"10.1007/s10618-024-01020-3","DOIUrl":null,"url":null,"abstract":"<p>With machine learning (ML) becoming a popular tool across all domains, practitioners are in dire need of comprehensive reporting on the state-of-the-art. Benchmarks and open databases provide helpful insights for many tasks, however suffer from several phenomena: Firstly, they overly focus on prediction quality, which is problematic considering the demand for more sustainability in ML. Depending on the use case at hand, interested users might also face tight resource constraints and thus should be allowed to interact with reporting frameworks, in order to prioritize certain reported characteristics. Furthermore, as some practitioners might not yet be well-skilled in ML, it is important to convey information on a more abstract, comprehensible level. Usability and extendability are key for moving with the state-of-the-art and in order to be trustworthy, frameworks should explicitly address reproducibility. In this work, we analyze established reporting systems under consideration of the aforementioned issues. Afterwards, we propose STREP, our novel framework that aims at overcoming these shortcomings and paves the way towards more sustainable and trustworthy reporting. We use STREP’s (publicly available) implementation to investigate various existing report databases. Our experimental results unveil the need for making reporting more resource-aware and demonstrate our framework’s capabilities of overcoming current reporting limitations. With our work, we want to initiate a paradigm shift in reporting and help with making ML advances more considerate of sustainability and trustworthiness.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"8 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards more sustainable and trustworthy reporting in machine learning\",\"authors\":\"Raphael Fischer, Thomas Liebig, Katharina Morik\",\"doi\":\"10.1007/s10618-024-01020-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>With machine learning (ML) becoming a popular tool across all domains, practitioners are in dire need of comprehensive reporting on the state-of-the-art. Benchmarks and open databases provide helpful insights for many tasks, however suffer from several phenomena: Firstly, they overly focus on prediction quality, which is problematic considering the demand for more sustainability in ML. Depending on the use case at hand, interested users might also face tight resource constraints and thus should be allowed to interact with reporting frameworks, in order to prioritize certain reported characteristics. Furthermore, as some practitioners might not yet be well-skilled in ML, it is important to convey information on a more abstract, comprehensible level. Usability and extendability are key for moving with the state-of-the-art and in order to be trustworthy, frameworks should explicitly address reproducibility. In this work, we analyze established reporting systems under consideration of the aforementioned issues. Afterwards, we propose STREP, our novel framework that aims at overcoming these shortcomings and paves the way towards more sustainable and trustworthy reporting. We use STREP’s (publicly available) implementation to investigate various existing report databases. Our experimental results unveil the need for making reporting more resource-aware and demonstrate our framework’s capabilities of overcoming current reporting limitations. With our work, we want to initiate a paradigm shift in reporting and help with making ML advances more considerate of sustainability and trustworthiness.</p>\",\"PeriodicalId\":55183,\"journal\":{\"name\":\"Data Mining and Knowledge Discovery\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Mining and Knowledge Discovery\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10618-024-01020-3\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01020-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

随着机器学习(ML)成为各个领域的流行工具,从业人员迫切需要有关最新技术的全面报告。基准和开放数据库为许多任务提供了有用的见解,但也存在一些问题:首先,它们过于关注预测质量,而考虑到对 ML 可持续性的要求,这是有问题的。根据当前的使用情况,感兴趣的用户也可能面临资源紧张的问题,因此应允许他们与报告框架进行交互,以便优先考虑某些报告特征。此外,由于一些从业人员可能尚未熟练掌握 ML,因此必须在更抽象、更易理解的层面上传达信息。可用性和可扩展性是与最新技术保持同步的关键,为了值得信赖,框架应明确解决可重复性问题。在这项工作中,我们根据上述问题对已有的报告系统进行了分析。随后,我们提出了 STREP -- 我们的新型框架,旨在克服这些不足,为实现更可持续、更可信的报告铺平道路。我们使用 STREP(公开可用)的实现来调查各种现有的报告数据库。我们的实验结果揭示了使报告更具资源感知能力的必要性,并展示了我们的框架克服当前报告局限性的能力。通过我们的工作,我们希望启动报告范式的转变,并帮助使 ML 的进步更加考虑可持续性和可信性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards more sustainable and trustworthy reporting in machine learning

With machine learning (ML) becoming a popular tool across all domains, practitioners are in dire need of comprehensive reporting on the state-of-the-art. Benchmarks and open databases provide helpful insights for many tasks, however suffer from several phenomena: Firstly, they overly focus on prediction quality, which is problematic considering the demand for more sustainability in ML. Depending on the use case at hand, interested users might also face tight resource constraints and thus should be allowed to interact with reporting frameworks, in order to prioritize certain reported characteristics. Furthermore, as some practitioners might not yet be well-skilled in ML, it is important to convey information on a more abstract, comprehensible level. Usability and extendability are key for moving with the state-of-the-art and in order to be trustworthy, frameworks should explicitly address reproducibility. In this work, we analyze established reporting systems under consideration of the aforementioned issues. Afterwards, we propose STREP, our novel framework that aims at overcoming these shortcomings and paves the way towards more sustainable and trustworthy reporting. We use STREP’s (publicly available) implementation to investigate various existing report databases. Our experimental results unveil the need for making reporting more resource-aware and demonstrate our framework’s capabilities of overcoming current reporting limitations. With our work, we want to initiate a paradigm shift in reporting and help with making ML advances more considerate of sustainability and trustworthiness.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery 工程技术-计算机:人工智能
CiteScore
10.40
自引率
4.20%
发文量
68
审稿时长
10 months
期刊介绍: Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.
期刊最新文献
FRUITS: feature extraction using iterated sums for time series classification Bounding the family-wise error rate in local causal discovery using Rademacher averages Evaluating the disclosure risk of anonymized documents via a machine learning-based re-identification attack Efficient learning with projected histograms Opinion dynamics in social networks incorporating higher-order interactions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1