首页 > 最新文献

Proceedings of the 12th International Conference on Management of Digital EcoSystems最新文献

英文 中文
A Novel Framework for Event Interpretation in a Heterogeneous Information System 异构信息系统中一种新的事件解释框架
Nabila Guennouni, C. Sallaberry, Sébastien Laborie, R. Chbeir
Over the last decade, the number of research and development projects on sensor network technology has grown exponentially. Events detection is among these research fields, it allows the monitoring of the environment. To build an interpretation to these events, the combination of sensor network and document corpus data is essential since document corpus provide significant amounts of important and valuable information (e.g., technical data sheets, maintenance reports, customer sheets). However, most information systems in connected environments do not support the interconnection of sensor network and document corpus data, hence, user has to look for an explanation by himself through multiple queries on both data sources which is indeed very tedious, time consuming and requires a huge compilation effort. In this paper, we show that recent researches on 5W1H question-answering ("What? Who? Where? When? Why? How?") are an interesting issue to facilitate tunnelling through heterogeneous data sources (sensor networks and document corpus) and the identification of relevant data for the purpose of explaining an event. Consequently, we propose ISEE (an Information System for Event Explanation), a framework for event interpretation based on (i) the semantic representation of a heterogeneous information system, (ii) the cross-analysis of both sensor network and document corpus data and (iii) 5W1H question-answering techniques.
在过去的十年中,传感器网络技术的研究和开发项目的数量呈指数级增长。事件检测是这些研究领域之一,它允许对环境进行监测。为了对这些事件进行解释,传感器网络和文档语料库数据的组合是必不可少的,因为文档语料库提供了大量重要和有价值的信息(例如,技术数据表、维护报告、客户表)。然而,大多数互联环境下的信息系统并不支持传感器网络和文档语料库数据的互联,因此,用户必须通过对两个数据源的多次查询来寻找自己的解释,这确实非常繁琐,耗时且需要大量的编译工作。在本文中,我们展示了最近关于5W1H问答(“What?”谁?在哪里?什么时候?为什么?如何?”)是一个有趣的问题,它有助于通过异构数据源(传感器网络和文档语料库)挖掘隧道,并识别相关数据以解释事件。因此,我们提出了ISEE(事件解释信息系统),这是一个基于(i)异构信息系统的语义表示,(ii)传感器网络和文档语料库数据的交叉分析,以及(iii) 5W1H问答技术的事件解释框架。
{"title":"A Novel Framework for Event Interpretation in a Heterogeneous Information System","authors":"Nabila Guennouni, C. Sallaberry, Sébastien Laborie, R. Chbeir","doi":"10.1145/3415958.3433073","DOIUrl":"https://doi.org/10.1145/3415958.3433073","url":null,"abstract":"Over the last decade, the number of research and development projects on sensor network technology has grown exponentially. Events detection is among these research fields, it allows the monitoring of the environment. To build an interpretation to these events, the combination of sensor network and document corpus data is essential since document corpus provide significant amounts of important and valuable information (e.g., technical data sheets, maintenance reports, customer sheets). However, most information systems in connected environments do not support the interconnection of sensor network and document corpus data, hence, user has to look for an explanation by himself through multiple queries on both data sources which is indeed very tedious, time consuming and requires a huge compilation effort. In this paper, we show that recent researches on 5W1H question-answering (\"What? Who? Where? When? Why? How?\") are an interesting issue to facilitate tunnelling through heterogeneous data sources (sensor networks and document corpus) and the identification of relevant data for the purpose of explaining an event. Consequently, we propose ISEE (an Information System for Event Explanation), a framework for event interpretation based on (i) the semantic representation of a heterogeneous information system, (ii) the cross-analysis of both sensor network and document corpus data and (iii) 5W1H question-answering techniques.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"331 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine Learning Pipeline for Reusing Pretrained Models 重用预训练模型的机器学习管道
M. Alshehhi, Di Wang
Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.
事实证明,机器学习方法在分析各种格式的大量数据以获取模式、检测趋势、获得洞察力和基于历史数据预测结果方面是有效的。然而,从时间和数据消耗的角度来看,在各种实际应用程序中从头开始训练模型是非常昂贵的。模型自适应(域自适应)是解决这一问题的一种很有前途的方法。它可以重用嵌入在现有模型中的知识来训练另一个模型。然而,由于数据集偏差或域移位,模型自适应是一项具有挑战性的任务。此外,由于数据隐私和成本问题(收集额外的数据可能需要花钱),从原始(源)域和目的地(目标)域访问数据在现实世界中经常是一个问题。近年来介绍了几种领域自适应算法和方法;他们为不同但相关的目标领域重用来自一个源领域的训练模型。现有的许多领域自适应方法都是利用源领域的数据来修改训练好的模型结构或调整目标领域的潜在空间。领域自适应技术可以根据几个标准进行评估,即准确性、知识转移、培训时间和预算。在本文中,我们从这样的概念出发,即在许多现实场景中,训练模型的所有者限制对模型结构和源数据集的访问。为了解决这一问题,我们提出了一种方法,在不访问源域的情况下,有效地从目标域中选择数据(最小化目标域数据的消耗)以适应现有模型,同时仍然达到可接受的精度。我们的方法是为监督学习和半监督学习设计的,并可扩展到无监督学习。
{"title":"Machine Learning Pipeline for Reusing Pretrained Models","authors":"M. Alshehhi, Di Wang","doi":"10.1145/3415958.3433054","DOIUrl":"https://doi.org/10.1145/3415958.3433054","url":null,"abstract":"Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134164110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Methodology for Non-Functional Property Evaluation of Machine Learning Models 机器学习模型的非功能属性评价方法
M. Anisetti, C. Ardagna, E. Damiani, Paolo G. Panero
The pervasive diffusion of Machine Learning (ML) in many critical domains and application scenarios has revolutionized implementation and working of modern IT systems. The behavior of modern systems often depends on the behavior of ML models, which are treated as black boxes, thus making automated decisions based on inference unpredictable. In this context, there is an increasing need of verifying the non-functional properties of ML models, such as, fairness and privacy, to the aim of providing certified ML-based applications and services. In this paper, we propose a methodology based on Multi-Armed Bandit for evaluating non-functional properties of ML models. Our methodology adopts Thompson sampling, Monte Carlo Simulation, and Value Remaining. An experimental evaluation in a real-world scenario is presented to prove the applicability of our approach in evaluating the fairness of different ML models.
机器学习(ML)在许多关键领域和应用场景中的广泛传播已经彻底改变了现代IT系统的实现和工作。现代系统的行为通常依赖于ML模型的行为,这些模型被视为黑盒,因此基于不可预测的推理做出自动决策。在这种情况下,越来越需要验证ML模型的非功能属性,例如公平性和隐私性,以提供经过认证的基于ML的应用程序和服务。在本文中,我们提出了一种基于Multi-Armed Bandit的方法来评估ML模型的非功能属性。我们的方法采用汤普森抽样、蒙特卡罗模拟和价值保留。提出了一个真实场景中的实验评估,以证明我们的方法在评估不同ML模型的公平性方面的适用性。
{"title":"A Methodology for Non-Functional Property Evaluation of Machine Learning Models","authors":"M. Anisetti, C. Ardagna, E. Damiani, Paolo G. Panero","doi":"10.1145/3415958.3433101","DOIUrl":"https://doi.org/10.1145/3415958.3433101","url":null,"abstract":"The pervasive diffusion of Machine Learning (ML) in many critical domains and application scenarios has revolutionized implementation and working of modern IT systems. The behavior of modern systems often depends on the behavior of ML models, which are treated as black boxes, thus making automated decisions based on inference unpredictable. In this context, there is an increasing need of verifying the non-functional properties of ML models, such as, fairness and privacy, to the aim of providing certified ML-based applications and services. In this paper, we propose a methodology based on Multi-Armed Bandit for evaluating non-functional properties of ML models. Our methodology adopts Thompson sampling, Monte Carlo Simulation, and Value Remaining. An experimental evaluation in a real-world scenario is presented to prove the applicability of our approach in evaluating the fairness of different ML models.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities bot - detective:一个可解释的Twitter bot检测服务,具有众包功能
Maria Kouvela, Ilias Dimitriadis, A. Vakali
Popular microblogging platforms (such as Twitter) offer a fertile ground for open communication among humans, however, they also attract many bots and automated accounts "disguised" as human users. Typically, such accounts favor malicious activities such as phishing, public opinion manipulation and hate speech spreading, to name a few. Although several AI driven bot detection methods have been implemented, the justification of bot classification and characterization remains quite opaque and AI decisions lack in ethical responsibility. Most of these approaches operate with AI black-boxed algorithms and their efficiency is often questionable. In this work we propose Bot-Detective, a web service that takes into account both the efficient detection of bot users and the interpretability of the results as well. Our main contributions are summarized as follows: i) we propose a novel explainable bot-detection approach, which, to the best of authors' knowledge, is the first one to offer interpretable, responsible, and AI driven bot identification in Twitter, ii) we deploy a publicly available bot detection Web service which integrates an explainable ML framework along with users feedback functionality under an effective crowdsourcing mechanism; iii) we build the proposed service under a newly created annotated dataset by exploiting Twitter's rules and existing tools. This dataset is publicly shared for further use. In situ experimentation has showcased that Bot-Detective produces comprehensive and accurate results, with a promising service take up at scale.
流行的微博平台(如Twitter)为人类之间的公开交流提供了肥沃的土壤,然而,它们也吸引了许多“伪装”成人类用户的机器人和自动账户。通常情况下,这些账户支持恶意活动,如网络钓鱼、舆论操纵和仇恨言论传播等。尽管已经实施了几种人工智能驱动的机器人检测方法,但机器人分类和表征的理由仍然相当不透明,人工智能决策缺乏道德责任。这些方法大多使用人工智能黑盒算法,其效率经常受到质疑。在这项工作中,我们提出了bot - detective,这是一种既考虑到bot用户的有效检测又考虑到结果的可解释性的web服务。我们的主要贡献总结如下:i)我们提出了一种新颖的可解释的机器人检测方法,据作者所知,这是第一个在Twitter上提供可解释的、负责任的和人工智能驱动的机器人识别的方法,ii)我们部署了一个公开可用的机器人检测Web服务,该服务在有效的众包机制下集成了一个可解释的ML框架以及用户反馈功能;iii)我们利用Twitter的规则和现有工具,在新创建的带注释的数据集下构建提议的服务。此数据集公开共享以供进一步使用。现场实验表明,Bot-Detective可以产生全面而准确的结果,并具有大规模应用的前景。
{"title":"Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities","authors":"Maria Kouvela, Ilias Dimitriadis, A. Vakali","doi":"10.1145/3415958.3433075","DOIUrl":"https://doi.org/10.1145/3415958.3433075","url":null,"abstract":"Popular microblogging platforms (such as Twitter) offer a fertile ground for open communication among humans, however, they also attract many bots and automated accounts \"disguised\" as human users. Typically, such accounts favor malicious activities such as phishing, public opinion manipulation and hate speech spreading, to name a few. Although several AI driven bot detection methods have been implemented, the justification of bot classification and characterization remains quite opaque and AI decisions lack in ethical responsibility. Most of these approaches operate with AI black-boxed algorithms and their efficiency is often questionable. In this work we propose Bot-Detective, a web service that takes into account both the efficient detection of bot users and the interpretability of the results as well. Our main contributions are summarized as follows: i) we propose a novel explainable bot-detection approach, which, to the best of authors' knowledge, is the first one to offer interpretable, responsible, and AI driven bot identification in Twitter, ii) we deploy a publicly available bot detection Web service which integrates an explainable ML framework along with users feedback functionality under an effective crowdsourcing mechanism; iii) we build the proposed service under a newly created annotated dataset by exploiting Twitter's rules and existing tools. This dataset is publicly shared for further use. In situ experimentation has showcased that Bot-Detective produces comprehensive and accurate results, with a promising service take up at scale.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115640030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
Proceedings of the 12th International Conference on Management of Digital EcoSystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1