Nabila Guennouni, C. Sallaberry, Sébastien Laborie, R. Chbeir
Over the last decade, the number of research and development projects on sensor network technology has grown exponentially. Events detection is among these research fields, it allows the monitoring of the environment. To build an interpretation to these events, the combination of sensor network and document corpus data is essential since document corpus provide significant amounts of important and valuable information (e.g., technical data sheets, maintenance reports, customer sheets). However, most information systems in connected environments do not support the interconnection of sensor network and document corpus data, hence, user has to look for an explanation by himself through multiple queries on both data sources which is indeed very tedious, time consuming and requires a huge compilation effort. In this paper, we show that recent researches on 5W1H question-answering ("What? Who? Where? When? Why? How?") are an interesting issue to facilitate tunnelling through heterogeneous data sources (sensor networks and document corpus) and the identification of relevant data for the purpose of explaining an event. Consequently, we propose ISEE (an Information System for Event Explanation), a framework for event interpretation based on (i) the semantic representation of a heterogeneous information system, (ii) the cross-analysis of both sensor network and document corpus data and (iii) 5W1H question-answering techniques.
{"title":"A Novel Framework for Event Interpretation in a Heterogeneous Information System","authors":"Nabila Guennouni, C. Sallaberry, Sébastien Laborie, R. Chbeir","doi":"10.1145/3415958.3433073","DOIUrl":"https://doi.org/10.1145/3415958.3433073","url":null,"abstract":"Over the last decade, the number of research and development projects on sensor network technology has grown exponentially. Events detection is among these research fields, it allows the monitoring of the environment. To build an interpretation to these events, the combination of sensor network and document corpus data is essential since document corpus provide significant amounts of important and valuable information (e.g., technical data sheets, maintenance reports, customer sheets). However, most information systems in connected environments do not support the interconnection of sensor network and document corpus data, hence, user has to look for an explanation by himself through multiple queries on both data sources which is indeed very tedious, time consuming and requires a huge compilation effort. In this paper, we show that recent researches on 5W1H question-answering (\"What? Who? Where? When? Why? How?\") are an interesting issue to facilitate tunnelling through heterogeneous data sources (sensor networks and document corpus) and the identification of relevant data for the purpose of explaining an event. Consequently, we propose ISEE (an Information System for Event Explanation), a framework for event interpretation based on (i) the semantic representation of a heterogeneous information system, (ii) the cross-analysis of both sensor network and document corpus data and (iii) 5W1H question-answering techniques.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"331 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.
{"title":"Machine Learning Pipeline for Reusing Pretrained Models","authors":"M. Alshehhi, Di Wang","doi":"10.1145/3415958.3433054","DOIUrl":"https://doi.org/10.1145/3415958.3433054","url":null,"abstract":"Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134164110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Anisetti, C. Ardagna, E. Damiani, Paolo G. Panero
The pervasive diffusion of Machine Learning (ML) in many critical domains and application scenarios has revolutionized implementation and working of modern IT systems. The behavior of modern systems often depends on the behavior of ML models, which are treated as black boxes, thus making automated decisions based on inference unpredictable. In this context, there is an increasing need of verifying the non-functional properties of ML models, such as, fairness and privacy, to the aim of providing certified ML-based applications and services. In this paper, we propose a methodology based on Multi-Armed Bandit for evaluating non-functional properties of ML models. Our methodology adopts Thompson sampling, Monte Carlo Simulation, and Value Remaining. An experimental evaluation in a real-world scenario is presented to prove the applicability of our approach in evaluating the fairness of different ML models.
{"title":"A Methodology for Non-Functional Property Evaluation of Machine Learning Models","authors":"M. Anisetti, C. Ardagna, E. Damiani, Paolo G. Panero","doi":"10.1145/3415958.3433101","DOIUrl":"https://doi.org/10.1145/3415958.3433101","url":null,"abstract":"The pervasive diffusion of Machine Learning (ML) in many critical domains and application scenarios has revolutionized implementation and working of modern IT systems. The behavior of modern systems often depends on the behavior of ML models, which are treated as black boxes, thus making automated decisions based on inference unpredictable. In this context, there is an increasing need of verifying the non-functional properties of ML models, such as, fairness and privacy, to the aim of providing certified ML-based applications and services. In this paper, we propose a methodology based on Multi-Armed Bandit for evaluating non-functional properties of ML models. Our methodology adopts Thompson sampling, Monte Carlo Simulation, and Value Remaining. An experimental evaluation in a real-world scenario is presented to prove the applicability of our approach in evaluating the fairness of different ML models.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Popular microblogging platforms (such as Twitter) offer a fertile ground for open communication among humans, however, they also attract many bots and automated accounts "disguised" as human users. Typically, such accounts favor malicious activities such as phishing, public opinion manipulation and hate speech spreading, to name a few. Although several AI driven bot detection methods have been implemented, the justification of bot classification and characterization remains quite opaque and AI decisions lack in ethical responsibility. Most of these approaches operate with AI black-boxed algorithms and their efficiency is often questionable. In this work we propose Bot-Detective, a web service that takes into account both the efficient detection of bot users and the interpretability of the results as well. Our main contributions are summarized as follows: i) we propose a novel explainable bot-detection approach, which, to the best of authors' knowledge, is the first one to offer interpretable, responsible, and AI driven bot identification in Twitter, ii) we deploy a publicly available bot detection Web service which integrates an explainable ML framework along with users feedback functionality under an effective crowdsourcing mechanism; iii) we build the proposed service under a newly created annotated dataset by exploiting Twitter's rules and existing tools. This dataset is publicly shared for further use. In situ experimentation has showcased that Bot-Detective produces comprehensive and accurate results, with a promising service take up at scale.
{"title":"Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities","authors":"Maria Kouvela, Ilias Dimitriadis, A. Vakali","doi":"10.1145/3415958.3433075","DOIUrl":"https://doi.org/10.1145/3415958.3433075","url":null,"abstract":"Popular microblogging platforms (such as Twitter) offer a fertile ground for open communication among humans, however, they also attract many bots and automated accounts \"disguised\" as human users. Typically, such accounts favor malicious activities such as phishing, public opinion manipulation and hate speech spreading, to name a few. Although several AI driven bot detection methods have been implemented, the justification of bot classification and characterization remains quite opaque and AI decisions lack in ethical responsibility. Most of these approaches operate with AI black-boxed algorithms and their efficiency is often questionable. In this work we propose Bot-Detective, a web service that takes into account both the efficient detection of bot users and the interpretability of the results as well. Our main contributions are summarized as follows: i) we propose a novel explainable bot-detection approach, which, to the best of authors' knowledge, is the first one to offer interpretable, responsible, and AI driven bot identification in Twitter, ii) we deploy a publicly available bot detection Web service which integrates an explainable ML framework along with users feedback functionality under an effective crowdsourcing mechanism; iii) we build the proposed service under a newly created annotated dataset by exploiting Twitter's rules and existing tools. This dataset is publicly shared for further use. In situ experimentation has showcased that Bot-Detective produces comprehensive and accurate results, with a promising service take up at scale.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115640030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}