Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00046
Jumpei Ono, Miku Kawai, Takashi Ogata
This study proposes the need for explanation-centered story generation, with the objective of extending the method of doing so. Explanation-Centered Story Generation is a concept that generates a story from an explanation, offering a mechanism for the story generation system to generate multiple forms of stories in this manner.
{"title":"Toward Explanation-Centered Story Generation","authors":"Jumpei Ono, Miku Kawai, Takashi Ogata","doi":"10.1109/AIKE48582.2020.00046","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00046","url":null,"abstract":"This study proposes the need for explanation-centered story generation, with the objective of extending the method of doing so. Explanation-Centered Story Generation is a concept that generates a story from an explanation, offering a mechanism for the story generation system to generate multiple forms of stories in this manner.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122926777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00027
Agostino Forestiero
Discovering anomalous data or behaviors is fundamental to obtain critical security information such as intrusion detections, faults and system failures. The limited resources, like computing and storage, make conventional techniques to design Intrusion Detection Systems (IDS) not particularly suitable for smart environments. This paper proposes a novel multiagent algorithm leveraging on devices activity footprints for intrusion detection in Internet of Things environment. Smart objects are mapped with real-valued vectors obtained through the IoT2Vec model, a word embedding technique able to capture the semantic context of device activities and represent these ones in dense vectors. The vectors are assigned to agents, which are spread onto a 2D virtual space, where they move following the rules of a bio-inspired model, the flocking model. A similarity function, applied to the associated vectors, drives the agents for a selective application of the movement rules. The outcome is the emergence of agent groups aggregated on the basis of the activities of their associated devices. Thus, it is possible to easily individuate isolated agents (i.e. devices with dissimilar activity from all), representing potential intruders or with anomalous behaviors to be monitored. Preliminary results confirm the validity of the approach.
{"title":"Intrusion detection algorithm in Smart Environments featuring activity footprints approach","authors":"Agostino Forestiero","doi":"10.1109/AIKE48582.2020.00027","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00027","url":null,"abstract":"Discovering anomalous data or behaviors is fundamental to obtain critical security information such as intrusion detections, faults and system failures. The limited resources, like computing and storage, make conventional techniques to design Intrusion Detection Systems (IDS) not particularly suitable for smart environments. This paper proposes a novel multiagent algorithm leveraging on devices activity footprints for intrusion detection in Internet of Things environment. Smart objects are mapped with real-valued vectors obtained through the IoT2Vec model, a word embedding technique able to capture the semantic context of device activities and represent these ones in dense vectors. The vectors are assigned to agents, which are spread onto a 2D virtual space, where they move following the rules of a bio-inspired model, the flocking model. A similarity function, applied to the associated vectors, drives the agents for a selective application of the movement rules. The outcome is the emergence of agent groups aggregated on the basis of the activities of their associated devices. Thus, it is possible to easily individuate isolated agents (i.e. devices with dissimilar activity from all), representing potential intruders or with anomalous behaviors to be monitored. Preliminary results confirm the validity of the approach.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116747532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00037
N. Kharlamova, S. Hashemi, C. Træholt
Battery energy storage systems (BESSs) are becoming a crucial part of electric grids due to their important roles in renewable energy sources (RES) integration in energy systems. Cyber-secure operation of BESS in renewable energy systems is significant, since it is susceptible to cyber threats and its potential failure may result in economical and physical damage to both the BESS and the system. However, there is a lack of comprehensive study on the attack detection methods for industrial BESSs. This paper reviews the state-of-the-art work in the area of BESS cyber threats, investigates how to detect cyberattackes in the operation stage. We address the problem of enhancing the communication channels' integrity can by implementing blockchain in the design stage of BESS, combined with applying artificial intelligence (AI) and machine learning (ML) methods for false data injection attack (FDIA) detection in the BESS operation stage. The focus is on the application of ML and AI methods for FDIA detection on different system layers. Based on our analysis, data-driven approaches such as clustering and artificial-neutral-network-based state estimation (SE) forecast are recommended for the implementation in BESSs.
{"title":"The Cyber Security of Battery Energy Storage Systems and Adoption of Data-driven Methods","authors":"N. Kharlamova, S. Hashemi, C. Træholt","doi":"10.1109/AIKE48582.2020.00037","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00037","url":null,"abstract":"Battery energy storage systems (BESSs) are becoming a crucial part of electric grids due to their important roles in renewable energy sources (RES) integration in energy systems. Cyber-secure operation of BESS in renewable energy systems is significant, since it is susceptible to cyber threats and its potential failure may result in economical and physical damage to both the BESS and the system. However, there is a lack of comprehensive study on the attack detection methods for industrial BESSs. This paper reviews the state-of-the-art work in the area of BESS cyber threats, investigates how to detect cyberattackes in the operation stage. We address the problem of enhancing the communication channels' integrity can by implementing blockchain in the design stage of BESS, combined with applying artificial intelligence (AI) and machine learning (ML) methods for false data injection attack (FDIA) detection in the BESS operation stage. The focus is on the application of ML and AI methods for FDIA detection on different system layers. Based on our analysis, data-driven approaches such as clustering and artificial-neutral-network-based state estimation (SE) forecast are recommended for the implementation in BESSs.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116659573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00033
Kengo Kuwana, K. Tei, Y. Fukazawa, S. Honiden
Discrete controller synthesis is a method that involves using game theory to automatically generate a controller for achieving a system goal. This method is used in artificial intelligence for planning self-adaptive systems, in which it is necessary to shorten the time taken to generate a plan. Discrete controller synthesis generates a controller from an environment model and requirement model. The environment model represents the behavior of the system’s external environment as a finite state machine and is often constructed by parallel composition, which causes a state explosion. As a result, a controller cannot be synthesized within a realistic amount of memory or time. An on-the-fly method called directed controller synthesis (DCS) was developed by Daniel Ciolek. DCS partially expands and checks the environment model during exploration to avoid the state explosion caused by parallel composition. DCS uses a best-first search algorithm and has open lists, which drastically increases the size of the open list when searching for a large-scale problem and lowers search efficiency. Therefore, we propose a method of applying the df-pn algorithm, which is used when playing shogi (Japanese chess) on a computer, particularly tsume-shogi (a type of shogi problem). This algorithm is an iterative deepening depth-first search algorithm that does not have an open list but uses a hash table to store search history. Through experiments comparing our method with DCS, we were able to attain faster controller synthesis with our method than with DCS for large-scale problems.
{"title":"Method of Applying Df-pn Algorithm to On-the-fly Controller Synthesis","authors":"Kengo Kuwana, K. Tei, Y. Fukazawa, S. Honiden","doi":"10.1109/AIKE48582.2020.00033","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00033","url":null,"abstract":"Discrete controller synthesis is a method that involves using game theory to automatically generate a controller for achieving a system goal. This method is used in artificial intelligence for planning self-adaptive systems, in which it is necessary to shorten the time taken to generate a plan. Discrete controller synthesis generates a controller from an environment model and requirement model. The environment model represents the behavior of the system’s external environment as a finite state machine and is often constructed by parallel composition, which causes a state explosion. As a result, a controller cannot be synthesized within a realistic amount of memory or time. An on-the-fly method called directed controller synthesis (DCS) was developed by Daniel Ciolek. DCS partially expands and checks the environment model during exploration to avoid the state explosion caused by parallel composition. DCS uses a best-first search algorithm and has open lists, which drastically increases the size of the open list when searching for a large-scale problem and lowers search efficiency. Therefore, we propose a method of applying the df-pn algorithm, which is used when playing shogi (Japanese chess) on a computer, particularly tsume-shogi (a type of shogi problem). This algorithm is an iterative deepening depth-first search algorithm that does not have an open list but uses a hash table to store search history. Through experiments comparing our method with DCS, we were able to attain faster controller synthesis with our method than with DCS for large-scale problems.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00025
T. Koçak, Cagkan Ciloglu
Frames provided by camera on mobile devices may be distorted because of camera defects and/or weather conditions such as rain and snow. These distortions affect image classifiers. This paper proposes using deep-learning architectures to restore quality distortions in real-time mobile video for image classifiers. An iOS based app is developed using CoreML to show that deep convolutional auto-encoder (CAE) based methods can be used to restore picture quality.
{"title":"Real-time Restoration of Quality Distortions in Mobile Images using Deep Learning","authors":"T. Koçak, Cagkan Ciloglu","doi":"10.1109/AIKE48582.2020.00025","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00025","url":null,"abstract":"Frames provided by camera on mobile devices may be distorted because of camera defects and/or weather conditions such as rain and snow. These distortions affect image classifiers. This paper proposes using deep-learning architectures to restore quality distortions in real-time mobile video for image classifiers. An iOS based app is developed using CoreML to show that deep convolutional auto-encoder (CAE) based methods can be used to restore picture quality.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124265804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00021
Jihyeon Park, Munyeong Kang, Seong-je Cho, Hyoil Han, Kyoungwon Suh
With the increasing popularity of the Android platform, we have seen the rapid growth of malicious Android applications recently. Considering that the heavy use of applications on mobile phones such as games, emails, and social network services has become a crucial part of our daily life, we have become more vulnerable to malicious applications running on mobile devices. To alleviate this hostile environment of Android mobile applications, we propose a malware detection approach that (1) extracts both built-in permissions and custom permissions requested by Android apps from their Manifest.xml and (2) applies the permissions and a Random Forest classifier to Android applications for classifying them into benign and malicious. The Random Forest classifier learns a model using the permissions to classify the input dataset of 45,311 Android applications. In the learned model, an optimal subset of permissions has been identified and then using the subset of permissions we could achieve 94.23% accuracy in detecting malware.
{"title":"Analysis of Permission Selection Techniques in Machine Learning-based Malicious App Detection","authors":"Jihyeon Park, Munyeong Kang, Seong-je Cho, Hyoil Han, Kyoungwon Suh","doi":"10.1109/AIKE48582.2020.00021","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00021","url":null,"abstract":"With the increasing popularity of the Android platform, we have seen the rapid growth of malicious Android applications recently. Considering that the heavy use of applications on mobile phones such as games, emails, and social network services has become a crucial part of our daily life, we have become more vulnerable to malicious applications running on mobile devices. To alleviate this hostile environment of Android mobile applications, we propose a malware detection approach that (1) extracts both built-in permissions and custom permissions requested by Android apps from their Manifest.xml and (2) applies the permissions and a Random Forest classifier to Android applications for classifying them into benign and malicious. The Random Forest classifier learns a model using the permissions to classify the input dataset of 45,311 Android applications. In the learned model, an optimal subset of permissions has been identified and then using the subset of permissions we could achieve 94.23% accuracy in detecting malware.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114563971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00029
Negar Tajziyehchi, Mohammad Moshirpour, George Jergeas, F. Sadeghpour
The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns. To solve this issue, it is essential to identify the key factors that contribute to project cost growth. The data provided by the Construction Owners Association of Alberta (COAA) and the Construction Industry Institute (CII) was used in this study. This data shows that Alberta’s average cost growth is much higher than similar projects in the United States, and it is therefore desirable to improve Alberta’s project performance. There are 139 samples for Alberta projects, and the nature of the data is high dimensional, making it difficult to extract useful information from the data for cost growth prediction. The use of dimensionality reduction techniques, such as feature selection, contribute to identifying the most important features that impact cost growth. This study identified 16 out of 281 significant features, selected in two steps. Initially, 21 features were selected by LASSO. The R2 score and RMSE are calculated for five different models in three train and test split models. Random forest had the highest score, using more than 80 percent of the data for training. The permutation importance of each feature is calculated using random forest, and 16 variables are extracted. These features are applied as an input for five machine learning algorithms to evaluate the variables’ predictive ability.
{"title":"A Predictive Model of Cost Growth in Construction Projects Using Feature Selection","authors":"Negar Tajziyehchi, Mohammad Moshirpour, George Jergeas, F. Sadeghpour","doi":"10.1109/AIKE48582.2020.00029","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00029","url":null,"abstract":"The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns. To solve this issue, it is essential to identify the key factors that contribute to project cost growth. The data provided by the Construction Owners Association of Alberta (COAA) and the Construction Industry Institute (CII) was used in this study. This data shows that Alberta’s average cost growth is much higher than similar projects in the United States, and it is therefore desirable to improve Alberta’s project performance. There are 139 samples for Alberta projects, and the nature of the data is high dimensional, making it difficult to extract useful information from the data for cost growth prediction. The use of dimensionality reduction techniques, such as feature selection, contribute to identifying the most important features that impact cost growth. This study identified 16 out of 281 significant features, selected in two steps. Initially, 21 features were selected by LASSO. The R2 score and RMSE are calculated for five different models in three train and test split models. Random forest had the highest score, using more than 80 percent of the data for training. The permutation importance of each feature is calculated using random forest, and 16 variables are extracted. These features are applied as an input for five machine learning algorithms to evaluate the variables’ predictive ability.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133840198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00012
Hadi Ghahremannezhad, Hang Shi, Chengjun Liu
Real-time intelligent video-based traffic surveillance applications play an important role in intelligent transportation systems. To reduce false alarms as well as to increase computational efficiency, robust road segmentation for automated Region of Interest (RoI) detection becomes a popular focus in the research community. A novel Adaptive Bidirectional Detection (ABD) of region-of-interest method is presented in this paper to automatically segment the roads with bidirectional traffic flows into two regions of interest. Specifically, a foreground segmentation method is first applied along with the flood-fill algorithm to estimate the road regions. Then the Lucas-Kanade’s optical flow algorithm is utilized to track and divide the estimated road into regions of interest in real-time. Experimental results using a dataset of real traffic videos illustrate the feasibility of the proposed method for automatically determining the RoIs in real-time.
{"title":"A New Adaptive Bidirectional Region-of-Interest Detection Method for Intelligent Traffic Video Analysis","authors":"Hadi Ghahremannezhad, Hang Shi, Chengjun Liu","doi":"10.1109/AIKE48582.2020.00012","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00012","url":null,"abstract":"Real-time intelligent video-based traffic surveillance applications play an important role in intelligent transportation systems. To reduce false alarms as well as to increase computational efficiency, robust road segmentation for automated Region of Interest (RoI) detection becomes a popular focus in the research community. A novel Adaptive Bidirectional Detection (ABD) of region-of-interest method is presented in this paper to automatically segment the roads with bidirectional traffic flows into two regions of interest. Specifically, a foreground segmentation method is first applied along with the flood-fill algorithm to estimate the road regions. Then the Lucas-Kanade’s optical flow algorithm is utilized to track and divide the estimated road into regions of interest in real-time. Experimental results using a dataset of real traffic videos illustrate the feasibility of the proposed method for automatically determining the RoIs in real-time.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"16 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120815064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00011
J. S. Junior, J. Paulo, Jérôme Mendes, D. Alves, L. Ribeiro
Wildfire Decision Support Systems are critical tools for civil protection authorities in the management of all wildfire stages, including prevention. To timely act and apply the necessary preventive measures to reduce the fire danger in wildfires, many proposed calibration studies of the Canadian Forest Fire Weather Index System (CFFWIS) have been performed mainly based on techniques that still depend on manual and empirical analysis, being limited to exploiting a few regions. This paper proposes a methodology for automatic calibration of the CFFWIS to obtain a fire danger measurement that best suits the specific characteristics of a given region. The proposed methodology, applied to 769 regions from Europe, is based on the k-means clustering technique to automatically identify patterns in the data sets composed of elements of the CFFWIS and wildfire records. The results of the automatic calibration of the CFFWIS on each of the 769 regions reinforce the versatility of the proposed methodology, which can be adapted to different regions.
{"title":"Automatic Calibration of Forest Fire Weather Index For Independent Customizable Regions Based on Historical Records","authors":"J. S. Junior, J. Paulo, Jérôme Mendes, D. Alves, L. Ribeiro","doi":"10.1109/AIKE48582.2020.00011","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00011","url":null,"abstract":"Wildfire Decision Support Systems are critical tools for civil protection authorities in the management of all wildfire stages, including prevention. To timely act and apply the necessary preventive measures to reduce the fire danger in wildfires, many proposed calibration studies of the Canadian Forest Fire Weather Index System (CFFWIS) have been performed mainly based on techniques that still depend on manual and empirical analysis, being limited to exploiting a few regions. This paper proposes a methodology for automatic calibration of the CFFWIS to obtain a fire danger measurement that best suits the specific characteristics of a given region. The proposed methodology, applied to 769 regions from Europe, is based on the k-means clustering technique to automatically identify patterns in the data sets composed of elements of the CFFWIS and wildfire records. The results of the automatic calibration of the CFFWIS on each of the 769 regions reinforce the versatility of the proposed methodology, which can be adapted to different regions.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121949956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/AIKE48582.2020.00023
Javier Pastorino, A. Biswas
Machine learning (ML) algorithms are data-driven and given a goal task and a prior experience dataset relevant to the task, one can attempt to solve the task using ML seeking to achieve high accuracy. There is usually a big gap in the understanding between an ML experts and the dataset providers due to limited expertise in cross disciplines. Narrowing down a suitable set of problems to solve using ML is possibly the most ambiguous yet important agenda for data providers to consider before initiating collaborations with ML experts. We proposed an ML-fueled pipeline to identify potential problems (i.e., the tasks) so data providers can, with ease, explore potential problem areas to investigate with ML. The autonomous pipeline integrates information theory and graph-based unsupervised learning paradigms in order to generate a ranked retrieval of top-k problems for the given dataset for a successful ML based collaboration. We conducted experiments on diverse real-world and well-known datasets, and from a supervised learning standpoint, the proposed pipeline achieved 72% top-5 task retrieval accuracy on an average, which surpasses the retrieval performance for the same paradigm using the popular exploratory data analysis tools. Detailed experiment results with our source codes are available at: https://github.com/jpastorino/heyml.
{"title":"Hey ML, what can you do for me?","authors":"Javier Pastorino, A. Biswas","doi":"10.1109/AIKE48582.2020.00023","DOIUrl":"https://doi.org/10.1109/AIKE48582.2020.00023","url":null,"abstract":"Machine learning (ML) algorithms are data-driven and given a goal task and a prior experience dataset relevant to the task, one can attempt to solve the task using ML seeking to achieve high accuracy. There is usually a big gap in the understanding between an ML experts and the dataset providers due to limited expertise in cross disciplines. Narrowing down a suitable set of problems to solve using ML is possibly the most ambiguous yet important agenda for data providers to consider before initiating collaborations with ML experts. We proposed an ML-fueled pipeline to identify potential problems (i.e., the tasks) so data providers can, with ease, explore potential problem areas to investigate with ML. The autonomous pipeline integrates information theory and graph-based unsupervised learning paradigms in order to generate a ranked retrieval of top-k problems for the given dataset for a successful ML based collaboration. We conducted experiments on diverse real-world and well-known datasets, and from a supervised learning standpoint, the proposed pipeline achieved 72% top-5 task retrieval accuracy on an average, which surpasses the retrieval performance for the same paradigm using the popular exploratory data analysis tools. Detailed experiment results with our source codes are available at: https://github.com/jpastorino/heyml.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"1074 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132882704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}