Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953613
Binh Dang, Dinh-Truong Do, Le-Minh Nguyen
Topic information has been helpful to direct semantics in text summarization. In this paper, we present a study on a novel and efficient method to incorporate the topic information with the BART model for abstractive summarization, called the tBART. The proposed model inherits the advantages of the BART, learns latent topics, and transfers the topic vector of tokens to context space by an align function. The experimental results illustrate the effectiveness of our proposed method, which significantly outperforms previous methods on two benchmark datasets: XSUM and CNN/DAILY MAIL.
{"title":"tBART: Abstractive summarization based on the joining of Topic modeling and BART","authors":"Binh Dang, Dinh-Truong Do, Le-Minh Nguyen","doi":"10.1109/KSE56063.2022.9953613","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953613","url":null,"abstract":"Topic information has been helpful to direct semantics in text summarization. In this paper, we present a study on a novel and efficient method to incorporate the topic information with the BART model for abstractive summarization, called the tBART. The proposed model inherits the advantages of the BART, learns latent topics, and transfers the topic vector of tokens to context space by an align function. The experimental results illustrate the effectiveness of our proposed method, which significantly outperforms previous methods on two benchmark datasets: XSUM and CNN/DAILY MAIL.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133836897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953795
Presents the copyright information for the conference. May include reprint permission information.
展示会议的版权信息。可能包括转载许可信息。
{"title":"Copyright Page","authors":"","doi":"10.1109/KSE56063.2022.9953795","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953795","url":null,"abstract":"Presents the copyright information for the conference. May include reprint permission information.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116891957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953757
Duc Tran, Ha Nguyen, Hung Nguyen, Tin Nguyen
Advances in single-cell RNA sequencing (scRNAseq) technologies have allowed us to study the heterogeneity of cell populations. The cell compositions of tissues from different hosts may vary greatly, indicating the condition of the hosts, from which the samples are collected. However, the high sequencing cost and the lack of fresh tissues make single-cell approaches less appealing. In many cases, it is practically impossible to generate single-cell data in a large number of subjects, making it challenging to monitor changes in cell type compositions in various diseases. Here we introduce a novel approach, named Deconvolution using Weighted Elastic Net (DWEN), that allows researchers to accurately estimate the cell type compositions from bulk data samples without the need of generating single-cell data. It also allows for the re-analysis of bulk data collected from rare conditions to extract more in-depth cell-type level insights. The approach consists of two modules. The first module constructs the cell type signature matrix from single-cell data while the second module estimates the cell type compositions of input bulk samples. In an extensive analysis using 20 datasets generated from scRNA-seq data of different human tissues, we demonstrate that DWEN outperforms current state-of-the-arts in estimating cell type compositions of bulk samples.
{"title":"DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples","authors":"Duc Tran, Ha Nguyen, Hung Nguyen, Tin Nguyen","doi":"10.1109/KSE56063.2022.9953757","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953757","url":null,"abstract":"Advances in single-cell RNA sequencing (scRNAseq) technologies have allowed us to study the heterogeneity of cell populations. The cell compositions of tissues from different hosts may vary greatly, indicating the condition of the hosts, from which the samples are collected. However, the high sequencing cost and the lack of fresh tissues make single-cell approaches less appealing. In many cases, it is practically impossible to generate single-cell data in a large number of subjects, making it challenging to monitor changes in cell type compositions in various diseases. Here we introduce a novel approach, named Deconvolution using Weighted Elastic Net (DWEN), that allows researchers to accurately estimate the cell type compositions from bulk data samples without the need of generating single-cell data. It also allows for the re-analysis of bulk data collected from rare conditions to extract more in-depth cell-type level insights. The approach consists of two modules. The first module constructs the cell type signature matrix from single-cell data while the second module estimates the cell type compositions of input bulk samples. In an extensive analysis using 20 datasets generated from scRNA-seq data of different human tissues, we demonstrate that DWEN outperforms current state-of-the-arts in estimating cell type compositions of bulk samples.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953794
Hong Phuong Le, Thi Thuy Linh Nguyen, Minh Tu Pham, Thanh Hai Vu
This paper presents a multilingual natural language understanding model which is based on BERT and ELECTRA neural networks. The model is pre-trained and fine-tuned on large datasets of four languages: Indonesian, Malaysian, Japanese and Vietnamese. Our fine-tuning method uses an attentional recurrent neural network instead of the common fine-tuning with linear layers. The proposed model is evaluated on several standard benchmark datasets, including intent classification, named entity recognition and sentiment analysis. For Indonesian and Malaysian, our model achieves the same or higher results compared to the existing state-of-the-art IndoNLU and Bahasa ELECTRA models for these languages. For Japanese, our model achieves promising results on sentiment analysis and two-layer named entity recognition. For Vietnamese, our model improves the performance of two sequence labeling tasks including part-of-speech tagging and named entity recognition compared to the state-of-the-art results. The model has been deployed as a core component of the commercial FPT.AI conversational platform, effectively serving many clients in the Indonesian, Malaysian, Japanese and Vietnamese markets–the platform has served 62 million API requests in the first five months of 2022 for chatbot services.11including requests deployed for on-premise contracts.
{"title":"Multilingual Natural Language Understanding for the FPT.AI Conversational Platform","authors":"Hong Phuong Le, Thi Thuy Linh Nguyen, Minh Tu Pham, Thanh Hai Vu","doi":"10.1109/KSE56063.2022.9953794","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953794","url":null,"abstract":"This paper presents a multilingual natural language understanding model which is based on BERT and ELECTRA neural networks. The model is pre-trained and fine-tuned on large datasets of four languages: Indonesian, Malaysian, Japanese and Vietnamese. Our fine-tuning method uses an attentional recurrent neural network instead of the common fine-tuning with linear layers. The proposed model is evaluated on several standard benchmark datasets, including intent classification, named entity recognition and sentiment analysis. For Indonesian and Malaysian, our model achieves the same or higher results compared to the existing state-of-the-art IndoNLU and Bahasa ELECTRA models for these languages. For Japanese, our model achieves promising results on sentiment analysis and two-layer named entity recognition. For Vietnamese, our model improves the performance of two sequence labeling tasks including part-of-speech tagging and named entity recognition compared to the state-of-the-art results. The model has been deployed as a core component of the commercial FPT.AI conversational platform, effectively serving many clients in the Indonesian, Malaysian, Japanese and Vietnamese markets–the platform has served 62 million API requests in the first five months of 2022 for chatbot services.11including requests deployed for on-premise contracts.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126840112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953747
Minh Tran Binh, Long Nguyen, D. N. Duc
Using improvement direction to control the evolution of multi-objective optimization algorithms is an interesting and effective method. Improvement direction techniques often evaluate the geometric properties of the solution set in the objective space and based on that to adjusting the evolutionary process to ensure it is capable of exploration and exploitation. The direction of improvement is usually determined based on the convergent and diverse nature of the solution population, in fact, the distribution of the solution population can suggest an online adjustment of the evolutionary process to overcome the problem of keeping the balance between convergence and diversity. In this study, we identify empty regions in the solution population and use the centers of those areas, which we call bliss points, to direct and adjust the algorithms which use improvement direction to enhance the quality of the algorithms. Experimental results have shown competitive results, promising to apply to multi-objective evolutionary algorithms using other geometric techniques.
{"title":"Using bliss points to enhance direction based multi-objective algorithms","authors":"Minh Tran Binh, Long Nguyen, D. N. Duc","doi":"10.1109/KSE56063.2022.9953747","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953747","url":null,"abstract":"Using improvement direction to control the evolution of multi-objective optimization algorithms is an interesting and effective method. Improvement direction techniques often evaluate the geometric properties of the solution set in the objective space and based on that to adjusting the evolutionary process to ensure it is capable of exploration and exploitation. The direction of improvement is usually determined based on the convergent and diverse nature of the solution population, in fact, the distribution of the solution population can suggest an online adjustment of the evolutionary process to overcome the problem of keeping the balance between convergence and diversity. In this study, we identify empty regions in the solution population and use the centers of those areas, which we call bliss points, to direct and adjust the algorithms which use improvement direction to enhance the quality of the algorithms. Experimental results have shown competitive results, promising to apply to multi-objective evolutionary algorithms using other geometric techniques.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128352746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953772
Hoang-Viet Tran, Pham Ngoc Hung
Concolic testing has been well-known among software quality assurance methods thanks to its fully automated capability of generating test data, executing them, and producing code coverage reports. This paper presents an improved method named ISDART for SDART, which is one of the most recent advanced methods based on concolic testing, to increase its performance. The key idea of the proposed method is to remove the waste time on generating and executing random test data which do not increase the code coverage. Initially, ISDART generates random test data only once. Then, with the code coverage information retrieved from the randomly generated test data, ISDART explores an uncovered test path, transforms them to test path constraints, solves those constraints, and generates a new test data from the resulting solution. The process is repeated until no uncovered test path can be found. We have implemented both SDART and ISDART and performed experiments with some common unit functions. The experimental results show that ISDART outperforms SDART in terms of speed for the whole testing process whilst reducing the number of generated test data.
{"title":"An Improved Method of The Static Directed Automated Random Testing Method in Test Data Generation for C/C++ Projects","authors":"Hoang-Viet Tran, Pham Ngoc Hung","doi":"10.1109/KSE56063.2022.9953772","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953772","url":null,"abstract":"Concolic testing has been well-known among software quality assurance methods thanks to its fully automated capability of generating test data, executing them, and producing code coverage reports. This paper presents an improved method named ISDART for SDART, which is one of the most recent advanced methods based on concolic testing, to increase its performance. The key idea of the proposed method is to remove the waste time on generating and executing random test data which do not increase the code coverage. Initially, ISDART generates random test data only once. Then, with the code coverage information retrieved from the randomly generated test data, ISDART explores an uncovered test path, transforms them to test path constraints, solves those constraints, and generates a new test data from the resulting solution. The process is repeated until no uncovered test path can be found. We have implemented both SDART and ISDART and performed experiments with some common unit functions. The experimental results show that ISDART outperforms SDART in terms of speed for the whole testing process whilst reducing the number of generated test data.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"61 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134226652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953793
T. Dam, Thuy Anh Ta
This work concerns a stochastic fractional 0-1 program whose coefficients are assumed to be random and follow a given distribution. To solve such a problem, one would need to sample over the randomness of the coefficients. However, in many situations, the sample size would be limited, which makes it difficult for existing approaches (e.g, the sample average approximation approach) to give good solutions. To deal with this issue, we explore a distributionally robust optimization version (DRO) of the fractional problem. We show that the DRO can be reformulated as an equivalent variance regularization version and can be further transformed into a mixed-integer second order cone program (MISOCP), for which an off-the-shelf solver (i.e., CPLEX) can handle. We, then, perform computational results comparing our robust method against the conventional sample average approximation (SAA), using synthetic instances. Our results show that our approach is more effective than the SAA approach in protecting the decision-maker against bad scenarios.
{"title":"Distributionally Robust Fractional 0-1 Programming","authors":"T. Dam, Thuy Anh Ta","doi":"10.1109/KSE56063.2022.9953793","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953793","url":null,"abstract":"This work concerns a stochastic fractional 0-1 program whose coefficients are assumed to be random and follow a given distribution. To solve such a problem, one would need to sample over the randomness of the coefficients. However, in many situations, the sample size would be limited, which makes it difficult for existing approaches (e.g, the sample average approximation approach) to give good solutions. To deal with this issue, we explore a distributionally robust optimization version (DRO) of the fractional problem. We show that the DRO can be reformulated as an equivalent variance regularization version and can be further transformed into a mixed-integer second order cone program (MISOCP), for which an off-the-shelf solver (i.e., CPLEX) can handle. We, then, perform computational results comparing our robust method against the conventional sample average approximation (SAA), using synthetic instances. Our results show that our approach is more effective than the SAA approach in protecting the decision-maker against bad scenarios.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130247305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953782
L. Tran, Binh Van Duong, Binh T. Nguyen
The fast growth of e-commerce markets helps companies bring their products closer to customers and lets users have many choices for online shopping. However, it causes the need to have a proper strategy to keep customers in every company. As a rising solution, sentiment analysis on users’ feedback using artificial intelligence is a timely-fashioned way for business owners to understand their customers and clients, which could help them improve their business against competitors. Therefore, in the scope of our research, we introduce our results on the task of customers’ review sentiment analysis using the dataset provided in the Fashion and Beauty Review Rating (one competition organized in Kaggle), where our solution reached first place with a score of 0.51269 RMSE. Our proposed solution combines deep learning models (Bidirectional Long Short-term Memory, Bidirectional Gated Recurrent Unit, Convolutional Neural Network) and a rule-based method (a method that uses linguistic rules to predict the rating of reviews). We can describe the solution in this paper with the support of analysis techniques to give more insightful points.
电子商务市场的快速发展有助于企业将产品更贴近消费者,并让用户在网上购物时有更多选择。然而,它导致需要有一个适当的战略,以保持客户在每个公司。作为一种新兴的解决方案,利用人工智能对用户的反馈进行情绪分析,是企业主了解客户和客户的一种及时的方式,可以帮助他们在竞争中提高业务水平。因此,在我们的研究范围内,我们使用Fashion and Beauty review Rating(在Kaggle组织的一场比赛)中提供的数据集介绍了我们在客户评论情感分析任务上的结果,我们的解决方案以0.51269 RMSE的分数获得了第一名。我们提出的解决方案结合了深度学习模型(双向长短期记忆、双向门控循环单元、卷积神经网络)和基于规则的方法(一种使用语言规则来预测评论评级的方法)。我们可以在分析技术的支持下描述本文的解决方案,以给出更有见地的观点。
{"title":"Sentiment Classification for Beauty-fashion Reviews","authors":"L. Tran, Binh Van Duong, Binh T. Nguyen","doi":"10.1109/KSE56063.2022.9953782","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953782","url":null,"abstract":"The fast growth of e-commerce markets helps companies bring their products closer to customers and lets users have many choices for online shopping. However, it causes the need to have a proper strategy to keep customers in every company. As a rising solution, sentiment analysis on users’ feedback using artificial intelligence is a timely-fashioned way for business owners to understand their customers and clients, which could help them improve their business against competitors. Therefore, in the scope of our research, we introduce our results on the task of customers’ review sentiment analysis using the dataset provided in the Fashion and Beauty Review Rating (one competition organized in Kaggle), where our solution reached first place with a score of 0.51269 RMSE. Our proposed solution combines deep learning models (Bidirectional Long Short-term Memory, Bidirectional Gated Recurrent Unit, Convolutional Neural Network) and a rule-based method (a method that uses linguistic rules to predict the rating of reviews). We can describe the solution in this paper with the support of analysis techniques to give more insightful points.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114983392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953765
Yuhui Yang, Koichi Ota, Wen Gu, S. Hasegawa
This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.
{"title":"Automatic Region of Interest Prediction from Instructor’s Behaviors in Lecture Archives","authors":"Yuhui Yang, Koichi Ota, Wen Gu, S. Hasegawa","doi":"10.1109/KSE56063.2022.9953765","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953765","url":null,"abstract":"This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134101664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953624
Joe Hrzich, Gunjan Basra, Talal Halabi
Machine Learning (ML)-based intelligent services are gradually becoming the leading service design and delivery model in edge computing, where user and device data is outsourced to take part of large-scale BigData analytics. This paradigm however entails challenging security and privacy concerns, which require rethinking the fundamental concepts behind performing ML. For instance, the encryption of sensitive data provides a straightforward solution that ensures data security and privacy. In particular, Homomorphic encryption allows arbitrary computation on encrypted data and has gained a lot of attention recently. However, it has not been fully adopted by edge computing-based ML due to its potential impact on classification accuracy and model performance. This paper conducts an experimental evaluation of different types of Homomorphic encryption techniques, namely, Partial, Somewhat, and Fully Homomorphic encryption over several ML models, which train on encrypted data and produce classification predictions based on encrypted input data. The results demonstrate two potential directions in the context of ML privacy at the network edge: privacy-preserving training and privacy-preserving classification. The performance of encryption-driven ML models is compared using different metrics such as accuracy and computation time for plaintext vs. encrypted text. This evaluation will guide future research in investigating which ML models perform better over encrypted data.
{"title":"Experimental Evaluation of Homomorphic Encryption in Cloud and Edge Machine Learning","authors":"Joe Hrzich, Gunjan Basra, Talal Halabi","doi":"10.1109/KSE56063.2022.9953624","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953624","url":null,"abstract":"Machine Learning (ML)-based intelligent services are gradually becoming the leading service design and delivery model in edge computing, where user and device data is outsourced to take part of large-scale BigData analytics. This paradigm however entails challenging security and privacy concerns, which require rethinking the fundamental concepts behind performing ML. For instance, the encryption of sensitive data provides a straightforward solution that ensures data security and privacy. In particular, Homomorphic encryption allows arbitrary computation on encrypted data and has gained a lot of attention recently. However, it has not been fully adopted by edge computing-based ML due to its potential impact on classification accuracy and model performance. This paper conducts an experimental evaluation of different types of Homomorphic encryption techniques, namely, Partial, Somewhat, and Fully Homomorphic encryption over several ML models, which train on encrypted data and produce classification predictions based on encrypted input data. The results demonstrate two potential directions in the context of ML privacy at the network edge: privacy-preserving training and privacy-preserving classification. The performance of encryption-driven ML models is compared using different metrics such as accuracy and computation time for plaintext vs. encrypted text. This evaluation will guide future research in investigating which ML models perform better over encrypted data.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133821608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}