Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953781
K. H. Nguyen, Dat Cong Dinh, Hang Le, Dinh Dien
Cross-lingual Semantic Textual Similarity (STS) is a challenging problem in Natural Language Understanding tasks, especially for low-resource languages like Vietnamese. Currently, one of the state-of-the-art approaches for this problem is to use distilled multilingual Sentence Transformer model. However, there are few studies on how these models work for English-Vietnamese language pairs. In this paper, we aim to inspect the performance of these models in the English-Vietnamese STS tasks. From our findings, we will propose possible improvements for this approach in the future.
{"title":"English-Vietnamese Cross-lingual Semantic Textual Similarity using Sentence Transformer model","authors":"K. H. Nguyen, Dat Cong Dinh, Hang Le, Dinh Dien","doi":"10.1109/KSE56063.2022.9953781","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953781","url":null,"abstract":"Cross-lingual Semantic Textual Similarity (STS) is a challenging problem in Natural Language Understanding tasks, especially for low-resource languages like Vietnamese. Currently, one of the state-of-the-art approaches for this problem is to use distilled multilingual Sentence Transformer model. However, there are few studies on how these models work for English-Vietnamese language pairs. In this paper, we aim to inspect the performance of these models in the English-Vietnamese STS tasks. From our findings, we will propose possible improvements for this approach in the future.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133017253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953755
Masaya Taniguchi, S. Tojo
The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generative power of the obtained grammar. In this phase, the syntactic head is a significant feature, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve it using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.
{"title":"Losing a Head in Grammar Extraction","authors":"Masaya Taniguchi, S. Tojo","doi":"10.1109/KSE56063.2022.9953755","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953755","url":null,"abstract":"The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generative power of the obtained grammar. In this phase, the syntactic head is a significant feature, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve it using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130263189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953774
Ngan Ha Duong, Thuy Anh Ta
In this paper, we study a facility cost optimization problem in a competitive market. Our objective is to distribute an available budget to some newly opened facilities to maximize an expected captured customer demand, assuming that customers will select a facility to visit according to a random utility maximization model. In this work, given the fact that the objective function of this problem is highly non-convex and challenging to solve exactly, we propose a technique to approximate the objective function by piece-wise linear functions, making it possible to reformulate the problem as a mixed-integer linear or conic program, which can further be solved by a commercial solver such as CPLEX. We also explore an outer-approximation algorithm to solve the approximate problem. Computational results are provided to demonstrate the performances of our approaches.
{"title":"Approximation Methods for a Nonlinear Competitive Facility Cost Optimization Problem","authors":"Ngan Ha Duong, Thuy Anh Ta","doi":"10.1109/KSE56063.2022.9953774","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953774","url":null,"abstract":"In this paper, we study a facility cost optimization problem in a competitive market. Our objective is to distribute an available budget to some newly opened facilities to maximize an expected captured customer demand, assuming that customers will select a facility to visit according to a random utility maximization model. In this work, given the fact that the objective function of this problem is highly non-convex and challenging to solve exactly, we propose a technique to approximate the objective function by piece-wise linear functions, making it possible to reformulate the problem as a mixed-integer linear or conic program, which can further be solved by a commercial solver such as CPLEX. We also explore an outer-approximation algorithm to solve the approximate problem. Computational results are provided to demonstrate the performances of our approaches.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124830503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953778
Hoang Giang Pham
In many real-world applications, cost factors play a significant role. Costs have been taken into consideration in numerous previous studies in machine learning, especially, in building decision trees. This research also considers a cost-sensitive decision tree construction problem with an assumption that test costs must be paid to obtain the values of the decision attribute and a record must be classified without exceeding the spending cost threshold. Moreover, our problem considers records with multiple condition attributes. We construct a cost-constrained decision tree using a Mixed-Integer formulation, which enables us to identify the optimal trees. The experimental results demonstrate that our formulation satisfactorily handles small data sets with multiple condition attributes under different cost constraints.
{"title":"The Mixed-Integer Linear Programming for Cost-Constrained Decision Trees with Multiple Condition Attributes","authors":"Hoang Giang Pham","doi":"10.1109/KSE56063.2022.9953778","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953778","url":null,"abstract":"In many real-world applications, cost factors play a significant role. Costs have been taken into consideration in numerous previous studies in machine learning, especially, in building decision trees. This research also considers a cost-sensitive decision tree construction problem with an assumption that test costs must be paid to obtain the values of the decision attribute and a record must be classified without exceeding the spending cost threshold. Moreover, our problem considers records with multiple condition attributes. We construct a cost-constrained decision tree using a Mixed-Integer formulation, which enables us to identify the optimal trees. The experimental results demonstrate that our formulation satisfactorily handles small data sets with multiple condition attributes under different cost constraints.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953614
Vy Giang Thao, Huynh Thi Khanh Chi, N. Hop
Kanban system is a tool of the Just-in-time approach that many automotive companies have been adopted to improve their in-house operations. However, the traditional Kanban system has experienced some disadvantages in meeting the performance targets in terms of inventory, delivery, quality, and cost. In this paper, we propose a new combination approach of Blockchain and Electronic Kanban (E-Kanban) system to improve the traditional Kanban system for pull leveling pattern associated with the parallel information system. The proposed system is simulated to validate the feasible solutions for continuous improvement purpose. A real case of a leading automotive company in Vietnam is investigated to illustrate the proposed system.
{"title":"Combining Blockchain with an E-Kanban System for Pull Leveling of an Assembly Line","authors":"Vy Giang Thao, Huynh Thi Khanh Chi, N. Hop","doi":"10.1109/KSE56063.2022.9953614","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953614","url":null,"abstract":"Kanban system is a tool of the Just-in-time approach that many automotive companies have been adopted to improve their in-house operations. However, the traditional Kanban system has experienced some disadvantages in meeting the performance targets in terms of inventory, delivery, quality, and cost. In this paper, we propose a new combination approach of Blockchain and Electronic Kanban (E-Kanban) system to improve the traditional Kanban system for pull leveling pattern associated with the parallel information system. The proposed system is simulated to validate the feasible solutions for continuous improvement purpose. A real case of a leading automotive company in Vietnam is investigated to illustrate the proposed system.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953771
Quoc-Dai Luong Tran, Anh-Cuong Le, V. Huynh
Conversational agents are getting more popular and applied in a wide range of practical application areas. The main task of these agents is not only to generate context-appropriate responses to a given query but also to make the conversation human-like. Thanks to the ability of deep learning based models in natural language modeling, recent studies have made progress in designing conversational agents that can provide more semantically accurate responses. However, the naturalness in such conversation setting has not been given adequate attention in these studies. This paper aims to incorporate both important criteria of accuracy and naturalness of conversation in developing a new model for conversational agents. To this end, inspired by the idea of Turing test and the idea of adversarial learning strategy, we propose to design a model based on generative deep neural networks that interestingly allow to generate accurate responses optimized by the mechanics of imitating human-generated conversations. Experimental results demonstrate that the proposed models produce more natural and accurate responses, yielding significant gains in BLEU scores.
{"title":"Towards a Human-like Chatbot using Deep Adversarial Learning","authors":"Quoc-Dai Luong Tran, Anh-Cuong Le, V. Huynh","doi":"10.1109/KSE56063.2022.9953771","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953771","url":null,"abstract":"Conversational agents are getting more popular and applied in a wide range of practical application areas. The main task of these agents is not only to generate context-appropriate responses to a given query but also to make the conversation human-like. Thanks to the ability of deep learning based models in natural language modeling, recent studies have made progress in designing conversational agents that can provide more semantically accurate responses. However, the naturalness in such conversation setting has not been given adequate attention in these studies. This paper aims to incorporate both important criteria of accuracy and naturalness of conversation in developing a new model for conversational agents. To this end, inspired by the idea of Turing test and the idea of adversarial learning strategy, we propose to design a model based on generative deep neural networks that interestingly allow to generate accurate responses optimized by the mechanics of imitating human-generated conversations. Experimental results demonstrate that the proposed models produce more natural and accurate responses, yielding significant gains in BLEU scores.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128255890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953773
Tien Dung Huynh, Quoc Tuan Vu, Viet-Dung Nguyen, D. T. Hoang
The approximation technique in MPBoot effectively addresses the problem of maximum parsimony phylogenetic bootstrapping, an essential task in bioinformatics with diverse applications in evolutionary biology. In this paper, we investigate integrating the tree bisection and reconnection (TBR) rearrangement to MPBoot to increase its sampling performance in the search space, and we describe the MPBoot-TBR algorithm. Since the size of the TBR neighborhood is cubic in the number of taxa, we offer algorithmic strategies for swiftly evaluating a TBR move, searching quickly in the neighborhood of a specified remove-branch, and hill-climbing using TBR. Furthermore, the framework’s stopping condition is adjusted because compared to the subtree pruning and regrafting, TBR requires fewer search iterations to converge to an acceptable MP score. In terms of bootstrap accuracy, MPBoot-TBR is comparable to MPBoot. In terms of MP score and computation time on real datasets, MPBoot-TBR outperforms the original MPBoot. We have implemented the proposed methods in the MPBoot-TBR program, the source code of which is accessible at https: //github.com/HynDuf7/mpboot/tree/Huynh_Tien_Dung.
{"title":"Employing tree bisection and reconnection rearrangement for parsimony inference in MPBoot","authors":"Tien Dung Huynh, Quoc Tuan Vu, Viet-Dung Nguyen, D. T. Hoang","doi":"10.1109/KSE56063.2022.9953773","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953773","url":null,"abstract":"The approximation technique in MPBoot effectively addresses the problem of maximum parsimony phylogenetic bootstrapping, an essential task in bioinformatics with diverse applications in evolutionary biology. In this paper, we investigate integrating the tree bisection and reconnection (TBR) rearrangement to MPBoot to increase its sampling performance in the search space, and we describe the MPBoot-TBR algorithm. Since the size of the TBR neighborhood is cubic in the number of taxa, we offer algorithmic strategies for swiftly evaluating a TBR move, searching quickly in the neighborhood of a specified remove-branch, and hill-climbing using TBR. Furthermore, the framework’s stopping condition is adjusted because compared to the subtree pruning and regrafting, TBR requires fewer search iterations to converge to an acceptable MP score. In terms of bootstrap accuracy, MPBoot-TBR is comparable to MPBoot. In terms of MP score and computation time on real datasets, MPBoot-TBR outperforms the original MPBoot. We have implemented the proposed methods in the MPBoot-TBR program, the source code of which is accessible at https: //github.com/HynDuf7/mpboot/tree/Huynh_Tien_Dung.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133637332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953770
Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh
Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${mathrm {PhoBERT}}_{basmathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.
{"title":"ATPM-REAP: A Simple and Efficient Address Tracking and Parsing for Vietnamese Real Estate Advertisement Posts","authors":"Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh","doi":"10.1109/KSE56063.2022.9953770","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953770","url":null,"abstract":"Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${mathrm {PhoBERT}}_{basmathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953751
H. Nguyen, Nam Q. Tran, H. Le
According to the World Health organization (WHO), tuberculosis (TB) is the top disease deadly worldwide, especially in developing and underdeveloped countries, due to poverty and limited health resources. Early screening for TB is a highly urgent task because of the severe effects on patient health and the rapid spread of the disease. Among the methods of diagnosing tuberculosis, chest X-ray images are often used as resources for clinical diagnosis because of their convenience and optimal cost. Currently, research on Computer-Aided Diagnosis (CAD) systems uses machine learning to provide doctors with diagnostic, analytical, and disease-monitoring techniques. Graph neural networks (GNN) have recently emerged as a research trend; works using GNN achieve perfect accuracy in many fields. In this paper, a study is presented on a solution to automatically diagnose tuberculosis on X-ray images (CXR) using the graph neural network method. We classify the CRX dataset into two classes (TB and non-TB). We achieve encouraging results with the proposed model: accuracy 99.33%, recall 99.07%, precision 99.63%, f1-score 99.35%, AUC 99.97%.
{"title":"Diagnosing tuberculosis using graph neural network","authors":"H. Nguyen, Nam Q. Tran, H. Le","doi":"10.1109/KSE56063.2022.9953751","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953751","url":null,"abstract":"According to the World Health organization (WHO), tuberculosis (TB) is the top disease deadly worldwide, especially in developing and underdeveloped countries, due to poverty and limited health resources. Early screening for TB is a highly urgent task because of the severe effects on patient health and the rapid spread of the disease. Among the methods of diagnosing tuberculosis, chest X-ray images are often used as resources for clinical diagnosis because of their convenience and optimal cost. Currently, research on Computer-Aided Diagnosis (CAD) systems uses machine learning to provide doctors with diagnostic, analytical, and disease-monitoring techniques. Graph neural networks (GNN) have recently emerged as a research trend; works using GNN achieve perfect accuracy in many fields. In this paper, a study is presented on a solution to automatically diagnose tuberculosis on X-ray images (CXR) using the graph neural network method. We classify the CRX dataset into two classes (TB and non-TB). We achieve encouraging results with the proposed model: accuracy 99.33%, recall 99.07%, precision 99.63%, f1-score 99.35%, AUC 99.97%.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-19DOI: 10.1109/KSE56063.2022.9953619
L. D. Vu, V. Pham, M. Nguyen, Hai-Chau Le
Sepsis is known as a life-threading status, which relates closely to the responses of the human body to an infection inside the tissues and organs. Such a reaction results in the distortion of the organ function. In this work, a novel algorithm is proposed for the diagnosis of pediatric sepsis including a random forest model and a combination of 9 genes. The proposed algorithm is constructed carefully with a sequential gene selection procedure, which combines differential gene expression analysis and gene importance computed by the machine learning model to address the most informative differential gene expression. The cross-validation procedure in combination with different machine learning algorithms is adopted for the estimation of the diagnosis performance related to the gene combinations and machine learning models. The selected gene combinations are then tested separately using various machine learning methods. The validation results, which are accuracy of 91.79%, sensitivity of 57.33%, and specificity of 100%, show that the proposed algorithm is potential for practical application in the real clinic environment.
{"title":"Pediatric Sepsis Diagnosis Based on Differential Gene Expression and Machine Learning Method","authors":"L. D. Vu, V. Pham, M. Nguyen, Hai-Chau Le","doi":"10.1109/KSE56063.2022.9953619","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953619","url":null,"abstract":"Sepsis is known as a life-threading status, which relates closely to the responses of the human body to an infection inside the tissues and organs. Such a reaction results in the distortion of the organ function. In this work, a novel algorithm is proposed for the diagnosis of pediatric sepsis including a random forest model and a combination of 9 genes. The proposed algorithm is constructed carefully with a sequential gene selection procedure, which combines differential gene expression analysis and gene importance computed by the machine learning model to address the most informative differential gene expression. The cross-validation procedure in combination with different machine learning algorithms is adopted for the estimation of the diagnosis performance related to the gene combinations and machine learning models. The selected gene combinations are then tested separately using various machine learning methods. The validation results, which are accuracy of 91.79%, sensitivity of 57.33%, and specificity of 100%, show that the proposed algorithm is potential for practical application in the real clinic environment.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132764766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}