首页 > 最新文献

2022 14th International Conference on Knowledge and Systems Engineering (KSE)最新文献

英文 中文
English-Vietnamese Cross-lingual Semantic Textual Similarity using Sentence Transformer model 基于句子转换模型的英越语跨语言语义文本相似度研究
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953781
K. H. Nguyen, Dat Cong Dinh, Hang Le, Dinh Dien
Cross-lingual Semantic Textual Similarity (STS) is a challenging problem in Natural Language Understanding tasks, especially for low-resource languages like Vietnamese. Currently, one of the state-of-the-art approaches for this problem is to use distilled multilingual Sentence Transformer model. However, there are few studies on how these models work for English-Vietnamese language pairs. In this paper, we aim to inspect the performance of these models in the English-Vietnamese STS tasks. From our findings, we will propose possible improvements for this approach in the future.
跨语言语义文本相似度(STS)是自然语言理解任务中的一个具有挑战性的问题,特别是对于像越南语这样的低资源语言。目前,解决这一问题的最先进的方法之一是使用蒸馏的多语言句子转换器模型。然而,关于这些模型如何适用于英语-越南语对的研究很少。在本文中,我们旨在检验这些模型在英语-越南语STS任务中的表现。根据我们的发现,我们将在未来对这种方法提出可能的改进。
{"title":"English-Vietnamese Cross-lingual Semantic Textual Similarity using Sentence Transformer model","authors":"K. H. Nguyen, Dat Cong Dinh, Hang Le, Dinh Dien","doi":"10.1109/KSE56063.2022.9953781","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953781","url":null,"abstract":"Cross-lingual Semantic Textual Similarity (STS) is a challenging problem in Natural Language Understanding tasks, especially for low-resource languages like Vietnamese. Currently, one of the state-of-the-art approaches for this problem is to use distilled multilingual Sentence Transformer model. However, there are few studies on how these models work for English-Vietnamese language pairs. In this paper, we aim to inspect the performance of these models in the English-Vietnamese STS tasks. From our findings, we will propose possible improvements for this approach in the future.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133017253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Losing a Head in Grammar Extraction 在语法提取中失去理智
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953755
Masaya Taniguchi, S. Tojo
The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generative power of the obtained grammar. In this phase, the syntactic head is a significant feature, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve it using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.
树库语料库是一个树的集合,它代表一个句子的组成部分和依赖关系。我们的动机是从树库中提取语法规则,即分解树状数据结构并找到语法规则。在提取之后,我们需要验证语法的充分性,以便我们检查获得的语法的生成能力。在这一阶段,语法头是一个重要的特征,然而,在获得的语法中,头信息缺失。因此,我们建议用范畴语法(CG)的类型提升规则来补充丢失的头部信息。我们将同样的问题扩展到组合范畴语法(CCG),并使用广义类型提升来解决它。此外,我们通过在证明辅助系统Isabelle/ HOL中编写的形式证明来验证我们的语法。
{"title":"Losing a Head in Grammar Extraction","authors":"Masaya Taniguchi, S. Tojo","doi":"10.1109/KSE56063.2022.9953755","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953755","url":null,"abstract":"The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generative power of the obtained grammar. In this phase, the syntactic head is a significant feature, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve it using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130263189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximation Methods for a Nonlinear Competitive Facility Cost Optimization Problem 一类非线性竞争性设施成本优化问题的逼近方法
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953774
Ngan Ha Duong, Thuy Anh Ta
In this paper, we study a facility cost optimization problem in a competitive market. Our objective is to distribute an available budget to some newly opened facilities to maximize an expected captured customer demand, assuming that customers will select a facility to visit according to a random utility maximization model. In this work, given the fact that the objective function of this problem is highly non-convex and challenging to solve exactly, we propose a technique to approximate the objective function by piece-wise linear functions, making it possible to reformulate the problem as a mixed-integer linear or conic program, which can further be solved by a commercial solver such as CPLEX. We also explore an outer-approximation algorithm to solve the approximate problem. Computational results are provided to demonstrate the performances of our approaches.
本文研究了竞争市场条件下的设备成本优化问题。我们的目标是将可用预算分配给一些新开放的设施,以最大限度地满足预期的捕获客户需求,假设客户将根据随机效用最大化模型选择要访问的设施。在这项工作中,考虑到这个问题的目标函数是高度非凸的,并且很难精确求解,我们提出了一种用分段线性函数近似目标函数的技术,使得将问题重新表述为混合整数线性或二次规划成为可能,这可以进一步由商业求解器(如CPLEX)求解。我们还探索了一种外部近似算法来解决近似问题。计算结果证明了我们的方法的性能。
{"title":"Approximation Methods for a Nonlinear Competitive Facility Cost Optimization Problem","authors":"Ngan Ha Duong, Thuy Anh Ta","doi":"10.1109/KSE56063.2022.9953774","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953774","url":null,"abstract":"In this paper, we study a facility cost optimization problem in a competitive market. Our objective is to distribute an available budget to some newly opened facilities to maximize an expected captured customer demand, assuming that customers will select a facility to visit according to a random utility maximization model. In this work, given the fact that the objective function of this problem is highly non-convex and challenging to solve exactly, we propose a technique to approximate the objective function by piece-wise linear functions, making it possible to reformulate the problem as a mixed-integer linear or conic program, which can further be solved by a commercial solver such as CPLEX. We also explore an outer-approximation algorithm to solve the approximate problem. Computational results are provided to demonstrate the performances of our approaches.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124830503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Mixed-Integer Linear Programming for Cost-Constrained Decision Trees with Multiple Condition Attributes 多条件属性成本约束决策树的混合整数线性规划
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953778
Hoang Giang Pham
In many real-world applications, cost factors play a significant role. Costs have been taken into consideration in numerous previous studies in machine learning, especially, in building decision trees. This research also considers a cost-sensitive decision tree construction problem with an assumption that test costs must be paid to obtain the values of the decision attribute and a record must be classified without exceeding the spending cost threshold. Moreover, our problem considers records with multiple condition attributes. We construct a cost-constrained decision tree using a Mixed-Integer formulation, which enables us to identify the optimal trees. The experimental results demonstrate that our formulation satisfactorily handles small data sets with multiple condition attributes under different cost constraints.
在许多实际应用中,成本因素起着重要作用。在机器学习的许多先前的研究中,特别是在构建决策树时,已经考虑了成本。本研究还考虑了一个成本敏感的决策树构建问题,假设必须支付测试成本才能获得决策属性的值,并且必须在不超过支出成本阈值的情况下对记录进行分类。此外,我们的问题考虑具有多个条件属性的记录。我们使用混合整数公式构造了一个成本约束的决策树,使我们能够识别最优树。实验结果表明,该方法可以很好地处理不同成本约束下具有多个条件属性的小数据集。
{"title":"The Mixed-Integer Linear Programming for Cost-Constrained Decision Trees with Multiple Condition Attributes","authors":"Hoang Giang Pham","doi":"10.1109/KSE56063.2022.9953778","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953778","url":null,"abstract":"In many real-world applications, cost factors play a significant role. Costs have been taken into consideration in numerous previous studies in machine learning, especially, in building decision trees. This research also considers a cost-sensitive decision tree construction problem with an assumption that test costs must be paid to obtain the values of the decision attribute and a record must be classified without exceeding the spending cost threshold. Moreover, our problem considers records with multiple condition attributes. We construct a cost-constrained decision tree using a Mixed-Integer formulation, which enables us to identify the optimal trees. The experimental results demonstrate that our formulation satisfactorily handles small data sets with multiple condition attributes under different cost constraints.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Blockchain with an E-Kanban System for Pull Leveling of an Assembly Line 将区块链与e -看板系统相结合,实现装配线的拉平
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953614
Vy Giang Thao, Huynh Thi Khanh Chi, N. Hop
Kanban system is a tool of the Just-in-time approach that many automotive companies have been adopted to improve their in-house operations. However, the traditional Kanban system has experienced some disadvantages in meeting the performance targets in terms of inventory, delivery, quality, and cost. In this paper, we propose a new combination approach of Blockchain and Electronic Kanban (E-Kanban) system to improve the traditional Kanban system for pull leveling pattern associated with the parallel information system. The proposed system is simulated to validate the feasible solutions for continuous improvement purpose. A real case of a leading automotive company in Vietnam is investigated to illustrate the proposed system.
看板系统是准时制方法的一种工具,许多汽车公司已经采用它来改善他们的内部运营。然而,传统的看板系统在满足库存、交付、质量和成本方面的绩效目标方面存在一些缺点。本文提出了一种区块链与电子看板(E-Kanban)系统相结合的新方法,以改进传统看板系统与并行信息系统相关联的拉平模式。对所提出的系统进行了仿真,以验证可行的解决方案,从而达到持续改进的目的。本文以越南一家领先的汽车公司为例,对所提出的系统进行了分析。
{"title":"Combining Blockchain with an E-Kanban System for Pull Leveling of an Assembly Line","authors":"Vy Giang Thao, Huynh Thi Khanh Chi, N. Hop","doi":"10.1109/KSE56063.2022.9953614","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953614","url":null,"abstract":"Kanban system is a tool of the Just-in-time approach that many automotive companies have been adopted to improve their in-house operations. However, the traditional Kanban system has experienced some disadvantages in meeting the performance targets in terms of inventory, delivery, quality, and cost. In this paper, we propose a new combination approach of Blockchain and Electronic Kanban (E-Kanban) system to improve the traditional Kanban system for pull leveling pattern associated with the parallel information system. The proposed system is simulated to validate the feasible solutions for continuous improvement purpose. A real case of a leading automotive company in Vietnam is investigated to illustrate the proposed system.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Human-like Chatbot using Deep Adversarial Learning 迈向使用深度对抗学习的类人聊天机器人
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953771
Quoc-Dai Luong Tran, Anh-Cuong Le, V. Huynh
Conversational agents are getting more popular and applied in a wide range of practical application areas. The main task of these agents is not only to generate context-appropriate responses to a given query but also to make the conversation human-like. Thanks to the ability of deep learning based models in natural language modeling, recent studies have made progress in designing conversational agents that can provide more semantically accurate responses. However, the naturalness in such conversation setting has not been given adequate attention in these studies. This paper aims to incorporate both important criteria of accuracy and naturalness of conversation in developing a new model for conversational agents. To this end, inspired by the idea of Turing test and the idea of adversarial learning strategy, we propose to design a model based on generative deep neural networks that interestingly allow to generate accurate responses optimized by the mechanics of imitating human-generated conversations. Experimental results demonstrate that the proposed models produce more natural and accurate responses, yielding significant gains in BLEU scores.
会话代理越来越受到人们的欢迎,并在广泛的实际应用领域得到了应用。这些代理的主要任务不仅是为给定查询生成与上下文相适应的响应,而且还要使对话类似于人类。由于基于深度学习的模型在自然语言建模中的能力,最近的研究在设计会话代理方面取得了进展,这些会话代理可以提供更准确的语义响应。然而,在这些研究中,这种会话环境中的自然性并没有得到足够的重视。本文旨在结合会话的准确性和自然度这两个重要标准来开发一个新的会话代理模型。为此,受图灵测试思想和对抗性学习策略思想的启发,我们提出设计一个基于生成式深度神经网络的模型,有趣的是,该模型允许通过模仿人类生成的对话机制来生成优化的准确响应。实验结果表明,所提出的模型产生了更自然和准确的反应,在BLEU分数上取得了显著的进步。
{"title":"Towards a Human-like Chatbot using Deep Adversarial Learning","authors":"Quoc-Dai Luong Tran, Anh-Cuong Le, V. Huynh","doi":"10.1109/KSE56063.2022.9953771","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953771","url":null,"abstract":"Conversational agents are getting more popular and applied in a wide range of practical application areas. The main task of these agents is not only to generate context-appropriate responses to a given query but also to make the conversation human-like. Thanks to the ability of deep learning based models in natural language modeling, recent studies have made progress in designing conversational agents that can provide more semantically accurate responses. However, the naturalness in such conversation setting has not been given adequate attention in these studies. This paper aims to incorporate both important criteria of accuracy and naturalness of conversation in developing a new model for conversational agents. To this end, inspired by the idea of Turing test and the idea of adversarial learning strategy, we propose to design a model based on generative deep neural networks that interestingly allow to generate accurate responses optimized by the mechanics of imitating human-generated conversations. Experimental results demonstrate that the proposed models produce more natural and accurate responses, yielding significant gains in BLEU scores.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128255890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Employing tree bisection and reconnection rearrangement for parsimony inference in MPBoot 在MPBoot中采用树分割和重连重排进行简约推理
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953773
Tien Dung Huynh, Quoc Tuan Vu, Viet-Dung Nguyen, D. T. Hoang
The approximation technique in MPBoot effectively addresses the problem of maximum parsimony phylogenetic bootstrapping, an essential task in bioinformatics with diverse applications in evolutionary biology. In this paper, we investigate integrating the tree bisection and reconnection (TBR) rearrangement to MPBoot to increase its sampling performance in the search space, and we describe the MPBoot-TBR algorithm. Since the size of the TBR neighborhood is cubic in the number of taxa, we offer algorithmic strategies for swiftly evaluating a TBR move, searching quickly in the neighborhood of a specified remove-branch, and hill-climbing using TBR. Furthermore, the framework’s stopping condition is adjusted because compared to the subtree pruning and regrafting, TBR requires fewer search iterations to converge to an acceptable MP score. In terms of bootstrap accuracy, MPBoot-TBR is comparable to MPBoot. In terms of MP score and computation time on real datasets, MPBoot-TBR outperforms the original MPBoot. We have implemented the proposed methods in the MPBoot-TBR program, the source code of which is accessible at https: //github.com/HynDuf7/mpboot/tree/Huynh_Tien_Dung.
MPBoot中的近似技术有效地解决了最大简约系统发育引导问题,这是生物信息学中的一项重要任务,在进化生物学中有着广泛的应用。为了提高MPBoot在搜索空间中的采样性能,我们研究了将树的分割和重连接(TBR)重排整合到MPBoot中,并描述了MPBoot-TBR算法。由于TBR邻域的大小在分类群数量上是立方的,我们提供了快速评估TBR移动的算法策略,在指定移除分支的邻域中快速搜索,以及使用TBR进行爬坡。此外,由于与子树修剪和重新嫁接相比,TBR需要更少的搜索迭代才能收敛到可接受的MP分数,因此调整了框架的停止条件。在引导精度方面,MPBoot- tbr与MPBoot相当。MPBoot- tbr在实际数据集上的MP分数和计算时间都优于原始MPBoot。我们已经在MPBoot-TBR程序中实现了所提出的方法,其源代码可以在https: //github.com/HynDuf7/mpboot/tree/Huynh_Tien_Dung上访问。
{"title":"Employing tree bisection and reconnection rearrangement for parsimony inference in MPBoot","authors":"Tien Dung Huynh, Quoc Tuan Vu, Viet-Dung Nguyen, D. T. Hoang","doi":"10.1109/KSE56063.2022.9953773","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953773","url":null,"abstract":"The approximation technique in MPBoot effectively addresses the problem of maximum parsimony phylogenetic bootstrapping, an essential task in bioinformatics with diverse applications in evolutionary biology. In this paper, we investigate integrating the tree bisection and reconnection (TBR) rearrangement to MPBoot to increase its sampling performance in the search space, and we describe the MPBoot-TBR algorithm. Since the size of the TBR neighborhood is cubic in the number of taxa, we offer algorithmic strategies for swiftly evaluating a TBR move, searching quickly in the neighborhood of a specified remove-branch, and hill-climbing using TBR. Furthermore, the framework’s stopping condition is adjusted because compared to the subtree pruning and regrafting, TBR requires fewer search iterations to converge to an acceptable MP score. In terms of bootstrap accuracy, MPBoot-TBR is comparable to MPBoot. In terms of MP score and computation time on real datasets, MPBoot-TBR outperforms the original MPBoot. We have implemented the proposed methods in the MPBoot-TBR program, the source code of which is accessible at https: //github.com/HynDuf7/mpboot/tree/Huynh_Tien_Dung.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133637332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ATPM-REAP: A Simple and Efficient Address Tracking and Parsing for Vietnamese Real Estate Advertisement Posts ATPM-REAP:越南房地产广告帖子的简单有效地址跟踪和解析
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953770
Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh
Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${mathrm {PhoBERT}}_{basmathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.
房地产在许多国家都是一个巨大而重要的领域。利用房地产广告帖子中的有用信息可以帮助您更好地了解市场状况并探索其他重要见解,特别是对于越南市场。值得注意的是,在房地产的代表信息中,地址或位置是必需的信息。然而,在越南有不同的方式来写下地址信息。因此,从房地产广告帖子中检测代表地址信息的相关文本就成为一项必要而富有挑战性的任务。本文研究了越南语的地址检测和解析任务。首先,我们创建了一个房地产广告数据集,每个房地产有16个不同的属性(实体),并为数据注释过程中检测到的每个实体分配正确的标签。然后,我们提出了一种实用的方法来检测特定房地产广告帖子中可能的地址位置,然后将本地化的地址文本提取为四个不同级别的地址信息:市/省、区/镇、区和街道。实验结果表明,${mathrm {PhoBERT}}_{basmathrm{e}}$模型性能最佳,f1得分为0.8195。最后,我们将所提方法与其他方法进行比较,得到了在所有层次上准确率最高的结果:市/省(0.952)、区/镇(0.9482)、区(0.9225)、街(0.8994),正确检测四个层次的总准确率为0.8367。
{"title":"ATPM-REAP: A Simple and Efficient Address Tracking and Parsing for Vietnamese Real Estate Advertisement Posts","authors":"Binh T. Nguyen, Tung Tran Nguyen Doan, S. T. Huynh, An Tran-Hoai Le, An Trong Nguyen, K. Tran, N. Ho, Trung T. Nguyen, Dang T. Huynh","doi":"10.1109/KSE56063.2022.9953770","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953770","url":null,"abstract":"Real estate is an enormous and essential field in many countries. Taking advantage of helpful information from real estate advertisement posts can help better understand the market condition and explore other vital insights, especially for the Vietnamese market. It is worth noting that in the representative information of real estate, the address or the location is required information. However, there are different ways to write down the address information in Vietnam. For this reason, detecting the relevant text representing the address information from real estate advertisement posts becomes an essential and challenging task. This paper investigates the address detecting and parsing task for the Vietnamese language. First, we create a dataset of real estate advertisements having 16 different attributes (entities) of each real estate and assign the correct label for each entity detected during the data annotation process. Then, we propose a practical approach for detecting locations of possible addresses inside one specific real estate advertisement post and then extract the localized address text into four different levels of the address information: City/Province, District/Town, Ward, and Street. The experiment results indicate that the ${mathrm {PhoBERT}}_{basmathrm{e}}$ model achieves the best performance with an F1-score of 0.8195. Finally, we compare our proposed method with other approaches and achieve the highest accuracy results for all levels as follows: City/Province (0.952), District/Town (0.9482), Ward (0.9225), Street (0.8994), and the combined accuracy of correctly detecting all four levels is 0.8367.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagnosing tuberculosis using graph neural network 用图神经网络诊断肺结核
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953751
H. Nguyen, Nam Q. Tran, H. Le
According to the World Health organization (WHO), tuberculosis (TB) is the top disease deadly worldwide, especially in developing and underdeveloped countries, due to poverty and limited health resources. Early screening for TB is a highly urgent task because of the severe effects on patient health and the rapid spread of the disease. Among the methods of diagnosing tuberculosis, chest X-ray images are often used as resources for clinical diagnosis because of their convenience and optimal cost. Currently, research on Computer-Aided Diagnosis (CAD) systems uses machine learning to provide doctors with diagnostic, analytical, and disease-monitoring techniques. Graph neural networks (GNN) have recently emerged as a research trend; works using GNN achieve perfect accuracy in many fields. In this paper, a study is presented on a solution to automatically diagnose tuberculosis on X-ray images (CXR) using the graph neural network method. We classify the CRX dataset into two classes (TB and non-TB). We achieve encouraging results with the proposed model: accuracy 99.33%, recall 99.07%, precision 99.63%, f1-score 99.35%, AUC 99.97%.
根据世界卫生组织(WHO)的数据,由于贫困和卫生资源有限,结核病(TB)是世界范围内最致命的疾病,特别是在发展中国家和不发达国家。结核病的早期筛查是一项非常紧迫的任务,因为它对患者健康产生严重影响,并迅速蔓延。在结核病的诊断方法中,胸部x线影像因其方便和成本最优而常被用作临床诊断的资源。目前,计算机辅助诊断(CAD)系统的研究利用机器学习为医生提供诊断、分析和疾病监测技术。图神经网络(GNN)是近年来兴起的一种研究趋势;使用GNN的工作在许多领域都达到了完美的精度。本文研究了一种基于图神经网络的x射线图像结核自动诊断方法。我们将CRX数据集分为两类(TB和非TB)。该模型取得了令人鼓舞的结果:准确率99.33%,召回率99.07%,精度99.63%,f1-score 99.35%, AUC 99.97%。
{"title":"Diagnosing tuberculosis using graph neural network","authors":"H. Nguyen, Nam Q. Tran, H. Le","doi":"10.1109/KSE56063.2022.9953751","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953751","url":null,"abstract":"According to the World Health organization (WHO), tuberculosis (TB) is the top disease deadly worldwide, especially in developing and underdeveloped countries, due to poverty and limited health resources. Early screening for TB is a highly urgent task because of the severe effects on patient health and the rapid spread of the disease. Among the methods of diagnosing tuberculosis, chest X-ray images are often used as resources for clinical diagnosis because of their convenience and optimal cost. Currently, research on Computer-Aided Diagnosis (CAD) systems uses machine learning to provide doctors with diagnostic, analytical, and disease-monitoring techniques. Graph neural networks (GNN) have recently emerged as a research trend; works using GNN achieve perfect accuracy in many fields. In this paper, a study is presented on a solution to automatically diagnose tuberculosis on X-ray images (CXR) using the graph neural network method. We classify the CRX dataset into two classes (TB and non-TB). We achieve encouraging results with the proposed model: accuracy 99.33%, recall 99.07%, precision 99.63%, f1-score 99.35%, AUC 99.97%.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pediatric Sepsis Diagnosis Based on Differential Gene Expression and Machine Learning Method 基于差异基因表达和机器学习方法的儿童脓毒症诊断
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953619
L. D. Vu, V. Pham, M. Nguyen, Hai-Chau Le
Sepsis is known as a life-threading status, which relates closely to the responses of the human body to an infection inside the tissues and organs. Such a reaction results in the distortion of the organ function. In this work, a novel algorithm is proposed for the diagnosis of pediatric sepsis including a random forest model and a combination of 9 genes. The proposed algorithm is constructed carefully with a sequential gene selection procedure, which combines differential gene expression analysis and gene importance computed by the machine learning model to address the most informative differential gene expression. The cross-validation procedure in combination with different machine learning algorithms is adopted for the estimation of the diagnosis performance related to the gene combinations and machine learning models. The selected gene combinations are then tested separately using various machine learning methods. The validation results, which are accuracy of 91.79%, sensitivity of 57.33%, and specificity of 100%, show that the proposed algorithm is potential for practical application in the real clinic environment.
脓毒症被认为是一种危及生命的状态,它与人体对组织和器官内感染的反应密切相关。这种反应导致器官功能的扭曲。在这项工作中,提出了一种新的算法用于儿科败血症的诊断,包括随机森林模型和9基因的组合。该算法采用序列基因选择程序,将差异基因表达分析和机器学习模型计算的基因重要度相结合,以解决信息量最大的差异基因表达。结合不同的机器学习算法,采用交叉验证程序来估计与基因组合和机器学习模型相关的诊断性能。然后使用各种机器学习方法分别测试选定的基因组合。验证结果表明,该算法的准确率为91.79%,灵敏度为57.33%,特异性为100%,具有在临床实际环境中实际应用的潜力。
{"title":"Pediatric Sepsis Diagnosis Based on Differential Gene Expression and Machine Learning Method","authors":"L. D. Vu, V. Pham, M. Nguyen, Hai-Chau Le","doi":"10.1109/KSE56063.2022.9953619","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953619","url":null,"abstract":"Sepsis is known as a life-threading status, which relates closely to the responses of the human body to an infection inside the tissues and organs. Such a reaction results in the distortion of the organ function. In this work, a novel algorithm is proposed for the diagnosis of pediatric sepsis including a random forest model and a combination of 9 genes. The proposed algorithm is constructed carefully with a sequential gene selection procedure, which combines differential gene expression analysis and gene importance computed by the machine learning model to address the most informative differential gene expression. The cross-validation procedure in combination with different machine learning algorithms is adopted for the estimation of the diagnosis performance related to the gene combinations and machine learning models. The selected gene combinations are then tested separately using various machine learning methods. The validation results, which are accuracy of 91.79%, sensitivity of 57.33%, and specificity of 100%, show that the proposed algorithm is potential for practical application in the real clinic environment.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132764766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 14th International Conference on Knowledge and Systems Engineering (KSE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1