首页 > 最新文献

2022 14th International Conference on Knowledge and Systems Engineering (KSE)最新文献

英文 中文
Adaptive Learning Models for Getting Insights into Multimodal Lifelog Data 获得洞察多模式生活日志数据的自适应学习模型
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953616
Phuc-Thinh Nguyen, M. Nazmudeen, Minh-Son Dao, Duy-Dong Le
Regular exercise and scientific eating can support weight control and benefit everyone’s health, especially athletes. In recent years, although much research has been conducted in this field, only small groups of people were studied, and a few models revealed links between weight and speed attributes (e.g., activities, wellbeing, habits) to extract tips to assist people in controlling their weight and running speed. In this research, we propose an approach that uses pattern mining and correlation discovery techniques to discover the most optimal attributes over time to forecast the weight and speed of an athlete for a sports event. Furthermore, we propose Adaptive Learning Models, which can learn from personal and public data to forecast a person’s weight or speed in various age groups, such as young adults, middle-aged adults, and female or male members. Based on the above analysis, different approaches to building prediction models of athletes’ weight or running speed are being examined based on the primary data. Our suggested approach yields encouraging results when tested on public and private data sets.
有规律的运动和科学的饮食可以帮助控制体重,有益于每个人的健康,尤其是运动员。近年来,尽管在这一领域进行了大量研究,但只对一小部分人进行了研究,一些模型揭示了体重和速度属性(如活动、健康、习惯)之间的联系,以提取帮助人们控制体重和跑步速度的提示。在这项研究中,我们提出了一种方法,使用模式挖掘和相关发现技术来发现随着时间推移的最优属性,以预测运动员在体育赛事中的体重和速度。此外,我们提出了适应性学习模型,该模型可以从个人和公共数据中学习,以预测不同年龄组的人的体重或速度,如年轻人,中年人,女性或男性成员。在以上分析的基础上,基于原始数据,研究了建立运动员体重或跑步速度预测模型的不同方法。我们建议的方法在公共和私人数据集上进行测试时产生了令人鼓舞的结果。
{"title":"Adaptive Learning Models for Getting Insights into Multimodal Lifelog Data","authors":"Phuc-Thinh Nguyen, M. Nazmudeen, Minh-Son Dao, Duy-Dong Le","doi":"10.1109/KSE56063.2022.9953616","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953616","url":null,"abstract":"Regular exercise and scientific eating can support weight control and benefit everyone’s health, especially athletes. In recent years, although much research has been conducted in this field, only small groups of people were studied, and a few models revealed links between weight and speed attributes (e.g., activities, wellbeing, habits) to extract tips to assist people in controlling their weight and running speed. In this research, we propose an approach that uses pattern mining and correlation discovery techniques to discover the most optimal attributes over time to forecast the weight and speed of an athlete for a sports event. Furthermore, we propose Adaptive Learning Models, which can learn from personal and public data to forecast a person’s weight or speed in various age groups, such as young adults, middle-aged adults, and female or male members. Based on the above analysis, different approaches to building prediction models of athletes’ weight or running speed are being examined based on the primary data. Our suggested approach yields encouraging results when tested on public and private data sets.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131702589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Approach for Vietnamese Aspect-Based Sentiment Analysis 越南语面向方面情感分析的新方法
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953759
Bao Le, M. Nguyen, Nhi Kieu-Phuong Nguyen, Binh T. Nguyen
Intelligent systems, especially smartphones, have become crucial parts of the world. These devices can solve various human tasks, from long-distance communication to healthcare assistants. For this tremendous success, customer feedback on a smartphone plays an integral role during the development process. This paper presents an improved approach for the Vietnamese Smartphone Feedback Dataset (UIT-ViSFD), collected and annotated carefully in 2021 (including 11,122 comments and their labels) by employing the pretrained PhoBERT model with a proper pre-processing method. In the experiments, we compare the approach with other transformer-based models such as XLM-R, DistilBERT, RoBERTa, and BERT. The experimental results show that the proposed method can bypass the state-of-the-art methods related to the UIT-ViSFD corpus. As a result, our model can achieve better macro-F1 scores for the Aspect and Sentiment Detection task, which are 86.03% and 78.76%, respectively. In addition, our approach could improve the results of Aspect-Based Sentiment Analysis datasets in the Vietnamese language.
智能系统,尤其是智能手机,已经成为世界的重要组成部分。这些设备可以解决各种人类任务,从远程通信到医疗保健助理。为了取得如此巨大的成功,智能手机上的用户反馈在开发过程中发挥了不可或缺的作用。本文提出了一种针对越南智能手机反馈数据集(unit - visfd)的改进方法,该数据集于2021年收集并仔细注释(包括11,122条评论及其标签),采用预训练的PhoBERT模型和适当的预处理方法。在实验中,我们将该方法与其他基于变压器的模型(如XLM-R、DistilBERT、RoBERTa和BERT)进行了比较。实验结果表明,该方法可以绕过与unit - visfd语料库相关的最先进的方法。因此,我们的模型在Aspect和Sentiment Detection任务上可以获得更好的宏观f1分数,分别为86.03%和78.76%。此外,我们的方法可以改善越南语中基于方面的情感分析数据集的结果。
{"title":"A New Approach for Vietnamese Aspect-Based Sentiment Analysis","authors":"Bao Le, M. Nguyen, Nhi Kieu-Phuong Nguyen, Binh T. Nguyen","doi":"10.1109/KSE56063.2022.9953759","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953759","url":null,"abstract":"Intelligent systems, especially smartphones, have become crucial parts of the world. These devices can solve various human tasks, from long-distance communication to healthcare assistants. For this tremendous success, customer feedback on a smartphone plays an integral role during the development process. This paper presents an improved approach for the Vietnamese Smartphone Feedback Dataset (UIT-ViSFD), collected and annotated carefully in 2021 (including 11,122 comments and their labels) by employing the pretrained PhoBERT model with a proper pre-processing method. In the experiments, we compare the approach with other transformer-based models such as XLM-R, DistilBERT, RoBERTa, and BERT. The experimental results show that the proposed method can bypass the state-of-the-art methods related to the UIT-ViSFD corpus. As a result, our model can achieve better macro-F1 scores for the Aspect and Sentiment Detection task, which are 86.03% and 78.76%, respectively. In addition, our approach could improve the results of Aspect-Based Sentiment Analysis datasets in the Vietnamese language.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Knowledge Base Completion with transfer learning using BERT and fastText 使用BERT和fastText完成迁移学习的知识库
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953802
Thuy-Anh Nguyen Thi, Thi-Hong Vuong, Thi-Hanh Le, X. Phan, Thi-Thao Le, Quang-Thuy Ha
Knowledge base completion (KBC) is the task of predicting and filling missing information based on the current data in that knowledge base. Recently, one of the most feasible approaches introduced by V. Kocijan and T. Lukasiewicz (2021) is to transfer knowledge from one collection of information to another without the need for entity or relation matching. Still, this work has not scaled pre-training to larger models and datasets and investigated the impact of the encoder architecture. In this work, we propose a method that can combine the benefits of Bidirectional Encoder Representations from Transformer (BERT), fastText, Gated Recurrent Unit (GRU), and Fully Connected (FC) layer to improve the KBC task in Kocijan and Lukasiewicz’s model. The experimental results show the effectiveness of our proposed model in several popular datasets like ReVerb20K, ReVerb45K, FB15K237, and WN18RR.
知识库补全(KBC)是基于知识库中的当前数据预测和填充缺失信息的任务。最近,V. Kocijan和T. Lukasiewicz(2021)提出的最可行的方法之一是将知识从一个信息集合转移到另一个信息集合,而不需要实体或关系匹配。尽管如此,这项工作还没有将预训练扩展到更大的模型和数据集,也没有调查编码器架构的影响。在这项工作中,我们提出了一种方法,可以结合来自变压器(BERT), fastText,门控循环单元(GRU)和完全连接(FC)层的双向编码器表示的优点,以改进Kocijan和Lukasiewicz模型中的KBC任务。实验结果表明了该模型在ReVerb20K、ReVerb45K、FB15K237和WN18RR等常用数据集上的有效性。
{"title":"Knowledge Base Completion with transfer learning using BERT and fastText","authors":"Thuy-Anh Nguyen Thi, Thi-Hong Vuong, Thi-Hanh Le, X. Phan, Thi-Thao Le, Quang-Thuy Ha","doi":"10.1109/KSE56063.2022.9953802","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953802","url":null,"abstract":"Knowledge base completion (KBC) is the task of predicting and filling missing information based on the current data in that knowledge base. Recently, one of the most feasible approaches introduced by V. Kocijan and T. Lukasiewicz (2021) is to transfer knowledge from one collection of information to another without the need for entity or relation matching. Still, this work has not scaled pre-training to larger models and datasets and investigated the impact of the encoder architecture. In this work, we propose a method that can combine the benefits of Bidirectional Encoder Representations from Transformer (BERT), fastText, Gated Recurrent Unit (GRU), and Fully Connected (FC) layer to improve the KBC task in Kocijan and Lukasiewicz’s model. The experimental results show the effectiveness of our proposed model in several popular datasets like ReVerb20K, ReVerb45K, FB15K237, and WN18RR.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"408 25","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120889604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Coarse-to-fine Unsupervised Domain Adaptation Method for Cross-Mode Polyp Segmentation 交叉模式息肉分割中一种粗到精的无监督域自适应方法
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953621
Kieu Dang Nam, Thi-Oanh Nguyen, N. T. Thuy, D. V. Hang, D. Long, Tran Quang Trung, D. V. Sang
The goal of the Unsupervised Domain Adaptation (UDA) is to transfer the knowledge of the model learned from a source domain with available labels to the target data domain without having access to labels. However, the performance of UDA can greatly suffer from the domain shift issue caused by the misalignment of the two data distributions from the two data sources. Endoscopy can be performed under different light modes, including white-light imaging (WLI) and image-enhanced endoscopy (IEE) light modes. However, most of the current polyp datasets are collected in the WLI mode since it is the standard and most popular one in all endoscopy systems. Therefore, AI models trained on such WLI datasets can strongly degrade when applied to other light modes. In order to address this issue, this paper proposes a coarse-to-fine UDA method that first coarsely aligns the two data distributions at the input level using the Fourier transform in chromatic space; then finely aligns them at the feature level using a fine-grained adversarial training. The backbone of our model is based on a powerful transformer architecture. Experimental results show that our proposed method effectively solves the domain shift issue and achieves a substantial performance improvement on cross-mode polyp segmentation for endoscopy.
无监督域自适应(Unsupervised Domain Adaptation, UDA)的目标是在不访问标签的情况下,将模型从具有可用标签的源域学习到的知识转移到目标数据域。然而,由于来自两个数据源的两个数据分布不一致而导致的域移位问题可能会极大地影响UDA的性能。内窥镜可以在不同的光模式下进行,包括白光成像(WLI)和图像增强内窥镜(IEE)光模式。然而,目前大多数息肉数据集都是在WLI模式下收集的,因为它是所有内窥镜系统中最标准和最流行的模式。因此,在这种WLI数据集上训练的AI模型在应用于其他光照模式时可能会严重退化。为了解决这一问题,本文提出了一种从粗到精的UDA方法,该方法首先使用色空间中的傅里叶变换在输入级对两个数据分布进行粗对齐;然后使用细粒度的对抗性训练在特征级别对它们进行精细对齐。我们的模型的主干是基于一个强大的变压器体系结构。实验结果表明,本文提出的方法有效地解决了域移位问题,在内镜交叉模式息肉分割中取得了显著的性能提升。
{"title":"A Coarse-to-fine Unsupervised Domain Adaptation Method for Cross-Mode Polyp Segmentation","authors":"Kieu Dang Nam, Thi-Oanh Nguyen, N. T. Thuy, D. V. Hang, D. Long, Tran Quang Trung, D. V. Sang","doi":"10.1109/KSE56063.2022.9953621","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953621","url":null,"abstract":"The goal of the Unsupervised Domain Adaptation (UDA) is to transfer the knowledge of the model learned from a source domain with available labels to the target data domain without having access to labels. However, the performance of UDA can greatly suffer from the domain shift issue caused by the misalignment of the two data distributions from the two data sources. Endoscopy can be performed under different light modes, including white-light imaging (WLI) and image-enhanced endoscopy (IEE) light modes. However, most of the current polyp datasets are collected in the WLI mode since it is the standard and most popular one in all endoscopy systems. Therefore, AI models trained on such WLI datasets can strongly degrade when applied to other light modes. In order to address this issue, this paper proposes a coarse-to-fine UDA method that first coarsely aligns the two data distributions at the input level using the Fourier transform in chromatic space; then finely aligns them at the feature level using a fine-grained adversarial training. The backbone of our model is based on a powerful transformer architecture. Experimental results show that our proposed method effectively solves the domain shift issue and achieves a substantial performance improvement on cross-mode polyp segmentation for endoscopy.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133183851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Iterated Local Search for the Talent Scheduling Problem with Location Costs 考虑位置成本的人才调度问题的迭代局部搜索
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953796
Thu Trang Hoa, Minh Anh Nguyen
The talent scheduling problem seeks to determine the movie shooting sequence that minimizes the total cost of the actors involved, which usually accounts for a significant portion of the cost of any real-world movie production. This paper introduces an extension of the talent scheduling problem that takes into account both the costs of filming locations and actors. To better capture reality, we consider that the rental cost for a filming location can vary across the planning horizon. The objective is to find the shooting sequence as well as the start date for each scene that minimizes the total cost, including actor and location costs, while ensuring all scenes are completed within the planning horizon. We first formulate the problem as a mixed integer linear programming (MILP) model, from which small instances can be solved to optimality by MILP solvers. Next, an iterated local search heuristic that can efficiently solve larger instances is developed. Then we provide a new benchmark data set for our new variance of the talent scheduling problem. The results of computational experiments upon new benchmark instances suggest that our heuristic can outperform the MILP model solved by a commercial solver in terms of both solution quality and runtime.
人才调度问题旨在确定电影拍摄顺序,使所涉及的演员的总成本最小化,这通常占任何现实电影制作成本的很大一部分。本文介绍了同时考虑拍摄场地和演员成本的人才调度问题的扩展。为了更好地捕捉现实,我们考虑到拍摄地点的租金成本在规划范围内可能会有所不同。目标是为每个场景找到拍摄顺序以及开始日期,以最大限度地减少总成本,包括演员和位置成本,同时确保所有场景在规划范围内完成。我们首先将问题表述为一个混合整数线性规划(MILP)模型,从该模型中可以用MILP求解器求解小实例的最优性。其次,提出了一种能够有效求解较大实例的迭代局部搜索启发式算法。然后,我们为人才调度问题的新方差提供了一个新的基准数据集。在新的基准实例上的计算实验结果表明,我们的启发式算法在求解质量和运行时间方面都优于商业求解器求解的MILP模型。
{"title":"An Iterated Local Search for the Talent Scheduling Problem with Location Costs","authors":"Thu Trang Hoa, Minh Anh Nguyen","doi":"10.1109/KSE56063.2022.9953796","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953796","url":null,"abstract":"The talent scheduling problem seeks to determine the movie shooting sequence that minimizes the total cost of the actors involved, which usually accounts for a significant portion of the cost of any real-world movie production. This paper introduces an extension of the talent scheduling problem that takes into account both the costs of filming locations and actors. To better capture reality, we consider that the rental cost for a filming location can vary across the planning horizon. The objective is to find the shooting sequence as well as the start date for each scene that minimizes the total cost, including actor and location costs, while ensuring all scenes are completed within the planning horizon. We first formulate the problem as a mixed integer linear programming (MILP) model, from which small instances can be solved to optimality by MILP solvers. Next, an iterated local search heuristic that can efficiently solve larger instances is developed. Then we provide a new benchmark data set for our new variance of the talent scheduling problem. The results of computational experiments upon new benchmark instances suggest that our heuristic can outperform the MILP model solved by a commercial solver in terms of both solution quality and runtime.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Unsupervised Learning Method to improve Legal Document Retrieval task at ALQAC 2022 一种改进alqac2022法律文件检索任务的无监督学习方法
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953618
D. Nguyen, Hieu Nguyen, Tung Le, Le-Minh Nguyen
Document retrieval for domain-specific has been an important and challenging research in NLP, particularly legal documents. The main challenge in the legal domain is the close combination of specialized knowledge from experts, which makes the entire data collecting and evaluation procedure complex and time consuming. In this study, we propose a training data augmentation procedure and an unsupervised embedding learning method and apply it to the Legal Document Retrieval task at the Automated Legal Question Answering Competition 2022 (ALQAC 2022). In this task, our method outperformed current standard models and achieved competitive results at ALQAC 2022.
面向特定领域的文档检索一直是自然语言处理领域的一个重要且具有挑战性的研究方向,法律文档的检索尤其如此。法律领域的主要挑战是专家专业知识的紧密结合,这使得整个数据收集和评估过程复杂而耗时。在这项研究中,我们提出了一种训练数据增强过程和一种无监督嵌入学习方法,并将其应用于2022年自动法律问答竞赛(ALQAC 2022)的法律文档检索任务。在这项任务中,我们的方法优于当前的标准模型,并在ALQAC 2022上取得了具有竞争力的结果。
{"title":"An Unsupervised Learning Method to improve Legal Document Retrieval task at ALQAC 2022","authors":"D. Nguyen, Hieu Nguyen, Tung Le, Le-Minh Nguyen","doi":"10.1109/KSE56063.2022.9953618","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953618","url":null,"abstract":"Document retrieval for domain-specific has been an important and challenging research in NLP, particularly legal documents. The main challenge in the legal domain is the close combination of specialized knowledge from experts, which makes the entire data collecting and evaluation procedure complex and time consuming. In this study, we propose a training data augmentation procedure and an unsupervised embedding learning method and apply it to the Legal Document Retrieval task at the Automated Legal Question Answering Competition 2022 (ALQAC 2022). In this task, our method outperformed current standard models and achieved competitive results at ALQAC 2022.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123584454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Controlling Weight Update Probability of Sparse Features in Machine Learning 机器学习中稀疏特征权值更新概率的控制
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953753
Joon-Choul Shin, Wansu Kim, Jusang Lee, Jieun Park, Cheolyoung Ock
In machine learning, the feature frequency in learning data can be used for a value of the feature, and in this case, sparse feature is likely to create overfitting problems in the weight optimization process. This is called sparse data problem, and this paper proposes a method that reduce the probability of weight update as the feature is sparse. We experimented with this method in four Natural Language Processing tasks, and the experiment results showed that this method had positive effects on all tasks. On average, this method had the effect of reducing 8 per 100 errors. Also it reduced the number of weight updates, therefore the learning time was reduced to 81% in Named Entity Recognition task.
在机器学习中,可以用学习数据中的特征频率作为特征的一个值,在这种情况下,稀疏特征很可能在权值优化过程中产生过拟合问题。这被称为稀疏数据问题,本文提出了一种降低权重更新概率的方法,因为特征是稀疏的。我们在四个自然语言处理任务中进行了实验,实验结果表明,该方法对所有任务都有积极的效果。平均而言,这种方法的效果是每100个错误减少8个。此外,它减少了权重更新的次数,因此在命名实体识别任务中,学习时间减少到81%。
{"title":"Controlling Weight Update Probability of Sparse Features in Machine Learning","authors":"Joon-Choul Shin, Wansu Kim, Jusang Lee, Jieun Park, Cheolyoung Ock","doi":"10.1109/KSE56063.2022.9953753","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953753","url":null,"abstract":"In machine learning, the feature frequency in learning data can be used for a value of the feature, and in this case, sparse feature is likely to create overfitting problems in the weight optimization process. This is called sparse data problem, and this paper proposes a method that reduce the probability of weight update as the feature is sparse. We experimented with this method in four Natural Language Processing tasks, and the experiment results showed that this method had positive effects on all tasks. On average, this method had the effect of reducing 8 per 100 errors. Also it reduced the number of weight updates, therefore the learning time was reduced to 81% in Named Entity Recognition task.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131483773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Task-based Knowledge for Lexicon-based Approach in Vietnamese Hate Speech Detection 基于任务知识的基于词典的越南语仇恨言语检测方法
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953615
Suong N. Hoang, Binh Duc Nguyen, Nam-Phong Nguyen, Son T. Luu, Hieu T. Phan, H. Nguyen
The explosion of free-text content on social media has brought the exponential propagation of hate speech. The definition of hate speech is well-defined in the community guidelines of many popular platforms such as Facebook, Tiktok, and Twitter, where any communication judges towards the minor, protected groups are considered hateful content. This paper first points out the sophisticated word-play of malicious users in a Vietnamese Hate Speech (VHS) Dataset. The Center Loss in the training process to disambiguate the task-based sentence embedding is proposed for improving generalizations of the model. Moreover, a task-based lexical attention pooling is also proposed to highlight lexicon-level information and then combined into sentence embedding. The experimental results show that the proposed method improves the F1 score in the ViHSD dataset, while the training time and inference speed are insignificantly changed.
社交媒体上自由文本内容的爆炸式增长带来了仇恨言论的指数级传播。在Facebook、抖音和Twitter等许多流行平台的社区指导方针中,仇恨言论的定义是明确的,在这些平台上,任何针对未成年人、受保护群体的传播都被视为仇恨内容。本文首先指出了越南仇恨言论(VHS)数据集中恶意用户的复杂文字游戏。为了提高模型的泛化性,提出了训练过程中的中心损失来消除基于任务的句子嵌入的歧义。此外,我们还提出了一种基于任务的词汇注意池方法来突出词汇级信息,并将其结合到句子嵌入中。实验结果表明,该方法提高了ViHSD数据集的F1分数,而训练时间和推理速度没有显著变化。
{"title":"Enhanced Task-based Knowledge for Lexicon-based Approach in Vietnamese Hate Speech Detection","authors":"Suong N. Hoang, Binh Duc Nguyen, Nam-Phong Nguyen, Son T. Luu, Hieu T. Phan, H. Nguyen","doi":"10.1109/KSE56063.2022.9953615","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953615","url":null,"abstract":"The explosion of free-text content on social media has brought the exponential propagation of hate speech. The definition of hate speech is well-defined in the community guidelines of many popular platforms such as Facebook, Tiktok, and Twitter, where any communication judges towards the minor, protected groups are considered hateful content. This paper first points out the sophisticated word-play of malicious users in a Vietnamese Hate Speech (VHS) Dataset. The Center Loss in the training process to disambiguate the task-based sentence embedding is proposed for improving generalizations of the model. Moreover, a task-based lexical attention pooling is also proposed to highlight lexicon-level information and then combined into sentence embedding. The experimental results show that the proposed method improves the F1 score in the ViHSD dataset, while the training time and inference speed are insignificantly changed.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128669858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of Cluster-based Sampling on the Over-smoothing Issue in Graph Neural Network 基于聚类采样对图神经网络过平滑问题的影响
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953797
T. Hoang, Viet-Cuong Ta
Graph neural networks (GNNs) are among the dominated approaches for learning graph structured data and are used in various applications such as social network or product recommendation. The GNN operates mainly on the message passing mechanism which a node receives related nodes information to improve its internal representation. However, when the depth of the GNN increases, the message passing mechanism cut-offs the high-frequency component of the nodes’ representation, thus leads to the over-smoothing issue. In this paper, we propose the usage of cluster-based sampling to reduce the smoothing effect of the high number of layers in GNN. Given each nodes is assigned to a specific region of the embedding space, the cluster-based sampling is expected to propagate this information to the node’s neighbour, thus improve the nodes’ expressivity. Our approach is tested with several popular GNN architecture and the experiments show that our approach could reduce the smoothing effect in comparison with the standard approaches using the Mean Average Distance metric.
图神经网络(gnn)是学习图结构数据的主要方法之一,并用于各种应用,如社交网络或产品推荐。GNN主要通过节点接收相关节点信息的消息传递机制来改进其内部表示。然而,当GNN深度增加时,消息传递机制会切断节点表示的高频成分,从而导致过度平滑问题。在本文中,我们提出使用基于聚类的采样来降低GNN中高层数的平滑效应。给定每个节点被分配到嵌入空间的特定区域,期望基于聚类的采样将该信息传播到节点的邻居,从而提高节点的表达性。我们的方法在几种流行的GNN架构上进行了测试,实验表明,与使用平均距离度量的标准方法相比,我们的方法可以降低平滑效果。
{"title":"Effect of Cluster-based Sampling on the Over-smoothing Issue in Graph Neural Network","authors":"T. Hoang, Viet-Cuong Ta","doi":"10.1109/KSE56063.2022.9953797","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953797","url":null,"abstract":"Graph neural networks (GNNs) are among the dominated approaches for learning graph structured data and are used in various applications such as social network or product recommendation. The GNN operates mainly on the message passing mechanism which a node receives related nodes information to improve its internal representation. However, when the depth of the GNN increases, the message passing mechanism cut-offs the high-frequency component of the nodes’ representation, thus leads to the over-smoothing issue. In this paper, we propose the usage of cluster-based sampling to reduce the smoothing effect of the high number of layers in GNN. Given each nodes is assigned to a specific region of the embedding space, the cluster-based sampling is expected to propagate this information to the node’s neighbour, thus improve the nodes’ expressivity. Our approach is tested with several popular GNN architecture and the experiments show that our approach could reduce the smoothing effect in comparison with the standard approaches using the Mean Average Distance metric.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"614 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120875932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Federated Learning for Air Quality Index Prediction: An Overview 联合学习用于空气质量指数预测:综述
Pub Date : 2022-10-19 DOI: 10.1109/KSE56063.2022.9953790
Duy-Dong Le, Anh-Khoa Tran, Minh-Son Dao, M. Nazmudeen, Viet-Tiep Mai, Nhat-Ha Su
The air quality index forecast in big cities is an exciting study area in smart cities and healthcare on the Internet of Things. In recent years, a large number of empirical, academic, and review papers using machine learning for air quality analysis have been published. However, most of those studies focused on traditional centralized processing on a single machine, and there had been few surveys of federated learning in this field. This overview aims to fill this gap and provide newcomers with a broader perspective to inform future research on this topic, especially for the multi-model approach. We have examined over 70 carefully selected papers in this scope and discovered that multi-model federated learning is the most effective technique that could enhance the air quality index prediction result. Therefore, this mechanism needs to be considered by science community in the coming years.
大城市空气质量指数预测是智慧城市和物联网医疗领域一个令人兴奋的研究领域。近年来,已经发表了大量使用机器学习进行空气质量分析的实证、学术和评论论文。然而,这些研究大多集中在单个机器上的传统集中处理,并且在该领域很少有关于联邦学习的调查。本综述旨在填补这一空白,并为新来者提供更广阔的视角,为该主题的未来研究提供信息,特别是多模型方法。我们仔细研究了70多篇这方面的论文,发现多模型联邦学习是提高空气质量指数预测结果的最有效技术。因此,这一机制需要在未来几年得到科学界的考虑。
{"title":"Federated Learning for Air Quality Index Prediction: An Overview","authors":"Duy-Dong Le, Anh-Khoa Tran, Minh-Son Dao, M. Nazmudeen, Viet-Tiep Mai, Nhat-Ha Su","doi":"10.1109/KSE56063.2022.9953790","DOIUrl":"https://doi.org/10.1109/KSE56063.2022.9953790","url":null,"abstract":"The air quality index forecast in big cities is an exciting study area in smart cities and healthcare on the Internet of Things. In recent years, a large number of empirical, academic, and review papers using machine learning for air quality analysis have been published. However, most of those studies focused on traditional centralized processing on a single machine, and there had been few surveys of federated learning in this field. This overview aims to fill this gap and provide newcomers with a broader perspective to inform future research on this topic, especially for the multi-model approach. We have examined over 70 carefully selected papers in this scope and discovered that multi-model federated learning is the most effective technique that could enhance the air quality index prediction result. Therefore, this mechanism needs to be considered by science community in the coming years.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130797396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 14th International Conference on Knowledge and Systems Engineering (KSE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1