首页 > 最新文献

Asian Conference on Intelligent Information and Database Systems最新文献

英文 中文
Speeding Up Recommender Systems Using Association Rules 使用关联规则加速推荐系统
Pub Date : 2022-11-16 DOI: 10.1007/978-3-031-21967-2_14
Eyad Kannout, H. Nguyen, Marek Grzegorowski
{"title":"Speeding Up Recommender Systems Using Association Rules","authors":"Eyad Kannout, H. Nguyen, Marek Grzegorowski","doi":"10.1007/978-3-031-21967-2_14","DOIUrl":"https://doi.org/10.1007/978-3-031-21967-2_14","url":null,"abstract":"","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121832681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semantic Pivoting Model for Effective Event Detection 有效事件检测的语义旋转模型
Pub Date : 2022-11-01 DOI: 10.48550/arXiv.2211.00709
Anran Hao, S. Hui, Jian Su
Event Detection, which aims to identify and classify mentions of event instances from unstructured articles, is an important task in Natural Language Processing (NLP). Existing techniques for event detection only use homogeneous one-hot vectors to represent the event type classes, ignoring the fact that the semantic meaning of the types is important to the task. Such an approach is inefficient and prone to overfitting. In this paper, we propose a Semantic Pivoting Model for Effective Event Detection (SPEED), which explicitly incorporates prior information during training and captures semantically meaningful correlations between input and events. Experimental results show that our proposed model achieves state-of-the-art performance and outperforms the baselines in multiple settings without using any external resources.
事件检测是自然语言处理(NLP)中的一项重要任务,旨在从非结构化文章中识别和分类提及的事件实例。现有的事件检测技术只使用同构的单热向量来表示事件类型类,而忽略了类型的语义对任务的重要性。这种方法效率低下,而且容易出现过拟合。在本文中,我们提出了一种用于有效事件检测(SPEED)的语义旋转模型,该模型在训练过程中明确地结合先验信息,并捕获输入和事件之间语义上有意义的相关性。实验结果表明,我们提出的模型在不使用任何外部资源的情况下达到了最先进的性能,并且在多个设置中优于基线。
{"title":"Semantic Pivoting Model for Effective Event Detection","authors":"Anran Hao, S. Hui, Jian Su","doi":"10.48550/arXiv.2211.00709","DOIUrl":"https://doi.org/10.48550/arXiv.2211.00709","url":null,"abstract":"Event Detection, which aims to identify and classify mentions of event instances from unstructured articles, is an important task in Natural Language Processing (NLP). Existing techniques for event detection only use homogeneous one-hot vectors to represent the event type classes, ignoring the fact that the semantic meaning of the types is important to the task. Such an approach is inefficient and prone to overfitting. In this paper, we propose a Semantic Pivoting Model for Effective Event Detection (SPEED), which explicitly incorporates prior information during training and captures semantically meaningful correlations between input and events. Experimental results show that our proposed model achieves state-of-the-art performance and outperforms the baselines in multiple settings without using any external resources.","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115373676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A practical method for occupational skills detection in Vietnamese job listings 越南招聘启事中职业技能检测的实用方法
Pub Date : 2022-10-26 DOI: 10.48550/arXiv.2210.14607
Viet-Trung Tran, Hai Cao, T. Cao
. Vietnamese labor market has been under an imbalanced development. The number of university graduates is growing, but so is the unemployment rate. This situation is often caused by the lack of accurate and timely labor market information, which leads to skill miss-matches between worker supply and the actual market demands. To build a data monitoring and analytic platform for the labor market, one of the main challenges is to be able to automatically detect occupational skills from labor-related data, such as resumes and job listings. Traditional approaches rely on existing taxonomy and/or large annotated data to build Named Entity Recognition (NER) models. They are expensive and require huge manual efforts. In this paper, we propose a practical methodology for skill detection in Vietnamese job listings. Rather than viewing the task as a NER task, we consider the task as a ranking problem. We propose a pipeline in which phrases are first extracted and ranked in semantic similarity with the phrases’ contexts. Then we employ a final classification to detect skill phrases. We collected three datasets and conducted extensive experiments. The results demonstrated that our methodology achieved better performance than a NER model in scarce datasets.
. 越南劳动力市场一直处于不平衡的发展状态。大学毕业生的数量在增长,但失业率也在增长。这种情况往往是由于缺乏准确及时的劳动力市场信息,导致工人供给与实际市场需求之间的技能不匹配。要为劳动力市场建立一个数据监测和分析平台,主要挑战之一是能够从与劳动相关的数据(如简历和工作列表)中自动检测职业技能。传统方法依赖于现有的分类法和/或大型注释数据来构建命名实体识别(NER)模型。它们是昂贵的,需要大量的人工劳动。在本文中,我们提出了一种实用的方法来检测越南工作清单中的技能。而不是将任务视为NER任务,我们将任务视为排序问题。我们提出了一个管道,其中首先提取短语并根据短语上下文的语义相似度进行排名。然后我们使用最后的分类来检测技能短语。我们收集了三个数据集,并进行了广泛的实验。结果表明,我们的方法在稀缺数据集上取得了比NER模型更好的性能。
{"title":"A practical method for occupational skills detection in Vietnamese job listings","authors":"Viet-Trung Tran, Hai Cao, T. Cao","doi":"10.48550/arXiv.2210.14607","DOIUrl":"https://doi.org/10.48550/arXiv.2210.14607","url":null,"abstract":". Vietnamese labor market has been under an imbalanced development. The number of university graduates is growing, but so is the unemployment rate. This situation is often caused by the lack of accurate and timely labor market information, which leads to skill miss-matches between worker supply and the actual market demands. To build a data monitoring and analytic platform for the labor market, one of the main challenges is to be able to automatically detect occupational skills from labor-related data, such as resumes and job listings. Traditional approaches rely on existing taxonomy and/or large annotated data to build Named Entity Recognition (NER) models. They are expensive and require huge manual efforts. In this paper, we propose a practical methodology for skill detection in Vietnamese job listings. Rather than viewing the task as a NER task, we consider the task as a ranking problem. We propose a pipeline in which phrases are first extracted and ranked in semantic similarity with the phrases’ contexts. Then we employ a final classification to detect skill phrases. We collected three datasets and conducted extensive experiments. The results demonstrated that our methodology achieved better performance than a NER model in scarce datasets.","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115636890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keyword Extraction from Short Texts with~a~Text-To-Text Transfer Transformer 基于文本到文本传输转换器的短文本关键字提取
Pub Date : 2022-09-28 DOI: 10.48550/arXiv.2209.14008
Piotr Pęzik, Agnieszka Mikolajczyk-Barela, Adam Wawrzynski, Bartlomiej Niton, M. Ogrodniczuk
The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish (plT5) to the task of intrinsic and extrinsic keyword extraction from short text passages. The evaluation is carried out on the new Polish Open Science Metadata Corpus (POSMAC), which is released with this paper: a collection of 216,214 abstracts of scientific publications compiled in the CURLICAT project. We compare the results obtained by four different methods, i.e. plT5kw, extremeText, TermoPL, KeyBERT and conclude that the plT5kw model yields particularly promising results for both frequent and sparsely represented keywords. Furthermore, a plT5kw keyword generation model trained on the POSMAC also seems to produce highly useful results in cross-domain text labelling scenarios. We discuss the performance of the model on news stories and phone-based dialog transcripts which represent text genres and domains extrinsic to the dataset of scientific abstracts. Finally, we also attempt to characterize the challenges of evaluating a text-to-text model on both intrinsic and extrinsic keyword extraction.
本文探讨了波兰语文本到文本转换语言模型(T5)与从短文本段落中提取内在和外在关键字的相关性。评估是在新的波兰开放科学元数据语料库(POSMAC)上进行的,该语料库与本文一起发布:在CURLICAT项目中汇编的科学出版物摘要的216,214篇。我们比较了四种不同方法的结果,即plT5kw、extremeText、TermoPL、KeyBERT,并得出结论,plT5kw模型对频繁和稀疏表示的关键词都产生了特别有希望的结果。此外,在POSMAC上训练的plT5kw关键字生成模型似乎也在跨域文本标记场景中产生了非常有用的结果。我们讨论了该模型在新闻故事和基于电话的对话文本上的性能,这些对话文本代表了科学摘要数据集外部的文本类型和域。最后,我们还试图描述在内在和外在关键字提取上评估文本到文本模型的挑战。
{"title":"Keyword Extraction from Short Texts with~a~Text-To-Text Transfer Transformer","authors":"Piotr Pęzik, Agnieszka Mikolajczyk-Barela, Adam Wawrzynski, Bartlomiej Niton, M. Ogrodniczuk","doi":"10.48550/arXiv.2209.14008","DOIUrl":"https://doi.org/10.48550/arXiv.2209.14008","url":null,"abstract":"The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish (plT5) to the task of intrinsic and extrinsic keyword extraction from short text passages. The evaluation is carried out on the new Polish Open Science Metadata Corpus (POSMAC), which is released with this paper: a collection of 216,214 abstracts of scientific publications compiled in the CURLICAT project. We compare the results obtained by four different methods, i.e. plT5kw, extremeText, TermoPL, KeyBERT and conclude that the plT5kw model yields particularly promising results for both frequent and sparsely represented keywords. Furthermore, a plT5kw keyword generation model trained on the POSMAC also seems to produce highly useful results in cross-domain text labelling scenarios. We discuss the performance of the model on news stories and phone-based dialog transcripts which represent text genres and domains extrinsic to the dataset of scientific abstracts. Finally, we also attempt to characterize the challenges of evaluating a text-to-text model on both intrinsic and extrinsic keyword extraction.","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131601196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Image-based Contextual Pill Recognition with Medical Knowledge Graph Assistance 医学知识图谱辅助下基于图像的语境药丸识别
Pub Date : 2022-08-04 DOI: 10.48550/arXiv.2208.02432
Anh Duy Nguyen, Thuy-Dung Nguyen, H. Pham, T. Nguyen, Phi-Le Nguyen
Identifying pills given their captured images under various conditions and backgrounds has been becoming more and more essential. Several efforts have been devoted to utilizing the deep learning-based approach to tackle the pill recognition problem in the literature. However, due to the high similarity between pills' appearance, misrecognition often occurs, leaving pill recognition a challenge. To this end, in this paper, we introduce a novel approach named PIKA that leverages external knowledge to enhance pill recognition accuracy. Specifically, we address a practical scenario (which we call contextual pill recognition), aiming to identify pills in a picture of a patient's pill intake. Firstly, we propose a novel method for modeling the implicit association between pills in the presence of an external data source, in this case, prescriptions. Secondly, we present a walk-based graph embedding model that transforms from the graph space to vector space and extracts condensed relational features of the pills. Thirdly, a final framework is provided that leverages both image-based visual and graph-based relational features to accomplish the pill identification task. Within this framework, the visual representation of each pill is mapped to the graph embedding space, which is then used to execute attention over the graph representation, resulting in a semantically-rich context vector that aids in the final classification. To our knowledge, this is the first study to use external prescription data to establish associations between medicines and to classify them using this aiding information. The architecture of PIKA is lightweight and has the flexibility to incorporate into any recognition backbones. The experimental results show that by leveraging the external knowledge graph, PIKA can improve the recognition accuracy from 4.8% to 34.1% in terms of F1-score, compared to baselines.
根据在不同条件和背景下拍摄的图像来识别药丸已经变得越来越重要。在文献中,已经有一些研究致力于利用基于深度学习的方法来解决药丸识别问题。然而,由于药品外观高度相似,经常出现误认,给药品识别带来了挑战。为此,在本文中,我们引入了一种名为PIKA的新方法,利用外部知识来提高药丸识别的准确性。具体来说,我们解决了一个实际的场景(我们称之为上下文药丸识别),旨在识别患者服用药丸的图片中的药丸。首先,我们提出了一种新的方法,用于在存在外部数据源(在本例中为处方)的情况下对药丸之间的隐式关联进行建模。其次,我们提出了一种基于步行的图嵌入模型,该模型将图空间转换为向量空间,提取出药丸的浓缩关系特征。第三,提供了一个最终的框架,利用基于图像的视觉和基于图形的关系特征来完成药丸识别任务。在这个框架中,每个药丸的可视化表示被映射到图嵌入空间,然后使用图嵌入空间对图表示执行关注,从而产生一个语义丰富的上下文向量,有助于最终的分类。据我们所知,这是第一次使用外部处方数据来建立药物之间的关联并使用这些辅助信息对它们进行分类的研究。PIKA的体系结构是轻量级的,并且能够灵活地集成到任何识别主干中。实验结果表明,通过利用外部知识图,PIKA可以将f1分数的识别准确率从4.8%提高到34.1%。
{"title":"Image-based Contextual Pill Recognition with Medical Knowledge Graph Assistance","authors":"Anh Duy Nguyen, Thuy-Dung Nguyen, H. Pham, T. Nguyen, Phi-Le Nguyen","doi":"10.48550/arXiv.2208.02432","DOIUrl":"https://doi.org/10.48550/arXiv.2208.02432","url":null,"abstract":"Identifying pills given their captured images under various conditions and backgrounds has been becoming more and more essential. Several efforts have been devoted to utilizing the deep learning-based approach to tackle the pill recognition problem in the literature. However, due to the high similarity between pills' appearance, misrecognition often occurs, leaving pill recognition a challenge. To this end, in this paper, we introduce a novel approach named PIKA that leverages external knowledge to enhance pill recognition accuracy. Specifically, we address a practical scenario (which we call contextual pill recognition), aiming to identify pills in a picture of a patient's pill intake. Firstly, we propose a novel method for modeling the implicit association between pills in the presence of an external data source, in this case, prescriptions. Secondly, we present a walk-based graph embedding model that transforms from the graph space to vector space and extracts condensed relational features of the pills. Thirdly, a final framework is provided that leverages both image-based visual and graph-based relational features to accomplish the pill identification task. Within this framework, the visual representation of each pill is mapped to the graph embedding space, which is then used to execute attention over the graph representation, resulting in a semantically-rich context vector that aids in the final classification. To our knowledge, this is the first study to use external prescription data to establish associations between medicines and to classify them using this aiding information. The architecture of PIKA is lightweight and has the flexibility to incorporate into any recognition backbones. The experimental results show that by leveraging the external knowledge graph, PIKA can improve the recognition accuracy from 4.8% to 34.1% in terms of F1-score, compared to baselines.","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134065374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Spam Reviews on Vietnamese E-commerce Websites 检测越南电子商务网站的垃圾评论
Pub Date : 2022-07-27 DOI: 10.1007/978-3-031-21743-2_48
Co Van Dinh, Son T. Luu, A. Nguyen
{"title":"Detecting Spam Reviews on Vietnamese E-commerce Websites","authors":"Co Van Dinh, Son T. Luu, A. Nguyen","doi":"10.1007/978-3-031-21743-2_48","DOIUrl":"https://doi.org/10.1007/978-3-031-21743-2_48","url":null,"abstract":"","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127916356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Classification with Counterfactual Reasoning and Active Learning 基于反事实推理和主动学习的高效分类
Pub Date : 2022-07-25 DOI: 10.48550/arXiv.2207.12086
A. Mohammed, D. Nguyen, Bao Duong, T. Nguyen
Data augmentation is one of the most successful techniques to improve the classification accuracy of machine learning models in computer vision. However, applying data augmentation to tabular data is a challenging problem since it is hard to generate synthetic samples with labels. In this paper, we propose an efficient classifier with a novel data augmentation technique for tabular data. Our method called CCRAL combines causal reasoning to learn counterfactual samples for the original training samples and active learning to select useful counterfactual samples based on a region of uncertainty. By doing this, our method can maximize our model's generalization on the unseen testing data. We validate our method analytically, and compare with the standard baselines. Our experimental results highlight that CCRAL achieves significantly better performance than those of the baselines across several real-world tabular datasets in terms of accuracy and AUC. Data and source code are available at: https://github.com/nphdang/CCRAL.
数据增强是计算机视觉中提高机器学习模型分类精度最成功的技术之一。然而,将数据增强应用于表格数据是一个具有挑战性的问题,因为很难生成带有标签的合成样本。在本文中,我们提出了一个有效的分类器与新的数据增强技术的表格数据。我们的CCRAL方法结合了因果推理来学习原始训练样本的反事实样本和基于不确定区域的主动学习来选择有用的反事实样本。通过这样做,我们的方法可以最大化我们的模型对未知测试数据的泛化。我们分析验证了我们的方法,并与标准基线进行了比较。我们的实验结果表明,在精度和AUC方面,CCRAL在几个真实表格数据集上取得了明显优于基线的性能。数据和源代码可从https://github.com/nphdang/CCRAL获得。
{"title":"Efficient Classification with Counterfactual Reasoning and Active Learning","authors":"A. Mohammed, D. Nguyen, Bao Duong, T. Nguyen","doi":"10.48550/arXiv.2207.12086","DOIUrl":"https://doi.org/10.48550/arXiv.2207.12086","url":null,"abstract":"Data augmentation is one of the most successful techniques to improve the classification accuracy of machine learning models in computer vision. However, applying data augmentation to tabular data is a challenging problem since it is hard to generate synthetic samples with labels. In this paper, we propose an efficient classifier with a novel data augmentation technique for tabular data. Our method called CCRAL combines causal reasoning to learn counterfactual samples for the original training samples and active learning to select useful counterfactual samples based on a region of uncertainty. By doing this, our method can maximize our model's generalization on the unseen testing data. We validate our method analytically, and compare with the standard baselines. Our experimental results highlight that CCRAL achieves significantly better performance than those of the baselines across several real-world tabular datasets in terms of accuracy and AUC. Data and source code are available at: https://github.com/nphdang/CCRAL.","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127910007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Neural Network Training Method for Autonomous Driving Using Semi-Pseudo-Labels and 3D Data Augmentations 基于半伪标签和三维数据增强的自动驾驶神经网络训练方法
Pub Date : 2022-07-20 DOI: 10.1007/978-3-031-21967-2_18
Tamás Matuszka, Dániel Kozma
{"title":"A Novel Neural Network Training Method for Autonomous Driving Using Semi-Pseudo-Labels and 3D Data Augmentations","authors":"Tamás Matuszka, Dániel Kozma","doi":"10.1007/978-3-031-21967-2_18","DOIUrl":"https://doi.org/10.1007/978-3-031-21967-2_18","url":null,"abstract":"","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122982317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Utility Driven Job Selection Problem on Road Networks 公路网上效用驱动的工作选择问题
Pub Date : 2022-07-16 DOI: 10.48550/arXiv.2207.07831
Mayank Singhal, Suman Banerjee
In this paper, we study the problem of textsc{Utility Driven Job Selection} on Road Networks for which the inputs are: a road network with the vertices as the set of Point-Of-Interests (Henceforth mentioned as POI) and the edges are road segments joining the POIs, a set of jobs with their originating POI, starting time, duration, and the utility. A worker can earn the utility associated with the job if (s)he performs this. As the jobs are originating at different POIs, the worker has to move from one POI to the other one to take up the job. Some budget is available for this purpose. Any two jobs can be taken up by the worker only if the finishing time of the first job plus traveling time from the POI of the first job to the second one should be less than or equal to the starting time of the second job. We call this constraint as the temporal constraint. The goal of this problem is to choose a subset of the jobs to maximize the earned utility such that the budget and temporal constraints should not be violated. We present two solution approaches with detailed analysis. First one of them works based on finding the locally optimal job at the end of every job and we call this approach as the emph{Best First Search Approach}. The other approach is based on the Nearest Neighbor Search on road networks. We perform a set of experiments with realmbox{-}world trajectory datasets to demonstrate the efficiency and effectiveness of the proposed solution approaches. We observe that the proposed approaches lead to more utility compared to baseline methods.
在本文中,我们研究了道路网络上textsc{效用驱动的工作选择}问题,其输入是:一个以顶点为兴趣点集(以下称为POI)的道路网络,边缘是连接POI的道路段,一组具有原始POI,开始时间,持续时间和效用的作业。如果一个工人完成了这项工作,他就可以获得与这项工作相关的效用。由于作业源自不同的POI,工人必须从一个POI移动到另一个POI才能完成作业。为此目的可以得到一些预算。任何两份工作,只要第一份工作的完成时间加上从第一份工作地点到第二份工作地点的旅行时间小于或等于第二份工作的开始时间,工人就可以担任。我们称这种约束为时间约束。此问题的目标是选择工作的一个子集,以最大化获得的效用,从而不违反预算和时间约束。我们提出了两种解决方案,并进行了详细的分析。第一种方法是在每个工作结束时找到局部最优的工作,我们称这种方法为emph{最佳优先搜索方法}。另一种方法是基于路网上的最近邻搜索。我们用真实的mbox{-}世界轨迹数据集进行了一组实验,以证明所提出的解决方案的效率和有效性。我们观察到,与基线方法相比,所提出的方法更实用。
{"title":"Utility Driven Job Selection Problem on Road Networks","authors":"Mayank Singhal, Suman Banerjee","doi":"10.48550/arXiv.2207.07831","DOIUrl":"https://doi.org/10.48550/arXiv.2207.07831","url":null,"abstract":"In this paper, we study the problem of textsc{Utility Driven Job Selection} on Road Networks for which the inputs are: a road network with the vertices as the set of Point-Of-Interests (Henceforth mentioned as POI) and the edges are road segments joining the POIs, a set of jobs with their originating POI, starting time, duration, and the utility. A worker can earn the utility associated with the job if (s)he performs this. As the jobs are originating at different POIs, the worker has to move from one POI to the other one to take up the job. Some budget is available for this purpose. Any two jobs can be taken up by the worker only if the finishing time of the first job plus traveling time from the POI of the first job to the second one should be less than or equal to the starting time of the second job. We call this constraint as the temporal constraint. The goal of this problem is to choose a subset of the jobs to maximize the earned utility such that the budget and temporal constraints should not be violated. We present two solution approaches with detailed analysis. First one of them works based on finding the locally optimal job at the end of every job and we call this approach as the emph{Best First Search Approach}. The other approach is based on the Nearest Neighbor Search on road networks. We perform a set of experiments with realmbox{-}world trajectory datasets to demonstrate the efficiency and effectiveness of the proposed solution approaches. We observe that the proposed approaches lead to more utility compared to baseline methods.","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124936171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimCPSR: Simple Contrastive Learning for Paper Submission Recommendation System 论文提交推荐系统的简单对比学习
Pub Date : 2022-05-12 DOI: 10.48550/arXiv.2205.05940
Duc H. Le, T. T. Doan, S. T. Huynh, Binh T. Nguyen
. The recommendation system plays a vital role in many areas, especially academic fields, to support researchers in submitting and increasing the acceptance of their work through the conference or journal selection process. This study proposes a transformer-based model using transfer learning as an efficient approach for the paper submission recommendation system. By combining essential information (such as the title, the abstract, and the list of keywords) with the aims & scopes of journals, the model can recommend the Top K journals that maximize the acceptance of the paper. Our model had developed through two states: (i) Fine-tuning the pre-trained language model (LM) with a simple contrastive learning framework. We utilized a simple supervised contrastive objective to fine-tune all parameters, encouraging the LM to learn the document representation effectively. (ii) The fine-tuned LM was then trained on different combinations of the features for the downstream task. This study suggests a more advanced method for enhancing the efficiency of the paper submission recommendation system compared to previous approaches when we respectively achieve 0.5173, 0.8097, 0.8862, 0.9496 for Top 1, 3, 5, 10 accuracies on the test set for combining the title, abstract, and keywords as input features. Incorpo-rating the journals’ aims and scopes, our model shows an exciting result by getting 0.5194, 0.8112, 0.8866, 0.9496 respective to Top 1, 3, 5, and 10. We provide the implementation and datasets for further reference at https://github.com/hduc-le/SimCPSR .
. 推荐系统在许多领域,特别是学术领域发挥着至关重要的作用,支持研究人员通过会议或期刊的选择过程提交和提高他们的工作被接受。本研究提出一种基于转换的模型,利用迁移学习作为论文提交推荐系统的有效方法。通过将基本信息(如标题、摘要和关键词列表)与期刊的目标和范围相结合,该模型可以推荐最大限度地提高论文接受度的Top K期刊。我们的模型经历了两个阶段的发展:(i)用一个简单的对比学习框架对预训练语言模型(LM)进行微调。我们使用一个简单的监督对比目标来微调所有参数,鼓励LM有效地学习文档表示。(ii)然后在下游任务的不同特征组合上训练微调的LM。结合题目、摘要和关键词作为输入特征,在测试集上Top 1、Top 3、Top 5、Top 10的准确率分别达到0.5173、0.8097、0.8862、0.9496,与之前的方法相比,本文提出了一种更高级的方法来提高论文提交推荐系统的效率。结合期刊的目标和范围,我们的模型显示了一个令人兴奋的结果,分别得到0.5194,0.8112,0.8866,0.9496的前1,3,5和10。我们在https://github.com/hduc-le/SimCPSR上提供了进一步参考的实现和数据集。
{"title":"SimCPSR: Simple Contrastive Learning for Paper Submission Recommendation System","authors":"Duc H. Le, T. T. Doan, S. T. Huynh, Binh T. Nguyen","doi":"10.48550/arXiv.2205.05940","DOIUrl":"https://doi.org/10.48550/arXiv.2205.05940","url":null,"abstract":". The recommendation system plays a vital role in many areas, especially academic fields, to support researchers in submitting and increasing the acceptance of their work through the conference or journal selection process. This study proposes a transformer-based model using transfer learning as an efficient approach for the paper submission recommendation system. By combining essential information (such as the title, the abstract, and the list of keywords) with the aims & scopes of journals, the model can recommend the Top K journals that maximize the acceptance of the paper. Our model had developed through two states: (i) Fine-tuning the pre-trained language model (LM) with a simple contrastive learning framework. We utilized a simple supervised contrastive objective to fine-tune all parameters, encouraging the LM to learn the document representation effectively. (ii) The fine-tuned LM was then trained on different combinations of the features for the downstream task. This study suggests a more advanced method for enhancing the efficiency of the paper submission recommendation system compared to previous approaches when we respectively achieve 0.5173, 0.8097, 0.8862, 0.9496 for Top 1, 3, 5, 10 accuracies on the test set for combining the title, abstract, and keywords as input features. Incorpo-rating the journals’ aims and scopes, our model shows an exciting result by getting 0.5194, 0.8112, 0.8866, 0.9496 respective to Top 1, 3, 5, and 10. We provide the implementation and datasets for further reference at https://github.com/hduc-le/SimCPSR .","PeriodicalId":397879,"journal":{"name":"Asian Conference on Intelligent Information and Database Systems","volume":"551 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122117761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Asian Conference on Intelligent Information and Database Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1