首页 > 最新文献

Computer Journal最新文献

英文 中文
A Unified Framework to Discover Permutation Generation Algorithms 发现排列生成算法的统一框架
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab181
Pramod Ganapathi;Rezaul Chowdhury
We present two simple, intuitive and general algorithmic frameworks that can be used to design a wide variety of permutation generation algorithms. The frameworks can be used to produce 19 existing permutation algorithms, including the well-known algorithms of Heap, Wells, Langdon, Zaks, Tompkins and Lipski. We use the frameworks to design two new sorting-based permutation generation algorithms, one of which is optimal.
我们提出了两个简单、直观和通用的算法框架,可用于设计各种排列生成算法。这些框架可以用来产生19种现有的排列算法,包括著名的Heap、Wells、Langdon、Zaks、Tompkins和Lipski算法。我们使用这些框架设计了两种新的基于排序的排列生成算法,其中一种是最优的。
{"title":"A Unified Framework to Discover Permutation Generation Algorithms","authors":"Pramod Ganapathi;Rezaul Chowdhury","doi":"10.1093/comjnl/bxab181","DOIUrl":"https://doi.org/10.1093/comjnl/bxab181","url":null,"abstract":"We present two simple, intuitive and general algorithmic frameworks that can be used to design a wide variety of permutation generation algorithms. The frameworks can be used to produce 19 existing permutation algorithms, including the well-known algorithms of Heap, Wells, Langdon, Zaks, Tompkins and Lipski. We use the frameworks to design two new sorting-based permutation generation algorithms, one of which is optimal.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"603-614"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49977703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Hybrid Strategy Improved Whale Optimization Algorithm for Web Service Composition Web服务组合的混合策略改进鲸鱼优化算法
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab187
Chuanxiang Ju;Hangqi Ding;Benjia Hu
With the rapid growth of the number of web services on the Internet, various service providers provide many similar services with the same function but different quality of service (QoS) attributes. It is a key problem to be solved urgently to select the service composition quickly, meeting the users’ QoS requirements from many candidate services. Optimization of web service composition is an NP-hard issue and intelligent optimization algorithms have become the mainstream method to solve this complex problem. This paper proposed a hybrid strategy improved whale optimization algorithm, which is based on the concepts of chaos initialization, nonlinear convergence factor and mutation. By maintaining a balance between exploration and exploitation, the problem of slow or early convergence is overcome to a certain extent. To evaluate its performance more accurately, the proposed algorithm was first tested on a set of standard benchmarks. After, simulations were performed using the real quality of web service dataset. Experimental results show that the proposed algorithm is better than the original version and other meta-heuristic algorithms on average, as well as verifies the feasibility and stability of web service composition optimization.
随着互联网上网络服务数量的快速增长,各种服务提供商提供了许多具有相同功能但不同服务质量(QoS)属性的类似服务。快速选择服务组合,从众多候选服务中满足用户的QoS要求,是一个亟待解决的关键问题。web服务组合优化是一个NP难题,智能优化算法已成为解决这一复杂问题的主流方法。基于混沌初始化、非线性收敛因子和变异的概念,提出了一种混合策略改进的whale优化算法。通过在勘探和开发之间保持平衡,在一定程度上克服了缓慢或早期收敛的问题。为了更准确地评估其性能,该算法首先在一组标准基准上进行了测试。之后,使用真实质量的web服务数据集进行了仿真。实验结果表明,该算法平均优于原始版本和其他元启发式算法,验证了web服务组合优化的可行性和稳定性。
{"title":"A Hybrid Strategy Improved Whale Optimization Algorithm for Web Service Composition","authors":"Chuanxiang Ju;Hangqi Ding;Benjia Hu","doi":"10.1093/comjnl/bxab187","DOIUrl":"https://doi.org/10.1093/comjnl/bxab187","url":null,"abstract":"With the rapid growth of the number of web services on the Internet, various service providers provide many similar services with the same function but different quality of service (QoS) attributes. It is a key problem to be solved urgently to select the service composition quickly, meeting the users’ QoS requirements from many candidate services. Optimization of web service composition is an NP-hard issue and intelligent optimization algorithms have become the mainstream method to solve this complex problem. This paper proposed a hybrid strategy improved whale optimization algorithm, which is based on the concepts of chaos initialization, nonlinear convergence factor and mutation. By maintaining a balance between exploration and exploitation, the problem of slow or early convergence is overcome to a certain extent. To evaluate its performance more accurately, the proposed algorithm was first tested on a set of standard benchmarks. After, simulations were performed using the real quality of web service dataset. Experimental results show that the proposed algorithm is better than the original version and other meta-heuristic algorithms on average, as well as verifies the feasibility and stability of web service composition optimization.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"662-677"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49977704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Pattern Matching Model for Recognition of Stone Inscription Characters 石刻文字识别的模式匹配模型
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab177
K Durga Devi;P Uma Maheswari;Phani Kumar Polasi;R Preetha;M Vidhyalakshmi
As there are countless significant works done for handwritten character recognition, very meager effort has been reported for inscription characters especially for Tamil stone inscriptions. The real challenge faced in handling stone inscription is dataset collection and foreground and background discrimination. Till present days, the archeological department follows traditional way of capturing, preserving and deciphering stone inscriptions which is manual, more time consuming and need expert assistance. Hence digitized recognition is essential and efficient pattern matching algorithm is needed to be developed to deal with variations in shape and size of complex structured characters present in Tamil stone inscriptions. In this paper, an automated character recognition by pattern matching approach is developed, where character features were extracted by using pattern matching algorithm that helps achieving good recognition rate. Recognition of ancient Tamil stone inscriptions characters and finding their corresponding contemporary Tamil character is done by Image-based Character Pattern Identification (ICPI) system. Modified Speeded Up Robust Feature with Bag of Grapheme (MSURF-BoG) algorithm is implemented to detect the strongest key points from the input character with different orientations. These key point features were created for training the image as a model called Bag of Grapheme (BoG) with code word creation. Hence unsupervised key point features were extracted and pattern matching is performed. 11th century Tamil stone inscriptions were taken as samples which has 7 vowels and 17 consonants, totally 24 characters were used. Here samples with different orientation from each 24 character were used for training the system. The proposed system is evaluated by recognition accuracy which is reported for character wise at the maximum of 96%.
由于有无数重要的工作,手写的字符识别,非常微薄的努力已被报道的铭文字符,特别是泰米尔石刻铭文。石刻处理面临的真正挑战是数据集的收集和前景背景的区分。迄今为止,考古部门仍采用传统的采集、保存和破译石刻的方法,这种方法既费时又费力,还需要专家的协助。因此,数字化识别是必要的,需要开发有效的模式匹配算法来处理泰米尔石刻中存在的形状和大小变化的复杂结构字符。本文提出了一种基于模式匹配的字符自动识别方法,利用模式匹配算法提取字符特征,从而达到较好的识别率。采用基于图像的字符模式识别(ICPI)系统对古代泰米尔石刻文字进行识别,并找到与之对应的当代泰米尔文字。采用改进的加速稳健特征与Grapheme Bag (MSURF-BoG)算法,从不同方向的输入字符中检测出最强的关键点。创建这些关键点特征是为了将图像训练为一个称为Grapheme Bag (BoG)的模型,并使用码字创建。在此基础上提取无监督的关键点特征并进行模式匹配。以11世纪泰米尔石刻碑文为例,有7个元音和17个辅音,共使用了24个字。在这里,使用来自每24个字符的不同方向的样本来训练系统。该系统的识别准确率最高可达96%。
{"title":"Pattern Matching Model for Recognition of Stone Inscription Characters","authors":"K Durga Devi;P Uma Maheswari;Phani Kumar Polasi;R Preetha;M Vidhyalakshmi","doi":"10.1093/comjnl/bxab177","DOIUrl":"https://doi.org/10.1093/comjnl/bxab177","url":null,"abstract":"As there are countless significant works done for handwritten character recognition, very meager effort has been reported for inscription characters especially for Tamil stone inscriptions. The real challenge faced in handling stone inscription is dataset collection and foreground and background discrimination. Till present days, the archeological department follows traditional way of capturing, preserving and deciphering stone inscriptions which is manual, more time consuming and need expert assistance. Hence digitized recognition is essential and efficient pattern matching algorithm is needed to be developed to deal with variations in shape and size of complex structured characters present in Tamil stone inscriptions. In this paper, an automated character recognition by pattern matching approach is developed, where character features were extracted by using pattern matching algorithm that helps achieving good recognition rate. Recognition of ancient Tamil stone inscriptions characters and finding their corresponding contemporary Tamil character is done by Image-based Character Pattern Identification (ICPI) system. Modified Speeded Up Robust Feature with Bag of Grapheme (MSURF-BoG) algorithm is implemented to detect the strongest key points from the input character with different orientations. These key point features were created for training the image as a model called Bag of Grapheme (BoG) with code word creation. Hence unsupervised key point features were extracted and pattern matching is performed. 11\u0000<sup>th</sup>\u0000 century Tamil stone inscriptions were taken as samples which has 7 vowels and 17 consonants, totally 24 characters were used. Here samples with different orientation from each 24 character were used for training the system. The proposed system is evaluated by recognition accuracy which is reported for character wise at the maximum of 96%.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"554-564"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49946834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Novel Generation Method for Diverse Privacy Image Based on Machine Learning 一种基于机器学习的多种隐私图像生成方法
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab176
Weina Niu;Yuheng Luo;Kangyi Ding;Xiaosong Zhang;Yanping Wang;Beibei Li
In recent years, deep neural networks have been extensively applied in various fields, and face recognition is one of the most important applications. Artificial intelligence has reached or even surpassed human capabilities in many fields. However, while artificial intelligence application provides convenience to the human lives, it also leads to the risk of privacy leaking. At present, the privacy protection technology for human faces has received extensive attention. Research goals of face privacy protection technology mainly include providing face anonymization and data availability protection. Existing methods usually have insufficient anonymity and they are not easy to control the degree of image distortion, which makes it difficult to achieve the purpose of privacy protection. Moreover, they do not explicitly perform diversity preservation of attributes such as emotions, expressions and ethnicities, so they cannot perform data analysis tasks on non-identity attributes. This paper proposes a diverse privacy face image generation algorithm based on machine learning, called DIVFGEN. This algorithm comprehensively considers image distortion, identity mapping distance loss and emotion classification loss; transforms the privacy protection target into the problem of generating adversarial examples based on the recognition model; and uses an adaptive optimization algorithm to generate anonymity and diversity of privacy images. The experimental results show that on the Cohn-Kanade+ dataset, our algorithm can reduce the probability of facial recognition by the neural network when it accurately classifies sentiment, from 98.6% to 4.8%.
近年来,深度神经网络在各个领域得到了广泛的应用,人脸识别是其中最重要的应用之一。人工智能在许多领域已经达到甚至超越了人类的能力。然而,人工智能的应用在为人类生活提供便利的同时,也带来了隐私泄露的风险。目前,人脸隐私保护技术受到了广泛关注。人脸隐私保护技术的研究目标主要包括提供人脸匿名化和数据可用性保护。现有的方法通常匿名性不足,且不易控制图像失真程度,难以达到保护隐私的目的。此外,它们没有明确地执行情感、表情和种族等属性的多样性保存,因此无法执行非身份属性的数据分析任务。本文提出了一种基于机器学习的多样化隐私人脸图像生成算法,称为DIVFGEN。该算法综合考虑了图像失真、身份映射距离损失和情感分类损失;将隐私保护目标转化为基于识别模型的对抗样例生成问题;并采用自适应优化算法生成隐私图像的匿名性和多样性。实验结果表明,在Cohn-Kanade+数据集上,我们的算法可以将神经网络对情感进行准确分类时面部识别的概率从98.6%降低到4.8%。
{"title":"A Novel Generation Method for Diverse Privacy Image Based on Machine Learning","authors":"Weina Niu;Yuheng Luo;Kangyi Ding;Xiaosong Zhang;Yanping Wang;Beibei Li","doi":"10.1093/comjnl/bxab176","DOIUrl":"https://doi.org/10.1093/comjnl/bxab176","url":null,"abstract":"In recent years, deep neural networks have been extensively applied in various fields, and face recognition is one of the most important applications. Artificial intelligence has reached or even surpassed human capabilities in many fields. However, while artificial intelligence application provides convenience to the human lives, it also leads to the risk of privacy leaking. At present, the privacy protection technology for human faces has received extensive attention. Research goals of face privacy protection technology mainly include providing face anonymization and data availability protection. Existing methods usually have insufficient anonymity and they are not easy to control the degree of image distortion, which makes it difficult to achieve the purpose of privacy protection. Moreover, they do not explicitly perform diversity preservation of attributes such as emotions, expressions and ethnicities, so they cannot perform data analysis tasks on non-identity attributes. This paper proposes a diverse privacy face image generation algorithm based on machine learning, called DIVFGEN. This algorithm comprehensively considers image distortion, identity mapping distance loss and emotion classification loss; transforms the privacy protection target into the problem of generating adversarial examples based on the recognition model; and uses an adaptive optimization algorithm to generate anonymity and diversity of privacy images. The experimental results show that on the Cohn-Kanade+ dataset, our algorithm can reduce the probability of facial recognition by the neural network when it accurately classifies sentiment, from 98.6% to 4.8%.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"540-553"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49946835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Caviar-Sunflower Optimization Algorithm-Based Deep Learning Classifier for Multi-Document Summarization 基于鱼子酱-向日葵优化算法的多文档摘要深度学习分类器
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab193
Sheela J;Janet B
This paper proposes a multi-document summarization model using an optimization algorithm named CAVIAR Sun Flower Optimization (CAV-SFO). In this method, two classifiers, namely: Generative Adversarial Network (GAN) classifier and Deep Recurrent Neural Network (Deep RNN), are utilized to generate a score for summarizing multi-documents. Initially, the simHash method is applied for removing the duplicate/real duplicate contents from sentences. Then, the result is given to the proposed CAV-SFO based GAN classifier to determine the score for individual sentences. The CAV-SFO is newly designed by incorporating CAVIAR with Sun Flower Optimization Algorithm (SFO). On the other hand, the pre-processing step is done for duplicate-removed sentences from input multi-document based on stop word removal and stemming. Afterward, text-based features are extracted from pre-processed documents, and then CAV-SFO based Deep RNN is introduced for generating a score; thereby, the internal model parameters are optimally tuned. Finally, the score generated by CAV-SFO based GAN and CAV-SFO based Deep RNN is hybridized, and the final score is obtained using a multi-document compression ratio. The proposed TaylorALO-based GAN showed improved results with maximal precision of 0.989, maximal recall of 0.986, maximal F-Measure of 0.823, maximal Rouge-Precision of 0.930, and maximal Rouge-recall of 0.870.
本文提出了一种基于CAVIAR太阳花优化算法的多文档摘要模型。在该方法中,使用两个分类器,即生成对抗性网络(GAN)分类器和深度递归神经网络(Deep RNN),来生成用于汇总多文档的分数。最初,simHash方法用于去除句子中的重复/真实重复内容。然后,将结果提供给所提出的基于CAV-SFO的GAN分类器,以确定单个句子的分数。CAV-SFO是将CAVIAR与太阳花优化算法(SFO)相结合而新设计的。另一方面,基于停止词去除和词干处理,对输入的多文档中重复去除的句子进行预处理。然后,从预处理的文档中提取基于文本的特征,然后引入基于CAV-SFO的Deep RNN来生成分数;从而优化了内部模型参数。最后,将基于CAV-SFO的GAN和基于CAV-SVO的Deep RNN生成的分数进行混合,并使用多文档压缩比获得最终分数。所提出的基于TaylorALO的GAN显示出改进的结果,最大精度为0.989,最大召回率为0.986,最大F-Measure为0.823,最大Rouge精度为0.930,最大Rough召回率为0.870。
{"title":"Caviar-Sunflower Optimization Algorithm-Based Deep Learning Classifier for Multi-Document Summarization","authors":"Sheela J;Janet B","doi":"10.1093/comjnl/bxab193","DOIUrl":"https://doi.org/10.1093/comjnl/bxab193","url":null,"abstract":"This paper proposes a multi-document summarization model using an optimization algorithm named CAVIAR Sun Flower Optimization (CAV-SFO). In this method, two classifiers, namely: Generative Adversarial Network (GAN) classifier and Deep Recurrent Neural Network (Deep RNN), are utilized to generate a score for summarizing multi-documents. Initially, the simHash method is applied for removing the duplicate/real duplicate contents from sentences. Then, the result is given to the proposed CAV-SFO based GAN classifier to determine the score for individual sentences. The CAV-SFO is newly designed by incorporating CAVIAR with Sun Flower Optimization Algorithm (SFO). On the other hand, the pre-processing step is done for duplicate-removed sentences from input multi-document based on stop word removal and stemming. Afterward, text-based features are extracted from pre-processed documents, and then CAV-SFO based Deep RNN is introduced for generating a score; thereby, the internal model parameters are optimally tuned. Finally, the score generated by CAV-SFO based GAN and CAV-SFO based Deep RNN is hybridized, and the final score is obtained using a multi-document compression ratio. The proposed TaylorALO-based GAN showed improved results with maximal precision of 0.989, maximal recall of 0.986, maximal F-Measure of 0.823, maximal Rouge-Precision of 0.930, and maximal Rouge-recall of 0.870.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"727-742"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49977710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Incremental Hierarchical Clustering Based System For Record Linkage In E-Commerce Domain 电子商务领域中基于增量层次聚类的记录链接系统
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab179
Furkan Gözükara;Selma Ayşe Özel
In this study, a novel record linkage system for E-commerce products is presented. Our system aims to cluster the same products that are crawled from different E-commerce websites into the same cluster. The proposed system achieves a very high success rate by combining both semi-supervised and unsupervised approaches. Unlike the previously proposed systems in the literature, neither a training set nor structured corpora are necessary. The core of the system is based on Hierarchical Agglomerative Clustering (HAC); however, the HAC algorithm is modified to be dynamic such that it can efficiently cluster a stream of incoming new data. Since the proposed system does not depend on any prior data, it can cluster new products. The system uses bag-of-words representation of the product titles, employs a single distance metric, exploits multiple domain-based attributes and does not depend on the characteristics of the natural language used in the product records. To our knowledge, there is no commonly used tool or technique to measure the quality of a clustering task. Therefore in this study, we use ELKI (Environment for Developing KDD-Applications Supported by Index-Structures), an open-source data mining software, for performance measurement of the clustering methods; and show how to use ELKI for this purpose. To evaluate our system, we collect our own dataset and make it publicly available to researchers who study E-commerce product clustering. Our proposed system achieves 96.25% F-Measure according to our experimental analysis. The other state-of-the-art clustering systems obtain the best 89.12% F-Measure.
本文提出了一种新的电子商务产品备案联动系统。我们的系统旨在将从不同电子商务网站抓取的相同产品聚集到同一个集群中。该系统结合了半监督和无监督两种方法,取得了很高的成功率。与文献中先前提出的系统不同,既不需要训练集,也不需要结构化的语料库。该系统的核心是基于层次聚类(HAC);然而,HAC算法被修改为动态的,这样它可以有效地对传入的新数据流进行聚类。由于所提出的系统不依赖于任何先前的数据,它可以聚类新产品。该系统使用词袋表示产品标题,采用单一距离度量,利用多个基于域的属性,并且不依赖于产品记录中使用的自然语言的特征。据我们所知,没有常用的工具或技术来衡量聚类任务的质量。因此,在本研究中,我们使用开源数据挖掘软件ELKI (Environment for Developing KDD-Applications Supported by Index-Structures)对聚类方法进行性能度量;并展示如何使用ELKI实现此目的。为了评估我们的系统,我们收集了自己的数据集,并将其公开提供给研究电子商务产品聚类的研究人员。通过实验分析,我们提出的系统达到了96.25%的F-Measure。其他最先进的集群系统获得了89.12%的最佳F-Measure。
{"title":"An Incremental Hierarchical Clustering Based System For Record Linkage In E-Commerce Domain","authors":"Furkan Gözükara;Selma Ayşe Özel","doi":"10.1093/comjnl/bxab179","DOIUrl":"https://doi.org/10.1093/comjnl/bxab179","url":null,"abstract":"In this study, a novel record linkage system for E-commerce products is presented. Our system aims to cluster the same products that are crawled from different E-commerce websites into the same cluster. The proposed system achieves a very high success rate by combining both semi-supervised and unsupervised approaches. Unlike the previously proposed systems in the literature, neither a training set nor structured corpora are necessary. The core of the system is based on Hierarchical Agglomerative Clustering (HAC); however, the HAC algorithm is modified to be dynamic such that it can efficiently cluster a stream of incoming new data. Since the proposed system does not depend on any prior data, it can cluster new products. The system uses bag-of-words representation of the product titles, employs a single distance metric, exploits multiple domain-based attributes and does not depend on the characteristics of the natural language used in the product records. To our knowledge, there is no commonly used tool or technique to measure the quality of a clustering task. Therefore in this study, we use ELKI (Environment for Developing KDD-Applications Supported by Index-Structures), an open-source data mining software, for performance measurement of the clustering methods; and show how to use ELKI for this purpose. To evaluate our system, we collect our own dataset and make it publicly available to researchers who study E-commerce product clustering. Our proposed system achieves 96.25% F-Measure according to our experimental analysis. The other state-of-the-art clustering systems obtain the best 89.12% F-Measure.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"581-602"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49946837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Parameter Server Placement for Distributed Deep Learning in Edge Computing 边缘计算中分布式深度学习的高效参数服务器配置
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab188
Yalan Wu;Jiaquan Yan;Long Chen;Jigang Wu;Yidong Li
Parameter servers (PSs) placement is one of the most important factors for global model training on distributed deep learning. This paper formulates a novel problem for placement strategy of PSs in the dynamic available storage capacity, with the objective of minimizing the training time of the distributed deep learning under the constraints of storage capacity and the number of local PSs. Then, we provide the proof for the NP-hardness of the proposed problem. The whole training epochs are divided into two parts, i.e. the first epoch and the other epochs. For the first epoch, an approximation algorithm and a rounding algorithm are proposed in this paper, to solve the proposed problem. For the other epochs, an adjustment algorithm is proposed, by continuously adjusting the decisions for placement strategy of PSs to decrease the training time of the global model. Simulation results show that the proposed approximation algorithm and rounding algorithm perform better than existing works for all cases, in terms of the training time of global model. Meanwhile, the training time of global model for the proposed approximation algorithm is very close to that for optimal solution generated by the brute-force approach for all cases. Besides, the integrated algorithm outperforms the existing works when the available storage capacity varies during the training.
参数服务器(PS)的位置是分布式深度学习全局模型训练的最重要因素之一。本文提出了一个新的PSs在动态可用存储容量中的放置策略问题,目的是在存储容量和局部PSs数量的约束下最小化分布式深度学习的训练时间。然后,我们为所提出的问题的NP硬度提供了证明。整个训练时期分为两个部分,即第一个时期和其他时期。对于第一个历元,本文提出了一种近似算法和舍入算法来解决所提出的问题。对于其他时期,提出了一种调整算法,通过连续调整PS的放置策略决策来减少全局模型的训练时间。仿真结果表明,在全局模型的训练时间方面,所提出的近似算法和舍入算法在所有情况下都优于现有算法。同时,对于所有情况,所提出的近似算法的全局模型的训练时间都非常接近于暴力方法生成的最优解的训练时间。此外,当训练过程中可用存储容量发生变化时,集成算法的性能优于现有算法。
{"title":"Efficient Parameter Server Placement for Distributed Deep Learning in Edge Computing","authors":"Yalan Wu;Jiaquan Yan;Long Chen;Jigang Wu;Yidong Li","doi":"10.1093/comjnl/bxab188","DOIUrl":"https://doi.org/10.1093/comjnl/bxab188","url":null,"abstract":"Parameter servers (PSs) placement is one of the most important factors for global model training on distributed deep learning. This paper formulates a novel problem for placement strategy of PSs in the dynamic available storage capacity, with the objective of minimizing the training time of the distributed deep learning under the constraints of storage capacity and the number of local PSs. Then, we provide the proof for the NP-hardness of the proposed problem. The whole training epochs are divided into two parts, i.e. the first epoch and the other epochs. For the first epoch, an approximation algorithm and a rounding algorithm are proposed in this paper, to solve the proposed problem. For the other epochs, an adjustment algorithm is proposed, by continuously adjusting the decisions for placement strategy of PSs to decrease the training time of the global model. Simulation results show that the proposed approximation algorithm and rounding algorithm perform better than existing works for all cases, in terms of the training time of global model. Meanwhile, the training time of global model for the proposed approximation algorithm is very close to that for optimal solution generated by the brute-force approach for all cases. Besides, the integrated algorithm outperforms the existing works when the available storage capacity varies during the training.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"678-691"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49977708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepSTF: A Deep Spatial–Temporal Forecast Model of Taxi Flow 出租车流量的深度时空预测模型
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab178
Zhiqiang Lv;Jianbo Li;Chuanhao Dong;Zhihao Xu
Taxi flow forecast is significant for planning transportation and allocating basic transportation resources. The flow forecast in the urban adjacent area is different from the fixed-point flow forecast. Their data are more complex and diverse, which make them more challenging to forecast. This paper introduces a deep spatial–temporal forecast (DeepSTF) model for the flow forecasting of urban adjacent area, which divides the urban into grids and makes it have a graph structure. The model builds a spatial–temporal calculation block, which uses graph convolutional network to extract spatial correlation feature and uses two-layer temporal convolutional networks to extract time-dependent feature. Based on the theory of dilation convolution and causal convolution, the model overcomes the under-fitting phenomenon of other models when calculating with rapidly changing data. In order to improve the accuracy of prediction, we take weather as an implicit factor and let it participate in the feature calculation process. A comparison experiment is set between our model and the seven existing traffic flow forecast models. The experimental results prove that the model has better the capabilities of long-term traffic prediction and performs well in various evaluation indicators.
出租车客流预测对交通规划和基础交通资源配置具有重要意义。城市邻区流量预测不同于定点流量预测。它们的数据更加复杂和多样化,这使得预测它们更具挑战性。本文介绍了一种用于城市邻区流量预测的深度时空预测(deep - temporal forecasting,简称DeepSTF)模型,该模型将城市划分为网格,使其具有图形结构。该模型构建了一个时空计算块,利用图卷积网络提取空间相关特征,利用两层时间卷积网络提取时间相关特征。该模型基于膨胀卷积和因果卷积理论,克服了其他模型在处理快速变化数据时存在的欠拟合现象。为了提高预测的准确性,我们将天气作为一个隐含因素,并让其参与特征计算过程。并与现有的7种交通流预测模型进行了对比实验。实验结果表明,该模型具有较好的长期交通预测能力,在各项评价指标中表现良好。
{"title":"DeepSTF: A Deep Spatial–Temporal Forecast Model of Taxi Flow","authors":"Zhiqiang Lv;Jianbo Li;Chuanhao Dong;Zhihao Xu","doi":"10.1093/comjnl/bxab178","DOIUrl":"https://doi.org/10.1093/comjnl/bxab178","url":null,"abstract":"Taxi flow forecast is significant for planning transportation and allocating basic transportation resources. The flow forecast in the urban adjacent area is different from the fixed-point flow forecast. Their data are more complex and diverse, which make them more challenging to forecast. This paper introduces a deep spatial–temporal forecast (DeepSTF) model for the flow forecasting of urban adjacent area, which divides the urban into grids and makes it have a graph structure. The model builds a spatial–temporal calculation block, which uses graph convolutional network to extract spatial correlation feature and uses two-layer temporal convolutional networks to extract time-dependent feature. Based on the theory of dilation convolution and causal convolution, the model overcomes the under-fitting phenomenon of other models when calculating with rapidly changing data. In order to improve the accuracy of prediction, we take weather as an implicit factor and let it participate in the feature calculation process. A comparison experiment is set between our model and the seven existing traffic flow forecast models. The experimental results prove that the model has better the capabilities of long-term traffic prediction and performs well in various evaluation indicators.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"565-580"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49946836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Declarative Programming with Intensional Sets in Java Using JSetL 使用JSetL在Java中使用内涵集进行声明式编程
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab195
Maximiliano Cristiá;Andrea Fois;Gianfranco Rossi
Intensional sets are sets given by a property rather than by enumerating their elements. In a previous work, we have proposed a decision procedure for a first-order logic language which provides restricted intensional sets (RISs), i.e. a sub-class of intensional sets that are guaranteed to denote finite—though unbounded—sets. In this paper, we show how RIS can be exploited as a convenient programming tool also in a conventional setting, namely the imperative O-O language Java. We do this by considering a Java library, called JSetL, that integrates the notions of logical variable, (set) unification and constraints that are typical of constraint logic programming languages into the Java language. We show how JSetL is naturally extended to accommodate for RIS and RIS constraints and how this extension can be exploited; on the one hand, to support a more declarative style of programming and, on the other hand, to effectively enhance the expressive power of the constraint language provided by the library.
内涵集合是由属性给出的集合,而不是通过枚举它们的元素给出的集合。在之前的工作中,我们提出了一种一阶逻辑语言的决策过程,该逻辑语言提供了限制内涵集(RISs),即保证表示有限但无界集合的内涵集的子类。在本文中,我们展示了如何将RIS作为一种方便的编程工具,也可以在传统设置中使用,即命令式O-O语言Java。我们通过考虑一个名为JSetL的Java库来实现这一点,该库将逻辑变量、(集合)统一和约束的概念(这些都是约束逻辑编程语言的典型特征)集成到Java语言中。我们展示了如何自然地扩展JSetL以适应RIS和RIS约束,以及如何利用这种扩展;一方面,为了支持一种更加声明式的编程风格,另一方面,为了有效地增强库提供的约束语言的表达能力。
{"title":"Declarative Programming with Intensional Sets in Java Using JSetL","authors":"Maximiliano Cristiá;Andrea Fois;Gianfranco Rossi","doi":"10.1093/comjnl/bxab195","DOIUrl":"https://doi.org/10.1093/comjnl/bxab195","url":null,"abstract":"Intensional sets are sets given by a property rather than by enumerating their elements. In a previous work, we have proposed a decision procedure for a first-order logic language which provides restricted intensional sets (RISs), i.e. a sub-class of intensional sets that are guaranteed to denote finite—though unbounded—sets. In this paper, we show how RIS can be exploited as a convenient programming tool also in a conventional setting, namely the imperative O-O language Java. We do this by considering a Java library, called JSetL, that integrates the notions of logical variable, (set) unification and constraints that are typical of constraint logic programming languages into the Java language. We show how JSetL is naturally extended to accommodate for RIS and RIS constraints and how this extension can be exploited; on the one hand, to support a more declarative style of programming and, on the other hand, to effectively enhance the expressive power of the constraint language provided by the library.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"763-784"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49946839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Defending Against Data Poisoning Attacks: From Distributed Learning to Federated Learning 防范数据中毒攻击:从分布式学习到联邦学习
IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2021-10-01 DOI: 10.1093/comjnl/bxab192
Yuchen Tian;Weizhe Zhang;Andrew Simpson;Yang Liu;Zoe Lin Jiang
Federated learning (FL), a variant of distributed learning (DL), supports the training of a shared model without accessing private data from different sources. Despite its benefits with regard to privacy preservation, FL's distributed nature and privacy constraints make it vulnerable to data poisoning attacks. Existing defenses, primarily designed for DL, are typically not well adapted to FL. In this paper, we study such attacks and defenses. In doing so, we start from the perspective of DL and then give consideration to a real-world FL scenario, with the aim being to explore the requisites of a desirable defense in FL. Our study shows that (i) the batch size used in each training round affects the effectiveness of defenses in DL, (ii) the defenses investigated are somewhat effective and moderately influenced by batch size in FL settings and (iii) the non-IID data makes it more difficult to defend against data poisoning attacks in FL. Based on the findings, we discuss the key challenges and possible directions in defending against such attacks in FL. In addition, we propose detect and suppress the potential outliers(DSPO), a defense against data poisoning attacks in FL scenarios. Our results show that DSPO outperforms other defenses in several cases.
联合学习(FL)是分布式学习(DL)的一种变体,支持在不访问来自不同来源的私有数据的情况下训练共享模型。尽管FL在保护隐私方面有好处,但其分布式性质和隐私限制使其容易受到数据中毒攻击。现有的防御,主要是为DL设计的,通常不能很好地适应FL。在本文中,我们研究了这种攻击和防御。在这样做的过程中,我们从DL的角度出发,然后考虑真实世界的FL场景,目的是探索FL中理想防御的必要条件。我们的研究表明:(i)每轮训练中使用的批量大小影响DL中防御的有效性,(ii)所研究的防御在一定程度上是有效的,并且在FL设置中受到批量大小的适度影响;(iii)非IID数据使在FL中防御数据中毒攻击变得更加困难。基于这些发现,我们讨论了在FL中抵御此类攻击的关键挑战和可能的方向。此外,我们提出检测和抑制潜在异常值(DSPO),针对FL场景中的数据中毒攻击的防御。我们的结果表明,DSPO在某些情况下优于其他防御。
{"title":"Defending Against Data Poisoning Attacks: From Distributed Learning to Federated Learning","authors":"Yuchen Tian;Weizhe Zhang;Andrew Simpson;Yang Liu;Zoe Lin Jiang","doi":"10.1093/comjnl/bxab192","DOIUrl":"https://doi.org/10.1093/comjnl/bxab192","url":null,"abstract":"Federated learning (FL), a variant of distributed learning (DL), supports the training of a shared model without accessing private data from different sources. Despite its benefits with regard to privacy preservation, FL's distributed nature and privacy constraints make it vulnerable to data poisoning attacks. Existing defenses, primarily designed for DL, are typically not well adapted to FL. In this paper, we study such attacks and defenses. In doing so, we start from the perspective of DL and then give consideration to a real-world FL scenario, with the aim being to explore the requisites of a desirable defense in FL. Our study shows that (i) the batch size used in each training round affects the effectiveness of defenses in DL, (ii) the defenses investigated are somewhat effective and moderately influenced by batch size in FL settings and (iii) the non-IID data makes it more difficult to defend against data poisoning attacks in FL. Based on the findings, we discuss the key challenges and possible directions in defending against such attacks in FL. In addition, we propose detect and suppress the potential outliers(DSPO), a defense against data poisoning attacks in FL scenarios. Our results show that DSPO outperforms other defenses in several cases.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"66 3","pages":"711-726"},"PeriodicalIF":1.4,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49977709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Computer Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1