This paper presents an original investigation into the domain of violence detection in videos, introducing an innovative approach tailored to the unique challenges of a federated learning environment. The study encompasses a comprehensive exploration of machine learning techniques, leveraging spatio-temporal features extracted from benchmark video datasets. In a notable departure from conventional methodologies, we introduce a novel architecture, the “Diff Gated” network, designed to streamline preprocessing and training while simultaneously enhancing accuracy. Our exploration of advanced machine learning techniques, such as super-convergence and transfer learning, expands the horizons of federated learning, offering a broader range of practical applications. Moreover, our research introduces a method for seamlessly adapting centralized datasets to the federated learning context, bridging the gap between traditional machine learning and federated learning approaches. The outcome of this study is a remarkable advancement in the field of violence detection, with our federated learning model consistently outperforming state-of-the-art models, underscoring the transformative potential of our contributions. This work represents a significant step forward in the application of machine learning techniques to critical societal challenges.
{"title":"Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures","authors":"Quentin Pajon, Swan Serre, Hugo Wissocq, Léo Rabaud, Siba Haidar, Antoun Yaacoub","doi":"10.1007/s11390-024-3702-7","DOIUrl":"https://doi.org/10.1007/s11390-024-3702-7","url":null,"abstract":"<p>This paper presents an original investigation into the domain of violence detection in videos, introducing an innovative approach tailored to the unique challenges of a federated learning environment. The study encompasses a comprehensive exploration of machine learning techniques, leveraging spatio-temporal features extracted from benchmark video datasets. In a notable departure from conventional methodologies, we introduce a novel architecture, the “Diff Gated” network, designed to streamline preprocessing and training while simultaneously enhancing accuracy. Our exploration of advanced machine learning techniques, such as super-convergence and transfer learning, expands the horizons of federated learning, offering a broader range of practical applications. Moreover, our research introduces a method for seamlessly adapting centralized datasets to the federated learning context, bridging the gap between traditional machine learning and federated learning approaches. The outcome of this study is a remarkable advancement in the field of violence detection, with our federated learning model consistently outperforming state-of-the-art models, underscoring the transformative potential of our contributions. This work represents a significant step forward in the application of machine learning techniques to critical societal challenges.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"152 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-024-3767-3
Fei Du, Xin-Jian Ma, Jing-Ru Yang, Yi Liu, Chao-Ran Luo, Xue-Bin Wang, Hai-Ou Jiang, Xiang Jing
Since OpenAI opened access to ChatGPT, large language models (LLMs) become an increasingly popular topic attracting researchers’ attention from abundant domains. However, public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically unrevealed. Since datasets are an important setup of LLMs, this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune processes. The paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art LLMs. Secondly, based on the properties of the pre-train and fine-tune processes, it comments on pre-train datasets from quality, quantity, and relation with models, and comments on fine-tune datasets from quality, quantity, and concerns. This study then critically figures out the problems and research trends that exist in current LLM datasets. The study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data development. To the best of our knowledge, this paper is the first to summarize and discuss datasets used in both autoregressive and chat LLMs. The survey offers insights and suggestions to researchers and LLM developers as they build their models, and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.
{"title":"A Survey of LLM Datasets: From Autoregressive Model to AI Chatbot","authors":"Fei Du, Xin-Jian Ma, Jing-Ru Yang, Yi Liu, Chao-Ran Luo, Xue-Bin Wang, Hai-Ou Jiang, Xiang Jing","doi":"10.1007/s11390-024-3767-3","DOIUrl":"https://doi.org/10.1007/s11390-024-3767-3","url":null,"abstract":"<p>Since OpenAI opened access to ChatGPT, large language models (LLMs) become an increasingly popular topic attracting researchers’ attention from abundant domains. However, public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically unrevealed. Since datasets are an important setup of LLMs, this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune processes. The paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art LLMs. Secondly, based on the properties of the pre-train and fine-tune processes, it comments on pre-train datasets from quality, quantity, and relation with models, and comments on fine-tune datasets from quality, quantity, and concerns. This study then critically figures out the problems and research trends that exist in current LLM datasets. The study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data development. To the best of our knowledge, this paper is the first to summarize and discuss datasets used in both autoregressive and chat LLMs. The survey offers insights and suggestions to researchers and LLM developers as they build their models, and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"16 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-024-4143-z
Zhong-Zheng Peng, Yi-Xin Yang, Jin-Hui Tang, Jin-Shan Pan
Video colorization aims to add color to grayscale or monochrome videos. Although existing methods have achieved substantial and noteworthy results in the field of image colorization, video colorization presents more formidable obstacles due to the additional necessity for temporal consistency. Moreover, there is rarely a systematic review of video colorization methods. In this paper, we aim to review existing state-of-the-art video colorization methods. In addition, maintaining spatial-temporal consistency is pivotal to the process of video colorization. To gain deeper insight into the evolution of existing methods in terms of spatial-temporal consistency, we further review video colorization methods from a novel perspective. Video colorization methods can be categorized into four main categories: optical-flow based methods, scribble-based methods, exemplar-based methods, and fully automatic methods. However, optical-flow based methods rely heavily on accurate optical-flow estimation, scribble-based methods require extensive user interaction and modifications, exemplar-based methods face challenges in obtaining suitable reference images, and fully automatic methods often struggle to meet specific colorization requirements. We also discuss the existing challenges and highlight several future research opportunities worth exploring.
{"title":"Video Colorization: A Survey","authors":"Zhong-Zheng Peng, Yi-Xin Yang, Jin-Hui Tang, Jin-Shan Pan","doi":"10.1007/s11390-024-4143-z","DOIUrl":"https://doi.org/10.1007/s11390-024-4143-z","url":null,"abstract":"<p>Video colorization aims to add color to grayscale or monochrome videos. Although existing methods have achieved substantial and noteworthy results in the field of image colorization, video colorization presents more formidable obstacles due to the additional necessity for temporal consistency. Moreover, there is rarely a systematic review of video colorization methods. In this paper, we aim to review existing state-of-the-art video colorization methods. In addition, maintaining spatial-temporal consistency is pivotal to the process of video colorization. To gain deeper insight into the evolution of existing methods in terms of spatial-temporal consistency, we further review video colorization methods from a novel perspective. Video colorization methods can be categorized into four main categories: optical-flow based methods, scribble-based methods, exemplar-based methods, and fully automatic methods. However, optical-flow based methods rely heavily on accurate optical-flow estimation, scribble-based methods require extensive user interaction and modifications, exemplar-based methods face challenges in obtaining suitable reference images, and fully automatic methods often struggle to meet specific colorization requirements. We also discuss the existing challenges and highlight several future research opportunities worth exploring.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"64 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-024-3872-3
Lei Guan, Dong-Sheng Li, Ji-Ye Liang, Wen-Jian Wang, Ke-Shi Ge, Xi-Cheng Lu
Deep learning has become the cornerstone of artificial intelligence, playing an increasingly important role in human production and lifestyle. However, as the complexity of problem-solving increases, deep learning models become increasingly intricate, resulting in a proliferation of large language models with an astonishing number of parameters. Pipeline model parallelism (PMP) has emerged as one of the mainstream approaches to addressing the significant challenge of training “big models”. This paper presents a comprehensive review of PMP. It covers the basic concepts and main challenges of PMP. It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches, and discusses the main techniques to achieve load balance for both intra-node and inter-node training. Furthermore, the main techniques to optimize computation, storage, and communication are presented, with potential research directions being discussed.
{"title":"Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview","authors":"Lei Guan, Dong-Sheng Li, Ji-Ye Liang, Wen-Jian Wang, Ke-Shi Ge, Xi-Cheng Lu","doi":"10.1007/s11390-024-3872-3","DOIUrl":"https://doi.org/10.1007/s11390-024-3872-3","url":null,"abstract":"<p>Deep learning has become the cornerstone of artificial intelligence, playing an increasingly important role in human production and lifestyle. However, as the complexity of problem-solving increases, deep learning models become increasingly intricate, resulting in a proliferation of large language models with an astonishing number of parameters. Pipeline model parallelism (PMP) has emerged as one of the mainstream approaches to addressing the significant challenge of training “big models”. This paper presents a comprehensive review of PMP. It covers the basic concepts and main challenges of PMP. It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches, and discusses the main techniques to achieve load balance for both intra-node and inter-node training. Furthermore, the main techniques to optimize computation, storage, and communication are presented, with potential research directions being discussed.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"61 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-024-2883-4
Fabio Caffaro, Giuseppe Rizzo
Humanity has fantasized about artificial intelligence tools able to discuss with human beings fluently for decades. Numerous efforts have been proposed ranging from ELIZA to the modern vocal assistants. Despite the large interest in this research and innovation field, there is a lack of common understanding on the concept of conversational agents and general over expectations that hide the current limitations of existing solutions. This work proposes a literature review on the subject with a focus on the most promising type of conversational agents that are powered on top of knowledge bases and that can offer the ground knowledge to hold conversation autonomously on different topics. We describe a conceptual architecture to define the knowledge-enhanced conversational agents and investigate different domains of applications. We conclude this work by listing some promising research pathways for future work.
几十年来,人类一直幻想着人工智能工具能够流畅地与人类进行讨论。从 ELIZA 到现代发声助手,人们已经提出了无数的方案。尽管人们对这一研究和创新领域兴趣浓厚,但对会话代理的概念却缺乏共识,普遍的过高期望掩盖了现有解决方案的局限性。本作品对这一主题进行了文献综述,重点关注最有前途的会话代理类型,它们由知识库驱动,能够提供基础知识,就不同主题自主进行会话。我们描述了定义知识增强型对话代理的概念架构,并研究了不同的应用领域。最后,我们列出了未来工作中一些有前景的研究方向。
{"title":"Knowledge-Enhanced Conversational Agents","authors":"Fabio Caffaro, Giuseppe Rizzo","doi":"10.1007/s11390-024-2883-4","DOIUrl":"https://doi.org/10.1007/s11390-024-2883-4","url":null,"abstract":"<p>Humanity has fantasized about artificial intelligence tools able to discuss with human beings fluently for decades. Numerous efforts have been proposed ranging from ELIZA to the modern vocal assistants. Despite the large interest in this research and innovation field, there is a lack of common understanding on the concept of conversational agents and general over expectations that hide the current limitations of existing solutions. This work proposes a literature review on the subject with a focus on the most promising type of conversational agents that are powered on top of knowledge bases and that can offer the ground knowledge to hold conversation autonomously on different topics. We describe a\u0000conceptual architecture to define the knowledge-enhanced conversational agents and investigate different domains of applications. We conclude this work by listing some promising research pathways for future work.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"31 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-024-3814-0
Rui Jiang, Guang-Cong Zheng, Teng Li, Tian-Rui Yang, Jing-Dong Wang, Xi Li
Diffusion models have recently emerged as powerful generative models, producing high-fidelity samples across domains. Despite this, they have two key challenges, including improving the time-consuming iterative generation process and controlling and steering the generation process. Existing surveys provide broad overviews of diffusion model advancements. However, they lack comprehensive coverage specifically centered on techniques for controllable generation. This survey seeks to address this gap by providing a comprehensive and coherent review on controllable generation in diffusion models. We provide a detailed taxonomy defining controlled generation for diffusion models. Controllable generation is categorized based on the formulation, methodologies, and evaluation metrics. By enumerating the range of methods researchers have developed for enhanced control, we aim to establish controllable diffusion generation as a distinct subfield warranting dedicated focus. With this survey, we contextualize recent results, provide the dedicated treatment of controllable diffusion model generation, and outline limitations and future directions. To demonstrate applicability, we highlight controllable diffusion techniques for major computer vision tasks application. By consolidating methods and applications for controllable diffusion models, we hope to catalyze further innovations in reliable and scalable controllable generation.
{"title":"A Survey of Multimodal Controllable Diffusion Models","authors":"Rui Jiang, Guang-Cong Zheng, Teng Li, Tian-Rui Yang, Jing-Dong Wang, Xi Li","doi":"10.1007/s11390-024-3814-0","DOIUrl":"https://doi.org/10.1007/s11390-024-3814-0","url":null,"abstract":"<p>Diffusion models have recently emerged as powerful generative models, producing high-fidelity samples across domains. Despite this, they have two key challenges, including improving the time-consuming iterative generation process and controlling and steering the generation process. Existing surveys provide broad overviews of diffusion model advancements. However, they lack comprehensive coverage specifically centered on techniques for controllable generation. This survey seeks to address this gap by providing a comprehensive and coherent review on controllable generation in diffusion models. We provide a detailed taxonomy defining controlled generation for diffusion models. Controllable generation is categorized based on the formulation, methodologies, and evaluation metrics. By enumerating the range of methods researchers have developed for enhanced control, we aim to establish controllable diffusion generation as a distinct subfield warranting dedicated focus. With this survey, we contextualize recent results, provide the dedicated treatment of controllable diffusion model generation, and outline limitations and future directions. To demonstrate applicability, we highlight controllable diffusion techniques for major computer vision tasks application. By consolidating methods and applications for controllable diffusion models, we hope to catalyze further innovations in reliable and scalable controllable generation.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"13 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-023-2519-0
Yang-Su Liu, Zhen-Zhe Zheng, Fan Wu, Gui-Hai Chen
Large-quantity and high-quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data markets. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley Value Mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms state-of-the-art methods.
{"title":"When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading","authors":"Yang-Su Liu, Zhen-Zhe Zheng, Fan Wu, Gui-Hai Chen","doi":"10.1007/s11390-023-2519-0","DOIUrl":"https://doi.org/10.1007/s11390-023-2519-0","url":null,"abstract":"<p>Large-quantity and high-quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data markets. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley Value Mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms state-of-the-art methods.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"50 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-024-3914-x
Yin Xu, Ming-Jun Xiao, Chen Wu, Jie Wu, Jin-Rui Zhou, He Sun
Federated learning (FL) is an emerging privacy-preserving distributed computing paradigm, enabling numerous clients to collaboratively train machine learning models without the necessity of transmitting clients’ private datasets to the central server. Unlike most existing research where the local datasets of clients are assumed to be unchanged over time throughout the whole FL process, our study addresses such scenarios in this paper where clients’ datasets need to be updated periodically, and the server can incentivize clients to employ as fresh as possible datasets for local model training. Our primary objective is to design a client selection strategy to minimize the loss of the global model for FL loss within a constrained budget. To this end, we introduce the concept of “Age of Information” (AoI) to quantitatively assess the freshness of local datasets and conduct a theoretical analysis of the convergence bound in our AoI-aware FL system. Based on the convergence bound, we further formulate our problem as a restless multi-armed bandit (RMAB) problem. Next, we relax the RMAB problem and apply the Lagrangian Dual approach to decouple it into multiple subproblems. Finally, we propose a Whittle’s Index Based Client Selection (WICS) algorithm to determine the set of selected clients. In addition, comprehensive simulations substantiate that the proposed algorithm can effectively reduce training loss and enhance the learning accuracy compared with some state-of-the-art methods.
{"title":"Age-of-Information-Aware Federated Learning","authors":"Yin Xu, Ming-Jun Xiao, Chen Wu, Jie Wu, Jin-Rui Zhou, He Sun","doi":"10.1007/s11390-024-3914-x","DOIUrl":"https://doi.org/10.1007/s11390-024-3914-x","url":null,"abstract":"<p>Federated learning (FL) is an emerging privacy-preserving distributed computing paradigm, enabling numerous clients to collaboratively train machine learning models without the necessity of transmitting clients’ private datasets to the central server. Unlike most existing research where the local datasets of clients are assumed to be unchanged over time throughout the whole FL process, our study addresses such scenarios in this paper where clients’ datasets need to be updated periodically, and the server can incentivize clients to employ as fresh as possible datasets for local model training. Our primary objective is to design a client selection strategy to minimize the loss of the global model for FL loss within a constrained budget. To this end, we introduce the concept of “Age of Information” (AoI) to quantitatively assess the freshness of local datasets and conduct a theoretical analysis of the convergence bound in our AoI-aware FL system. Based on the convergence bound, we further formulate our problem as a restless multi-armed bandit (RMAB) problem. Next, we relax the RMAB problem and apply the Lagrangian Dual approach to decouple it into multiple subproblems. Finally, we propose a Whittle’s Index Based Client Selection (WICS) algorithm to determine the set of selected clients. In addition, comprehensive simulations substantiate that the proposed algorithm can effectively reduce training loss and enhance the learning accuracy compared with some state-of-the-art methods.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"29 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s11390-023-3788-3
Wei-Ming Hu, Qiang Wang, Jin Gao, Bing Li, Stephen Maybank
CNN (convolutional neural network) based real time trackers usually do not carry out online network update in order to maintain rapid tracking speed. This inevitably influences the adaptability to changes in object appearance. Correlation filter based trackers can update the model parameters online in real time. In this paper, we present an end-to-end lightweight network architecture, namely Discriminant Correlation Filter Network (DCFNet). A differentiable DCF (discriminant correlation filter) layer is incorporated into a Siamese network architecture in order to learn the convolutional features and the correlation filter simultaneously. The correlation filter can be efficiently updated online. In previous work, we introduced a joint scale-position space to the DCFNet, forming a scale DCFNet which carries out the predictions of object scale and position simultaneously. We combine the scale DCFNet with the convolutional-deconvolutional network, learning both the high-level embedding space representations and the low-level fine-grained representations for images. The adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding are complementary for visual tracking. The back-propagation is derived in the Fourier frequency domain throughout the entire work, preserving the efficiency of the DCF. Extensive evaluations on the OTB (Object Tracking Benchmark) and VOT (Visual Object Tracking Challenge) datasets demonstrate that the proposed trackers have fast speeds, while maintaining tracking accuracy.
{"title":"DCFNet: Discriminant Correlation Filters Network for Visual Tracking","authors":"Wei-Ming Hu, Qiang Wang, Jin Gao, Bing Li, Stephen Maybank","doi":"10.1007/s11390-023-3788-3","DOIUrl":"https://doi.org/10.1007/s11390-023-3788-3","url":null,"abstract":"<p>CNN (convolutional neural network) based real time trackers usually do not carry out online network update in order to maintain rapid tracking speed. This inevitably influences the adaptability to changes in object appearance. Correlation filter based trackers can update the model parameters online in real time. In this paper, we present an end-to-end lightweight network architecture, namely Discriminant Correlation Filter Network (DCFNet). A differentiable DCF (discriminant correlation filter) layer is incorporated into a Siamese network architecture in order to learn the convolutional features and the correlation filter simultaneously. The correlation filter can be efficiently updated online. In previous work, we introduced a joint scale-position space to the DCFNet, forming a scale DCFNet which carries out the predictions of object scale and position simultaneously. We combine the scale DCFNet with the convolutional-deconvolutional network, learning both the high-level embedding space representations and the low-level fine-grained representations for images. The adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding are complementary for visual tracking. The back-propagation is derived in the Fourier frequency domain throughout the entire work, preserving the efficiency of the DCF. Extensive evaluations on the OTB (Object Tracking Benchmark) and VOT (Visual Object Tracking Challenge) datasets demonstrate that the proposed trackers have fast speeds, while maintaining tracking accuracy.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"2 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a local search algorithm, one of its most important features is the definition of its neighborhood which is crucial to the algorithm’s performance. In this paper, we present an analysis of neighborhood combination search for solving the single-machine scheduling problem with sequence-dependent setup time with the objective of minimizing total weighted tardiness (SMSWT). First, We propose a new neighborhood structure named Block Swap (B1) which can be considered as an extension of the previously widely used Block Move (B2) neighborhood, and a fast incremental evaluation technique to enhance its evaluation efficiency. Second, based on the Block Swap and Block Move neighborhoods, we present two kinds of neighborhood structures: neighborhood union (denoted by B1⋃B2) and token-ring search (denoted by B1 → B2), both of which are combinations of B1 and B2. Third, we incorporate the neighborhood union and token-ring search into two representative metaheuristic algorithms: the Iterated Local Search Algorithm (ILSnew) and the Hybrid Evolutionary Algorithm (HEAnew) to investigate the performance of the neighborhood union and token-ring search. Extensive experiments show the competitiveness of the token-ring search combination mechanism of the two neighborhoods. Tested on the 120 public benchmark instances, our HEAnew has a highly competitive performance in solution quality and computational time compared with both the exact algorithms and recent metaheuristics. We have also tested the HEAnew algorithm with the selected neighborhood combination search to deal with the 64 public benchmark instances of the single-machine scheduling problem with sequence-dependent setup time. HEAnew is able to match the optimal or the best known results for all the 64 instances. In particular, the computational time for reaching the best well-known results for five challenging instances is reduced by at least 61.25%.
{"title":"Neighborhood Combination Search for Single-Machine Scheduling with Sequence-Dependent Setup Time","authors":"Xiao-Lu Liu, Hong-Yun Xu, Jia-Ming Chen, Zhou-Xing Su, Zhi-Peng Lyu, Jun-Wen Ding","doi":"10.1007/s11390-023-2007-6","DOIUrl":"https://doi.org/10.1007/s11390-023-2007-6","url":null,"abstract":"<p>In a local search algorithm, one of its most important features is the definition of its neighborhood which is crucial to the algorithm’s performance. In this paper, we present an analysis of neighborhood combination search for solving the single-machine scheduling problem with sequence-dependent setup time with the objective of minimizing total weighted tardiness (SMSWT). First, We propose a new neighborhood structure named Block Swap (B1) which can be considered as an extension of the previously widely used Block Move (B2) neighborhood, and a fast incremental evaluation technique to enhance its evaluation efficiency. Second, based on the Block Swap and Block Move neighborhoods, we present two kinds of neighborhood structures: neighborhood union (denoted by B1⋃B2) and token-ring search (denoted by B1 → B2), both of which are combinations of B1 and B2. Third, we incorporate the neighborhood union and token-ring search into two representative metaheuristic algorithms: the Iterated Local Search Algorithm (ILS<sub>new</sub>) and the Hybrid Evolutionary Algorithm (HEA<sub>new</sub>) to investigate the performance of the neighborhood union and token-ring search. Extensive experiments show the competitiveness of the token-ring search combination mechanism of the two neighborhoods. Tested on the 120 public benchmark instances, our HEA<sub>new</sub> has a highly competitive performance in solution quality and computational time compared with both the exact algorithms and recent metaheuristics. We have also tested the HEA<sub>new</sub> algorithm with the selected neighborhood combination search to deal with the 64 public benchmark instances of the single-machine scheduling problem with sequence-dependent setup time. HEA<sub>new</sub> is able to match the optimal or the best known results for all the 64 instances. In particular, the computational time for reaching the best well-known results for five challenging instances is reduced by at least 61.25%.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"64 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}