A strategy refers to the rules that the agent chooses the available actions to achieve goals. Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system’s utility, decrease the overall cost, and increase mission success probability. This paper proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method – soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. Our method achieves the state-of-the-art performance on the standard continuous control benchmarks in the OpenAI Gym environment. The results demonstrate that the promising potential of the BSAC method significantly improves training efficiency. Furthermore, we extend the topic to the Multi-Agent systems (MAS), discussing the potential research fields and directions.
{"title":"Bayesian Strategy Networks Based Soft Actor-Critic Learning","authors":"Qin Yang, Ramviyas Parasuraman","doi":"10.1145/3643862","DOIUrl":"https://doi.org/10.1145/3643862","url":null,"abstract":"<p>A strategy refers to the rules that the agent chooses the available actions to achieve goals. Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system’s utility, decrease the overall cost, and increase mission success probability. This paper proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method – soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. Our method achieves the state-of-the-art performance on the standard continuous control benchmarks in the OpenAI Gym environment. The results demonstrate that the promising potential of the BSAC method significantly improves training efficiency. Furthermore, we extend the topic to the Multi-Agent systems (MAS), discussing the potential research fields and directions.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"40 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many deep learning works on financial time-series forecasting focus on predicting future prices/returns of individual assets with numerical price-related information for trading, and hence propose models designed for univariate, single task and/or unimodal settings. Forecasting for investment and risk management involves multiple tasks in multivariate settings: forecasts of expected returns and risks of assets in portfolios, and correlations between these assets. As different sources/types of time-series influence future returns, risks and correlations of assets in different ways, it is also important to capture time-series from different modalities. Hence, this paper addresses financial time-series forecasting for investment and risk management in a multivariate, multitask and multimodal setting. Financial time-series forecasting is however challenging due to the low signal-to-noise ratios typical in financial time-series, and as intra-series and inter-series relationships of assets evolve across time. To address these challenges, our proposed Temporal Implicit Multimodal Network (TIME) model learns implicit inter-series relationship networks between assets from multimodal financial time-series at multiple time-steps adaptively. TIME then uses dynamic network and temporal encoding modules to jointly capture such evolving relationships, multimodal financial time-series and temporal representations. Our experiments show that TIME outperforms other state-of-the-art models on multiple forecasting tasks and investment and risk management applications.
{"title":"Temporal Implicit Multimodal Networks for Investment and Risk Management","authors":"Gary Ang, Ee-Peng Lim","doi":"10.1145/3643855","DOIUrl":"https://doi.org/10.1145/3643855","url":null,"abstract":"<p>Many deep learning works on financial time-series forecasting focus on predicting future prices/returns of individual assets with numerical price-related information for trading, and hence propose models designed for univariate, single task and/or unimodal settings. Forecasting for investment and risk management involves multiple tasks in multivariate settings: forecasts of expected returns and risks of assets in portfolios, and correlations between these assets. As different sources/types of time-series influence future returns, risks and correlations of assets in different ways, it is also important to capture time-series from different modalities. Hence, this paper addresses financial time-series forecasting for investment and risk management in a multivariate, multitask and multimodal setting. Financial time-series forecasting is however challenging due to the low signal-to-noise ratios typical in financial time-series, and as intra-series and inter-series relationships of assets evolve across time. To address these challenges, our proposed Temporal Implicit Multimodal Network (TIME) model learns implicit inter-series relationship networks between assets from multimodal financial time-series at multiple time-steps adaptively. TIME then uses dynamic network and temporal encoding modules to jointly capture such evolving relationships, multimodal financial time-series and temporal representations. Our experiments show that TIME outperforms other state-of-the-art models on multiple forecasting tasks and investment and risk management applications.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"24 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kezhi Lu, Qian Zhang, Danny Hughes, Guangquan Zhang, Jie Lu
Recommender systems are one of the most successful applications of using AI for providing personalized e-services to customers. However, data sparsity is presenting enormous challenges that are hindering the further development of advanced recommender systems. Although cross-domain recommendation partly overcomes data sparsity by transferring knowledge from a source domain with relatively dense data to augment data in the target domain, the current methods do not handle heterogeneous data very well. For example, using today’s cross-domain transfer learning schemes with data comprising clicks, ratings, user reviews, item meta data, and knowledge graphs will likely result in a poorly-performing model. User preferences will not be comprehensively profiled, and accurate recommendations will not be generated. To solve these three challenges – i.e., handling heterogeneous data, avoiding negative transfer, and dealing with data sparsity – we designed a new end-to-end deep adversarial multi-channel transfer network for cross-domain recommendation named AMT-CDR. Heterogeneous data is handled by constructing a cross-domain graph based on real-world knowledge graphs – we used Freebase and YAGO. Negative transfer is prevented through an adversarial learning strategy that maintains consistency across the different data channels. And data sparsity is addressed with an end-to-end neural network that considers data across multiple channels and generates accurate recommendations by leveraging knowledge from both the source and target domains. Extensive experiments on three dual-target cross-domain recommendation tasks demonstrate the superiority of AMT-CDR compared to eight state-of-the-art methods. All source code is available at https://github.com/bjtu-lucas-nlp/AMT-CDR.
{"title":"AMT-CDR: A Deep Adversarial Multi-channel Transfer Network for Cross-domain Recommendation","authors":"Kezhi Lu, Qian Zhang, Danny Hughes, Guangquan Zhang, Jie Lu","doi":"10.1145/3641286","DOIUrl":"https://doi.org/10.1145/3641286","url":null,"abstract":"<p>Recommender systems are one of the most successful applications of using AI for providing personalized e-services to customers. However, data sparsity is presenting enormous challenges that are hindering the further development of advanced recommender systems. Although cross-domain recommendation partly overcomes data sparsity by transferring knowledge from a source domain with relatively dense data to augment data in the target domain, the current methods do not handle heterogeneous data very well. For example, using today’s cross-domain transfer learning schemes with data comprising clicks, ratings, user reviews, item meta data, and knowledge graphs will likely result in a poorly-performing model. User preferences will not be comprehensively profiled, and accurate recommendations will not be generated. To solve these three challenges – i.e., handling heterogeneous data, avoiding negative transfer, and dealing with data sparsity – we designed a new end-to-end deep <b>a</b>dversarial <b>m</b>ulti-channel <b>t</b>ransfer network for <b>c</b>ross-<b>d</b>omain <b>r</b>ecommendation named AMT-CDR. Heterogeneous data is handled by constructing a cross-domain graph based on real-world knowledge graphs – we used Freebase and YAGO. Negative transfer is prevented through an adversarial learning strategy that maintains consistency across the different data channels. And data sparsity is addressed with an end-to-end neural network that considers data across multiple channels and generates accurate recommendations by leveraging knowledge from both the source and target domains. Extensive experiments on three dual-target cross-domain recommendation tasks demonstrate the superiority of AMT-CDR compared to eight state-of-the-art methods. All source code is available at https://github.com/bjtu-lucas-nlp/AMT-CDR.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"324 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139580959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dylan Molho, Jiayuan Ding, Wenzhuo Tang, Zhaoheng Li, Hongzhi Wen, Yixin Wang, Julian Venegas, Wei Jin, Renming Liu, Runze Su, Patrick Danaher, Robert Yang, Yu Leo Lei, Yuying Xie, Jiliang Tang
Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.
{"title":"Deep Learning in Single-Cell Analysis","authors":"Dylan Molho, Jiayuan Ding, Wenzhuo Tang, Zhaoheng Li, Hongzhi Wen, Yixin Wang, Julian Venegas, Wei Jin, Renming Liu, Runze Su, Patrick Danaher, Robert Yang, Yu Leo Lei, Yuying Xie, Jiliang Tang","doi":"10.1145/3641284","DOIUrl":"https://doi.org/10.1145/3641284","url":null,"abstract":"<p>Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"124 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139580611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to (11.7% ) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.
{"title":"Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window","authors":"Zefang Zong, Tong Xia, Meng Zheng, Yong Li","doi":"10.1145/3625232","DOIUrl":"https://doi.org/10.1145/3625232","url":null,"abstract":"<p>Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to (11.7% ) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"161 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139556135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie
Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the ‘where’ and ‘how’ questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.
{"title":"A Survey on Evaluation of Large Language Models","authors":"Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie","doi":"10.1145/3641289","DOIUrl":"https://doi.org/10.1145/3641289","url":null,"abstract":"<p>Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: <i>what to evaluate</i>, <i>where to evaluate</i>, and <i>how to evaluate</i>. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the ‘where’ and ‘how’ questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"22 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139562259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chiao-Ting Chen, Chi Lee, Szu-Hao Huang, Wen-Chih Peng
The significant increase in credit card transactions can be attributed to the rapid growth of online shopping and digital payments, particularly during the COVID-19 pandemic. To safeguard cardholders, e-commerce companies, and financial institutions, the implementation of an effective and real-time fraud detection method using modern artificial intelligence techniques is imperative. However, the development of machine-learning-based approaches for fraud detection faces challenges such as inadequate transaction representation, noise labels, and data imbalance. Additionally, practical considerations like dynamic thresholds, concept drift, and verification latency need to be appropriately addressed. In this study, we designed a fraud detection method that accurately extracts a series of spatial and temporal representative features to precisely describe credit card transactions. Furthermore, several auxiliary self-supervised objectives were developed to model cardholders’ behavior sequences. By employing intelligent sampling strategies, potential noise labels were eliminated, thereby reducing the level of data imbalance. The developed method encompasses various innovative functions that cater to practical usage requirements. We applied this method to two real-world datasets, and the results indicated a higher F1 score compared to the most commonly used online fraud detection methods.
信用卡交易的大幅增长可归因于网上购物和数字支付的快速增长,尤其是在 COVID-19 大流行期间。为了保障持卡人、电子商务公司和金融机构的安全,利用现代人工智能技术实施有效、实时的欺诈检测方法势在必行。然而,基于机器学习的欺诈检测方法的开发面临着交易表示不充分、噪声标签和数据不平衡等挑战。此外,动态阈值、概念漂移和验证延迟等实际问题也需要妥善解决。在本研究中,我们设计了一种欺诈检测方法,该方法能准确提取一系列空间和时间代表特征,以精确描述信用卡交易。此外,我们还开发了几个辅助的自监督目标来模拟持卡人的行为序列。通过采用智能采样策略,消除了潜在的噪声标签,从而降低了数据不平衡程度。所开发的方法包含各种创新功能,可满足实际使用要求。我们将该方法应用于两个真实数据集,结果表明,与最常用的在线欺诈检测方法相比,该方法的 F1 分数更高。
{"title":"Credit Card Fraud Detection via Intelligent Sampling and Self-supervised Learning","authors":"Chiao-Ting Chen, Chi Lee, Szu-Hao Huang, Wen-Chih Peng","doi":"10.1145/3641283","DOIUrl":"https://doi.org/10.1145/3641283","url":null,"abstract":"<p>The significant increase in credit card transactions can be attributed to the rapid growth of online shopping and digital payments, particularly during the COVID-19 pandemic. To safeguard cardholders, e-commerce companies, and financial institutions, the implementation of an effective and real-time fraud detection method using modern artificial intelligence techniques is imperative. However, the development of machine-learning-based approaches for fraud detection faces challenges such as inadequate transaction representation, noise labels, and data imbalance. Additionally, practical considerations like dynamic thresholds, concept drift, and verification latency need to be appropriately addressed. In this study, we designed a fraud detection method that accurately extracts a series of spatial and temporal representative features to precisely describe credit card transactions. Furthermore, several auxiliary self-supervised objectives were developed to model cardholders’ behavior sequences. By employing intelligent sampling strategies, potential noise labels were eliminated, thereby reducing the level of data imbalance. The developed method encompasses various innovative functions that cater to practical usage requirements. We applied this method to two real-world datasets, and the results indicated a higher F1 score compared to the most commonly used online fraud detection methods.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"3 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139555992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Han, Yun-feng Ren, Alessandro Brighente, Mauro Conti
Video surveillance systems provide means to detect the presence of potentially malicious drones in the surroundings of critical infrastructures. In particular, these systems collect images and feed them to a deep-learning classifier able to detect the presence of a drone in the input image. However, current classifiers are not efficient in identifying drones that disguise themselves with the image background, e.g., hiding in front of a tree. Furthermore, video-based detection systems heavily rely on the image’s brightness, where darkness imposes significant challenges in detecting drones. Both these phenomena increase the possibilities for attackers to get close to critical infrastructures without being spotted and hence be able to gather sensitive information or cause physical damages, possibly leading to safety threats.
In this paper, we propose RANGO, a drone detection arithmetic able to detect drones in challenging images where the target is difficult to distinguish from the background. RANGO is based on a deep learning architecture that exploits a Preconditioning Operation (PREP) that highlights the target by the difference between the target gradient and the background gradient. The idea is to highlight features that will be useful for classification. After PREP, RANGO uses multiple convolution kernels to make the final decision on the presence of the drone. We test RANGO on a drone image dataset composed of multiple already existing datasets to which we add samples of birds and planes. We then compare RANGO with multiple currently existing approaches to show its superiority. When tested on images with disguising drones, RANGO attains an increase of (6.6% ) mean Average Precision (mAP) compared to YOLOv5 solution. When tested on the conventional dataset, RANGO improves the mAP by approximately (2.2% ), thus confirming its effectiveness also in the general scenario.
视频监控系统为检测关键基础设施周围是否存在潜在恶意无人机提供了手段。特别是,这些系统收集图像并将其输入深度学习分类器,该分类器能够检测输入图像中是否存在无人机。然而,目前的分类器无法有效识别伪装成图像背景的无人机,例如躲在树前的无人机。此外,基于视频的检测系统在很大程度上依赖于图像的亮度,而黑暗环境给检测无人机带来了巨大挑战。这两种现象都增加了攻击者在不被发现的情况下接近关键基础设施的可能性,从而能够收集敏感信息或造成物理破坏,可能导致安全威胁。在本文中,我们提出了一种无人机检测算法 RANGO,它能够在目标与背景难以区分的高难度图像中检测无人机。RANGO 基于深度学习架构,利用预处理操作 (PREP),通过目标梯度与背景梯度之间的差异来突出目标。其目的是突出对分类有用的特征。在 PREP 之后,RANGO 利用多重卷积核对无人机的存在做出最终判断。我们在无人机图像数据集上对 RANGO 进行了测试,该数据集由多个已有数据集组成,我们在其中添加了鸟类和飞机样本。然后,我们将 RANGO 与现有的多种方法进行比较,以显示其优越性。在伪装无人机的图像上进行测试时,RANGO 的平均精度(mAP)比 YOLOv5 解决方案提高了(6.6%)。在传统数据集上进行测试时,RANGO 的 mAP 提高了大约 (2.2%),从而证实了它在一般场景下的有效性。
{"title":"RANGO: A Novel Deep Learning Approach to Detect Drones Disguising from Video Surveillance Systems","authors":"Jin Han, Yun-feng Ren, Alessandro Brighente, Mauro Conti","doi":"10.1145/3641282","DOIUrl":"https://doi.org/10.1145/3641282","url":null,"abstract":"<p>Video surveillance systems provide means to detect the presence of potentially malicious drones in the surroundings of critical infrastructures. In particular, these systems collect images and feed them to a deep-learning classifier able to detect the presence of a drone in the input image. However, current classifiers are not efficient in identifying drones that disguise themselves with the image background, e.g., hiding in front of a tree. Furthermore, video-based detection systems heavily rely on the image’s brightness, where darkness imposes significant challenges in detecting drones. Both these phenomena increase the possibilities for attackers to get close to critical infrastructures without being spotted and hence be able to gather sensitive information or cause physical damages, possibly leading to safety threats. </p><p>In this paper, we propose RANGO, a drone detection arithmetic able to detect drones in challenging images where the target is difficult to distinguish from the background. RANGO is based on a deep learning architecture that exploits a Preconditioning Operation (PREP) that highlights the target by the difference between the target gradient and the background gradient. The idea is to highlight features that will be useful for classification. After PREP, RANGO uses multiple convolution kernels to make the final decision on the presence of the drone. We test RANGO on a drone image dataset composed of multiple already existing datasets to which we add samples of birds and planes. We then compare RANGO with multiple currently existing approaches to show its superiority. When tested on images with disguising drones, RANGO attains an increase of (6.6% ) mean Average Precision (mAP) compared to YOLOv5 solution. When tested on the conventional dataset, RANGO improves the mAP by approximately (2.2% ), thus confirming its effectiveness also in the general scenario.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"152 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139555995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommendation models are deployed in a variety of commercial applications in order to provide personalized services for users.
However, most of them rely on the users’ original rating records that are often collected by a centralized server for model training, which may cause privacy issues.
Recently, some centralized federated recommendation models are proposed for the protection of users’ privacy, which however requires a server for coordination in the whole process of model training.
As a response, we propose a novel privacy-aware decentralized federated recommendation (DFedRec) model, which is lossless compared with the traditional model in recommendation performance and is thus more accurate than other models in this line.
Specifically, we design a privacy-aware structured client-level graph for the sharing of the model parameters in the process of model training, which is a one-stone-two-bird strategy, i.e., it protects users’ privacy via some randomly sampled fake entries and reduces the communication cost by sharing the model parameters only with the related neighboring users.
With the help of the privacy-aware structured client-level graph, we propose two novel collaborative training mechanisms in the setting without a server, including a batch algorithm DFedRec(b) and a stochastic one DFedRec(s), where the former requires the anonymity mechanism while the latter does not. They are both equivalent to PMF trained in a centralized server and are thus lossless.
We then provide formal analysis of privacy guarantee of our methods and conduct extensive empirical studies on three public datasets with explicit feedback, which show the effectiveness of our DFedRec, i.e., it is privacy aware, communication efficient, and lossless.
{"title":"Decentralized Federated Recommendation with Privacy-Aware Structured Client-Level Graph","authors":"Zhitao Li, Zhaohao Lin, Feng Liang, Weike Pan, Qiang Yang, Zhong Ming","doi":"10.1145/3641287","DOIUrl":"https://doi.org/10.1145/3641287","url":null,"abstract":"<p>Recommendation models are deployed in a variety of commercial applications in order to provide personalized services for users. </p><p>However, most of them rely on the users’ original rating records that are often collected by a centralized server for model training, which may cause privacy issues. </p><p>Recently, some centralized federated recommendation models are proposed for the protection of users’ privacy, which however requires a server for coordination in the whole process of model training. </p><p>As a response, we propose a novel privacy-aware decentralized federated recommendation (DFedRec) model, which is lossless compared with the traditional model in recommendation performance and is thus more accurate than other models in this line. </p><p>Specifically, we design a privacy-aware structured client-level graph for the sharing of the model parameters in the process of model training, which is a one-stone-two-bird strategy, i.e., it protects users’ privacy via some randomly sampled fake entries and reduces the communication cost by sharing the model parameters only with the related neighboring users. </p><p>With the help of the privacy-aware structured client-level graph, we propose two novel collaborative training mechanisms in the setting without a server, including a batch algorithm DFedRec(b) and a stochastic one DFedRec(s), where the former requires the anonymity mechanism while the latter does not. They are both equivalent to PMF trained in a centralized server and are thus lossless. </p><p>We then provide formal analysis of privacy guarantee of our methods and conduct extensive empirical studies on three public datasets with explicit feedback, which show the effectiveness of our DFedRec, i.e., it is privacy aware, communication efficient, and lossless.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"41 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139514800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the ever-increasing dataset size and data storage capacity, there is a strong need to build systems that can effectively utilize these vast datasets to extract valuable information. Large datasets often exhibit sparsity and pose cold start problems, necessitating the development of responsible recommender systems. Knowledge graphs have utility in responsibly representing information related to recommendation scenarios. However, many studies overlook explicitly encoding contextual information, which is crucial for reducing the bias of multi-layer propagation. Additionally, existing methods stack multiple layers to encode high-order neighbor information, while disregarding the relational information between items and entities. This oversight hampers their ability to capture the collaborative signal latent in user-item interactions. This is particularly important in health informatics, where knowledge graphs consist of various entities connected to items through different relations. Ignoring the relational information renders them insufficient for modeling user preferences. This work presents an end-to-end recommendation framework named Knowledge Graph Enhanced Contextualized Attention-Based Network (KGCAN). It explicitly encodes both relational and contextual information of entities to preserve the original entity information. Furthermore, a user-specific attention mechanism is employed to capture personalized recommendations. The proposed model is validated on three benchmark datasets through extensive experiments. The experimental results demonstrate that KGCAN outperforms existing KG-based recommendation models. Additionally, a case study from the healthcare domain is discussed, highlighting the importance of attention mechanisms and high-order connectivity in the responsible recommendation system for health informatics.
随着数据集规模和数据存储容量的不断扩大,人们亟需建立能够有效利用这些庞大数据集来提取有价值信息的系统。大型数据集通常表现出稀疏性,并带来冷启动问题,因此有必要开发负责任的推荐系统。知识图谱可以负责任地表示与推荐场景相关的信息。然而,许多研究忽视了对上下文信息的明确编码,而这对于减少多层传播的偏差至关重要。此外,现有方法堆叠多层来编码高阶邻居信息,却忽略了项目和实体之间的关系信息。这种疏忽影响了它们捕捉用户与项目交互中潜在的协作信号的能力。这一点在健康信息学中尤为重要,因为在健康信息学中,知识图谱由通过不同关系与项目相连的各种实体组成。忽略关系信息会使它们不足以为用户偏好建模。这项研究提出了一个端到端的推荐框架,名为 "知识图谱增强型基于上下文的注意力网络(KGCAN)"。它明确编码了实体的关系信息和上下文信息,以保留原始实体信息。此外,还采用了用户特定关注机制来捕捉个性化推荐。通过大量实验,我们在三个基准数据集上验证了所提出的模型。实验结果表明,KGCAN 优于现有的基于 KG 的推荐模型。此外,还讨论了医疗保健领域的一个案例研究,强调了关注机制和高阶连接在医疗信息学负责任推荐系统中的重要性。
{"title":"Knowledge Graph Enhanced Contextualized Attention-Based Network for Responsible User-Specific Recommendation","authors":"Ehsan Elahi, Sajid Anwar, Babar Shah, Zahid Halim, Abrar Ullah, Imad Rida, Muhammad Waqas","doi":"10.1145/3641288","DOIUrl":"https://doi.org/10.1145/3641288","url":null,"abstract":"<p>With the ever-increasing dataset size and data storage capacity, there is a strong need to build systems that can effectively utilize these vast datasets to extract valuable information. Large datasets often exhibit sparsity and pose cold start problems, necessitating the development of responsible recommender systems. Knowledge graphs have utility in responsibly representing information related to recommendation scenarios. However, many studies overlook explicitly encoding contextual information, which is crucial for reducing the bias of multi-layer propagation. Additionally, existing methods stack multiple layers to encode high-order neighbor information, while disregarding the relational information between items and entities. This oversight hampers their ability to capture the collaborative signal latent in user-item interactions. This is particularly important in health informatics, where knowledge graphs consist of various entities connected to items through different relations. Ignoring the relational information renders them insufficient for modeling user preferences. This work presents an end-to-end recommendation framework named Knowledge Graph Enhanced Contextualized Attention-Based Network (KGCAN). It explicitly encodes both relational and contextual information of entities to preserve the original entity information. Furthermore, a user-specific attention mechanism is employed to capture personalized recommendations. The proposed model is validated on three benchmark datasets through extensive experiments. The experimental results demonstrate that KGCAN outperforms existing KG-based recommendation models. Additionally, a case study from the healthcare domain is discussed, highlighting the importance of attention mechanisms and high-order connectivity in the responsible recommendation system for health informatics.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"37 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}