Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10191
Xinchi Qiu, Yan Gao, Lorenzo Sani, Heng Pan, Wanru Zhao, Pedro Gusmão, Mina Alibeigi, Alexandru Iacob, Nicholas D. Lane
Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices while keeping data localized. The deployment of FL in numerous real-world applications faces delays, primarily due to the prevalent reliance on supervised tasks. Generating detailed labels at edge devices, if feasible, is demanding, given resource constraints and the imperative for continuous data updates. In addressing these challenges, solutions such as federated semi-supervised learning (FSSL), which relies on unlabeled clients' data and a limited amount of labeled data on the server, become pivotal. In this paper, we propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server. The anchor head is empowered with a newly designed label contrastive loss based on the cosine similarity metric. Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples. Extensive experiments on CIFAR10/100 and SVHN datasets demonstrate that our method outperforms the state-of-the-art method by a significant margin in terms of convergence rate and model accuracy.
{"title":"FedAnchor: Enhancing Federated Semi-Supervised Learning with Label Contrastive Loss for Unlabeled Clients","authors":"Xinchi Qiu, Yan Gao, Lorenzo Sani, Heng Pan, Wanru Zhao, Pedro Gusmão, Mina Alibeigi, Alexandru Iacob, Nicholas D. Lane","doi":"10.48550/arXiv.2402.10191","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10191","url":null,"abstract":"Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices while keeping data localized. The deployment of FL in numerous real-world applications faces delays, primarily due to the prevalent reliance on supervised tasks. Generating detailed labels at edge devices, if feasible, is demanding, given resource constraints and the imperative for continuous data updates. In addressing these challenges, solutions such as federated semi-supervised learning (FSSL), which relies on unlabeled clients' data and a limited amount of labeled data on the server, become pivotal. In this paper, we propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server. The anchor head is empowered with a newly designed label contrastive loss based on the cosine similarity metric. Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples. Extensive experiments on CIFAR10/100 and SVHN datasets demonstrate that our method outperforms the state-of-the-art method by a significant margin in terms of convergence rate and model accuracy.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09797
Hyewon Han, Naveen Kumar
In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic speech recognition (ASR) and natural language understanding (NLU) performance. To address this problem, we propose voice activity detection (VAD) model for all talkers using multichannel information, which is then used to filter audio for downstream tasks. We adopt a synthetic training data generation approach through playback and re-recording for such scenarios, simulating challenging speech overlap conditions. We train our models on this synthetic data and demonstrate that our approach outperforms single-channel VAD models and energy-based multi-channel VAD algorithm in various acoustic environments. In addition to VAD results, we also present multiparty ASR evaluation results to highlight the impact of using our VAD model for filtering audio in downstream tasks by significantly reducing the insertion error.
在这项工作中,我们为现场多方互动节目的多通道多谈话者设置提出了一种新颖的串扰抑制框架。我们的远场音频设置要求在现场互动时免提,由四个相邻的谈话者在同一空间内使用定向麦克风组成。这种设置通常会在声道之间产生严重的串扰,从而降低自动语音识别(ASR)和自然语言理解(NLU)的性能。为解决这一问题,我们提出了利用多通道信息对所有说话者进行语音活动检测(VAD)的模型,然后利用该模型为下游任务过滤音频。我们采用一种合成训练数据生成方法,通过回放和重新录制此类场景,模拟具有挑战性的语音重叠条件。我们在这些合成数据上训练我们的模型,并证明我们的方法在各种声学环境中优于单通道 VAD 模型和基于能量的多通道 VAD 算法。除了 VAD 结果外,我们还展示了多方 ASR 评估结果,以强调在下游任务中使用我们的 VAD 模型过滤音频的影响,即显著减少插入误差。
{"title":"A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings","authors":"Hyewon Han, Naveen Kumar","doi":"10.48550/arXiv.2402.09797","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09797","url":null,"abstract":"In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic speech recognition (ASR) and natural language understanding (NLU) performance. To address this problem, we propose voice activity detection (VAD) model for all talkers using multichannel information, which is then used to filter audio for downstream tasks. We adopt a synthetic training data generation approach through playback and re-recording for such scenarios, simulating challenging speech overlap conditions. We train our models on this synthetic data and demonstrate that our approach outperforms single-channel VAD models and energy-based multi-channel VAD algorithm in various acoustic environments. In addition to VAD results, we also present multiparty ASR evaluation results to highlight the impact of using our VAD model for filtering audio in downstream tasks by significantly reducing the insertion error.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10184
Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang
There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. Here we aim to mitigate such incompatibility through the design of dataset information structures during reward modeling, and meanwhile propose new, generalizable methods of analysis that have wider applications, including potentially shedding light on goal misgeneralization. Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and large language model (LLM) behavior. Based on this framework, we introduce a new method to model generalization in the reward modeling stage of RLHF, the induced Bayesian network (IBN). Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical methods of generalization analysis. An insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. We derive that in complex contexts with limited data, the tree-based reward model (RM) induces up to $Theta(log n/loglog n)$ times less variance than chain-based RM where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. Looking ahead, we hope to extend the IBN analysis to help understand the phenomenon of goal misgeneralization.
{"title":"Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective","authors":"Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang","doi":"10.48550/arXiv.2402.10184","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10184","url":null,"abstract":"There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. Here we aim to mitigate such incompatibility through the design of dataset information structures during reward modeling, and meanwhile propose new, generalizable methods of analysis that have wider applications, including potentially shedding light on goal misgeneralization. Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and large language model (LLM) behavior. Based on this framework, we introduce a new method to model generalization in the reward modeling stage of RLHF, the induced Bayesian network (IBN). Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical methods of generalization analysis. An insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. We derive that in complex contexts with limited data, the tree-based reward model (RM) induces up to $Theta(log n/loglog n)$ times less variance than chain-based RM where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. Looking ahead, we hope to extend the IBN analysis to help understand the phenomenon of goal misgeneralization.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"21 22","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09721
Tao Lin, Yiling Chen
We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.
{"title":"Persuading a Learning Agent","authors":"Tao Lin, Yiling Chen","doi":"10.48550/arXiv.2402.09721","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09721","url":null,"abstract":"We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"9 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10082
Enrique Mármol Campos, Aurora González-Vidal, José Luis Hernández Ramos, A. Gómez-Skarmeta
Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks, which can significantly degrade model performance and hinder convergence. The effectiveness of existing approaches to mitigate complex attacks, such as median, trimmed mean, or Krum aggregation functions, has been only partially demonstrated in the case of specific attacks. Our study introduces a novel robust aggregation mechanism utilizing the Fourier Transform (FT), which is able to effectively handling sophisticated attacks without prior knowledge of the number of attackers. Employing this data technique, weights generated by FL clients are projected into the frequency domain to ascertain their density function, selecting the one exhibiting the highest frequency. Consequently, malicious clients' weights are excluded. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.
{"title":"FedRDF: A Robust and Dynamic Aggregation Function against Poisoning Attacks in Federated Learning","authors":"Enrique Mármol Campos, Aurora González-Vidal, José Luis Hernández Ramos, A. Gómez-Skarmeta","doi":"10.48550/arXiv.2402.10082","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10082","url":null,"abstract":"Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks, which can significantly degrade model performance and hinder convergence. The effectiveness of existing approaches to mitigate complex attacks, such as median, trimmed mean, or Krum aggregation functions, has been only partially demonstrated in the case of specific attacks. Our study introduces a novel robust aggregation mechanism utilizing the Fourier Transform (FT), which is able to effectively handling sophisticated attacks without prior knowledge of the number of attackers. Employing this data technique, weights generated by FL clients are projected into the frequency domain to ascertain their density function, selecting the one exhibiting the highest frequency. Consequently, malicious clients' weights are excluded. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10172
Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell
Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30%$.
{"title":"OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models","authors":"Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell","doi":"10.48550/arXiv.2402.10172","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10172","url":null,"abstract":"Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30%$.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"8 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10210
Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner"and"loser"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.
{"title":"Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation","authors":"Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu","doi":"10.48550/arXiv.2402.10210","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10210","url":null,"abstract":"Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images (\"winner\"and\"loser\"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"1 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09896
Anastasios K. Papazafeiropoulos, Hanxiao Ge, P. Kourtessis, T. Ratnarajah, S. Chatzinotas, S. Papavassiliou
Simultaneously transmitting and reflecting textcolor{black}{reconfigurable intelligent surface} (STAR-RIS) is a promising implementation of RIS-assisted systems that enables full-space coverage. However, STAR-RIS as well as conventional RIS suffer from the double-fading effect. Thus, in this paper, we propose the marriage of active RIS and STAR-RIS, denoted as ASTARS for massive multiple-input multiple-output (mMIMO) systems, and we focus on the energy splitting (ES) and mode switching (MS) protocols. Compared to prior literature, we consider the impact of correlated fading, and we rely our analysis on the two timescale protocol, being dependent on statistical channel state information (CSI). On this ground, we propose a channel estimation method for ASTARS with reduced overhead that accounts for its architecture. Next, we derive a textcolor{black}{closed-form expression} for the achievable sum-rate for both types of users in the transmission and reflection regions in a unified approach with significant practical advantages such as reduced complexity and overhead, which result in a lower number of required iterations for convergence compared to an alternating optimization (AO) approach. Notably, we maximize simultaneously the amplitudes, the phase shifts, and the active amplifying coefficients of the ASTARS by applying the projected gradient ascent method (PGAM). Remarkably, the proposed optimization can be executed at every several coherence intervals that reduces the processing burden considerably. Simulations corroborate the analytical results, provide insight into the effects of fundamental variables on the sum achievable SE, and present the superiority of 16 ASTARS compared to passive STAR-RIS for a practical number of surface elements.
{"title":"Two-Timescale Design for Active STAR-RIS Aided Massive MIMO Systems","authors":"Anastasios K. Papazafeiropoulos, Hanxiao Ge, P. Kourtessis, T. Ratnarajah, S. Chatzinotas, S. Papavassiliou","doi":"10.48550/arXiv.2402.09896","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09896","url":null,"abstract":"Simultaneously transmitting and reflecting textcolor{black}{reconfigurable intelligent surface} (STAR-RIS) is a promising implementation of RIS-assisted systems that enables full-space coverage. However, STAR-RIS as well as conventional RIS suffer from the double-fading effect. Thus, in this paper, we propose the marriage of active RIS and STAR-RIS, denoted as ASTARS for massive multiple-input multiple-output (mMIMO) systems, and we focus on the energy splitting (ES) and mode switching (MS) protocols. Compared to prior literature, we consider the impact of correlated fading, and we rely our analysis on the two timescale protocol, being dependent on statistical channel state information (CSI). On this ground, we propose a channel estimation method for ASTARS with reduced overhead that accounts for its architecture. Next, we derive a textcolor{black}{closed-form expression} for the achievable sum-rate for both types of users in the transmission and reflection regions in a unified approach with significant practical advantages such as reduced complexity and overhead, which result in a lower number of required iterations for convergence compared to an alternating optimization (AO) approach. Notably, we maximize simultaneously the amplitudes, the phase shifts, and the active amplifying coefficients of the ASTARS by applying the projected gradient ascent method (PGAM). Remarkably, the proposed optimization can be executed at every several coherence intervals that reduces the processing burden considerably. Simulations corroborate the analytical results, provide insight into the effects of fundamental variables on the sum achievable SE, and present the superiority of 16 ASTARS compared to passive STAR-RIS for a practical number of surface elements.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09897
M. Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir
The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.
COVID-19 大流行对身心健康都产生了不利影响。在这一流行病期间,许多研究都侧重于从社交媒体中获得与健康相关的观点。在本研究中,我们的主要目标是开发一款基于机器学习的网络应用程序,用于自动分类社交媒体上与 COVID-19 相关的讨论。为此,我们标注了 COVID-19 相关的 Twitter 数据,提供了基准分类结果,并开发了一款网络应用程序。我们使用 Twitter API 收集数据,并将总共 6,667 条推文标记为五个不同的类别:健康风险、预防、症状、传播和治疗。我们使用各种特征提取方法提取特征,并将其应用于七种不同的传统机器学习算法,包括决策树、随机森林、随机梯度下降、Adaboost、K-近邻、逻辑回归和线性 SVC。此外,我们还使用了四种深度学习算法:LSTM、CNN、RNN 和 BERT 进行分类。总体而言,在深度学习中,我们使用 CNN 算法取得了 90.43% 的最高 F1 分数。线性 SVC 算法的 F1 得分最高,达到 86.13%,超过了其他传统机器学习方法。我们的研究不仅为健康相关数据分析领域做出了贡献,还以基于网络的高效数据分类工具的形式提供了宝贵的资源,有助于应对公共卫生挑战和提高对流行病的认识。我们公开了数据集和应用程序,可从以下链接下载:https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website。
{"title":"COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions","authors":"M. Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir","doi":"10.48550/arXiv.2402.09897","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09897","url":null,"abstract":"The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"11 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in multi-modal artificial intelligence (AI) have revolutionized the fields of stock market forecasting and heart rate monitoring. Utilizing diverse data sources can substantially improve prediction accuracy. Nonetheless, additional data may not always align with the original dataset. Interpolation methods are commonly utilized for handling missing values in modal data, though they may exhibit limitations in the context of sparse information. Addressing this challenge, we propose a Modality Completion Deep Belief Network-Based Model (MC-DBN). This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data. It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model. We conduct evaluations of the MC-DBN model in two datasets from the stock market forecasting and heart rate monitoring domains. Comprehensive experiments showcase the model's capacity to bridge the semantic divide present in multi-modal data, subsequently enhancing its performance. The source code is available at: https://github.com/logan-0623/DBN-generate
{"title":"MC-DBN: A Deep Belief Network-Based Model for Modality Completion","authors":"Zihong Luo, Haochen Xue, Mingyu Jin, Chengzhi Liu, Zile Huang, Chong Zhang, Shuliang Zhao","doi":"10.48550/arXiv.2402.09782","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09782","url":null,"abstract":"Recent advancements in multi-modal artificial intelligence (AI) have revolutionized the fields of stock market forecasting and heart rate monitoring. Utilizing diverse data sources can substantially improve prediction accuracy. Nonetheless, additional data may not always align with the original dataset. Interpolation methods are commonly utilized for handling missing values in modal data, though they may exhibit limitations in the context of sparse information. Addressing this challenge, we propose a Modality Completion Deep Belief Network-Based Model (MC-DBN). This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data. It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model. We conduct evaluations of the MC-DBN model in two datasets from the stock market forecasting and heart rate monitoring domains. Comprehensive experiments showcase the model's capacity to bridge the semantic divide present in multi-modal data, subsequently enhancing its performance. The source code is available at: https://github.com/logan-0623/DBN-generate","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"21 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}