ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09797

Hyewon Han, Naveen Kumar

In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic speech recognition (ASR) and natural language understanding (NLU) performance. To address this problem, we propose voice activity detection (VAD) model for all talkers using multichannel information, which is then used to filter audio for downstream tasks. We adopt a synthetic training data generation approach through playback and re-recording for such scenarios, simulating challenging speech overlap conditions. We train our models on this synthetic data and demonstrate that our approach outperforms single-channel VAD models and energy-based multi-channel VAD algorithm in various acoustic environments. In addition to VAD results, we also present multiparty ASR evaluation results to highlight the impact of using our VAD model for filtering audio in downstream tasks by significantly reducing the insertion error.

在这项工作中，我们为现场多方互动节目的多通道多谈话者设置提出了一种新颖的串扰抑制框架。我们的远场音频设置要求在现场互动时免提，由四个相邻的谈话者在同一空间内使用定向麦克风组成。这种设置通常会在声道之间产生严重的串扰，从而降低自动语音识别（ASR）和自然语言理解（NLU）的性能。为解决这一问题，我们提出了利用多通道信息对所有说话者进行语音活动检测（VAD）的模型，然后利用该模型为下游任务过滤音频。我们采用一种合成训练数据生成方法，通过回放和重新录制此类场景，模拟具有挑战性的语音重叠条件。我们在这些合成数据上训练我们的模型，并证明我们的方法在各种声学环境中优于单通道 VAD 模型和基于能量的多通道 VAD 算法。除了 VAD 结果外，我们还展示了多方 ASR 评估结果，以强调在下游任务中使用我们的 VAD 模型过滤音频的影响，即显著减少插入误差。

{"title":"A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings","authors":"Hyewon Han, Naveen Kumar","doi":"10.48550/arXiv.2402.09797","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09797","url":null,"abstract":"In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic speech recognition (ASR) and natural language understanding (NLU) performance. To address this problem, we propose voice activity detection (VAD) model for all talkers using multichannel information, which is then used to filter audio for downstream tasks. We adopt a synthetic training data generation approach through playback and re-recording for such scenarios, simulating challenging speech overlap conditions. We train our models on this synthetic data and demonstrate that our approach outperforms single-channel VAD models and energy-based multi-channel VAD algorithm in various acoustic environments. In addition to VAD results, we also present multiparty ASR evaluation results to highlight the impact of using our VAD model for filtering audio in downstream tasks by significantly reducing the insertion error.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective 反思 RLHF 中的信息结构：从图论角度看奖励泛化

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10184

Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang

There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. Here we aim to mitigate such incompatibility through the design of dataset information structures during reward modeling, and meanwhile propose new, generalizable methods of analysis that have wider applications, including potentially shedding light on goal misgeneralization. Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and large language model (LLM) behavior. Based on this framework, we introduce a new method to model generalization in the reward modeling stage of RLHF, the induced Bayesian network (IBN). Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical methods of generalization analysis. An insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. We derive that in complex contexts with limited data, the tree-based reward model (RM) induces up to $Theta(log n/loglog n)$ times less variance than chain-based RM where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. Looking ahead, we hope to extend the IBN analysis to help understand the phenomenon of goal misgeneralization.

来自人类反馈的强化学习（RLHF）存在一个三难问题：高度多样化的情境、低标记成本和可靠的配准性能之间的不兼容性。在此，我们旨在通过在奖励建模过程中设计数据集信息结构来缓解这种不兼容性，同时提出新的、可推广的分析方法，这些方法具有更广泛的应用前景，包括可能揭示目标泛化错误。具体来说，我们首先重新审视了 RLHF 过程，并提出了一个理论框架，将其描绘成文本分布的自动编码过程。我们的框架形式化了 RLHF 目标，即确保人类偏好与大型语言模型（LLM）行为之间的分布一致性。基于这一框架，我们在 RLHF 的奖励建模阶段引入了一种新的泛化建模方法--诱导贝叶斯网络（IBN）。该方法借鉴了随机图理论和因果分析，能够根据经验推导出泛化误差边界，是对经典泛化分析方法的重要改进。与传统 RLHF 方法中基于链的基线相比，我们的分析深入揭示了基于树的信息结构在奖赏建模中的优越性。我们得出，在数据有限的复杂情况下，基于树的奖励模型（RM）比基于链的奖励模型（其中$n$为数据集大小）引起的方差最多可减少$θ(log n/loglog n)$倍。作为验证，我们证明在三个 NLP 任务中，基于树的 RM 与基于链的基线相比，平均胜率达到 65%。展望未来，我们希望扩展 IBN 分析，以帮助理解目标概括错误的现象。

{"title":"Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective","authors":"Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang","doi":"10.48550/arXiv.2402.10184","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10184","url":null,"abstract":"There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. Here we aim to mitigate such incompatibility through the design of dataset information structures during reward modeling, and meanwhile propose new, generalizable methods of analysis that have wider applications, including potentially shedding light on goal misgeneralization. Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and large language model (LLM) behavior. Based on this framework, we introduce a new method to model generalization in the reward modeling stage of RLHF, the induced Bayesian network (IBN). Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical methods of generalization analysis. An insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. We derive that in complex contexts with limited data, the tree-based reward model (RM) induces up to $Theta(log n/loglog n)$ times less variance than chain-based RM where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. Looking ahead, we hope to extend the IBN analysis to help understand the phenomenon of goal misgeneralization.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"21 22","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Persuading a Learning Agent 说服学习代理

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09721

Tao Lin, Yiling Chen

We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.

我们研究的是一个重复贝叶斯说服问题（更广泛地说，是任何具有完整信息的广义委托代理问题），在这个问题中，委托人没有承诺能力，代理人使用算法来学习对委托人的信号做出反应。我们将这一问题简化为一个具有近似最佳响应代理的单次广义委托代理问题。通过这一简化，我们可以证明：如果代理人使用上下文无悔学习算法，那么委托人可以保证获得任意接近于有承诺的经典非学习模型中委托人最优效用的效用；如果代理人使用上下文无交换-后悔学习算法，那么委托人无法获得明显高于有承诺的非学习模型中最优效用的任何效用。委托人在学习模型和非学习模型中可获得的效用之间的差额以代理人的后悔（交换-后悔）为界。如果代理人使用基于均值的学习算法（可以是无遗憾算法，但不能是无交换遗憾算法），那么委托人的表现就会大大优于非学习模型。这些结论不仅适用于贝叶斯说服，也适用于任何具有完全信息的广义委托-代理问题，包括斯塔克尔伯格博弈和合同设计。

{"title":"Persuading a Learning Agent","authors":"Tao Lin, Yiling Chen","doi":"10.48550/arXiv.2402.09721","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09721","url":null,"abstract":"We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"9 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FedRDF: A Robust and Dynamic Aggregation Function against Poisoning Attacks in Federated Learning FedRDF：联盟学习中抵御中毒攻击的稳健动态聚合函数

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10082

Enrique Mármol Campos, Aurora González-Vidal, José Luis Hernández Ramos, A. Gómez-Skarmeta

Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks, which can significantly degrade model performance and hinder convergence. The effectiveness of existing approaches to mitigate complex attacks, such as median, trimmed mean, or Krum aggregation functions, has been only partially demonstrated in the case of specific attacks. Our study introduces a novel robust aggregation mechanism utilizing the Fourier Transform (FT), which is able to effectively handling sophisticated attacks without prior knowledge of the number of attackers. Employing this data technique, weights generated by FL clients are projected into the frequency domain to ascertain their density function, selecting the one exhibiting the highest frequency. Consequently, malicious clients' weights are excluded. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.

联合学习（FL）是解决与集中式机器学习（ML）部署相关的典型隐私问题的一种有前途的方法。尽管联合学习具有众所周知的优势，但它很容易受到拜占庭行为和中毒攻击等安全攻击，这些攻击会显著降低模型性能并阻碍收敛。现有的缓解复杂攻击的方法，如中位数、修剪均值或克鲁姆聚合函数，仅在特定攻击情况下部分证明了其有效性。我们的研究引入了一种利用傅立叶变换（FT）的新型稳健聚合机制，它能够有效处理复杂的攻击，而无需事先了解攻击者的数量。利用这种数据技术，FL 客户端生成的权重被投射到频域中，以确定其密度函数，并选择频率最高的一个。因此，恶意客户端的权重被排除在外。我们提出的方法针对各种模型中毒攻击进行了测试，证明其性能优于最先进的聚合方法。

{"title":"FedRDF: A Robust and Dynamic Aggregation Function against Poisoning Attacks in Federated Learning","authors":"Enrique Mármol Campos, Aurora González-Vidal, José Luis Hernández Ramos, A. Gómez-Skarmeta","doi":"10.48550/arXiv.2402.10082","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10082","url":null,"abstract":"Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks, which can significantly degrade model performance and hinder convergence. The effectiveness of existing approaches to mitigate complex attacks, such as median, trimmed mean, or Krum aggregation functions, has been only partially demonstrated in the case of specific attacks. Our study introduces a novel robust aggregation mechanism utilizing the Fourier Transform (FT), which is able to effectively handling sophisticated attacks without prior knowledge of the number of attackers. Employing this data technique, weights generated by FL clients are projected into the frequency domain to ascertain their density function, selecting the one exhibiting the highest frequency. Consequently, malicious clients' weights are excluded. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models OptiMUS：利用 (MI)LP 求解器和大型语言模型进行可扩展优化建模

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10172

Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell

Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30%$.

从制造业、分销业到医疗保健业，优化问题无处不在。然而，由于制定和解决这些问题所需的专业知识限制了优化工具和技术的广泛应用，因此大多数此类问题仍由人工启发式解决，而不是由最先进的求解器优化解决。本文介绍了 OptiMUS，这是一种基于大型语言模型（LLM）的代理，旨在根据自然语言描述制定和解决（混合整数）线性规划问题。OptiMUS 可以开发数学模型、编写和调试求解器代码、评估生成的解决方案，并根据评估结果改进其模型和代码。OptiMUS 采用模块化结构处理问题，因此可以处理描述冗长、数据复杂的问题，而无需冗长的提示。实验证明，在简单数据集上，OptiMUS 的性能比现有的一流方法高出 20% 美元以上，而在困难数据集上（包括与本文一同发布的新数据集 NLP4LP，该数据集具有长而复杂的问题），OptiMUS 的性能比现有的一流方法高出 30% 美元以上。

引用次数: 0

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation 文本到图像生成扩散模型的自播放微调

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10210

Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner"and"loser"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

对扩散模型进行微调仍然是生成式人工智能（GenAI）领域一个尚未充分开发的前沿领域，尤其是与在微调大型语言模型（LLM）方面取得的显著进展相比。虽然稳定扩散（SD）和 SDXL 等尖端扩散模型依赖于有监督的微调，但它们的性能在看到一定量的数据后会不可避免地趋于平稳。最近，强化学习（RL）被用于利用人类偏好数据对扩散模型进行微调，但它要求每个文本提示至少有两个图像（"赢家 "和 "输家 "图像）。在本文中，我们引入了一种称为扩散模型自我游戏微调（SPIN-Diffusion）的创新技术，在这种技术中，扩散模型与其早期版本进行竞争，从而促进迭代式自我改进过程。我们的方法可替代传统的监督微调和 RL 策略，显著提高模型性能和一致性。我们在 Pick-a-Pic 数据集上的实验表明，SPIN-Diffusion 从第一次迭代开始就在人类偏好一致性和视觉吸引力方面优于现有的监督微调方法。到第二次迭代时，它在所有指标上的表现都超过了基于 RLHF 的方法，而且只用了较少的数据就取得了这些结果。

{"title":"Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation","authors":"Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu","doi":"10.48550/arXiv.2402.10210","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10210","url":null,"abstract":"Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images (\"winner\"and\"loser\"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"1 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two-Timescale Design for Active STAR-RIS Aided Massive MIMO Systems 主动式 STAR-RIS 辅助大规模多输入多输出系统的双时标设计

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09896

Anastasios K. Papazafeiropoulos, Hanxiao Ge, P. Kourtessis, T. Ratnarajah, S. Chatzinotas, S. Papavassiliou

Simultaneously transmitting and reflecting textcolor{black}{reconfigurable intelligent surface} (STAR-RIS) is a promising implementation of RIS-assisted systems that enables full-space coverage. However, STAR-RIS as well as conventional RIS suffer from the double-fading effect. Thus, in this paper, we propose the marriage of active RIS and STAR-RIS, denoted as ASTARS for massive multiple-input multiple-output (mMIMO) systems, and we focus on the energy splitting (ES) and mode switching (MS) protocols. Compared to prior literature, we consider the impact of correlated fading, and we rely our analysis on the two timescale protocol, being dependent on statistical channel state information (CSI). On this ground, we propose a channel estimation method for ASTARS with reduced overhead that accounts for its architecture. Next, we derive a textcolor{black}{closed-form expression} for the achievable sum-rate for both types of users in the transmission and reflection regions in a unified approach with significant practical advantages such as reduced complexity and overhead, which result in a lower number of required iterations for convergence compared to an alternating optimization (AO) approach. Notably, we maximize simultaneously the amplitudes, the phase shifts, and the active amplifying coefficients of the ASTARS by applying the projected gradient ascent method (PGAM). Remarkably, the proposed optimization can be executed at every several coherence intervals that reduces the processing burden considerably. Simulations corroborate the analytical results, provide insight into the effects of fundamental variables on the sum achievable SE, and present the superiority of 16 ASTARS compared to passive STAR-RIS for a practical number of surface elements.

同时发射和反射（textcolor{black}{可重新配置的智能表面}（STAR-RIS）是 RIS 辅助系统的一种有前途的实现方式，它能实现全空间覆盖。然而，STAR-RIS 和传统的 RIS 都存在双衰减效应。因此，在本文中，我们提出了主动 RIS 与 STAR-RIS 的结合，并将其命名为 ASTARS，用于大规模多输入多输出（mMIMO）系统，重点关注能量分割（ES）和模式切换（MS）协议。与之前的文献相比，我们考虑了相关衰落的影响，并依赖于统计信道状态信息（CSI）对两个时标协议进行分析。在此基础上，我们提出了一种针对 ASTARS 的信道估计方法，该方法开销较小，且考虑到了 ASTARS 的架构。接下来，我们用一种统一的方法推导出了两类用户在传输和反射区域的可实现总速率（textcolor{black}{闭式表达式}），这种方法具有显著的实用优势，如降低了复杂性和开销，与交替优化（AO）方法相比，收敛所需的迭代次数更少。值得注意的是，通过应用投影梯度上升法（PGAM），我们同时最大化了 ASTARS 的振幅、相移和有源放大系数。值得注意的是，所提出的优化方法可以在每几个相干间隔内执行，从而大大减轻了处理负担。仿真证实了分析结果，深入分析了基本变量对可实现 SE 之和的影响，并展示了 16 ASTARS 与被动 STAR-RIS 相比，在实际表面元素数量上的优越性。

{"title":"Two-Timescale Design for Active STAR-RIS Aided Massive MIMO Systems","authors":"Anastasios K. Papazafeiropoulos, Hanxiao Ge, P. Kourtessis, T. Ratnarajah, S. Chatzinotas, S. Papavassiliou","doi":"10.48550/arXiv.2402.09896","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09896","url":null,"abstract":"Simultaneously transmitting and reflecting textcolor{black}{reconfigurable intelligent surface} (STAR-RIS) is a promising implementation of RIS-assisted systems that enables full-space coverage. However, STAR-RIS as well as conventional RIS suffer from the double-fading effect. Thus, in this paper, we propose the marriage of active RIS and STAR-RIS, denoted as ASTARS for massive multiple-input multiple-output (mMIMO) systems, and we focus on the energy splitting (ES) and mode switching (MS) protocols. Compared to prior literature, we consider the impact of correlated fading, and we rely our analysis on the two timescale protocol, being dependent on statistical channel state information (CSI). On this ground, we propose a channel estimation method for ASTARS with reduced overhead that accounts for its architecture. Next, we derive a textcolor{black}{closed-form expression} for the achievable sum-rate for both types of users in the transmission and reflection regions in a unified approach with significant practical advantages such as reduced complexity and overhead, which result in a lower number of required iterations for convergence compared to an alternating optimization (AO) approach. Notably, we maximize simultaneously the amplitudes, the phase shifts, and the active amplifying coefficients of the ASTARS by applying the projected gradient ascent method (PGAM). Remarkably, the proposed optimization can be executed at every several coherence intervals that reduces the processing burden considerably. Simulations corroborate the analytical results, provide insight into the effects of fundamental variables on the sum achievable SE, and present the superiority of 16 ASTARS compared to passive STAR-RIS for a practical number of surface elements.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions COVIDHealth：用于分类 COVID-19 讨论的基准 Twitter 数据集和基于机器学习的网络应用程序

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09897

M. Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir

The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.

COVID-19 大流行对身心健康都产生了不利影响。在这一流行病期间，许多研究都侧重于从社交媒体中获得与健康相关的观点。在本研究中，我们的主要目标是开发一款基于机器学习的网络应用程序，用于自动分类社交媒体上与 COVID-19 相关的讨论。为此，我们标注了 COVID-19 相关的 Twitter 数据，提供了基准分类结果，并开发了一款网络应用程序。我们使用 Twitter API 收集数据，并将总共 6,667 条推文标记为五个不同的类别：健康风险、预防、症状、传播和治疗。我们使用各种特征提取方法提取特征，并将其应用于七种不同的传统机器学习算法，包括决策树、随机森林、随机梯度下降、Adaboost、K-近邻、逻辑回归和线性 SVC。此外，我们还使用了四种深度学习算法：LSTM、CNN、RNN 和 BERT 进行分类。总体而言，在深度学习中，我们使用 CNN 算法取得了 90.43% 的最高 F1 分数。线性 SVC 算法的 F1 得分最高，达到 86.13%，超过了其他传统机器学习方法。我们的研究不仅为健康相关数据分析领域做出了贡献，还以基于网络的高效数据分类工具的形式提供了宝贵的资源，有助于应对公共卫生挑战和提高对流行病的认识。我们公开了数据集和应用程序，可从以下链接下载：https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website。

{"title":"COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions","authors":"M. Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir","doi":"10.48550/arXiv.2402.09897","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09897","url":null,"abstract":"The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"11 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MC-DBN: A Deep Belief Network-Based Model for Modality Completion MC-DBN：基于深度信念网络的模态完成模型

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09782

Zihong Luo, Haochen Xue, Mingyu Jin, Chengzhi Liu, Zile Huang, Chong Zhang, Shuliang Zhao

Recent advancements in multi-modal artificial intelligence (AI) have revolutionized the fields of stock market forecasting and heart rate monitoring. Utilizing diverse data sources can substantially improve prediction accuracy. Nonetheless, additional data may not always align with the original dataset. Interpolation methods are commonly utilized for handling missing values in modal data, though they may exhibit limitations in the context of sparse information. Addressing this challenge, we propose a Modality Completion Deep Belief Network-Based Model (MC-DBN). This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data. It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model. We conduct evaluations of the MC-DBN model in two datasets from the stock market forecasting and heart rate monitoring domains. Comprehensive experiments showcase the model's capacity to bridge the semantic divide present in multi-modal data, subsequently enhancing its performance. The source code is available at: https://github.com/logan-0623/DBN-generate

多模式人工智能（AI）的最新进展给股市预测和心率监测领域带来了革命性的变化。利用不同的数据源可以大大提高预测的准确性。然而，附加数据并不总能与原始数据集保持一致。插值方法通常用于处理模态数据中的缺失值，但在信息稀疏的情况下，这些方法可能会表现出局限性。为了应对这一挑战，我们提出了一种基于模态完成深度信念网络的模型（MC-DBN）。这种方法利用完整数据的隐含特征来弥补自身与其他不完整数据之间的差距。它能确保增强后的多模态数据与真实世界的动态特性紧密结合，从而提高模型的有效性。我们在股市预测和心率监测领域的两个数据集中对 MC-DBN 模型进行了评估。综合实验表明，该模型有能力弥合多模态数据中存在的语义鸿沟，从而提高其性能。源代码可在以下网址获取： https://github.com/logan-0623/DBN-generate

{"title":"MC-DBN: A Deep Belief Network-Based Model for Modality Completion","authors":"Zihong Luo, Haochen Xue, Mingyu Jin, Chengzhi Liu, Zile Huang, Chong Zhang, Shuliang Zhao","doi":"10.48550/arXiv.2402.09782","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09782","url":null,"abstract":"Recent advancements in multi-modal artificial intelligence (AI) have revolutionized the fields of stock market forecasting and heart rate monitoring. Utilizing diverse data sources can substantially improve prediction accuracy. Nonetheless, additional data may not always align with the original dataset. Interpolation methods are commonly utilized for handling missing values in modal data, though they may exhibit limitations in the context of sparse information. Addressing this challenge, we propose a Modality Completion Deep Belief Network-Based Model (MC-DBN). This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data. It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model. We conduct evaluations of the MC-DBN model in two datasets from the stock market forecasting and heart rate monitoring domains. Comprehensive experiments showcase the model's capacity to bridge the semantic divide present in multi-modal data, subsequently enhancing its performance. The source code is available at: https://github.com/logan-0623/DBN-generate","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"21 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitigating subjectivity and bias in AI development indices: A robust approach to redefining country rankings 减少人工智能发展指数中的主观性和偏见：重新定义国家排名的稳健方法

ArXiv

Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10122

B. S. Campello, G. D. Pelegrina, R. Pelissari, Ricardo Suyama, L. T. Duarte

Countries worldwide have been implementing different actions national strategies for Artificial Intelligence (AI) to shape policy priorities and guide their development concerning AI. Several AI indices have emerged to assess countries' progress in AI development, aiding decision-making on investments and policy choices. Typically, these indices combine multiple indicators using linear additive methods such as weighted sums, although they are limited in their ability to account for interactions among indicators. Another limitation concerns the use of deterministic weights, which can be perceived as subjective and vulnerable to debate and scrutiny, especially by nations that feel disadvantaged. Aiming at mitigating these problems, we conduct a methodological analysis to derive AI indices based on multiple criteria decision analysis. Initially, we assess correlations between different AI dimensions and employ the Choquet integral to model them. Thus, we apply the Stochastic Multicriteria Acceptability Analysis (SMAA) to conduct a sensitivity analysis using both weighted sum and Choquet integral in order to evaluate the stability of the indices with regard the weights. Finally, we introduce a novel ranking methodology based on SMAA, which considers several sets of weights to derive the ranking of countries. As a result, instead of using predefined weights, in the proposed approach, the ranking is achieved based on the probabilities of countries in occupying a specific position. In the computational analysis, we utilize the data employed in The Global AI Index proposed by Tortoise. Results reveal correlations in the data, and our approach effectively mitigates bias. In the sensitivity analysis, we scrutinize changes in the ranking resulting from weight adjustments. We demonstrate that our proposal rankings closely align with those derived from weight variations, proving to be more robust.

世界各国一直在实施不同的人工智能（AI）国家战略行动，以制定政策优先事项并指导其人工智能发展。一些人工智能指数已经出现，用于评估各国在人工智能发展方面的进展，帮助各国做出投资决策和政策选择。通常情况下，这些指数采用加权总和等线性相加方法将多个指标结合起来，但在考虑指标之间的相互作用方面能力有限。另一个局限性是使用确定性权重，这可能被视为主观的，容易受到争论和审查，特别是那些认为自己处于不利地位的国家。为了缓解这些问题，我们开展了一项方法分析，在多重标准决策分析的基础上得出人工智能指数。首先，我们评估了人工智能不同维度之间的相关性，并采用 Choquet 积分对其进行建模。然后，我们运用随机多标准可接受性分析法（SMAA），使用加权和与乔克特积分进行敏感性分析，以评估指数在权重方面的稳定性。最后，我们在 SMAA 的基础上引入了一种新的排名方法，该方法考虑了多组权重来得出国家排名。因此，在所提出的方法中，不是使用预先确定的权重，而是根据各国占据特定位置的概率进行排序。在计算分析中，我们利用了 Tortoise 提出的全球人工智能指数中使用的数据。结果显示了数据中的相关性，我们的方法有效地减少了偏差。在敏感性分析中，我们仔细研究了权重调整导致的排名变化。我们的结果表明，我们的建议排名与权重变化得出的排名非常接近，证明我们的方法更加稳健。

{"title":"Mitigating subjectivity and bias in AI development indices: A robust approach to redefining country rankings","authors":"B. S. Campello, G. D. Pelegrina, R. Pelissari, Ricardo Suyama, L. T. Duarte","doi":"10.48550/arXiv.2402.10122","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10122","url":null,"abstract":"Countries worldwide have been implementing different actions national strategies for Artificial Intelligence (AI) to shape policy priorities and guide their development concerning AI. Several AI indices have emerged to assess countries' progress in AI development, aiding decision-making on investments and policy choices. Typically, these indices combine multiple indicators using linear additive methods such as weighted sums, although they are limited in their ability to account for interactions among indicators. Another limitation concerns the use of deterministic weights, which can be perceived as subjective and vulnerable to debate and scrutiny, especially by nations that feel disadvantaged. Aiming at mitigating these problems, we conduct a methodological analysis to derive AI indices based on multiple criteria decision analysis. Initially, we assess correlations between different AI dimensions and employ the Choquet integral to model them. Thus, we apply the Stochastic Multicriteria Acceptability Analysis (SMAA) to conduct a sensitivity analysis using both weighted sum and Choquet integral in order to evaluate the stability of the indices with regard the weights. Finally, we introduce a novel ranking methodology based on SMAA, which considers several sets of weights to derive the ranking of countries. As a result, instead of using predefined weights, in the proposed approach, the ranking is achieved based on the probabilities of countries in occupying a specific position. In the computational analysis, we utilize the data employed in The Global AI Index proposed by Tortoise. Results reveal correlations in the data, and our approach effectively mitigates bias. In the sensitivity analysis, we scrutinize changes in the ranking resulting from weight adjustments. We demonstrate that our proposal rankings closely align with those derived from weight variations, proving to be more robust.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ArXiv最新文献