ArXiv最新文献

英文中文

ArXiv

Pub Date : 2024-03-10 DOI: 10.1609/aaai.v38i15.29675

Pedro Zuidberg Dos Martires

Probabilistic circuits (PCs) have gained prominence in recent years as a versatile framework for discussing probabilistic models that support tractable queries and are yet expressive enough to model complex probability distributions. Nevertheless, tractability comes at a cost: PCs are less expressive than neural networks. In this paper we introduce probabilistic neural circuits (PNCs), which strike a balance between PCs and neural nets in terms of tractability and expressive power. Theoretically, we show that PNCs can be interpreted as deep mixtures of Bayesian networks. Experimentally, we demonstrate that PNCs constitute powerful function approximators.

近年来，概率电路（PCs）作为讨论概率模型的通用框架日益受到人们的重视，它既支持简单明了的查询，又有足够的表现力来模拟复杂的概率分布。然而，可操作性是有代价的：PC 的表现力不如神经网络。在本文中，我们介绍了概率神经回路（PNC），它在可操作性和表现力方面实现了个人计算机和神经网络之间的平衡。从理论上讲，我们证明 PNC 可以解释为贝叶斯网络的深度混合。实验证明，PNC 构成了强大的函数近似器。

引用次数: 0

Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information 通过整合频率和音乐风格信息增强舞蹈生成的表现力

ArXiv

Pub Date : 2024-03-09 DOI: 10.1109/icassp48485.2024.10448469

Qiaochu Huang, Xu He, Boshi Tang, Hao-Wen Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen M. Meng

Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose ExpressiveBailando, a novel dance generation method designed to generate expressive dances, concurrently taking all three factors into account. Specifically, we mitigate the issue of speed homogenization by incorporating frequency information into VQ-VAE, thus improving dance dynamics. Additionally, we integrate music style information by extracting genre- and beat-related features with a pre-trained music model, hence achieving improvements in the other two factors. Extensive experimental results demonstrate that our proposed method can generate dances with high expressiveness and outperforms existing methods both qualitatively and quantitatively.

舞蹈生成作为人类动作生成的一个分支，已引起越来越多的关注。最近，一些作品试图从某些方面增强舞蹈的表现力，其中包括体裁匹配、节拍协调和舞蹈动态。然而，由于缺乏对上述三方面因素的综合考虑，其增强效果相当有限。在本文中，我们提出了一种新颖的舞蹈生成方法 ExpressiveBailando，该方法旨在生成富有表现力的舞蹈，同时兼顾上述三个因素。具体来说，我们通过将频率信息纳入 VQ-VAE 来缓解速度同质化问题，从而改善舞蹈的动态效果。此外，我们还整合了音乐风格信息，通过预先训练的音乐模型提取与流派和节拍相关的特征，从而改善了其他两个因素。广泛的实验结果表明，我们提出的方法可以生成具有高表现力的舞蹈，在质量和数量上都优于现有方法。

引用次数: 1

UniSparse: An Intermediate Language for General Sparse Format Customization UniSparse：通用稀疏格式定制的中间语言

ArXiv

Pub Date : 2024-03-09 DOI: 10.1145/3649816

Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, Zhiru Zhang

The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by utilizing sparsity pattern- or target-aware data structures and layouts to enhance memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively customizing the sparse formats. Additionally, because these frameworks represent formats using a limited set of per-dimension attributes, they lack the flexibility to accommodate numerous new variations of custom sparse data structures and layouts. To overcome this deficiency, we propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives. We also develop a compiler leveraging the MLIR infrastructure, which supports adaptive customization of formats, and automatic code generation of format conversion and compute operations for heterogeneous architectures. We demonstrate the efficacy of our approach through experiments running commonly-used sparse linear algebra operations with specialized formats on multiple different hardware targets, including an Intel CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory (PIM) device.

随着硬件专业化趋势的不断发展，在处理稀疏工作负载时，越来越多地使用定制数据格式，而这些工作负载通常是内存绑定的。这些格式通过利用稀疏模式或目标感知数据结构和布局来提高内存访问延迟和带宽利用率，从而促进优化软件/硬件实现。然而，现有的稀疏张量编程模型和编译器很少或根本不支持对稀疏格式进行有效定制。此外，由于这些框架使用有限的每维度属性集来表示格式，因此缺乏灵活性，无法适应自定义稀疏数据结构和布局的大量新变化。为了克服这一不足，我们提出了 UniSparse，一种为表示和定制稀疏格式提供统一抽象的中间语言。与现有的基于属性的框架不同，UniSparse 将稀疏张量的逻辑表示（即数据结构）与其底层内存布局分离开来，从而实现了两者的定制。因此，丰富的格式定制可以通过一小套定义明确的查询、突变和布局原语简洁地表达出来。我们还利用 MLIR 基础设施开发了一个编译器，它支持格式的自适应定制，以及异构架构的格式转换和计算操作的自动代码生成。我们通过在多个不同的硬件目标（包括英特尔 CPU、英伟达 GPU、AMD Xilinx FPGA 和模拟内存处理 (PIM) 设备）上运行具有专用格式的常用稀疏线性代数操作的实验，证明了我们的方法的功效。

{"title":"UniSparse: An Intermediate Language for General Sparse Format Customization","authors":"Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, Zhiru Zhang","doi":"10.1145/3649816","DOIUrl":"https://doi.org/10.1145/3649816","url":null,"abstract":"The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by utilizing sparsity pattern- or target-aware data structures and layouts to enhance memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively customizing the sparse formats. Additionally, because these frameworks represent formats using a limited set of per-dimension attributes, they lack the flexibility to accommodate numerous new variations of custom sparse data structures and layouts. To overcome this deficiency, we propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives. We also develop a compiler leveraging the MLIR infrastructure, which supports adaptive customization of formats, and automatic code generation of format conversion and compute operations for heterogeneous architectures. We demonstrate the efficacy of our approach through experiments running commonly-used sparse linear algebra operations with specialized formats on multiple different hardware targets, including an Intel CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory (PIM) device.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"9 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms 社交媒体上的内容审核公正与公平：不同语境和平台的比较

ArXiv

Pub Date : 2024-03-09 DOI: 10.1145/3613905.3650882

Jie Cai, Aashka Patel, Azadeh Naderi, D. Y. Wohn

Social media users may perceive moderation decisions by the platform differently, which can lead to frustration and dropout. This study investigates users' perceived justice and fairness of online moderation decisions when they are exposed to various illegal versus legal scenarios, retributive versus restorative moderation strategies, and user-moderated versus commercially moderated platforms. We conduct an online experiment on 200 American social media users of Reddit and Twitter. Results show that retributive moderation delivers higher justice and fairness for commercially moderated than for user-moderated platforms in illegal violations; restorative moderation delivers higher fairness for legal violations than illegal ones. We discuss the opportunities for platform policymaking to improve moderation system design.

社交媒体用户可能会对平台的审核决定有不同的感知，这可能会导致挫败感和辍学。本研究调查了用户在遇到各种非法与合法的情况、惩罚性与恢复性的审核策略、用户审核平台与商业审核平台时，对在线审核决定的公正性和公平性的感知。我们对 Reddit 和 Twitter 的 200 名美国社交媒体用户进行了在线实验。结果表明，在非法违规的情况下，商业管理平台的惩罚性审核比用户管理平台的公正性和公平性更高；在合法违规的情况下，恢复性审核比非法违规的公正性更高。我们讨论了平台政策制定改进审核系统设计的机会。

引用次数: 0

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks sVAD：利用尖峰神经网络进行鲁棒、低功耗和轻量级语音活动检测

ArXiv

Pub Date : 2024-03-09 DOI: 10.1109/icassp48485.2024.10446945

Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. Particularly, it provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms. The classifier utilizes Spiking Recurrent Neural Networks (sRNN) to exploit temporal speech information. Experimental results demonstrate that our sVAD achieves remarkable noise robustness and meanwhile maintains low power consumption and a small footprint, making it a promising solution for real-world VAD applications.

语音应用需要低功耗，并能在噪声条件下保持稳定。有效的语音活动检测（VAD）前端可降低计算需求。众所周知，尖峰神经网络（SNN）具有生物合理性和高能效。然而，基于 SNN 的 VAD 尚未实现噪声鲁棒性，而且通常需要大型模型才能实现高性能。本文介绍了一种新颖的基于 SNN 的 VAD 模型（简称为 sVAD），其特点是听觉编码器具有基于 SNN 的注意机制。特别是，它通过 SincNet 和一维卷积提供了有效的听觉特征表示，并通过注意力机制提高了噪声鲁棒性。分类器利用尖峰递归神经网络（sRNN）来利用时态语音信息。实验结果表明，我们的 sVAD 具有显著的噪声鲁棒性，同时功耗低、占用空间小，是一种很有前途的实际 VAD 应用解决方案。

引用次数: 0

A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation 对优酷用户在内容创作中使用生成式人工智能的初步探索

ArXiv

Pub Date : 2024-03-09 DOI: 10.1145/3613905.3651057

Yao Lyu, He Zhang, Shuo Niu, Jie Cai

Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being applied, and the methodologies content creators employ with Gen-AI tools during the creation process. This study initially explores this emerging area through a qualitative analysis of 68 YouTube videos demonstrating Gen-AI usage. Our research focuses on identifying the content domains, the variety of tools used, the activities performed, and the nature of the final products generated by Gen-AI in the context of user-generated content.

在 YouTube、TikTok、Instagram 和各种博客网站等平台上，内容创作者越来越多地利用人工智能生成技术（Gen-AI），使用大型语言模型（LLM）制作富有想象力的图片、人工智能生成的视频和文章。尽管人工智能生成的内容越来越受欢迎，但在应用人工智能生成内容的具体领域，以及内容创作者在创作过程中使用 Gen-AI 工具的方法论方面，仍然存在探索不足的问题。本研究通过对 68 个展示 Gen-AI 使用情况的 YouTube 视频进行定性分析，初步探索了这一新兴领域。我们的研究重点是在用户生成内容的背景下，确定 Gen-AI 生成的内容领域、使用的各种工具、执行的活动以及最终产品的性质。

引用次数: 1

Assessing User Apprehensions About Mixed Reality Artifacts and Applications: The Mixed Reality Concerns (MRC) Questionnaire 评估用户对混合现实人工制品和应用程序的疑虑：混合现实关注点（MRC）调查问卷

ArXiv

Pub Date : 2024-03-09 DOI: 10.1145/3613904.3642631

Christopher Katins, Pawel W. Wo'zniak, Aodi Chen, Ihsan Tumay, Luu Viet Trinh Le, John Uschold, Thomas Kosch

Current research in Mixed Reality (MR) presents a wide range of novel use cases for blending virtual elements with the real world. This yet-to-be-ubiquitous technology challenges how users currently work and interact with digital content. While offering many potential advantages, MR technologies introduce new security, safety, and privacy challenges. Thus, it is relevant to understand users' apprehensions towards MR technologies, ranging from security concerns to social acceptance. To address this challenge, we present the Mixed Reality Concerns (MRC) Questionnaire, designed to assess users' concerns towards MR artifacts and applications systematically. The development followed a structured process considering previous work, expert interviews, iterative refinements, and confirmatory tests to analytically validate the questionnaire. The MRC Questionnaire offers a new method of assessing users' critical opinions to compare and assess novel MR artifacts and applications regarding security, privacy, social implications, and trust.

当前的混合现实（MR）研究为虚拟元素与现实世界的融合提供了大量新颖的使用案例。这种尚未普及的技术对用户目前的工作方式和与数字内容的交互方式提出了挑战。在提供许多潜在优势的同时，MR 技术也带来了新的安保、安全和隐私挑战。因此，了解用户对磁共振技术的疑虑（从安全问题到社会接受度）具有重要意义。为了应对这一挑战，我们提出了混合现实关注点（MRC）问卷，旨在系统地评估用户对 MR 人工制品和应用的关注点。问卷的开发采用了结构化流程，考虑了先前的工作、专家访谈、迭代改进和确认测试，以便对问卷进行分析验证。MRC 问卷提供了一种评估用户关键意见的新方法，可用于比较和评估新型磁共振人工智能和应用的安全性、隐私性、社会影响和信任度。

引用次数: 0

Deciphering Crypto Twitter 解密加密推特

ArXiv

Pub Date : 2024-03-09 DOI: 10.1145/3614419.3644026

In-Soon Kang, Maruf Ahmed Mridul, Abraham Sanders, Yao Ma, Thilanka Munasinghe, Aparna Gupta, O. Seneviratne

Cryptocurrency is a fast-moving space, with a continuous influx of new projects every year. However, an increasing number of incidents in the space, such as hacks and security breaches, threaten the growth of the community and the development of technology. This dynamic and often tumultuous landscape is vividly mirrored and shaped by discussions within Crypto Twitter, a key digital arena where investors, enthusiasts, and skeptics converge, revealing real-time sentiments and trends through social media interactions. We present our analysis on a Twitter dataset collected during a formative period of the cryptocurrency landscape. We collected 40 million tweets using cryptocurrency-related keywords and performed a nuanced analysis that involved grouping the tweets by semantic similarity and constructing a tweet and user network. We used sentence-level embeddings and autoencoders to create K-means clusters of tweets and identified six groups of tweets and their topics to examine different cryptocurrency-related interests and the change in sentiment over time. Moreover, we discovered sentiment indicators that point to real-life incidents in the crypto world, such as the FTX incident of November 2022. We also constructed and analyzed different networks of tweets and users in our dataset by considering the reply and quote relationships and analyzed the largest components of each network. Our networks reveal a structure of bot activity in Crypto Twitter and suggest that they can be detected and handled using a network-based approach. Our work sheds light on the potential of social media signals to detect and understand crypto events, benefiting investors, regulators, and curious observers alike, as well as the potential for bot detection in Crypto Twitter using a network-based approach.

加密货币是一个快速发展的领域，每年都有大量新项目不断涌现。然而，该领域越来越多的事件，如黑客攻击和安全漏洞，威胁着社区的发展和技术的进步。Crypto Twitter 是投资者、爱好者和怀疑论者汇聚的重要数字领域，通过社交媒体互动揭示实时情绪和趋势。我们对加密货币格局形成时期收集的 Twitter 数据集进行了分析。我们收集了 4000 万条使用加密货币相关关键词的推文，并进行了细致入微的分析，包括按语义相似性对推文进行分组，以及构建推文和用户网络。我们使用句子级嵌入和自动编码器来创建推文的 K-means 聚类，并确定了六组推文及其主题，以研究与加密货币相关的不同兴趣和随时间推移的情绪变化。此外，我们还发现了指向加密世界真实事件的情绪指标，例如 2022 年 11 月的 FTX 事件。我们还通过考虑回复和引用关系，构建并分析了数据集中推文和用户的不同网络，并分析了每个网络中最大的组成部分。我们的网络揭示了 Crypto Twitter 中的僵尸活动结构，并表明可以使用基于网络的方法检测和处理僵尸活动。我们的工作揭示了社交媒体信号在检测和理解加密货币事件方面的潜力，使投资者、监管者和好奇的观察者都能从中受益，同时也揭示了使用基于网络的方法在加密货币推特中进行僵尸检测的潜力。

{"title":"Deciphering Crypto Twitter","authors":"In-Soon Kang, Maruf Ahmed Mridul, Abraham Sanders, Yao Ma, Thilanka Munasinghe, Aparna Gupta, O. Seneviratne","doi":"10.1145/3614419.3644026","DOIUrl":"https://doi.org/10.1145/3614419.3644026","url":null,"abstract":"Cryptocurrency is a fast-moving space, with a continuous influx of new projects every year. However, an increasing number of incidents in the space, such as hacks and security breaches, threaten the growth of the community and the development of technology. This dynamic and often tumultuous landscape is vividly mirrored and shaped by discussions within Crypto Twitter, a key digital arena where investors, enthusiasts, and skeptics converge, revealing real-time sentiments and trends through social media interactions. We present our analysis on a Twitter dataset collected during a formative period of the cryptocurrency landscape. We collected 40 million tweets using cryptocurrency-related keywords and performed a nuanced analysis that involved grouping the tweets by semantic similarity and constructing a tweet and user network. We used sentence-level embeddings and autoencoders to create K-means clusters of tweets and identified six groups of tweets and their topics to examine different cryptocurrency-related interests and the change in sentiment over time. Moreover, we discovered sentiment indicators that point to real-life incidents in the crypto world, such as the FTX incident of November 2022. We also constructed and analyzed different networks of tweets and users in our dataset by considering the reply and quote relationships and analyzed the largest components of each network. Our networks reveal a structure of bot activity in Crypto Twitter and suggest that they can be detected and handled using a network-based approach. Our work sheds light on the potential of social media signals to detect and understand crypto events, benefiting investors, regulators, and curious observers alike, as well as the potential for bot detection in Crypto Twitter using a network-based approach.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"11 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning CSCNET：用于合成零点学习的分类指定级联网络

ArXiv

Pub Date : 2024-03-09 DOI: 10.1109/icassp48485.2024.10446756

Yanyi Zhang, Qi Jia, Xin Fan, Yu Liu, Ran He

Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL), whose aim is to recognize novel A-O compositions based on foregone knowledge. Existing methods based on disentangled representation learning lose sight of the contextual dependency between the A-O primitive pairs. Inspired by this, we propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet). The key insight is to firstly classify one primitive and then specifies the predicted class as a priori for guiding another primitive recognition in a cascaded fashion. To this end, CSCNet constructs Attribute-to-Object and Object-to-Attribute cascaded branches, in addition to a composition branch modeling the two primitives as a whole. Notably, we devise a parametric classifier (ParamCls) to improve the matching between visual and semantic embeddings. By improving the A-O disentanglement, our framework achieves superior results than previous competitive methods.

属性与对象（A-O）分离是组合零点学习（CZSL）的一个基本而关键的问题，其目的是根据已有知识识别新的 A-O 组合。现有的基于解缠表示学习的方法忽略了 A-O 原始对之间的上下文依赖关系。受此启发，我们为 CZSL 提出了一个新颖的 A-O 分解框架，即分类指定级联网络（CSCNet）。其关键在于首先对一个基元进行分类，然后将预测的类别作为先验类别，以级联方式指导另一个基元的识别。为此，CSCNet 构建了 "属性到对象 "和 "对象到属性 "的级联分支，此外还有一个将两个基元作为整体建模的组合分支。值得注意的是，我们设计了一个参数分类器（ParamCls）来改进视觉嵌入和语义嵌入之间的匹配。通过改进 A-O 解缠，我们的框架取得了优于以往竞争方法的结果。

引用次数: 0

Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera 利用递归网络消除劣化，实现欠显示摄像机的视频修复

ArXiv

Pub Date : 2024-03-08 DOI: 10.1609/aaai.v38i4.28144

Chengxu Liu, Xuan Wang, Yuanting Fan, Shuai Li, Xueming Qian

Under-display camera (UDC) systems are the foundation of full-screen display devices in which the lens mounts under the display. The pixel array of light-emitting diodes used for display diffracts and attenuates incident light, causing various degradations as the light intensity changes. Unlike general video restoration which recovers video by treating different degradation factors equally, video restoration for UDC systems is more challenging that concerns removing diverse degradation over time while preserving temporal consistency. In this paper, we introduce a novel video restoration network, called D2RNet, specifically designed for UDC systems. It employs a set of Decoupling Attention Modules (DAM) that effectively separate the various video degradation factors. More specifically, a soft mask generation function is proposed to formulate each frame into flare and haze based on the diffraction arising from incident light of different intensities, followed by the proposed flare and haze removal components that leverage long- and short-term feature learning to handle the respective degradations. Such a design offers an targeted and effective solution to eliminating various types of degradation in UDC systems. We further extend our design into multi-scale to overcome the scale-changing of degradation that often occur in long-range videos. To demonstrate the superiority of D2RNet, we propose a large-scale UDC video benchmark by gathering HDR videos and generating realistically degraded videos using the point spread function measured by a commercial UDC system. Extensive quantitative and qualitative evaluations demonstrate the superiority of D2RNet compared to other state-of-the-art video restoration and UDC image restoration methods.

显示屏下摄像头（UDC）系统是全屏显示设备的基础，其中镜头安装在显示屏下。用于显示的发光二极管像素阵列会衍射和衰减入射光，从而随着光强的变化造成各种衰减。一般的视频修复是通过平等对待不同的劣化因素来恢复视频，与此不同，UDC 系统的视频修复更具挑战性，需要在保持时间一致性的同时消除随时间变化的各种劣化。本文介绍了一种专为 UDC 系统设计的新型视频修复网络，称为 D2RNet。它采用一组解耦注意力模块 (DAM)，能有效分离各种视频劣化因素。更具体地说，它提出了一种软掩码生成功能，根据不同强度入射光产生的衍射，将每个帧划分为耀斑和雾霾，然后提出耀斑和雾霾去除组件，利用长期和短期特征学习来处理各自的劣化。这种设计为消除 UDC 系统中的各种劣化提供了有针对性的有效解决方案。我们进一步将设计扩展到多尺度，以克服长距离视频中经常出现的尺度变化退化。为了证明 D2RNet 的优越性，我们提出了一个大规模 UDC 视频基准，方法是收集 HDR 视频，并使用商业 UDC 系统测量的点扩散函数生成真实的降级视频。广泛的定量和定性评估表明，与其他最先进的视频修复和 UDC 图像修复方法相比，D2RNet 具有卓越的性能。

{"title":"Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera","authors":"Chengxu Liu, Xuan Wang, Yuanting Fan, Shuai Li, Xueming Qian","doi":"10.1609/aaai.v38i4.28144","DOIUrl":"https://doi.org/10.1609/aaai.v38i4.28144","url":null,"abstract":"Under-display camera (UDC) systems are the foundation of full-screen display devices in which the lens mounts under the display. The pixel array of light-emitting diodes used for display diffracts and attenuates incident light, causing various degradations as the light intensity changes. Unlike general video restoration which recovers video by treating different degradation factors equally, video restoration for UDC systems is more challenging that concerns removing diverse degradation over time while preserving temporal consistency. In this paper, we introduce a novel video restoration network, called D2RNet, specifically designed for UDC systems. It employs a set of Decoupling Attention Modules (DAM) that effectively separate the various video degradation factors. More specifically, a soft mask generation function is proposed to formulate each frame into flare and haze based on the diffraction arising from incident light of different intensities, followed by the proposed flare and haze removal components that leverage long- and short-term feature learning to handle the respective degradations. Such a design offers an targeted and effective solution to eliminating various types of degradation in UDC systems. We further extend our design into multi-scale to overcome the scale-changing of degradation that often occur in long-range videos. To demonstrate the superiority of D2RNet, we propose a large-scale UDC video benchmark by gathering HDR videos and generating realistically degraded videos using the point spread function measured by a commercial UDC system. Extensive quantitative and qualitative evaluations demonstrate the superiority of D2RNet compared to other state-of-the-art video restoration and UDC image restoration methods.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"30 51","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ArXiv

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀