PHAIN：通过瞬时频率相位感知优化进行音频绘制

IF 5.1 2区计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-09-18 DOI:10.1109/TASLP.2024.3463415

Tomoro Tanaka;Kohei Yatabe;Yasuhiro Oikawa

{"title":"PHAIN：通过瞬时频率相位感知优化进行音频绘制","authors":"Tomoro Tanaka;Kohei Yatabe;Yasuhiro Oikawa","doi":"10.1109/TASLP.2024.3463415","DOIUrl":null,"url":null,"abstract":"Audio inpainting restores locally corrupted parts of digital audio signals. Sparsity-based methods achieve this by promoting sparsity in the time-frequency (T-F) domain, assuming short-time audio segments consist of a few sinusoids. However, such sparsity promotion reduces the magnitudes of the resulting waveforms; moreover, it often ignores the temporal connections of sinusoidal components. To address these problems, we propose a novel phase-aware audio inpainting method. Our method minimizes the time variations of a particular T-F representation calculated using the time derivative of the phase. This promotes sinusoidal components that coherently fit in the corrupted parts without directly suppressing the magnitudes. Both objective and subjective experiments confirmed the superiority of the proposed method compared with state-of-the-art methods.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4471-4485"},"PeriodicalIF":5.1000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PHAIN: Audio Inpainting via Phase-Aware Optimization With Instantaneous Frequency\",\"authors\":\"Tomoro Tanaka;Kohei Yatabe;Yasuhiro Oikawa\",\"doi\":\"10.1109/TASLP.2024.3463415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio inpainting restores locally corrupted parts of digital audio signals. Sparsity-based methods achieve this by promoting sparsity in the time-frequency (T-F) domain, assuming short-time audio segments consist of a few sinusoids. However, such sparsity promotion reduces the magnitudes of the resulting waveforms; moreover, it often ignores the temporal connections of sinusoidal components. To address these problems, we propose a novel phase-aware audio inpainting method. Our method minimizes the time variations of a particular T-F representation calculated using the time derivative of the phase. This promotes sinusoidal components that coherently fit in the corrupted parts without directly suppressing the magnitudes. Both objective and subjective experiments confirmed the superiority of the proposed method compared with state-of-the-art methods.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"4471-4485\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10684068/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10684068/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

音频绘制可恢复数字音频信号中局部损坏的部分。基于稀疏性的方法通过提高时频（T-F）域的稀疏性来实现这一目标，假定短时音频片段由几个正弦波组成。然而，这种稀疏性提升降低了所得到的波形的幅度，而且往往忽略了正弦成分的时间联系。为了解决这些问题，我们提出了一种新颖的相位感知音频绘制方法。我们的方法能最大限度地减少利用相位的时间导数计算出的特定 T-F 表示法的时间变化。这样就能在不直接抑制幅度的情况下，促进正弦波成分与损坏部分的一致性。客观和主观实验都证实，与最先进的方法相比，我们提出的方法更胜一筹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PHAIN: Audio Inpainting via Phase-Aware Optimization With Instantaneous Frequency

Audio inpainting restores locally corrupted parts of digital audio signals. Sparsity-based methods achieve this by promoting sparsity in the time-frequency (T-F) domain, assuming short-time audio segments consist of a few sinusoids. However, such sparsity promotion reduces the magnitudes of the resulting waveforms; moreover, it often ignores the temporal connections of sinusoidal components. To address these problems, we propose a novel phase-aware audio inpainting method. Our method minimizes the time variations of a particular T-F representation calculated using the time derivative of the phase. This promotes sinusoidal components that coherently fit in the corrupted parts without directly suppressing the magnitudes. Both objective and subjective experiments confirmed the superiority of the proposed method compared with state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

11.30

自引率

11.10%

发文量

217

期刊介绍： The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.

期刊最新文献

List of Reviewers IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation Online Neural Speaker Diarization With Target Speaker Tracking Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach