首页 > 最新文献

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Double-DCCCAE: Estimation of Body Gestures From Speech Waveform 双dcccae:从语音波形中估计肢体动作
Jinhong Lu, Tianhang Liu, Shuzhuang Xu, H. Shimodaira
This paper presents an approach for body-motion estimation from audio-speech waveform, where context information in both input and output streams is taken in to account without using recurrent models. Previous works commonly use multiple frames of input to estimate one frame of motion data, where the temporal information of the generated motion is little considered. To resolve the problems, we extend our previous work and propose a system, double deep canonical-correlation-constrained autoencoder (D-DCCCAE), which encodes each of speech and motion segments into fixed-length embedded features that are well correlated with the segments of the other modality. The learnt motion embedded feature is estimated from the learnt speech-embedded feature through a simple neural network and further decoded back to the sequential motion. The proposed pair of embedded features showed higher correlation than spectral features with motion data, and our model was more preferred than the baseline model (BA) in terms of human-likeness and comparable in terms of similar appropriateness.
本文提出了一种从音频-语音波形中估计身体运动的方法,其中输入和输出流中的上下文信息都被考虑在内,而不使用循环模型。以往的工作通常使用多帧输入来估计一帧运动数据,很少考虑生成的运动的时间信息。为了解决这些问题,我们扩展了之前的工作并提出了一个系统,双深度经典相关约束自动编码器(D-DCCCAE),它将每个语音和运动片段编码为固定长度的嵌入特征,这些特征与其他模态的片段具有良好的相关性。通过简单的神经网络从学习到的语音嵌入特征中估计出学习到的运动嵌入特征,并进一步解码回序列运动。所提出的嵌入特征对与运动数据的相关性高于光谱特征,并且我们的模型在人类相似性方面比基线模型(BA)更优选,在相似适当性方面具有可比性。
{"title":"Double-DCCCAE: Estimation of Body Gestures From Speech Waveform","authors":"Jinhong Lu, Tianhang Liu, Shuzhuang Xu, H. Shimodaira","doi":"10.1109/ICASSP39728.2021.9414660","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414660","url":null,"abstract":"This paper presents an approach for body-motion estimation from audio-speech waveform, where context information in both input and output streams is taken in to account without using recurrent models. Previous works commonly use multiple frames of input to estimate one frame of motion data, where the temporal information of the generated motion is little considered. To resolve the problems, we extend our previous work and propose a system, double deep canonical-correlation-constrained autoencoder (D-DCCCAE), which encodes each of speech and motion segments into fixed-length embedded features that are well correlated with the segments of the other modality. The learnt motion embedded feature is estimated from the learnt speech-embedded feature through a simple neural network and further decoded back to the sequential motion. The proposed pair of embedded features showed higher correlation than spectral features with motion data, and our model was more preferred than the baseline model (BA) in terms of human-likeness and comparable in terms of similar appropriateness.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122187130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Non-Convex Sparse Deviation Modeling Via Generative Models 基于生成模型的非凸稀疏偏差建模
Yaxi Yang, Hailin Wang, Haiquan Qiu, Jianjun Wang, Yao Wang
In this paper, the generative model is used to introduce the structural properties of the signal to replace the common sparse hypothesis, and a non-convex compressed sensing sparse deviation model based on the generative model (ℓq-Gen) is proposed. By establishing ℓq variant of the restricted isometry property (q-RIP) and Set-Restricted Eigenvalue Condition (q-S-REC), the error upper bound of the optimal decoder is derived when the recovered signal is within the sparse deviation range of the generator. Furthermore, it is proved that the Gaussian matrix satisfying a certain number of measurements is sufficient to ensure a good recovery for the generating function with high probability. Finally, a series of experiments are carried out to verify the effectiveness and superiority of the ℓq-Gen model.
本文利用生成模型引入信号的结构性质来代替常见的稀疏假设,提出了一种基于生成模型(q-Gen)的非凸压缩感知稀疏偏差模型。通过建立受限等距特性(q- rip)的q变异体和集限制特征值条件(q- s - rec),推导了恢复信号在发生器稀疏偏差范围内时最优解码器的误差上界。进一步证明了满足一定测量数的高斯矩阵足以保证生成函数具有高概率的良好恢复。最后,通过一系列实验验证了该模型的有效性和优越性。
{"title":"Non-Convex Sparse Deviation Modeling Via Generative Models","authors":"Yaxi Yang, Hailin Wang, Haiquan Qiu, Jianjun Wang, Yao Wang","doi":"10.1109/ICASSP39728.2021.9414170","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414170","url":null,"abstract":"In this paper, the generative model is used to introduce the structural properties of the signal to replace the common sparse hypothesis, and a non-convex compressed sensing sparse deviation model based on the generative model (ℓq-Gen) is proposed. By establishing ℓq variant of the restricted isometry property (q-RIP) and Set-Restricted Eigenvalue Condition (q-S-REC), the error upper bound of the optimal decoder is derived when the recovered signal is within the sparse deviation range of the generator. Furthermore, it is proved that the Gaussian matrix satisfying a certain number of measurements is sufficient to ensure a good recovery for the generating function with high probability. Finally, a series of experiments are carried out to verify the effectiveness and superiority of the ℓq-Gen model.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Adaptive Pyramid Single-View Depth Lookup Table Coding Method 一种自适应金字塔单视图深度查找表编码方法
Yangang Cai, Ronggang Wang, Song Gu, Jian Zhang, Wen Gao
As depth maps show unique characteristics like piecewise smooth regions bounded by sharp edges at depth discontinuities, new coding tools are required to approximate these signal characteristics. Moreover, the number of bits to signal the residual values for each segment can be further reduced by integrating a Depth Lookup Table (DLT), which maps depth values to valid depth values of the original depth map. The DLT is constructed based on an initial analysis of the input depth map and is then coded in the sequence header. In this paper, an adaptive pyramid single-view depth lookup table coding method is proposed, with the purpose of designing a clean syntax structure in the sequence header with reasonably good performance. Experiments show that the proposed method can reduce about 84.97% coding bits on average.
由于深度图显示出独特的特征,如在深度不连续处以尖锐边缘为界的分段光滑区域,因此需要新的编码工具来近似这些信号特征。此外,通过集成深度查找表(Depth Lookup Table, DLT),可以进一步减少用于表示每个段的残差值的比特数,深度查找表将深度值映射到原始深度图的有效深度值。DLT是基于对输入深度图的初始分析构建的,然后在序列标头中进行编码。本文提出了一种自适应金字塔单视图深度查找表编码方法,目的是在序列报头中设计一个语法结构清晰、性能合理的查询表。实验表明,该方法平均可减少84.97%的编码位。
{"title":"An Adaptive Pyramid Single-View Depth Lookup Table Coding Method","authors":"Yangang Cai, Ronggang Wang, Song Gu, Jian Zhang, Wen Gao","doi":"10.1109/ICASSP39728.2021.9414584","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414584","url":null,"abstract":"As depth maps show unique characteristics like piecewise smooth regions bounded by sharp edges at depth discontinuities, new coding tools are required to approximate these signal characteristics. Moreover, the number of bits to signal the residual values for each segment can be further reduced by integrating a Depth Lookup Table (DLT), which maps depth values to valid depth values of the original depth map. The DLT is constructed based on an initial analysis of the input depth map and is then coded in the sequence header. In this paper, an adaptive pyramid single-view depth lookup table coding method is proposed, with the purpose of designing a clean syntax structure in the sequence header with reasonably good performance. Experiments show that the proposed method can reduce about 84.97% coding bits on average.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117270011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Instrument Classification of Solo Sheet Music Images 独奏乐谱图像的乐器分类
Kevin Ji, Daniel Yang, T. Tsai
This paper studies instrument classification of solo sheet music. Whereas previous work has focused on instrument recognition in audio data, we instead approach the instrument classification problem using raw sheet music images. Our approach first converts the sheet music image into a sequence of musical words based on the bootleg score representation, and then treats the problem as a text classification task. We show that it is possible to significantly improve classifier performance by training a language model on unlabeled data, initializing a classifier with the pretrained language model weights, and then finetuning the classifier on labeled data. In this work, we train AWD-LSTM, GPT-2, and RoBERTa models on solo sheet music images from IMSLP for eight different instruments. We find that GPT-2 and RoBERTa slightly outperform AWD-LSTM, and that pretraining increases classification accuracy for RoBERTa from 34.5% to 42.9%. Furthermore, we propose two data augmentation methods that increase classification accuracy for RoBERTa by an additional 15%.
本文研究了独奏乐谱的乐器分类。鉴于之前的工作主要集中在音频数据中的乐器识别上,我们转而使用原始乐谱图像来处理乐器分类问题。我们的方法首先将乐谱图像转换为基于盗版乐谱表示的音乐单词序列,然后将该问题视为文本分类任务。我们表明,通过在未标记数据上训练语言模型,使用预训练的语言模型权重初始化分类器,然后在标记数据上微调分类器,可以显著提高分类器的性能。在这项工作中,我们在来自IMSLP的8种不同乐器的独奏乐谱图像上训练了AWD-LSTM, GPT-2和RoBERTa模型。我们发现GPT-2和RoBERTa的分类准确率略高于AWD-LSTM,预训练将RoBERTa的分类准确率从34.5%提高到42.9%。此外,我们提出了两种数据增强方法,将RoBERTa的分类精度提高了15%。
{"title":"Instrument Classification of Solo Sheet Music Images","authors":"Kevin Ji, Daniel Yang, T. Tsai","doi":"10.1109/ICASSP39728.2021.9413732","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413732","url":null,"abstract":"This paper studies instrument classification of solo sheet music. Whereas previous work has focused on instrument recognition in audio data, we instead approach the instrument classification problem using raw sheet music images. Our approach first converts the sheet music image into a sequence of musical words based on the bootleg score representation, and then treats the problem as a text classification task. We show that it is possible to significantly improve classifier performance by training a language model on unlabeled data, initializing a classifier with the pretrained language model weights, and then finetuning the classifier on labeled data. In this work, we train AWD-LSTM, GPT-2, and RoBERTa models on solo sheet music images from IMSLP for eight different instruments. We find that GPT-2 and RoBERTa slightly outperform AWD-LSTM, and that pretraining increases classification accuracy for RoBERTa from 34.5% to 42.9%. Furthermore, we propose two data augmentation methods that increase classification accuracy for RoBERTa by an additional 15%.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128638017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Real-Time Radio Modulation Classification With An LSTM Auto-Encoder 基于LSTM自编码器的实时无线电调制分类
Ziqi Ke, H. Vikalo
Identifying modulation type of a received radio signal is a challenging problem encountered in many applications including radio interference mitigation and spectrum allocation. This problem is rendered challenging by the existence of a large number of modulation schemes and numerous sources of interference. Existing methods for monitoring spectrum readily collect large amounts of radio signals. However, existing state-of-the-art approaches to modulation classification struggle to reach desired levels of accuracy with computational efficiency practically feasible for implementation on low-cost computational platforms. To this end, we propose a learning framework based on an LSTM denoising autoencoder designed to extract robust and stable features from the noisy received signals, and detect the underlying modulation scheme. The method uses a compact architecture that may be implemented on low-cost computational devices while achieving or exceeding state-of-the-art classification accuracy. Experimental results on realistic synthetic and over-the-air radio data show that the proposed framework reliably and efficiently classifies radio signals, and often significantly outperform state-of-the-art approaches.
识别接收到的无线电信号的调制类型是许多应用中遇到的一个具有挑战性的问题,包括无线电干扰缓解和频谱分配。由于存在大量的调制方案和众多的干扰源,这个问题变得具有挑战性。现有的频谱监测方法很容易收集到大量的无线电信号。然而,现有的最先进的调制分类方法难以达到所需的精度水平,并且在低成本计算平台上实现的计算效率实际上是可行的。为此,我们提出了一个基于LSTM去噪自编码器的学习框架,旨在从噪声接收信号中提取鲁棒和稳定的特征,并检测底层调制方案。该方法使用紧凑的架构,可以在低成本的计算设备上实现,同时达到或超过最先进的分类精度。在实际合成和空中无线电数据上的实验结果表明,所提出的框架可靠有效地对无线电信号进行分类,并且通常显著优于最先进的方法。
{"title":"Real-Time Radio Modulation Classification With An LSTM Auto-Encoder","authors":"Ziqi Ke, H. Vikalo","doi":"10.1109/ICASSP39728.2021.9414351","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414351","url":null,"abstract":"Identifying modulation type of a received radio signal is a challenging problem encountered in many applications including radio interference mitigation and spectrum allocation. This problem is rendered challenging by the existence of a large number of modulation schemes and numerous sources of interference. Existing methods for monitoring spectrum readily collect large amounts of radio signals. However, existing state-of-the-art approaches to modulation classification struggle to reach desired levels of accuracy with computational efficiency practically feasible for implementation on low-cost computational platforms. To this end, we propose a learning framework based on an LSTM denoising autoencoder designed to extract robust and stable features from the noisy received signals, and detect the underlying modulation scheme. The method uses a compact architecture that may be implemented on low-cost computational devices while achieving or exceeding state-of-the-art classification accuracy. Experimental results on realistic synthetic and over-the-air radio data show that the proposed framework reliably and efficiently classifies radio signals, and often significantly outperform state-of-the-art approaches.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"84 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Applied Methods for Sparse Sampling of Head-Related Transfer Functions 头相关传递函数稀疏采样的应用方法
Lior Arbel, Z. Ben-Hur, D. Alon, B. Rafaely
Production of high fidelity spatial audio applications requires individual head-related transfer functions (HRTFs). As the acquisition of HRTF is an elaborate process, interest lies in interpolating full length HRTF from sparse samples. Ear-alignment is a recently developed pre-processing technique, shown to reduce an HRTF’s spherical harmonics order, thus permitting sparse sampling over fewer directions. This paper describes the application of two methods for ear-aligned HRTF interpolation by sparse sampling: Orthogonal Matching Pursuit and Principal Component Analysis. These methods consist of generating unique vector sets for HRTF representation. The methods were tested over an HRTF dataset, indicating that interpolation errors using small sampling schemes may be further reduced by up to 5 dB in comparison with spherical harmonics interpolation.
制作高保真空间音频应用需要单独的头部相关传递函数(hrtf)。由于HRTF的获取是一个复杂的过程,人们的兴趣在于从稀疏样本中插值完整长度的HRTF。耳对准是最近开发的一种预处理技术,可以降低HRTF的球面谐波阶数,从而允许在更少的方向上进行稀疏采样。本文介绍了两种稀疏采样方法在耳廓对准HRTF插值中的应用:正交匹配追踪和主成分分析。这些方法包括为HRTF表示生成唯一的向量集。在HRTF数据集上对这些方法进行了测试,结果表明,与球面谐波插值相比,使用小采样方案的插值误差可进一步降低5 dB。
{"title":"Applied Methods for Sparse Sampling of Head-Related Transfer Functions","authors":"Lior Arbel, Z. Ben-Hur, D. Alon, B. Rafaely","doi":"10.1109/ICASSP39728.2021.9413976","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413976","url":null,"abstract":"Production of high fidelity spatial audio applications requires individual head-related transfer functions (HRTFs). As the acquisition of HRTF is an elaborate process, interest lies in interpolating full length HRTF from sparse samples. Ear-alignment is a recently developed pre-processing technique, shown to reduce an HRTF’s spherical harmonics order, thus permitting sparse sampling over fewer directions. This paper describes the application of two methods for ear-aligned HRTF interpolation by sparse sampling: Orthogonal Matching Pursuit and Principal Component Analysis. These methods consist of generating unique vector sets for HRTF representation. The methods were tested over an HRTF dataset, indicating that interpolation errors using small sampling schemes may be further reduced by up to 5 dB in comparison with spherical harmonics interpolation.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"94 2 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129454743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Residual Network for Covid-19 Diagnosis Using Ct-Scans 基于ct扫描的Covid-19多尺度残差网络诊断
Pratyush Garg, R. Ranjan, Kamini Upadhyay, M. Agrawal, D. Deepak
To mitigate the outbreak of highly contagious COVID-19, we need a sensitive, robust automated diagnostic tool. This paper proposes a three-level approach to separate the cases of COVID-19, pneumonia from normal patients using chest CT scans. At the first level, we fine tune a multi-scale ResNet50 model for feature extraction from all the slices of CT scan for each patient. By using multi-scale residual network, we can learn different sizes of infection, thereby making the detection possible at early stages too. These extracted features are used to train a patient-level classifier, at the second level. Four different classifiers are trained at this stage. Finally, predictions of patient level classifiers are combined by training an ensemble classifier. We test the proposed method on three sets of data released by ICASSP, COVID-19 Signal Processing Grand Challenge (SPGC). The proposed method has been successful in classifying the three classes with a validation accuracy of 94.9% and testing accuracy of 88.89%.
为了缓解高传染性COVID-19的爆发,我们需要一种敏感、强大的自动化诊断工具。本文提出了一种利用胸部CT扫描将新冠肺炎病例与正常患者区分开来的三级方法。首先,我们对一个多尺度ResNet50模型进行微调,从每个患者的CT扫描的所有切片中提取特征。通过使用多尺度残差网络,我们可以了解不同的感染规模,从而使早期检测成为可能。在第二层,这些提取的特征用于训练患者级别的分类器。在这个阶段训练四个不同的分类器。最后,通过训练集成分类器将患者级别分类器的预测组合起来。我们在ICASSP, COVID-19信号处理大挑战(SPGC)发布的三组数据上测试了所提出的方法。该方法成功地对三类进行了分类,验证准确率为94.9%,测试准确率为88.89%。
{"title":"Multi-Scale Residual Network for Covid-19 Diagnosis Using Ct-Scans","authors":"Pratyush Garg, R. Ranjan, Kamini Upadhyay, M. Agrawal, D. Deepak","doi":"10.1109/ICASSP39728.2021.9414426","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414426","url":null,"abstract":"To mitigate the outbreak of highly contagious COVID-19, we need a sensitive, robust automated diagnostic tool. This paper proposes a three-level approach to separate the cases of COVID-19, pneumonia from normal patients using chest CT scans. At the first level, we fine tune a multi-scale ResNet50 model for feature extraction from all the slices of CT scan for each patient. By using multi-scale residual network, we can learn different sizes of infection, thereby making the detection possible at early stages too. These extracted features are used to train a patient-level classifier, at the second level. Four different classifiers are trained at this stage. Finally, predictions of patient level classifiers are combined by training an ensemble classifier. We test the proposed method on three sets of data released by ICASSP, COVID-19 Signal Processing Grand Challenge (SPGC). The proposed method has been successful in classifying the three classes with a validation accuracy of 94.9% and testing accuracy of 88.89%.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128991469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Improving Dialogue Response Generation Via Knowledge Graph Filter 基于知识图过滤器的对话响应生成改进
Yanmeng Wang, Ye Wang, Xingyu Lou, Wenge Rong, Zhenghong Hao, Shaojun Wang
Current generative dialogue systems tend to produce generic dialog responses, which lack useful information and semantic coherence. An promising method to alleviate this problem is to integrate knowledge triples from knowledge base. However, current approaches mainly augment Seq2Seq framework with knowledge-aware mechanism to retrieve a large number of knowledge triples without considering specific dialogue context, which probably results in knowledge redundancy and incomplete knowledge comprehension. In this paper, we propose to leverage the contextual word representation of dialog post to filter out irrelevant knowledge with an attention-based triple filter network. We introduce a novel knowledge-enriched framework to integrate the filtered knowledge into the dialogue representation. Entity copy is further proposed to facilitate the integration of the knowledge during generation. Experiments on dialogue generation tasks have shown the proposed framework’s promising potential.
当前的生成对话系统往往产生泛化的对话反应,缺乏有用的信息和语义连贯性。对知识库中的知识三元组进行集成是解决这一问题的一种有效方法。然而,目前的方法主要是通过知识感知机制增强Seq2Seq框架来检索大量的知识三元组,而没有考虑具体的对话上下文,这可能导致知识冗余和知识理解不完全。在本文中,我们提出利用基于注意力的三重过滤网络,利用对话帖子的上下文词表示来过滤掉不相关的知识。我们引入了一种新的知识丰富框架,将过滤后的知识整合到对话表示中。进一步提出了实体复制,便于知识生成过程中的整合。在对话生成任务上的实验表明了该框架的良好潜力。
{"title":"Improving Dialogue Response Generation Via Knowledge Graph Filter","authors":"Yanmeng Wang, Ye Wang, Xingyu Lou, Wenge Rong, Zhenghong Hao, Shaojun Wang","doi":"10.1109/ICASSP39728.2021.9414324","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414324","url":null,"abstract":"Current generative dialogue systems tend to produce generic dialog responses, which lack useful information and semantic coherence. An promising method to alleviate this problem is to integrate knowledge triples from knowledge base. However, current approaches mainly augment Seq2Seq framework with knowledge-aware mechanism to retrieve a large number of knowledge triples without considering specific dialogue context, which probably results in knowledge redundancy and incomplete knowledge comprehension. In this paper, we propose to leverage the contextual word representation of dialog post to filter out irrelevant knowledge with an attention-based triple filter network. We introduce a novel knowledge-enriched framework to integrate the filtered knowledge into the dialogue representation. Entity copy is further proposed to facilitate the integration of the knowledge during generation. Experiments on dialogue generation tasks have shown the proposed framework’s promising potential.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123827616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Perceptual Quality Assessment for Recognizing True and Pseudo 4k Content 识别真实和伪4k内容的感知质量评估
Wenhan Zhu, Guangtao Zhai, Xiongkuo Min, Xiaokang Yang, Xiao-Ping Zhang
To meet the imperative demand for monitoring the quality of Ultra High-Definition (UHD) content in multimedia industries, we propose an efficient no-reference (NR) image quality assessment (IQA) metric to distinguish original and pseudo 4K contents and measure the quality of their quality in this paper. First, we establish a database including more than 3000 4K images composed of natural 4K images together with upscaled versions interpolated from 1080p and 720p images by fourteen algorithms. To improve computing efficiency, our model segments the input image and selects three representative patches by local variances. Then, we extract the histogram features and cut-off frequency features in the frequency domain as well as the natural scenes statistic (NSS) based features from the representative patches. Finally, we employ support vector regressor (SVR) to aggregate these extracted features as an overall quality metric to predict the quality score of the target image. Extensive experimental comparisons using seven common evaluation indicators demonstrate that the proposed model outperforms the competitive NR IQA methods and has a great ability to distinguish true and pseudo 4K images.
为了满足多媒体行业对超高清(UHD)内容质量监控的迫切需求,本文提出了一种高效的无参考(NR)图像质量评估(IQA)度量来区分原始和伪4K内容并测量其质量。首先,我们建立了一个包含3000多张4K图像的数据库,其中包括天然4K图像以及通过14种算法从1080p和720p图像中插值的升级版本。为了提高计算效率,我们的模型对输入图像进行分割,并通过局部方差选择三个具有代表性的patch。然后,在频域提取直方图特征和截止频率特征以及基于自然场景统计(NSS)的特征。最后,我们使用支持向量回归器(SVR)将这些提取的特征集合作为整体质量度量来预测目标图像的质量分数。使用7个常用评价指标进行的大量实验比较表明,该模型优于竞争对手的NR IQA方法,并且具有很强的区分真实和伪4K图像的能力。
{"title":"Perceptual Quality Assessment for Recognizing True and Pseudo 4k Content","authors":"Wenhan Zhu, Guangtao Zhai, Xiongkuo Min, Xiaokang Yang, Xiao-Ping Zhang","doi":"10.1109/ICASSP39728.2021.9414932","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414932","url":null,"abstract":"To meet the imperative demand for monitoring the quality of Ultra High-Definition (UHD) content in multimedia industries, we propose an efficient no-reference (NR) image quality assessment (IQA) metric to distinguish original and pseudo 4K contents and measure the quality of their quality in this paper. First, we establish a database including more than 3000 4K images composed of natural 4K images together with upscaled versions interpolated from 1080p and 720p images by fourteen algorithms. To improve computing efficiency, our model segments the input image and selects three representative patches by local variances. Then, we extract the histogram features and cut-off frequency features in the frequency domain as well as the natural scenes statistic (NSS) based features from the representative patches. Finally, we employ support vector regressor (SVR) to aggregate these extracted features as an overall quality metric to predict the quality score of the target image. Extensive experimental comparisons using seven common evaluation indicators demonstrate that the proposed model outperforms the competitive NR IQA methods and has a great ability to distinguish true and pseudo 4K images.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124224000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Checking PRNU Usability on Modern Devices 检查PRNU在现代设备上的可用性
C. Albisani, Massimo Iuliani, Alessandro Piva
The image source identification task is mainly addressed by exploiting the unique traces of the sensor pattern noise, that ensure a negligible false alarm rate when comparing patterns extracted from different devices, even of the same brand or model. However, most recent smartphones are equipped with proprietary in-camera processing that can possibly expose unexpected correlated patterns within images belonging to different sensors.In this paper, we first highlight that wrong source attribution can happen on smartphones belonging to the same brand when images are acquired both in default and in bokeh mode. While the bokeh mode is proved to introduce a correlated pattern due to the specific in-camera post-processing, we also show that natural images also expose such issue, even when a reference from flat images is available. Furthermore, different camera models expose different correlation patterns since they are reasonably related to developers’ choices. Then, we propose a general strategy that allows the forensic practitioner to determine whether a questioned device may suffer from these correlated patterns, thus avoiding the risk of false image attribution.
图像源识别任务主要通过利用传感器模式噪声的独特轨迹来解决,这确保在比较从不同设备提取的模式时,即使是同一品牌或型号,误报率也可以忽略不计。然而,大多数最新的智能手机都配备了专有的相机内处理,可能会在属于不同传感器的图像中暴露出意想不到的相关模式。在本文中,我们首先强调,当在默认和散景模式下获取图像时,属于同一品牌的智能手机上可能会发生错误的来源归属。虽然散景模式由于特定的相机后处理而被证明引入了相关模式,但我们也表明自然图像也暴露了这样的问题,即使有平面图像的参考。此外,不同的相机模型暴露了不同的相关模式,因为它们与开发人员的选择有合理的关系。然后,我们提出了一种通用策略,允许法医从业者确定被质疑的设备是否可能遭受这些相关模式的影响,从而避免了错误图像归因的风险。
{"title":"Checking PRNU Usability on Modern Devices","authors":"C. Albisani, Massimo Iuliani, Alessandro Piva","doi":"10.1109/ICASSP39728.2021.9413611","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413611","url":null,"abstract":"The image source identification task is mainly addressed by exploiting the unique traces of the sensor pattern noise, that ensure a negligible false alarm rate when comparing patterns extracted from different devices, even of the same brand or model. However, most recent smartphones are equipped with proprietary in-camera processing that can possibly expose unexpected correlated patterns within images belonging to different sensors.In this paper, we first highlight that wrong source attribution can happen on smartphones belonging to the same brand when images are acquired both in default and in bokeh mode. While the bokeh mode is proved to introduce a correlated pattern due to the specific in-camera post-processing, we also show that natural images also expose such issue, even when a reference from flat images is available. Furthermore, different camera models expose different correlation patterns since they are reasonably related to developers’ choices. Then, we propose a general strategy that allows the forensic practitioner to determine whether a questioned device may suffer from these correlated patterns, thus avoiding the risk of false image attribution.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121192517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1