首页 > 最新文献

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific最新文献

英文 中文
An event-related brain potential study on the impact of speech recognition errors 语音识别错误影响的事件相关脑电位研究
S. Sakti, Y. Odagaki, Takafumi Sasakura, Graham Neubig, T. Toda, Satoshi Nakamura
Most automatic speech recognition (ASR) systems, which aim for perfect transcription of utterances, are trained and tuned by minimizing the word error rate (WER). In this framework, even though the impact of all errors is not the same, all errors (substitutions, deletions, insertions) from any words are treated in a uniform manner. The size of the impact and exactly what the differences are remain unknown. Several studies have proposed possible alternatives to the WER metric. But no analysis has investigated how the human brain processes language and perceives the effect of mistaken output by ASR systems. In this research we utilize event-related brain potential (ERP) studies and directly analyze the brain activities on the impact of ASR errors. Our results reveal that the peak amplitudes of the positive shift after the substitution and deletion violations are much bigger than the insertion violations. This finding indicates that humans perceived each error differently based on its impact of the whole sentence. To investigate the effect of this study, we formulated a new weighted word error rate metric based on the ERP results: ERP-WWER. We re-evaluated the ASR performance using the new ERP-WWER metric and compared and discussed the results with the standard WER.
大多数自动语音识别(ASR)系统都是通过最小化单词错误率(WER)来训练和调整的,其目标是完美地转录话语。在这个框架中,尽管所有错误的影响不尽相同,但任何单词的所有错误(替换、删除、插入)都以统一的方式处理。这次撞击的规模和究竟有什么不同仍不得而知。有几项研究提出了替代WER指标的可能方案。但是没有分析研究人类大脑是如何处理语言和感知ASR系统错误输出的影响的。在本研究中,我们利用事件相关脑电位(ERP)研究,直接分析脑活动对ASR错误的影响。我们的研究结果表明,取代和删除违反后的正位移峰幅远大于插入违反后的正位移峰幅。这一发现表明,人们对每个错误的感知是基于其对整个句子的影响而不同的。为了研究这项研究的效果,我们在ERP结果的基础上制定了一个新的加权词错误率指标:ERP- wwer。我们使用新的ERP-WWER指标重新评估了ASR性能,并将结果与标准WER进行了比较和讨论。
{"title":"An event-related brain potential study on the impact of speech recognition errors","authors":"S. Sakti, Y. Odagaki, Takafumi Sasakura, Graham Neubig, T. Toda, Satoshi Nakamura","doi":"10.1109/APSIPA.2014.7041620","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041620","url":null,"abstract":"Most automatic speech recognition (ASR) systems, which aim for perfect transcription of utterances, are trained and tuned by minimizing the word error rate (WER). In this framework, even though the impact of all errors is not the same, all errors (substitutions, deletions, insertions) from any words are treated in a uniform manner. The size of the impact and exactly what the differences are remain unknown. Several studies have proposed possible alternatives to the WER metric. But no analysis has investigated how the human brain processes language and perceives the effect of mistaken output by ASR systems. In this research we utilize event-related brain potential (ERP) studies and directly analyze the brain activities on the impact of ASR errors. Our results reveal that the peak amplitudes of the positive shift after the substitution and deletion violations are much bigger than the insertion violations. This finding indicates that humans perceived each error differently based on its impact of the whole sentence. To investigate the effect of this study, we formulated a new weighted word error rate metric based on the ERP results: ERP-WWER. We re-evaluated the ASR performance using the new ERP-WWER metric and compared and discussed the results with the standard WER.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130315263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A pilot user's prospective in mobile robotic telepresence system 移动机器人远程呈现系统的先导用户展望
Muhammad Sikandar Lal Khan, S. Réhman, P. L. Hera, Feng Liu, Haibo Li
In this work we present an interactive video conferencing system specifically designed for enhancing the experience of video teleconferencing for a pilot user. We have used an Embodied Telepresence System (ETS) which was previously designed to enhance the experience of video teleconferencing for the collaborators. In this work we have deployed an ETS in a novel scenario to improve the experience of pilot user during distance communication. The ETS is used to adjust the view of the pilot user at the distance location (e.g. distance located conference/meeting). The velocity profile control for the ETS is developed which is implicitly controlled by the head of the pilot user. The experiment was conducted to test whether the view adjustment capability of an ETS increases the collaboration experience of video conferencing for the pilot user or not. The user study was conducted in which participants (pilot users) performed interaction using ETS and with traditional computer based video conferencing tool. Overall, the user study suggests the effectiveness of our approach and hence results in enhancing the experience of video conferencing for the pilot user.
在这项工作中,我们提出了一个交互式视频会议系统,专门为增强试点用户的视频电话会议体验而设计。我们使用了一个具体化的远程呈现系统(ETS),该系统以前是为了增强合作者的视频远程会议体验而设计的。在这项工作中,我们在一个新的场景中部署了一个ETS,以改善导频用户在远程通信中的体验。ETS用于调整试点用户在远程位置(例如远程会议/会议)的视图。提出了由导频用户头部隐式控制的ETS速度剖面控制方法。为了验证ETS的视点调整能力是否能提高试点用户的视频会议协作体验,进行了实验研究。在用户研究中,参与者(试点用户)使用ETS和传统的基于计算机的视频会议工具进行交互。总的来说,用户研究表明我们的方法是有效的,因此结果是提高了试点用户的视频会议体验。
{"title":"A pilot user's prospective in mobile robotic telepresence system","authors":"Muhammad Sikandar Lal Khan, S. Réhman, P. L. Hera, Feng Liu, Haibo Li","doi":"10.1109/APSIPA.2014.7041635","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041635","url":null,"abstract":"In this work we present an interactive video conferencing system specifically designed for enhancing the experience of video teleconferencing for a pilot user. We have used an Embodied Telepresence System (ETS) which was previously designed to enhance the experience of video teleconferencing for the collaborators. In this work we have deployed an ETS in a novel scenario to improve the experience of pilot user during distance communication. The ETS is used to adjust the view of the pilot user at the distance location (e.g. distance located conference/meeting). The velocity profile control for the ETS is developed which is implicitly controlled by the head of the pilot user. The experiment was conducted to test whether the view adjustment capability of an ETS increases the collaboration experience of video conferencing for the pilot user or not. The user study was conducted in which participants (pilot users) performed interaction using ETS and with traditional computer based video conferencing tool. Overall, the user study suggests the effectiveness of our approach and hence results in enhancing the experience of video conferencing for the pilot user.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126523654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Iterative depth recovery for multi-view video synthesis from stereo videos 基于立体视频的多视点视频合成迭代深度恢复
Chen-Hao Wei, Chen-Kuo Chiang, S. Lai
We propose a novel depth maps refinement algorithm and generate multi-view video sequences from two-view video sequences for modern autostereoscopic display. In order to generate realistic contents for virtual views, high-quality depth maps are very critical to the view synthesis results. We propose an iterative depth refinement approach of a joint error detection and correction algorithm to refine the depth maps that can be estimated by an existing stereo matching method or provided by a depth capturing device. Error detection aims at two types of error: across-view color-depth-inconsistency error and local color-depth-inconsistency error. Subsequently, the detected error pixels are corrected by searching appropriate candidates under several constraints to amend the depth errors. A trilateral filter is included in the refining process that considers intensity, spatial and temporal terms into the filter weighting to enhance the consistency across frames. In the proposed view synthesis framework, it features a disparity-based view interpolation method to alleviate the translucent artifacts and a directional filter to reduce the aliasing around the object boundaries. Experimental results show that the proposed algorithm effectively fixes errors in the depth maps. In addition, we also show the refined depth maps along with the proposed view synthesis framework significantly improve the novel view synthesis on several benchmark datasets.
提出了一种新的深度图优化算法,并从双视点视频序列生成多视点视频序列,用于现代自动立体显示。为了为虚拟视图生成逼真的内容,高质量的深度图对视图合成结果至关重要。我们提出了一种联合误差检测和校正算法的迭代深度细化方法,以细化可由现有立体匹配方法估计或由深度捕获设备提供的深度图。错误检测主要针对两种类型的错误:跨视图颜色深度不一致错误和局部颜色深度不一致错误。随后,通过在若干约束条件下搜索合适的候选像素来校正检测到的误差像素,以修正深度误差。在精炼过程中包含一个三边滤波器,该滤波器将强度、空间和时间项考虑到滤波器权重中,以增强帧间的一致性。在提出的视图合成框架中,采用基于视差的视图插值方法来减轻半透明伪影,并采用方向滤波器来减少物体边界周围的混叠。实验结果表明,该算法能有效地修正深度图中的误差。此外,我们还展示了改进的深度图以及所提出的视图合成框架在几个基准数据集上显著改善了新的视图合成。
{"title":"Iterative depth recovery for multi-view video synthesis from stereo videos","authors":"Chen-Hao Wei, Chen-Kuo Chiang, S. Lai","doi":"10.1109/APSIPA.2014.7041695","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041695","url":null,"abstract":"We propose a novel depth maps refinement algorithm and generate multi-view video sequences from two-view video sequences for modern autostereoscopic display. In order to generate realistic contents for virtual views, high-quality depth maps are very critical to the view synthesis results. We propose an iterative depth refinement approach of a joint error detection and correction algorithm to refine the depth maps that can be estimated by an existing stereo matching method or provided by a depth capturing device. Error detection aims at two types of error: across-view color-depth-inconsistency error and local color-depth-inconsistency error. Subsequently, the detected error pixels are corrected by searching appropriate candidates under several constraints to amend the depth errors. A trilateral filter is included in the refining process that considers intensity, spatial and temporal terms into the filter weighting to enhance the consistency across frames. In the proposed view synthesis framework, it features a disparity-based view interpolation method to alleviate the translucent artifacts and a directional filter to reduce the aliasing around the object boundaries. Experimental results show that the proposed algorithm effectively fixes errors in the depth maps. In addition, we also show the refined depth maps along with the proposed view synthesis framework significantly improve the novel view synthesis on several benchmark datasets.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130152439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analytical prediction formula of random variation in high frequency performance of weak inversion scaled MOSFET 弱反转比例MOSFET高频性能随机变化的解析预测公式
R. Banchuin, R. Chaisricharoen
This paper proposes an analytical prediction formula of probability distribution of random variation in high frequency performance of weak inversion region operated scaled MOSFET where the manufacturing process induced physical defects of MOSFET have been taken into account. Furthermore, the correlation between process parameter included random variables which contribute such variation, and the effects of physical differences between N-type and P-type MOSFET have also been considered. As the scaled MOSFET is of interested, the up to dated formula of physical defects induced random variation in parameter of such scaled device has been used as the basis. The proposed formula can accurately predict the probability distribution of random variation in high frequency performance of weak inversion scaled MOSFET with a confidence level of 99%. So, it has been found to be an efficient alternative approach for the variability aware design of various weak inversion scaled MOSFET based signal processing circuits and systems.
本文提出了一个考虑到制造过程引起的MOSFET物理缺陷的弱反转区操作的缩放MOSFET高频性能随机变化概率分布的解析预测公式。此外,工艺参数之间的相关性包括导致这种变化的随机变量,并且还考虑了n型和p型MOSFET之间物理差异的影响。对于已缩放的MOSFET,本文采用了最新的物理缺陷引起的参数随机变化公式作为基础。该公式能准确预测弱反转标度MOSFET高频性能随机变化的概率分布,置信度达99%。因此,它已被发现是各种基于弱反转比例MOSFET的信号处理电路和系统的可变性感知设计的有效替代方法。
{"title":"Analytical prediction formula of random variation in high frequency performance of weak inversion scaled MOSFET","authors":"R. Banchuin, R. Chaisricharoen","doi":"10.1109/APSIPA.2014.7041810","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041810","url":null,"abstract":"This paper proposes an analytical prediction formula of probability distribution of random variation in high frequency performance of weak inversion region operated scaled MOSFET where the manufacturing process induced physical defects of MOSFET have been taken into account. Furthermore, the correlation between process parameter included random variables which contribute such variation, and the effects of physical differences between N-type and P-type MOSFET have also been considered. As the scaled MOSFET is of interested, the up to dated formula of physical defects induced random variation in parameter of such scaled device has been used as the basis. The proposed formula can accurately predict the probability distribution of random variation in high frequency performance of weak inversion scaled MOSFET with a confidence level of 99%. So, it has been found to be an efficient alternative approach for the variability aware design of various weak inversion scaled MOSFET based signal processing circuits and systems.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130469136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed video quality assessment with modified MSE 改进MSE的压缩视频质量评估
Sudeng Hu, Lina Jin, C.-C. Jay Kuo
A method to adjust the mean-squared-errors (MSE) value for coded video quality assessment is investigated in this work by incorporating subjective human visual experience. First, we propose a linear model between the mean opinioin score (MOS) and a logarithmic function of the MSE value of coded video under a range of coding rates. This model is validated by experimental data. With further simplification, this model contains only one parameter to be determined by video characteristics. Next, we adopt a machine learing method to learn this parameter. Specifically, we select features to classify video content into groups, where videos in each group are more homoegeneous in their characteristics. Then, a proper model parameter can be trained and predicted within each video group. Experimental results on a coded video database are given to demonstrate the effectiveness of the proposed algorithm.
本文研究了一种结合人的主观视觉经验来调整编码视频质量评估中均方误差(MSE)值的方法。首先,在一定的编码率范围内,我们提出了一个平均意见分数(MOS)与编码视频的MSE值的对数函数之间的线性模型。实验数据验证了该模型的正确性。进一步简化后,该模型只包含一个由视频特征决定的参数。接下来,我们采用机器学习的方法来学习这个参数。具体来说,我们选择特征来将视频内容分类,每组中的视频在特征上更加相似。然后,在每个视频组内训练和预测合适的模型参数。在一个编码视频库上的实验结果验证了该算法的有效性。
{"title":"Compressed video quality assessment with modified MSE","authors":"Sudeng Hu, Lina Jin, C.-C. Jay Kuo","doi":"10.1109/APSIPA.2014.7041643","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041643","url":null,"abstract":"A method to adjust the mean-squared-errors (MSE) value for coded video quality assessment is investigated in this work by incorporating subjective human visual experience. First, we propose a linear model between the mean opinioin score (MOS) and a logarithmic function of the MSE value of coded video under a range of coding rates. This model is validated by experimental data. With further simplification, this model contains only one parameter to be determined by video characteristics. Next, we adopt a machine learing method to learn this parameter. Specifically, we select features to classify video content into groups, where videos in each group are more homoegeneous in their characteristics. Then, a proper model parameter can be trained and predicted within each video group. Experimental results on a coded video database are given to demonstrate the effectiveness of the proposed algorithm.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134556708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
3D object modeling with a Kinect camera 使用Kinect摄像头进行3D物体建模
Mayoore S. Jaiswal, Jun Xie, Ming-Ting Sun
RGB-D (Kinect-style) cameras are novel low-cost sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate the use of such cameras for acquiring multiple images of an object from multiple viewpoints and building complete 3D models of objects. Such models have applications in a wide range of industries. We implemented a complete 3D object model construction process with object segmentation, registration, global alignment, model denoising, and texturing, and studied the effects of these functions on the constructed 3D object models. We also developed a process for objective performance evaluation of the constructed 3D object models. We collected laser scan data as the ground truth using a Roland Pieza LPX-600 Laser Scanner to compare to the 3D models created by our process.
RGB- d (kinect式)相机是一种新颖的低成本传感系统,可以捕获RGB图像以及每像素深度信息。在本文中,我们研究了使用这种相机从多个视点获取一个物体的多个图像并建立物体的完整3D模型。这些模型在各行各业都有广泛的应用。我们实现了一个完整的三维物体模型构建过程,包括物体分割、配准、全局对齐、模型去噪和纹理化,并研究了这些功能对构建的三维物体模型的影响。我们还开发了一个过程的客观性能评估构建的3D对象模型。我们使用罗兰皮萨LPX-600激光扫描仪收集激光扫描数据作为地面真相,与我们的过程创建的3D模型进行比较。
{"title":"3D object modeling with a Kinect camera","authors":"Mayoore S. Jaiswal, Jun Xie, Ming-Ting Sun","doi":"10.1109/APSIPA.2014.7041821","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041821","url":null,"abstract":"RGB-D (Kinect-style) cameras are novel low-cost sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate the use of such cameras for acquiring multiple images of an object from multiple viewpoints and building complete 3D models of objects. Such models have applications in a wide range of industries. We implemented a complete 3D object model construction process with object segmentation, registration, global alignment, model denoising, and texturing, and studied the effects of these functions on the constructed 3D object models. We also developed a process for objective performance evaluation of the constructed 3D object models. We collected laser scan data as the ground truth using a Roland Pieza LPX-600 Laser Scanner to compare to the 3D models created by our process.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Detecting contrast agents in ultrasound image sequences for tumor diagnosis 在超声图像序列中检测造影剂用于肿瘤诊断
K. Noro, Koichi Ito, Yukari Yanagisawa, M. Sakamoto, S. Mori, K. Shiga, T. Kodama, T. Aoki
This paper proposes a method for detecting contrast agents in ultrasound image sequences to develop diagnostic ultrasound imaging systems for tumor diagnosis. The conventional methods are based on simple subtraction of ultrasound images to detect ultrasound contrast agents, where the conventional methods need ultrasound image sequences with and without contrast agents. Even if the subject slightly moves, the detection result of the conventional methods includes significant errors. On the other hand, the proposed method employs the spatio-temporal analysis of the pixel intensity variation over several frames. The proposed method also employs motion estimation to select optimal image frames for detecting contrast agents. Through a set of experiments using mice, we demonstrate that the proposed method exhibits efficient performance compared with the conventional methods.
本文提出了一种在超声图像序列中检测造影剂的方法,以开发用于肿瘤诊断的超声诊断成像系统。传统方法是基于超声图像的简单减法来检测超声造影剂,其中传统方法需要有和没有造影剂的超声图像序列。即使被试轻微移动,传统方法的检测结果也存在显著误差。另一方面,该方法采用多帧像素强度变化的时空分析。该方法还采用运动估计来选择用于检测造影剂的最佳图像帧。通过一组小鼠实验,我们证明了与传统方法相比,所提出的方法具有有效的性能。
{"title":"Detecting contrast agents in ultrasound image sequences for tumor diagnosis","authors":"K. Noro, Koichi Ito, Yukari Yanagisawa, M. Sakamoto, S. Mori, K. Shiga, T. Kodama, T. Aoki","doi":"10.1109/APSIPA.2014.7041598","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041598","url":null,"abstract":"This paper proposes a method for detecting contrast agents in ultrasound image sequences to develop diagnostic ultrasound imaging systems for tumor diagnosis. The conventional methods are based on simple subtraction of ultrasound images to detect ultrasound contrast agents, where the conventional methods need ultrasound image sequences with and without contrast agents. Even if the subject slightly moves, the detection result of the conventional methods includes significant errors. On the other hand, the proposed method employs the spatio-temporal analysis of the pixel intensity variation over several frames. The proposed method also employs motion estimation to select optimal image frames for detecting contrast agents. Through a set of experiments using mice, we demonstrate that the proposed method exhibits efficient performance compared with the conventional methods.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Block-based multiscale error concealment using low-rank completion 基于块的低秩补全多尺度错误隐藏
Mading Li, Jiaying Liu, Chong Ruan, Lu Liu, Zongming Guo
In this paper, we introduce a novel block-based multiscale error concealment method using low-rank completion. The proposed method searches for similar blocks and utilizes low-rank completion to recover the missing pixels. In order to make the full use of the hidden redundant information of images, we seek for more similar blocks by building an image pyramid. The blocks collected from the pyramid are more similar to each other, which leads to a more accurate recovery. Moreover, instead of recovering the missing block at once, we propose a ringlike iterative process to partially minimize the number of unknown pixels and further enhance the recovery result. Experimental results demonstrate the effectiveness of the proposed method.
本文提出了一种基于低秩补全的基于分块的多尺度误差隐藏方法。该方法通过搜索相似块,利用低秩补全恢复缺失像素。为了充分利用图像中隐藏的冗余信息,我们通过构建图像金字塔来寻找更多相似的块。从金字塔中收集的石块彼此之间更加相似,这导致了更准确的恢复。此外,我们提出了一种环形迭代过程,以部分减少未知像素的数量,进一步提高恢复效果,而不是立即恢复缺失的块。实验结果证明了该方法的有效性。
{"title":"Block-based multiscale error concealment using low-rank completion","authors":"Mading Li, Jiaying Liu, Chong Ruan, Lu Liu, Zongming Guo","doi":"10.1109/APSIPA.2014.7041587","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041587","url":null,"abstract":"In this paper, we introduce a novel block-based multiscale error concealment method using low-rank completion. The proposed method searches for similar blocks and utilizes low-rank completion to recover the missing pixels. In order to make the full use of the hidden redundant information of images, we seek for more similar blocks by building an image pyramid. The blocks collected from the pyramid are more similar to each other, which leads to a more accurate recovery. Moreover, instead of recovering the missing block at once, we propose a ringlike iterative process to partially minimize the number of unknown pixels and further enhance the recovery result. Experimental results demonstrate the effectiveness of the proposed method.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive audio scrambling via complete binary tree's traversal and wavelet transform 通过完全二叉树遍历和小波变换实现音频累进置乱
Twe Ta Oo, T. Onoye
In this paper, we firstly propose an effective audio scrambling method based on the pre-order traversal of a complete binary tree in time domain. The proposed method is fast, simple and has good scrambling effect. Then, with the aim of strengthening the anti-decryption capability, we present a wavelet-domain based scheme by considering not only the pre-order but also the in-/post-order based scrambling methods. First, an audio signal is wavelet-decomposed and retrieves the layers of wavelet coefficients. Then, the coefficients in each layer are scrambled by randomly chosen one out of the three methods. Anyone without knowledge of the correct wavelet decomposition parameters and the scrambling method used for each layer will never be able to descramble the signal successfully. Moreover, the new scheme also achieves progressive scrambling that enables to generate the audio outputs with different quality levels by controlling the scrambling degree as required.
本文首先提出了一种有效的音频置乱方法,该方法基于时间域完全二叉树的预序遍历。该方法快速、简便,置乱效果好。然后,为了增强抗解密能力,我们提出了一种基于小波域的方案,该方案既考虑了前序加扰,又考虑了后序加扰方法。首先,对音频信号进行小波分解,提取小波系数层。然后,从三种方法中随机选择一种方法对每一层的系数进行置乱。任何不知道正确的小波分解参数和每层使用的加扰方法的人都无法成功地解扰信号。此外,该方案还实现了累进置乱,可以根据需要控制置乱程度,从而产生不同质量水平的音频输出。
{"title":"Progressive audio scrambling via complete binary tree's traversal and wavelet transform","authors":"Twe Ta Oo, T. Onoye","doi":"10.1109/APSIPA.2014.7041525","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041525","url":null,"abstract":"In this paper, we firstly propose an effective audio scrambling method based on the pre-order traversal of a complete binary tree in time domain. The proposed method is fast, simple and has good scrambling effect. Then, with the aim of strengthening the anti-decryption capability, we present a wavelet-domain based scheme by considering not only the pre-order but also the in-/post-order based scrambling methods. First, an audio signal is wavelet-decomposed and retrieves the layers of wavelet coefficients. Then, the coefficients in each layer are scrambled by randomly chosen one out of the three methods. Anyone without knowledge of the correct wavelet decomposition parameters and the scrambling method used for each layer will never be able to descramble the signal successfully. Moreover, the new scheme also achieves progressive scrambling that enables to generate the audio outputs with different quality levels by controlling the scrambling degree as required.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133217095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
HMM-based Thai speech synthesis using unsupervised stress context labeling 基于hmm的泰语语音合成方法的无监督重音语境标注
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose an unsupervised technique for automatic labeling based on the characteristics of Thai stressed syllables, namely, having high FO movement and long duration. In the proposed technique, we use log FO variance and duration of each syllable to classify it into one of stress-related context classes. Objective and subjective evaluation results show that the proposed context labeling gives comparable performance to that conducted carefully by a human in terms of tone naturalness of synthetic speech.
本文描述了一种利用重音上下文的基于hmm的泰语语音合成方法。研究表明,与重读/非重读音节信息相关的语境(重音语境)显著提高了合成语音的音调正确性,但在声调建模中存在需要人工语境标注过程的问题。为了降低重音上下文标注的成本,本文基于泰语重音音节高FO运动和长持续时间的特点,提出了一种无监督的自动标注技术。在提出的技术中,我们使用每个音节的对数方差和持续时间将其分类到一个与重音相关的上下文类中。客观评价和主观评价结果表明,本文提出的上下文标注方法在合成语音的音调自然度方面与人工标注效果相当。
{"title":"HMM-based Thai speech synthesis using unsupervised stress context labeling","authors":"Decha Moungsri, Tomoki Koriyama, Takao Kobayashi","doi":"10.1109/APSIPA.2014.7041599","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041599","url":null,"abstract":"This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose an unsupervised technique for automatic labeling based on the characteristics of Thai stressed syllables, namely, having high FO movement and long duration. In the proposed technique, we use log FO variance and duration of each syllable to classify it into one of stress-related context classes. Objective and subjective evaluation results show that the proposed context labeling gives comparable performance to that conducted carefully by a human in terms of tone naturalness of synthetic speech.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1