Digital media technologies have become an integral part of the way we create, communicate, and consume information. At the core of these technologies are source coding methods that are described in this monograph. Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are described: entropy coding, quantization as well as predictive and transform coding. The emphasis is put onto algorithms that are also used in video coding, which will be explained in the other part of this two-part monograph.
{"title":"Source Coding: Part I of Fundamentals of Source and Video Coding","authors":"T. Wiegand, H. Schwarz","doi":"10.1561/2000000010","DOIUrl":"https://doi.org/10.1561/2000000010","url":null,"abstract":"Digital media technologies have become an integral part of the way we create, communicate, and consume information. At the core of these technologies are source coding methods that are described in this monograph. Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are described: entropy coding, quantization as well as predictive and transform coding. The emphasis is put onto algorithms that are also used in video coding, which will be explained in the other part of this two-part monograph.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"61 1","pages":"1-222"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82784981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In December 1974 the first realtime conversation on the ARPAnet took place between Culler-Harrison Incorporated in Goleta, California, and MIT Lincoln Laboratory in Lexington, Massachusetts. This was the first successful application of realtime digital speech communication over a packet network and an early milestone in the explosion of realtime signal processing of speech, audio, images, and video that we all take for granted today. It could be considered as the first voice over Internet Protocol (VoIP), except that the Internet Protocol (IP) had not yet been established. In fact, the interest in realtime signal processing had an indirect, but major, impact on the development of IP. This is the story of the development of linear predictive coded (LPC) speech and how it came to be used in the first successful packet speech experiments. Several related stories are recounted as well. This is the second part of a two part monograph on linear predictive coding (LPC) and the Internet protocol (IP). The first part presented an introduction to this history and a tutorial on linear prediction and its applications to speech, providing background and context to the technical history of the second part.
{"title":"A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol","authors":"R. Gray","doi":"10.1561/2000000036","DOIUrl":"https://doi.org/10.1561/2000000036","url":null,"abstract":"In December 1974 the first realtime conversation on the ARPAnet took place between Culler-Harrison Incorporated in Goleta, California, and MIT Lincoln Laboratory in Lexington, Massachusetts. This was the first successful application of realtime digital speech communication over a packet network and an early milestone in the explosion of realtime signal processing of speech, audio, images, and video that we all take for granted today. It could be considered as the first voice over Internet Protocol (VoIP), except that the Internet Protocol (IP) had not yet been established. In fact, the interest in realtime signal processing had an indirect, but major, impact on the development of IP. This is the story of the development of linear predictive coded (LPC) speech and how it came to be used in the first successful packet speech experiments. Several related stories are recounted as well. \u0000 \u0000This is the second part of a two part monograph on linear predictive coding (LPC) and the Internet protocol (IP). The first part presented an introduction to this history and a tutorial on linear prediction and its applications to speech, providing background and context to the technical history of the second part.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"70 1","pages":"203-303"},"PeriodicalIF":0.0,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86222082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linear prediction has long played an important role in speech processing, especially in the development during the late 1960s of the first low bit rate speech compression/coding systems. The approach, which eventually became known as linear predictive coding (LPC), coincidentally came to fruition at the right time to be adopted as the speech compression technique in the first successful realtime packet speech communication through the nascent ARPAnet in December 1974 — the ancestor of voice over the Internet Protocol (IP) and, more generally, of realtime signal processing through the Internet. This first part of a two part monograph on LPC and the IP provides a tutorial overview of linear prediction and its application to speech coding. A variety of viewpoints provides background and context for the second part, which comprises a technical and personal history of LPC, its use in the first packet speech demonstrations, and many related stories of the early applications of LPC and the prehistory of the Internet.
{"title":"A Survey of Linear Predictive Coding: Part I of Linear Predictive Coding and the Internet Protocol","authors":"R. Gray","doi":"10.1561/2000000029","DOIUrl":"https://doi.org/10.1561/2000000029","url":null,"abstract":"Linear prediction has long played an important role in speech processing, especially in the development during the late 1960s of the first low bit rate speech compression/coding systems. The approach, which eventually became known as linear predictive coding (LPC), coincidentally came to fruition at the right time to be adopted as the speech compression technique in the first successful realtime packet speech communication through the nascent ARPAnet in December 1974 — the ancestor of voice over the Internet Protocol (IP) and, more generally, of realtime signal processing through the Internet. This first part of a two part monograph on LPC and the IP provides a tutorial overview of linear prediction and its application to speech coding. A variety of viewpoints provides background and context for the second part, which comprises a technical and personal history of LPC, its use in the first packet speech demonstrations, and many related stories of the early applications of LPC and the prehistory of the Internet.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"28 1","pages":"153-202"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86131769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Chellappa, Aswin C. Sankaranarayanan, A. Veeraraghavan, P. Turaga
Computer vision systems attempt to understand a scene and its components from mostly visual information. The geometry exhibited by the real world, the influence of material properties on scattering of incident light, and the process of imaging introduce constraints and properties that are key to interpreting scenes and recognizing objects, their structure and kinematics. In the presence of noisy observations and other uncertainties, computer vision algorithms make use of statistical methods for robust inference. In this monograph, we highlight the role of geometric constraints in statistical estimation methods, and how the interplay between geometry and statistics leads to the choice and design of algorithms for video-based tracking, modeling and recognition of objects. In particular, we illustrate the role of imaging, illumination, and motion constraints in classical vision problems such as tracking, structure from motion, metrology, activity analysis and recognition, and present appropriate statistical methods used in each of these problems.
{"title":"Statistical Methods and Models for Video-Based Tracking, Modeling, and Recognition","authors":"R. Chellappa, Aswin C. Sankaranarayanan, A. Veeraraghavan, P. Turaga","doi":"10.1561/2000000007","DOIUrl":"https://doi.org/10.1561/2000000007","url":null,"abstract":"Computer vision systems attempt to understand a scene and its components from mostly visual information. The geometry exhibited by the real world, the influence of material properties on scattering of incident light, and the process of imaging introduce constraints and properties that are key to interpreting scenes and recognizing objects, their structure and kinematics. In the presence of noisy observations and other uncertainties, computer vision algorithms make use of statistical methods for robust inference. In this monograph, we highlight the role of geometric constraints in statistical estimation methods, and how the interplay between geometry and statistics leads to the choice and design of algorithms for video-based tracking, modeling and recognition of objects. In particular, we illustrate the role of imaging, illumination, and motion constraints in classical vision problems such as tracking, structure from motion, metrology, activity analysis and recognition, and present appropriate statistical methods used in each of these problems.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"1 1","pages":"1-151"},"PeriodicalIF":0.0,"publicationDate":"2010-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79801602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion compensation exploits temporal correlation in a video sequence to yield high compression efficiency. Multiple reference frame motion compensation is an extension of motion compensation that exploits temporal correlation over a longer time scale. Devised mainly for increasing compression efficiency, it exhibits useful properties such as enhanced error resilience and error concealment. In this survey, we explore different aspects of multiple reference frame motion compensation, including multihypothesis prediction, global motion prediction, improved error resilience and concealment for multiple references, and algorithms for fast motion estimation in the context of multiple reference frame video encoders.
{"title":"Multiple Reference Motion Compensation: A Tutorial Introduction and Survey","authors":"A. Leontaris, P. Cosman, A. Tourapis","doi":"10.1561/2000000019","DOIUrl":"https://doi.org/10.1561/2000000019","url":null,"abstract":"Motion compensation exploits temporal correlation in a video sequence to yield high compression efficiency. Multiple reference frame motion compensation is an extension of motion compensation that exploits temporal correlation over a longer time scale. Devised mainly for increasing compression efficiency, it exhibits useful properties such as enhanced error resilience and error concealment. In this survey, we explore different aspects of multiple reference frame motion compensation, including multihypothesis prediction, global motion prediction, improved error resilience and concealment for multiple references, and algorithms for fast motion estimation in the context of multiple reference frame video encoders.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"28 1","pages":"247-364"},"PeriodicalIF":0.0,"publicationDate":"2009-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74532268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This survey gives an introduction to redundant signal representations called frames. These representations have recently emerged as yet another powerful tool in the signal processing toolbox and have become popular through use in numerous applications. Our aim is to familiarize a general audience with the area, while at the same time giving a snapshot of the current state-of-the-art.
{"title":"An Introduction to Frames","authors":"J. Kovacevic, A. Chebira","doi":"10.1561/2000000006","DOIUrl":"https://doi.org/10.1561/2000000006","url":null,"abstract":"This survey gives an introduction to redundant signal representations called frames. These representations have recently emerged as yet another powerful tool in the signal processing toolbox and have become popular through use in numerous applications. Our aim is to familiarize a general audience with the area, while at the same time giving a snapshot of the current state-of-the-art.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"121 1","pages":"1-94"},"PeriodicalIF":0.0,"publicationDate":"2008-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73465229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the prime goals of statistical estimation theory is the development of performance bounds when estimating parameters of interest in a given model, as well as constructing estimators that achieve these limits. When the parameters to be estimated are deterministic, a popular approach is to bound the mean-squared error (MSE) achievable within the class of unbiased estimators. Although it is well-known that lower MSE can be obtained by allowing for a bias, in applications it is typically unclear how to choose an appropriate bias. In this survey we introduce MSE bounds that are lower than the unbiased Cramer–Rao bound (CRB) for all values of the unknowns. We then present a general framework for constructing biased estimators with smaller MSE than the standard maximum-likelihood (ML) approach, regardless of the true unknown values. Specializing the results to the linear Gaussian model, we derive a class of estimators that dominate least-squares in terms of MSE. We also introduce methods for choosing regularization parameters in penalized ML estimators that outperform standard techniques such as cross validation.
{"title":"Rethinking Biased Estimation: Improving Maximum Likelihood and the Cramér-Rao Bound","authors":"Yonina C. Eldar","doi":"10.1561/2000000008","DOIUrl":"https://doi.org/10.1561/2000000008","url":null,"abstract":"One of the prime goals of statistical estimation theory is the development of performance bounds when estimating parameters of interest in a given model, as well as constructing estimators that achieve these limits. When the parameters to be estimated are deterministic, a popular approach is to bound the mean-squared error (MSE) achievable within the class of unbiased estimators. Although it is well-known that lower MSE can be obtained by allowing for a bias, in applications it is typically unclear how to choose an appropriate bias. \u0000 \u0000In this survey we introduce MSE bounds that are lower than the unbiased Cramer–Rao bound (CRB) for all values of the unknowns. We then present a general framework for constructing biased estimators with smaller MSE than the standard maximum-likelihood (ML) approach, regardless of the true unknown values. Specializing the results to the linear Gaussian model, we derive a class of estimators that dominate least-squares in terms of MSE. We also introduce methods for choosing regularization parameters in penalized ML estimators that outperform standard techniques such as cross validation.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"18 1","pages":"305-449"},"PeriodicalIF":0.0,"publicationDate":"2008-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85807630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This monograph describes current-day wavelet transform image coding systems. As in the first part, steps of the algorithms are explained thoroughly and set apart. An image coding system consists of several stages: transformation, quantization, set partition or adaptive entropy coding or both, decoding including rate control, inverse transformation, de-quantization, and optional processing (see Figure 1.6). Wavelet transform systems can provide many desirable properties besides high efficiency, such as scalability in quality, scalability in resolution, and region-of-interest access to the coded bitstream. These properties are built into the JPEG2000 standard, so its coding will be fully described. Since JPEG2000 codes subblocks of subbands, other methods, such as SBHP (Subband Block Hierarchical Partitioning) [3] and EZBC (Embedded Zero Block Coder) [8], that code subbands or its subblocks independently are also described. The emphasis in this part is the use of the basic algorithms presented in the previous part in ways that achieve these desirable bitstream properties. In this vein, we describe a modification of the tree-based coding in SPIHT (Set Partitioning In Hierarchical Trees) [15], whose output bitstream can be decoded partially corresponding to a designated region of interest and is simultaneously quality and resolution scalable. This monograph is extracted and adapted from the forthcoming textbook entitled Digital Signal Compression: Principles and Practice by William A. Pearlman and Amir Said, Cambridge University Press, 2009.
这本专著描述了当前的小波变换图像编码系统。与第一部分一样,算法的步骤进行了彻底的解释并进行了区分。图像编码系统包括变换、量化、集合分割或自适应熵编码或两者同时进行的几个阶段,解码包括速率控制、逆变换、去量化和可选处理(见图1.6)。小波变换系统除了效率高外,还能提供许多理想的特性,如质量的可扩展性、分辨率的可扩展性和对编码比特流的兴趣区域访问。这些属性内置于JPEG2000标准中,因此将对其编码进行完整描述。由于JPEG2000对子带的子块进行编码,因此还描述了对子带或其子块进行独立编码的其他方法,如shbhp (Subband Block Hierarchical Partitioning)[3]和EZBC (Embedded Zero Block Coder)[8]。本部分的重点是使用前一部分中介绍的基本算法来实现这些理想的比特流属性。在这方面,我们描述了SPIHT (Set Partitioning In Hierarchical Trees)[15]中基于树的编码的一种修改,其输出比特流可以部分对应于指定的感兴趣区域进行解码,同时具有质量和分辨率可扩展性。本专著摘自即将出版的教科书《数字信号压缩:原理与实践》,作者是William A. Pearlman和Amir Said,剑桥大学出版社,2009年。
{"title":"Set Partition Coding: Part II of Set Partition Coding and Image Wavelet Coding Systems","authors":"W. Pearlman, A. Said","doi":"10.1561/2000000014","DOIUrl":"https://doi.org/10.1561/2000000014","url":null,"abstract":"This monograph describes current-day wavelet transform image coding systems. As in the first part, steps of the algorithms are explained thoroughly and set apart. An image coding system consists of several stages: transformation, quantization, set partition or adaptive entropy coding or both, decoding including rate control, inverse transformation, de-quantization, and optional processing (see Figure 1.6). Wavelet transform systems can provide many desirable properties besides high efficiency, such as scalability in quality, scalability in resolution, and region-of-interest access to the coded bitstream. These properties are built into the JPEG2000 standard, so its coding will be fully described. Since JPEG2000 codes subblocks of subbands, other methods, such as SBHP (Subband Block Hierarchical Partitioning) [3] and EZBC (Embedded Zero Block Coder) [8], that code subbands or its subblocks independently are also described. The emphasis in this part is the use of the basic algorithms presented in the previous part in ways that achieve these desirable bitstream properties. In this vein, we describe a modification of the tree-based coding in SPIHT (Set Partitioning In Hierarchical Trees) [15], whose output bitstream can be decoded partially corresponding to a designated region of interest and is simultaneously quality and resolution scalable. \u0000 \u0000This monograph is extracted and adapted from the forthcoming textbook entitled Digital Signal Compression: Principles and Practice by William A. Pearlman and Amir Said, Cambridge University Press, 2009.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"102 1","pages":"181-246"},"PeriodicalIF":0.0,"publicationDate":"2008-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74046815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this two-part monograph is to present a tutorial on set partition coding, with emphasis and examples on image wavelet transform coding systems, and describe their use in modern image coding systems. Set partition coding is a procedure that recursively splits groups of integer data or transform elements guided by a sequence of threshold tests, producing groups of elements whose magnitudes are between two known thresholds, therefore, setting the maximum number of bits required for their binary representation. It produces groups of elements whose magnitudes are less than a certain known threshold. Therefore, the number of bits for representing an element in a particular group is no more than the base-2 logarithm of its threshold rounded up to the nearest integer. SPIHT (Set Partitioning in Hierarchical Trees) and SPECK (Set Partitioning Embedded blocK) are popular state-of-the-art image coders that use set partition coding as the primary entropy coding method. JPEG2000 and EZW (Embedded Zerotree Wavelet) use it in an auxiliary manner. Part I elucidates the fundamentals of set partition coding and explains the setting of thresholds and the block and tree modes of partitioning. Algorithms are presented for the techniques of AGP (Amplitude and Group Partitioning), SPIHT, SPECK, and EZW. Numerical examples are worked out in detail for the latter three techniques. Part II describes various wavelet image coding systems that use set partitioning primarily, such as SBHP (Subband Block Hierarchical Partitioning), SPIHT, and EZBC (Embedded Zero-Block Coder). The basic JPEG2000 coder is also described. The coding procedures and the specific methods are presented both logically and in algorithmic form, where possible. Besides the obvious objective of obtaining small file sizes, much emphasis is placed on achieving low computational complexity and desirable output bitstream attributes, such as embeddedness, scalability in resolution, and random access decodability. This monograph is extracted and adapted from the forthcoming textbook entitled Digital Signal Compression: Principles and Practice by William A. Pearlman and Amir Said, Cambridge University Press, 2009.
这两部分专著的目的是提供一个集分割编码的教程,重点和图像小波变换编码系统的例子,并描述它们在现代图像编码系统中的应用。设置分割编码是一个过程,它递归地分割整数数据组或在一系列阈值测试的指导下转换元素,从而产生大小在两个已知阈值之间的元素组,从而设置其二进制表示所需的最大位数。它产生的元素群的大小小于某个已知阈值。因此,表示特定组中的元素的位数不超过其阈值的以2为底的对数,四舍五入到最接近的整数。SPIHT (Set Partitioning in Hierarchical Trees)和SPECK (Set Partitioning Embedded blocK)是目前流行的图像编码器,它们使用集合划分编码作为主要的熵编码方法。JPEG2000和EZW(嵌入式零树小波)以辅助的方式使用它。第1部分阐述了集合分区编码的基础知识,并解释了阈值的设置以及分区的块和树模式。提出了AGP(振幅和群划分)、SPIHT、SPECK和EZW技术的算法。对后三种技术进行了详细的数值算例。第二部分描述了主要使用集合分区的各种小波图像编码系统,如shbhp(子带块分层分区)、SPIHT和EZBC(嵌入式零块编码器)。介绍了基本的JPEG2000编码器。编码过程和具体方法在可能的情况下以逻辑和算法的形式呈现。除了获得小文件大小的明显目标外,还非常重视实现低计算复杂度和理想的输出比特流属性,例如嵌入性、分辨率的可伸缩性和随机访问可解码性。本专著摘自即将出版的教科书《数字信号压缩:原理与实践》,作者是William A. Pearlman和Amir Said,剑桥大学出版社,2009年。
{"title":"Set Partition Coding: Part I of Set Partition Coding and Image Wavelet Coding Systems","authors":"W. Pearlman, A. Said","doi":"10.1561/2000000013","DOIUrl":"https://doi.org/10.1561/2000000013","url":null,"abstract":"The purpose of this two-part monograph is to present a tutorial on set partition coding, with emphasis and examples on image wavelet transform coding systems, and describe their use in modern image coding systems. Set partition coding is a procedure that recursively splits groups of integer data or transform elements guided by a sequence of threshold tests, producing groups of elements whose magnitudes are between two known thresholds, therefore, setting the maximum number of bits required for their binary representation. It produces groups of elements whose magnitudes are less than a certain known threshold. Therefore, the number of bits for representing an element in a particular group is no more than the base-2 logarithm of its threshold rounded up to the nearest integer. SPIHT (Set Partitioning in Hierarchical Trees) and SPECK (Set Partitioning Embedded blocK) are popular state-of-the-art image coders that use set partition coding as the primary entropy coding method. JPEG2000 and EZW (Embedded Zerotree Wavelet) use it in an auxiliary manner. Part I elucidates the fundamentals of set partition coding and explains the setting of thresholds and the block and tree modes of partitioning. Algorithms are presented for the techniques of AGP (Amplitude and Group Partitioning), SPIHT, SPECK, and EZW. Numerical examples are worked out in detail for the latter three techniques. Part II describes various wavelet image coding systems that use set partitioning primarily, such as SBHP (Subband Block Hierarchical Partitioning), SPIHT, and EZBC (Embedded Zero-Block Coder). The basic JPEG2000 coder is also described. The coding procedures and the specific methods are presented both logically and in algorithmic form, where possible. Besides the obvious objective of obtaining small file sizes, much emphasis is placed on achieving low computational complexity and desirable output bitstream attributes, such as embeddedness, scalability in resolution, and random access decodability. \u0000 \u0000This monograph is extracted and adapted from the forthcoming textbook entitled Digital Signal Compression: Principles and Practice by William A. Pearlman and Amir Said, Cambridge University Press, 2009.","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"109 1","pages":"95-180"},"PeriodicalIF":0.0,"publicationDate":"2008-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80675851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since even before the time of Alexander Graham Bell's revolutionary invention, engineers and scientists have studied the phenomenon of speech communication with an eye on creating more efficient and effective systems of human-to-human and human-to-machine communication. Starting in the 1960s, digital signal processing (DSP), assumed a central role in speech studies, and today DSP is the key to realizing the fruits of the knowledge that has been gained through decades of research. Concomitant advances in integrated circuit technology and computer architecture have aligned to create a technological environment with virtually limitless opportunities for innovation in speech communication applications. In this text, we highlight the central role of DSP techniques in modern speech communication research and applications. We present a comprehensive overview of digital speech processing that ranges from the basic nature of the speech signal, through a variety of methods of representing speech in digital form, to applications in voice communication and automatic synthesis and recognition of speech. The breadth of this subject does not allow us to discuss any aspect of speech processing to great depth; hence our goal is to provide a useful introduction to the wide range of important concepts that comprise the field of digital speech processing. A more comprehensive treatment will appear in the forthcoming book, Theory and Application of Digital Speech Processing [101].
早在亚历山大·格雷厄姆·贝尔(Alexander Graham Bell)的革命性发明之前,工程师和科学家就开始研究语音交流现象,着眼于创造更高效的人机交流系统。从20世纪60年代开始,数字信号处理(DSP)在语音研究中起着核心作用,今天DSP是实现几十年研究成果的关键。伴随着集成电路技术和计算机体系结构的进步,为语音通信应用的创新创造了几乎无限的技术环境。在本文中,我们强调了DSP技术在现代语音通信研究和应用中的核心作用。我们对数字语音处理进行了全面的概述,从语音信号的基本性质,通过各种以数字形式表示语音的方法,到语音通信和语音自动合成和识别的应用。这个主题的广度不允许我们深入讨论语音处理的任何方面;因此,我们的目标是为组成数字语音处理领域的广泛重要概念提供有用的介绍。更全面的处理将出现在即将出版的《数字语音处理理论与应用》一书中[101]。
{"title":"Introduction to Digital Speech Processing","authors":"L. Rabiner, R. Schafer","doi":"10.1561/2000000001","DOIUrl":"https://doi.org/10.1561/2000000001","url":null,"abstract":"Since even before the time of Alexander Graham Bell's revolutionary invention, engineers and scientists have studied the phenomenon of speech communication with an eye on creating more efficient and effective systems of human-to-human and human-to-machine communication. Starting in the 1960s, digital signal processing (DSP), assumed a central role in speech studies, and today DSP is the key to realizing the fruits of the knowledge that has been gained through decades of research. Concomitant advances in integrated circuit technology and computer architecture have aligned to create a technological environment with virtually limitless opportunities for innovation in speech communication applications. In this text, we highlight the central role of DSP techniques in modern speech communication research and applications. We present a comprehensive overview of digital speech processing that ranges from the basic nature of the speech signal, through a variety of methods of representing speech in digital form, to applications in voice communication and automatic synthesis and recognition of speech. The breadth of this subject does not allow us to discuss any aspect of speech processing to great depth; hence our goal is to provide a useful introduction to the wide range of important concepts that comprise the field of digital speech processing. A more comprehensive treatment will appear in the forthcoming book, Theory and Application of Digital Speech Processing [101].","PeriodicalId":12340,"journal":{"name":"Found. Trends Signal Process.","volume":"73 1","pages":"1-194"},"PeriodicalIF":0.0,"publicationDate":"2007-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74071887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}