首页 > 最新文献

Ninth IEEE International Symposium on Multimedia (ISM 2007)最新文献

英文 中文
Making Sense of Ubiquitous Media style 理解无处不在的媒体风格
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.4412352
M. Muhlhauser
In the emerging Post-PC era, more and more computers 'in the net' can see, hear, or feel. Since these computers are networked, they can cooperate in the interpretation of their 'sensation'. Cameras, camcorders, etc. will soon be wirelessly connected, doubling as mobile phones. In other words: multimedia goes ubiquitous. On the other hand, users leverage off the wealth of text-based information present in the global Internet. However, the potential that lies in the 'cooperative sensation' and in the use of global textual information is by far not leveraged: it is the past, present, and future grand challenge to enable computers to 'make more sense' of all this information. The talk will provide a unified model for both multimedia sense-making and textual-information sense-making, and propose fostering the confluence of these two threads. Based on this unified view, it will suggest steps towards improved sense-making in the world of ubiquitous computers.
在后pc时代,越来越多的电脑可以“上网”看、听、有感觉。由于这些计算机是联网的,它们可以合作解释它们的“感觉”。照相机、摄像机等将很快实现无线连接,相当于手机的两倍。换句话说:多媒体无处不在。另一方面,用户利用了全球互联网上大量基于文本的信息。然而,“合作感”和使用全局文本信息的潜力到目前为止还没有得到充分利用:使计算机能够“更有意义”地理解所有这些信息,这是过去、现在和未来的重大挑战。本文将提供多媒体意义建构与文本信息意义建构的统一模型,并建议促进这两个线索的融合。基于这种统一的观点,它将提出在无所不在的计算机世界中改进意义构建的步骤。
{"title":"Making Sense of Ubiquitous Media style","authors":"M. Muhlhauser","doi":"10.1109/ISM.2007.4412352","DOIUrl":"https://doi.org/10.1109/ISM.2007.4412352","url":null,"abstract":"In the emerging Post-PC era, more and more computers 'in the net' can see, hear, or feel. Since these computers are networked, they can cooperate in the interpretation of their 'sensation'. Cameras, camcorders, etc. will soon be wirelessly connected, doubling as mobile phones. In other words: multimedia goes ubiquitous. On the other hand, users leverage off the wealth of text-based information present in the global Internet. However, the potential that lies in the 'cooperative sensation' and in the use of global textual information is by far not leveraged: it is the past, present, and future grand challenge to enable computers to 'make more sense' of all this information. The talk will provide a unified model for both multimedia sense-making and textual-information sense-making, and propose fostering the confluence of these two threads. Based on this unified view, it will suggest steps towards improved sense-making in the world of ubiquitous computers.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129674023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Role of QoE on IPTV Services style QoE在IPTV业务模式中的作用
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.46
J. Kishigami
The IPTV, Internet Protocol TV, is one of the hottest topics as an emerging service. This new media service has a significant potential where a various kind of content can be enjoyed in a variety of way. We are living in the content-centric world. This flood of data thanks to the evolution of the hardware since 60 year- old transistor technology becomes the potential problem these days. The user experience of this new media is thought as a key factor to success an IPTV service. Since a very early stage in ITU-T Focus Group on IPTV, QoE, Quality of Experience, is considered as a most important factor. This subjective concept should be measurable in a same manner as the QoS. The metadata function for the personalized service in IPTV will be described also.
IPTV,即互联网协议电视,作为一种新兴的服务,是当下最热门的话题之一。这种新媒体服务具有巨大的潜力,可以以各种方式享受各种内容。我们生活在一个以内容为中心的世界。自从60年前的晶体管技术以来,由于硬件的发展,这种数据的洪流成为了这些天的潜在问题。这种新媒体的用户体验被认为是IPTV服务成功的关键因素。在ITU-T IPTV焦点组的早期阶段,体验质量(QoE)就被认为是最重要的因素。这个主观概念应该以与QoS相同的方式进行测量。介绍了IPTV个性化服务的元数据功能。
{"title":"The Role of QoE on IPTV Services style","authors":"J. Kishigami","doi":"10.1109/ISM.2007.46","DOIUrl":"https://doi.org/10.1109/ISM.2007.46","url":null,"abstract":"The IPTV, Internet Protocol TV, is one of the hottest topics as an emerging service. This new media service has a significant potential where a various kind of content can be enjoyed in a variety of way. We are living in the content-centric world. This flood of data thanks to the evolution of the hardware since 60 year- old transistor technology becomes the potential problem these days. The user experience of this new media is thought as a key factor to success an IPTV service. Since a very early stage in ITU-T Focus Group on IPTV, QoE, Quality of Experience, is considered as a most important factor. This subjective concept should be measurable in a same manner as the QoS. The metadata function for the personalized service in IPTV will be described also.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121062913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Quality Compressed Steganography Using Hidden Referenced Halftoning 使用隐藏参考半调的高质量压缩隐写
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.50
Jing-Ming Guo, Jen-Ho Chen
Block truncation coding is an efficient compression technique while offering good image quality. Nonetheless, the blocking effect inherent in BTC causes severe perceptual artifact in high compression ratio applications. In this paper, an error-diffused block truncation coding (EDBTC) is proposed to solve this problem. According to the EDBTC, the error caused by the difference between the original grayscale pixel value and the correspondingly high or low mean substitute is diffused to the predefined neighborhood, and hence the average grayscale will be maintained invariably. In addition, since the compressed data are widely distributed in the internet transmission, the extra message delivering in a secret way also highly raises attention recently. In this paper, we propose the compressed steganography using Hidden Referenced Halftoning (CSHRH), which cooperates with error diffusion and ordered dithering to achieve the objective of secret communication in BTC images. As documented in the experimental results, a low complexity with good image quality approach is obtained. Moreover, CSHRH is extended to secret-sharing steganography (SSS) and color extension steganography (CES). The SSS is able to distribute message into multiple host images and hence improves the security. The CES is able to deliver secure message via color embedded CSHRH image. Both extensions are also with an extra benefit of achieving high capacity message convection.
块截断编码是一种有效的压缩技术,同时能提供良好的图像质量。然而,在高压缩比应用中,BTC固有的阻塞效应会导致严重的感知伪影。本文提出了一种错误扩散块截断编码(EDBTC)来解决这一问题。根据EDBTC,原始灰度像素值与相应的高或低均值代换值之间的差异所引起的误差被扩散到预定义的邻域,从而始终保持平均灰度。此外,由于压缩后的数据在网络传输中广泛分布,以保密方式传递的额外信息也引起了人们的高度关注。本文提出了一种利用隐藏参考半调(CSHRH)与误差扩散和有序抖动相结合的压缩隐写技术,以实现BTC图像的秘密通信。实验结果表明,该方法具有较低的复杂度和较好的图像质量。此外,CSHRH还扩展到秘密共享隐写(SSS)和颜色扩展隐写(CES)。SSS能够将消息分发到多个主机映像中,从而提高了安全性。CES可以通过彩色嵌入CSHRH图像传递安全信息。这两个扩展还具有实现高容量消息对流的额外好处。
{"title":"Quality Compressed Steganography Using Hidden Referenced Halftoning","authors":"Jing-Ming Guo, Jen-Ho Chen","doi":"10.1109/ISM.2007.50","DOIUrl":"https://doi.org/10.1109/ISM.2007.50","url":null,"abstract":"Block truncation coding is an efficient compression technique while offering good image quality. Nonetheless, the blocking effect inherent in BTC causes severe perceptual artifact in high compression ratio applications. In this paper, an error-diffused block truncation coding (EDBTC) is proposed to solve this problem. According to the EDBTC, the error caused by the difference between the original grayscale pixel value and the correspondingly high or low mean substitute is diffused to the predefined neighborhood, and hence the average grayscale will be maintained invariably. In addition, since the compressed data are widely distributed in the internet transmission, the extra message delivering in a secret way also highly raises attention recently. In this paper, we propose the compressed steganography using Hidden Referenced Halftoning (CSHRH), which cooperates with error diffusion and ordered dithering to achieve the objective of secret communication in BTC images. As documented in the experimental results, a low complexity with good image quality approach is obtained. Moreover, CSHRH is extended to secret-sharing steganography (SSS) and color extension steganography (CES). The SSS is able to distribute message into multiple host images and hence improves the security. The CES is able to deliver secure message via color embedded CSHRH image. Both extensions are also with an extra benefit of achieving high capacity message convection.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116872895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VEIL: A System for Certifying Video Provenance VEIL:视频来源认证系统
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.10
Ashish Gehani, U. Lindqvist
Traditionally, a consumer decided how much to trust a piece of data based on its source. As digital video cameras and editors become ubiquitous, an arbitrary video object is increasingly likely to be produced using a range of operations that combine clips from a multitude of sources. A consumer can determine the assurance level of the data by knowing its lineage. We describe a system to embed the provenance of the video into the data itself. As long as the video contains a predefined threshold of data (from the spatial and temporal domains), the entire lineage can be ascertained. We embed the metadata using subpixel linear interpolation between similar blocks in proximal frames. It can then be extracted in real time using a novel method for computing the embedded interpolation. We implemented the process in C and report the performance overhead it introduces for playing video files. We also characterize the tradeoff between the auxiliary channel's capacity (which limits the amount of provenance metadata that can be embedded) and the extent to which the video can be edited (in the spatial or temporal domains) while retaining complete lineage.
传统上,消费者根据数据的来源来决定对数据的信任程度。随着数码摄像机和编辑器变得无处不在,一个任意的视频对象越来越有可能是通过一系列的操作来制作的,这些操作结合了来自众多来源的剪辑。消费者可以通过了解数据的沿袭来确定数据的保证级别。我们描述了一个将视频的来源嵌入到数据本身的系统。只要视频包含预定义的数据阈值(来自空间和时间域),就可以确定整个谱系。我们在近端帧的相似块之间使用亚像素线性插值嵌入元数据。然后利用一种计算嵌入插值的新方法实时提取它。我们用C语言实现了这个过程,并报告了它在播放视频文件时带来的性能开销。我们还描述了辅助信道的容量(它限制了可以嵌入的来源元数据的数量)和视频可以编辑的程度(在空间或时间域)之间的权衡,同时保留完整的血统。
{"title":"VEIL: A System for Certifying Video Provenance","authors":"Ashish Gehani, U. Lindqvist","doi":"10.1109/ISM.2007.10","DOIUrl":"https://doi.org/10.1109/ISM.2007.10","url":null,"abstract":"Traditionally, a consumer decided how much to trust a piece of data based on its source. As digital video cameras and editors become ubiquitous, an arbitrary video object is increasingly likely to be produced using a range of operations that combine clips from a multitude of sources. A consumer can determine the assurance level of the data by knowing its lineage. We describe a system to embed the provenance of the video into the data itself. As long as the video contains a predefined threshold of data (from the spatial and temporal domains), the entire lineage can be ascertained. We embed the metadata using subpixel linear interpolation between similar blocks in proximal frames. It can then be extracted in real time using a novel method for computing the embedded interpolation. We implemented the process in C and report the performance overhead it introduces for playing video files. We also characterize the tradeoff between the auxiliary channel's capacity (which limits the amount of provenance metadata that can be embedded) and the extent to which the video can be edited (in the spatial or temporal domains) while retaining complete lineage.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121663541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Improving Throughput and Node Proximity of P2P Live Video Streaming through Overlay Adaptation 通过覆盖自适应提高P2P直播视频流的吞吐量和节点接近度
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.36
B. Biskupski, R. Cunningham, R. Meier
Due to the heterogeneity of the environment, in which hosts may have different bandwidth capacities and network distances between hosts vary, current mesh-based multicast protocols for video streaming over the Internet tend to in efficiently utilise the available bandwidth and often transfer large amounts of data between distant hosts. This limits system throughput, which results in reduced video quality, and imposes significant costs on Internet service providers (ISPs) caused by network traffic outside a provider's own network. This paper presents MeshTV, a mesh-based peer-to-peer (P2P) multicast protocol for streaming live video from a transmitter to numerous viewers. MeshTV proposes an algorithm for adapting the mesh overlay in which nodes explore their possible neighbour nodes and select neighbours so that data throughput is optimised and data is transmitted between nearby (low-latency) nodes, typically within the same ISP thus reducing the costs to ISPs. Our evaluation demonstrates that the adaptation algorithm used in MeshTV can improve video streaming throughput by over 100 % and typically reduces the distances (network latencies) between interacting nodes by 50 % compared to unoptimised mesh overlays.
由于环境的异质性,其中主机可能具有不同的带宽容量和主机之间的网络距离不同,目前基于网格的多播协议在互联网上的视频流往往不能有效地利用可用带宽,并且经常在远程主机之间传输大量数据。这限制了系统吞吐量,从而导致视频质量降低,并且由于提供商自己网络之外的网络流量而给互联网服务提供商(isp)带来了巨大的成本。本文提出了MeshTV,一种基于网格的点对点(P2P)多播协议,用于从发射机向众多观众传输实时视频。MeshTV提出了一种适应网格覆盖的算法,其中节点探索其可能的邻居节点并选择邻居,从而优化数据吞吐量,并在附近(低延迟)节点之间传输数据,通常在同一ISP内,从而降低ISP的成本。我们的评估表明,与未优化的网格覆盖相比,MeshTV中使用的自适应算法可以将视频流吞吐量提高100%以上,并且通常可以将交互节点之间的距离(网络延迟)减少50%。
{"title":"Improving Throughput and Node Proximity of P2P Live Video Streaming through Overlay Adaptation","authors":"B. Biskupski, R. Cunningham, R. Meier","doi":"10.1109/ISM.2007.36","DOIUrl":"https://doi.org/10.1109/ISM.2007.36","url":null,"abstract":"Due to the heterogeneity of the environment, in which hosts may have different bandwidth capacities and network distances between hosts vary, current mesh-based multicast protocols for video streaming over the Internet tend to in efficiently utilise the available bandwidth and often transfer large amounts of data between distant hosts. This limits system throughput, which results in reduced video quality, and imposes significant costs on Internet service providers (ISPs) caused by network traffic outside a provider's own network. This paper presents MeshTV, a mesh-based peer-to-peer (P2P) multicast protocol for streaming live video from a transmitter to numerous viewers. MeshTV proposes an algorithm for adapting the mesh overlay in which nodes explore their possible neighbour nodes and select neighbours so that data throughput is optimised and data is transmitted between nearby (low-latency) nodes, typically within the same ISP thus reducing the costs to ISPs. Our evaluation demonstrates that the adaptation algorithm used in MeshTV can improve video streaming throughput by over 100 % and typically reduces the distances (network latencies) between interacting nodes by 50 % compared to unoptimised mesh overlays.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Accelerating Embedded Multimedia Applications with Versatile and Reconfigurable Instruction Fusion 用通用和可重构指令融合加速嵌入式多媒体应用
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.23
A. Cheng
Continuously increasing demand for richer functionality, faster real-time communication, smaller feature size, longer battery life, more elevated security, and higher reliability is pushing the design for portable multimedia applications into the era where a single system is consisted of a general-purpose CPU interacting with several application-specific accelerating components and coprocessors to fulfill the ever diverse constraints imposed multi- directionally. The inter-component communication overhead, along with the engineering efforts required to integrate, verify, and validate such heterogeneous systems are scaled disproportionally as the complexity of such systems continue rising skyrocketedly. Moreover, due to limited instruction encoding space and the need to maintain backward compatibly in the future designs, designers are often forced to include only a very small subset of the total desired functionalities on chip, despite there can be more than sufficient silicon real estate to incorporate these specialized function units. This paper proposes a cost-effective technique of incorporating diverse functionalities into a single multi-purpose, streamlining acceleration unit, named Versatile Processing Unit (VPU), to replace the conventional ALU on a CPU. The proposed VPU can supply the general-purpose CPU with a rich set of streamlined operations, which may supersede some or even all of the heterogeneous cores. The superseded hardware components are removed to reduce the integration and communication overhead. The issues of limited instruction encoding space and future backward compatibility are resolved by our proposed dynamic instruction re-mapping technique, in which the instruction bit fields can be redefined on the fly to allow instruction space reuse at run time.
对更丰富的功能、更快的实时通信、更小的特性尺寸、更长的电池寿命、更高的安全性和更高的可靠性的不断增长的需求正在推动便携式多媒体应用程序的设计进入一个单一系统由一个通用CPU与几个特定应用程序的加速组件和协处理器相互作用组成的时代,以满足多向施加的各种限制。组件间的通信开销,以及集成、验证和验证这些异构系统所需的工程工作,随着这些系统的复杂性不断飙升而不成比例地扩大。此外,由于有限的指令编码空间和在未来的设计中保持向后兼容的需要,设计人员经常被迫在芯片上只包含所需功能的很小一部分,尽管可以有足够多的硅空间来包含这些专门的功能单元。本文提出了一种经济有效的技术,将多种功能整合到一个单一的多用途、流线型加速单元中,称为通用处理单元(VPU),以取代CPU上的传统ALU。所提出的VPU可以为通用CPU提供一组丰富的流线型操作,可以取代部分甚至全部的异构内核。删除被取代的硬件组件以减少集成和通信开销。我们提出的动态指令重映射技术解决了指令编码空间有限和未来向后兼容性的问题,该技术可以动态地重新定义指令位字段,以允许在运行时重用指令空间。
{"title":"Accelerating Embedded Multimedia Applications with Versatile and Reconfigurable Instruction Fusion","authors":"A. Cheng","doi":"10.1109/ISM.2007.23","DOIUrl":"https://doi.org/10.1109/ISM.2007.23","url":null,"abstract":"Continuously increasing demand for richer functionality, faster real-time communication, smaller feature size, longer battery life, more elevated security, and higher reliability is pushing the design for portable multimedia applications into the era where a single system is consisted of a general-purpose CPU interacting with several application-specific accelerating components and coprocessors to fulfill the ever diverse constraints imposed multi- directionally. The inter-component communication overhead, along with the engineering efforts required to integrate, verify, and validate such heterogeneous systems are scaled disproportionally as the complexity of such systems continue rising skyrocketedly. Moreover, due to limited instruction encoding space and the need to maintain backward compatibly in the future designs, designers are often forced to include only a very small subset of the total desired functionalities on chip, despite there can be more than sufficient silicon real estate to incorporate these specialized function units. This paper proposes a cost-effective technique of incorporating diverse functionalities into a single multi-purpose, streamlining acceleration unit, named Versatile Processing Unit (VPU), to replace the conventional ALU on a CPU. The proposed VPU can supply the general-purpose CPU with a rich set of streamlined operations, which may supersede some or even all of the heterogeneous cores. The superseded hardware components are removed to reduce the integration and communication overhead. The issues of limited instruction encoding space and future backward compatibility are resolved by our proposed dynamic instruction re-mapping technique, in which the instruction bit fields can be redefined on the fly to allow instruction space reuse at run time.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient and Effective Video Copy Detection Based on Spatiotemporal Analysis 基于时空分析的高效视频拷贝检测
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.38
Chih-Yi Chiu, Cheng-Chih Yang, Chu-Song Chen
In this paper, a novel method is presented to detect video copies for a given video query. These copies and the query have identical or near-duplicate content, which might differ in their spatiotemporal structures slightly. To address both the efficient and effective issues, we conduct the bag-of words model for video feature representation, and apply a coarse-to-fine matching scheme to analyze the video spatiotemporal structure. The proposed method can deal with various kinds of video transformations, such as cropping, zooming, speed change, and subsequence insertion/deletion, which are not well addressed in existing methods. Besides, two indexing methods are employed to speed up the matching process. Experimental results show that the proposed method can behave in an efficient and effective manner.
针对给定的视频查询,提出了一种检测视频副本的新方法。这些副本和查询具有相同或接近重复的内容,它们的时空结构可能略有不同。为了同时解决高效和有效的问题,我们对视频特征表示进行了词袋模型,并采用了一种从粗到精的匹配方案来分析视频时空结构。该方法可以处理各种类型的视频变换,如裁剪、缩放、速度变化和子序列插入/删除等,这些都是现有方法无法很好地解决的问题。此外,采用了两种索引方法来加快匹配过程。实验结果表明,该方法能够高效地进行检测。
{"title":"Efficient and Effective Video Copy Detection Based on Spatiotemporal Analysis","authors":"Chih-Yi Chiu, Cheng-Chih Yang, Chu-Song Chen","doi":"10.1109/ISM.2007.38","DOIUrl":"https://doi.org/10.1109/ISM.2007.38","url":null,"abstract":"In this paper, a novel method is presented to detect video copies for a given video query. These copies and the query have identical or near-duplicate content, which might differ in their spatiotemporal structures slightly. To address both the efficient and effective issues, we conduct the bag-of words model for video feature representation, and apply a coarse-to-fine matching scheme to analyze the video spatiotemporal structure. The proposed method can deal with various kinds of video transformations, such as cropping, zooming, speed change, and subsequence insertion/deletion, which are not well addressed in existing methods. Besides, two indexing methods are employed to speed up the matching process. Experimental results show that the proposed method can behave in an efficient and effective manner.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116261495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Performance Analysis of Distributed Speech Recognition Services over Noisy 802.11b Wireless Networks 噪声802.11b无线网络下分布式语音识别业务性能分析
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.24
A. Rinotti, P. Demichelis, Juan Carlos De Martin
The performance of an AURORA-like distributed speech recognition system over IEEE 802.11 WLANs is studied. The recognition features are packetized and sent over an 802.11b network. At the receiver recognition is performed. Two different scenarios are simulated to analyze DSR performance in presence of losses due to either low received power or to network congestion. Varying recognizer complexities, packet lengths, number of concurrent flows, and signal power levels are considered in both scenarios. Experimental results on a connected digits task show that for low signal power levels, the best recognition performance is obtained when speech features are sent in small IP packets, while in the case of network congestion the best performance is obtained by increasing the packet size.
研究了一种基于IEEE 802.11无线局域网的类极光分布式语音识别系统的性能。识别特征被打包并通过802.11b网络发送。在接收端进行识别。本文模拟了两种不同的场景,以分析由于接收功率低或网络拥塞导致的损耗情况下的DSR性能。这两种情况都考虑了不同的识别器复杂性、数据包长度、并发流数量和信号功率水平。在数字连接任务上的实验结果表明,在信号功率较低的情况下,语音特征以小IP包的形式发送可以获得最佳的识别性能,而在网络拥塞的情况下,增大包的大小可以获得最佳的识别性能。
{"title":"Performance Analysis of Distributed Speech Recognition Services over Noisy 802.11b Wireless Networks","authors":"A. Rinotti, P. Demichelis, Juan Carlos De Martin","doi":"10.1109/ISM.2007.24","DOIUrl":"https://doi.org/10.1109/ISM.2007.24","url":null,"abstract":"The performance of an AURORA-like distributed speech recognition system over IEEE 802.11 WLANs is studied. The recognition features are packetized and sent over an 802.11b network. At the receiver recognition is performed. Two different scenarios are simulated to analyze DSR performance in presence of losses due to either low received power or to network congestion. Varying recognizer complexities, packet lengths, number of concurrent flows, and signal power levels are considered in both scenarios. Experimental results on a connected digits task show that for low signal power levels, the best recognition performance is obtained when speech features are sent in small IP packets, while in the case of network congestion the best performance is obtained by increasing the packet size.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121707419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Questions in Arabic Audio Monologues Using Prosodic Features 利用韵律特征检测阿拉伯语音频独白中的问题
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.37
O. Khan, W. Al-Khatib, L. Cheded
Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we investigate is that of identifying question sentences in Arabic monologue lectures. Languages other than Arabic have received a lot of attention in this regard. We approach this problem by first segmenting the sentences from the continuous speech using intensity and duration features. Prosodic features are, then, extracted from each sentence. These features are used as input to decision trees to classify each sentence into either question or non question sentence. Our results suggest that questions are cued by more than one type of prosodic features in natural Arabic speech. We used C4.5 decision trees for classification and achieved 75.7% accuracy. Feature specific analysis further reveals that energy and fundamental frequency features are mainly responsible for discriminating between questions and non-question sentences.
韵律学已广泛应用于许多与语音相关的应用,包括说话人和单词识别、情感和口音识别、主题和句子分割以及文本到语音的应用。我们研究的一个重要应用是识别阿拉伯语独白讲座中的疑问句。在这方面,阿拉伯语以外的语言得到了许多注意。我们首先通过使用强度和持续时间特征从连续语音中分割句子来解决这个问题。然后,从每个句子中提取韵律特征。这些特征被用作决策树的输入,将每个句子分类为疑问句或非疑问句。我们的研究结果表明,在自然的阿拉伯语语音中,问题是由不止一种韵律特征提示的。我们使用C4.5决策树进行分类,准确率达到75.7%。特征具体分析进一步发现,能量和基频特征主要负责疑问句和非疑问句的区分。
{"title":"Detection of Questions in Arabic Audio Monologues Using Prosodic Features","authors":"O. Khan, W. Al-Khatib, L. Cheded","doi":"10.1109/ISM.2007.37","DOIUrl":"https://doi.org/10.1109/ISM.2007.37","url":null,"abstract":"Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we investigate is that of identifying question sentences in Arabic monologue lectures. Languages other than Arabic have received a lot of attention in this regard. We approach this problem by first segmenting the sentences from the continuous speech using intensity and duration features. Prosodic features are, then, extracted from each sentence. These features are used as input to decision trees to classify each sentence into either question or non question sentence. Our results suggest that questions are cued by more than one type of prosodic features in natural Arabic speech. We used C4.5 decision trees for classification and achieved 75.7% accuracy. Feature specific analysis further reveals that energy and fundamental frequency features are mainly responsible for discriminating between questions and non-question sentences.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129024898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Joint Network and Rate Allocation for Video Streaming over Multiple Wireless Networks 多无线网络视频流的联合网络和速率分配
Pub Date : 2007-12-10 DOI: 10.1109/ISM.2007.31
D. Jurca, W. Kellerer, E. Steinbach, Shoaib Khan, Srisakul Thakolsri, P. Frossard
We address the problem of video streaming over multiple parallel networks. In the context of multiple users, accessing different types of applications, we are looking for efficient ways of allocating network resources and selecting network paths for each application, in order to maximize the overall systems performance. Our optimization joint problem consists of finding the appropriate application rate allocation and network parameters for each individual user, such that a universal system quality metric is maximized. A specific mapping between the requirements of each considered application and the overall quality metric is introduced, and our results are compared to other solutions based on throughput optimization strategies. The superiority and robustness of our approach is shown through extensive simulations in constant and dynamic systems, when clients can join/leave the access networks. Furthermore, we introduce heuristic algorithms which can obtain good results and are inexpensive in terms of computation and execution time.
我们解决了多个并行网络上的视频流问题。在多个用户访问不同类型应用程序的情况下,我们正在寻找分配网络资源和为每个应用程序选择网络路径的有效方法,以最大限度地提高整体系统性能。我们的优化联合问题包括为每个单独的用户找到适当的应用程序速率分配和网络参数,从而使通用系统质量度量最大化。介绍了每个考虑的应用程序的需求与总体质量度量之间的特定映射,并将我们的结果与基于吞吐量优化策略的其他解决方案进行了比较。当客户端可以加入/离开接入网络时,我们的方法的优越性和鲁棒性在恒定和动态系统中得到了广泛的模拟。此外,我们引入了启发式算法,可以获得良好的结果,并且在计算和执行时间方面成本低廉。
{"title":"Joint Network and Rate Allocation for Video Streaming over Multiple Wireless Networks","authors":"D. Jurca, W. Kellerer, E. Steinbach, Shoaib Khan, Srisakul Thakolsri, P. Frossard","doi":"10.1109/ISM.2007.31","DOIUrl":"https://doi.org/10.1109/ISM.2007.31","url":null,"abstract":"We address the problem of video streaming over multiple parallel networks. In the context of multiple users, accessing different types of applications, we are looking for efficient ways of allocating network resources and selecting network paths for each application, in order to maximize the overall systems performance. Our optimization joint problem consists of finding the appropriate application rate allocation and network parameters for each individual user, such that a universal system quality metric is maximized. A specific mapping between the requirements of each considered application and the overall quality metric is introduced, and our results are compared to other solutions based on throughput optimization strategies. The superiority and robustness of our approach is shown through extensive simulations in constant and dynamic systems, when clients can join/leave the access networks. Furthermore, we introduce heuristic algorithms which can obtain good results and are inexpensive in terms of computation and execution time.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123033203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
Ninth IEEE International Symposium on Multimedia (ISM 2007)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1