MULTIMEDIA '04最新文献

英文中文

Learning an image manifold for retrieval 学习图像歧管检索

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027532

Xiaofei He, Wei-Ying Ma, HongJiang Zhang

We consider the problem of learning a mapping function from low-level feature space to high-level semantic space. Under the assumption that the data lie on a submanifold embedded in a high dimensional Euclidean space, we propose a relevance feedback scheme which is naturally conducted only on the image manifold in question rather than the total ambient space. While images are typically represented by feature vectors in Rn, the natural distance is often different from the distance induced by the ambient space Rn. The geodesic distances on manifold are used to measure the similarities between images. However, when the number of data points is small, it is hard to discover the intrinsic manifold structure. Based on user interactions in a relevance feedback driven query-by-example system, the intrinsic similarities between images can be accurately estimated. We then develop an algorithmic framework to approximate the optimal mapping function by a Radial Basis Function (RBF) neural network. The semantics of a new image can be inferred by the RBF neural network. Experimental results show that our approach is effective in improving the performance of content-based image retrieval systems.

我们考虑了从低级特征空间到高级语义空间的映射函数的学习问题。假设数据位于嵌入在高维欧几里德空间中的子流形上，我们提出了一种相关反馈方案，该方案自然仅对所讨论的图像流形而不是整个环境空间进行相关反馈。虽然图像通常由Rn中的特征向量表示，但自然距离通常与环境空间Rn诱导的距离不同。用流形上的测地线距离来度量图像之间的相似度。然而，当数据点数量较少时，很难发现内在的流形结构。在关联反馈驱动的示例查询系统中，基于用户交互，可以准确地估计图像之间的内在相似性。然后，我们开发了一个算法框架，通过径向基函数(RBF)神经网络来近似最优映射函数。通过RBF神经网络可以推断新图像的语义。实验结果表明，该方法可以有效地提高基于内容的图像检索系统的性能。

引用次数: 196

The dawn at my back 黎明在我的背后

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027766

Carroll Parrott Blue

Interactive multimedia narrative in CD/DVD-ROMs and game design offers choice and non-linearity as communication signifiers. The viewer chooses where to go in an interactive design, viewing content in a non-traditional, non-linear fashion. However the content is totally controlled by the author, not the reader. The content is pre-determined while the viewer controls the viewing path in real time. The Dawn At My Back: Memoir of a Black Texas Upbringing outlines three parallel stories of a mother, a daughter, and a society by detailing racism's impact on each element. The concept is to design a combination Book/DVD-ROM/Website that allows the reader direct interaction with the narrative. This project expands on the idea that interactivity is a dialogue between viewer and story by encouraging the book's reader to become a DVD-ROM user and website co-author.

CD/ dvd - rom中的交互式多媒体叙事和游戏设计提供了选择和非线性作为交流的符号。在交互式设计中，观看者选择去哪里，以非传统的、非线性的方式观看内容。然而，内容完全由作者控制，而不是读者。内容是预先确定的，而观看者实时控制观看路径。《我背后的黎明:德克萨斯州黑人成长回忆录》通过详细描述种族主义对每个元素的影响，概述了三个关于母亲、女儿和社会的平行故事。这个概念是设计一本书/DVD-ROM/网站的组合，让读者直接与叙事互动。这个项目通过鼓励书的读者成为DVD-ROM用户和网站的共同作者，扩展了互动性是观众和故事之间的对话的想法。

引用次数: 2

LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics 抒情:自动同步的声音音乐信号和文字歌词

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027576

Ye Wang, Min-Yen Kan, T. Nwe, Arun Shenoy, Jun Yin

We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a multimodal approach, where the appropriate pairing of audio and text processing helps create a more accurate system. Our audio processing technique uses a combination of top-down and bottom-up approaches, combining the strength of low-level audio features and high-level musical knowledge to determine the hierarchical rhythm structure, singing voice and chorus sections in the musical audio. Text processing is also employed to approximate the length of the sung passages using the textual lyrics. Results show an average error of less than one bar for per-line alignment of the lyrics on a test bed of 20 songs (sampled from CD audio and carefully selected for variety). We perform holistic and per-component testing and analysis and outline steps for further development.

我们提出了一个原型，自动对齐声学音乐信号与其相应的文本歌词，以类似于手动对齐卡拉ok的方式。我们使用多模态方法来解决这个问题，其中音频和文本处理的适当配对有助于创建更准确的系统。我们的音频处理技术采用自上而下和自下而上相结合的方法，结合低级音频特征和高级音乐知识的力量来确定音乐音频中的分层节奏结构，唱腔和合唱部分。文本处理还用于使用文本歌词来近似演唱段落的长度。结果显示，在一个包含20首歌曲(从CD音频中采样并仔细选择以进行多样化)的测试平台上，歌词每行对齐的平均误差小于1小节。我们执行整体和每个组件的测试和分析，并概述进一步开发的步骤。

引用次数: 84

Privacy protecting data collection in media spaces 保护媒体空间中数据收集的隐私

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027537

Jehan Wickramasuriya, M. Datt, S. Mehrotra, N. Venkatasubramanian

Around the world as both crime and technology become more prevalent, officials find themselves relying more and more on video surveillance as a cure-all in the name of public safety. Used properly, video cameras help expose wrongdoing but typically come at the cost of privacy to those not involved in any maleficent activity. What if we could design intelligent systems that are more selective in what video they capture, and focus on anomalous events while protecting the privacy of authorized personnel? This paper proposes a novel way of combining sensor technology with traditional video surveillance in building a privacy protecting framework that exploits the strengths of these modalities and complements their individual limitations. Our fully functional system utilizes off the shelf sensor hardware (i.e. RFID, motion detection) for localization, and combines this with a XML-based policy framework for access control to determine violations within the space. This information is fused with video surveillance streams in order to make decisions about how to display the individuals being surveilled. To achieve this, we have implemented several video masking techniques that correspond to varying user privacy levels. These results were achievable in real-time at acceptable frame rates, while meeting our requirements for privacy preservation.

在世界各地，随着犯罪和科技变得越来越普遍，官员们发现自己越来越依赖视频监控，以公共安全的名义将其视为万灵药。如果使用得当，视频摄像机有助于揭露不法行为，但通常是以没有参与任何不法行为的人的隐私为代价的。如果我们能设计出智能系统，在捕捉视频时更有选择性，在关注异常事件的同时保护授权人员的隐私，那会怎么样?本文提出了一种将传感器技术与传统视频监控相结合的新方法，以建立一个隐私保护框架，利用这些模式的优势并补充其各自的局限性。我们的全功能系统利用现成的传感器硬件(即RFID，运动检测)进行定位，并将其与基于xml的访问控制策略框架相结合，以确定空间内的违规行为。这些信息与视频监控流融合在一起，以便决定如何显示被监控的个人。为了实现这一点，我们实现了几种对应于不同用户隐私级别的视频屏蔽技术。这些结果可以在可接受的帧速率下实时实现，同时满足我们对隐私保护的要求。

{"title":"Privacy protecting data collection in media spaces","authors":"Jehan Wickramasuriya, M. Datt, S. Mehrotra, N. Venkatasubramanian","doi":"10.1145/1027527.1027537","DOIUrl":"https://doi.org/10.1145/1027527.1027537","url":null,"abstract":"Around the world as both crime and technology become more prevalent, officials find themselves relying more and more on video surveillance as a cure-all in the name of public safety. Used properly, video cameras help expose wrongdoing but typically come at the cost of privacy to those not involved in any maleficent activity. What if we could design intelligent systems that are more selective in what video they capture, and focus on anomalous events while protecting the privacy of authorized personnel? This paper proposes a novel way of combining sensor technology with traditional video surveillance in building a privacy protecting framework that exploits the strengths of these modalities and complements their individual limitations. Our fully functional system utilizes off the shelf sensor hardware (i.e. RFID, motion detection) for localization, and combines this with a XML-based policy framework for access control to determine violations within the space. This information is fused with video surveillance streams in order to make decisions about how to display the individuals being surveilled. To achieve this, we have implemented several video masking techniques that correspond to varying user privacy levels. These results were achievable in real-time at acceptable frame rates, while meeting our requirements for privacy preservation.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116794797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 171

Thematic segmentation of meetings through document/speech alignment 通过文件/演讲对齐对会议进行专题分割

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027714

Dalila Mekhaldi, D. Lalanne, R. Ingold

This article proposes a multimodal approach for segmenting meeting recordings. This bi-modal method takes advantages of the alignment of speech transcript with documents, in the context of meetings or lectures, where documents are discussed. The method first displays the alignment results as a set of nodes in a 2D space, where the two axes represent respectively the documents content and the speech transcript. The most connected regions in this graph are detected using a clustering method. The final clusters are then projected on the speech axis. Finally, the obtained sequence of segments is considered as the thematic structure of the speech transcript. In this article, we present our bi-modal method and compare it with two other mono-modal thematic segmentation methods.

本文提出了一种多模态方法来分割会议记录。在讨论文档的会议或讲座的上下文中，这种双模式方法利用了语音抄本与文档对齐的优势。该方法首先将对齐结果显示为二维空间中的一组节点，其中两个轴分别表示文档内容和语音文本。使用聚类方法检测图中连接最多的区域。最后的簇被投射到语音轴上。最后，将得到的片段序列作为语音文本的主题结构。在本文中，我们提出了我们的双模态方法，并将其与其他两种单模态主题分割方法进行了比较。

引用次数: 11

Probabilistic delay guarantees using delay distribution measurement 使用延迟分布测量的概率延迟保证

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027734

Kartik Gopalan, T. Chiueh, Yow-Jian Lin

Carriers increasingly differentiate their wide-area connectivity offerings by means of customized services, such as virtual private networks (VPN) with Quality of Service (QoS) guarantees, or QVPNs. The key challenge faced by carriers is to maximize the number of QVPNs admitted by exploiting the statistical multiplexing nature of input traffic. While existing measurement-based admission control algorithms utilize statistical multiplexing along the bandwidth dimension, they do not satisfactorily exploit statistical multiplexing along the delay dimension to guarantee distinct per-QVPN delay bounds. This paper presents Delay Distribution Measurement (DDM) based admission control algorithm, the first measurement-based approach that effectively exploits statistical multiplexing along the delay dimension. In other words, DDM exploits the well known fact that the actual delay experienced by most packets of a QVPN is usually far smaller than its worst-case delay bound requirement since multiple QVPNs rarely send traffic bursts at the same time. Additionally, DDM supports QVPNs with distinct probabilistic delay guarantees -- QVPNs that can tolerate more delay violations can reserve fewer resource than those that tolerate less, even though they require the same delay bound. A comprehensive performance evaluation using Voice over IP traces shows that, when compared to deterministic admission control, DDM can potentially increase the number of admitted QVPNs (and link utilization) by up to a factor of 3.0 even when the delay violation probability is as small as 10^-5.

运营商越来越多地通过定制服务来区分其广域连接产品，例如具有服务质量(QoS)保证的虚拟专用网(VPN)或qvpn。运营商面临的主要挑战是通过利用输入流量的统计复用特性来最大化允许的qvpn数量。虽然现有的基于测量的接纳控制算法利用沿带宽维度的统计多路复用，但它们不能令人满意地利用沿延迟维度的统计多路复用来保证不同的每qvpn延迟界限。本文提出了基于延迟分布测量(DDM)的允许控制算法，这是第一个基于测量的方法，可以有效地利用沿延迟维的统计复用。换句话说，DDM利用了一个众所周知的事实，即QVPN的大多数数据包所经历的实际延迟通常远远小于其最坏情况的延迟绑定需求，因为多个QVPN很少同时发送流量突发。此外，DDM支持具有不同概率延迟保证的qvpn——能够容忍更多延迟违规的qvpn可以比那些容忍更少延迟违规的qvpn保留更少的资源，即使它们需要相同的延迟界限。使用IP语音跟踪进行的综合性能评估表明，与确定性准入控制相比，即使延迟违规概率小至10-5,DDM也可以将允许的qvpn数量(和链路利用率)潜在地增加到3.0倍。

{"title":"Probabilistic delay guarantees using delay distribution measurement","authors":"Kartik Gopalan, T. Chiueh, Yow-Jian Lin","doi":"10.1145/1027527.1027734","DOIUrl":"https://doi.org/10.1145/1027527.1027734","url":null,"abstract":"Carriers increasingly differentiate their wide-area connectivity offerings by means of customized services, such as virtual private networks (VPN) with Quality of Service (QoS) guarantees, or QVPNs. The key challenge faced by carriers is to maximize the number of QVPNs admitted by exploiting the statistical multiplexing nature of input traffic. While existing measurement-based admission control algorithms utilize statistical multiplexing along the bandwidth dimension, they do not satisfactorily exploit statistical multiplexing along the delay dimension to guarantee distinct per-QVPN delay bounds. This paper presents Delay Distribution Measurement (DDM) based admission control algorithm, the first measurement-based approach that effectively exploits statistical multiplexing along the delay dimension. In other words, DDM exploits the well known fact that the actual delay experienced by most packets of a QVPN is usually far smaller than its worst-case delay bound requirement since multiple QVPNs rarely send traffic bursts at the same time. Additionally, DDM supports QVPNs with distinct probabilistic delay guarantees -- QVPNs that can tolerate more delay violations can reserve fewer resource than those that tolerate less, even though they require the same delay bound. A comprehensive performance evaluation using Voice over IP traces shows that, when compared to deterministic admission control, DDM can potentially increase the number of admitted QVPNs (and link utilization) by up to a factor of 3.0 even when the delay violation probability is as small as 10-5.","PeriodicalId":292207,"journal":{"name":"MULTIMEDIA '04","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125079024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Advanced user interfaces for dynamic video browsing 先进的用户界面动态视频浏览

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027694

Wolfgang Hürst, Georg Götz, Philipp Jarvers

In this demonstration we present three interface designs which enable users to visually browse video data by moving a slider thumb along the timeline. In such a case, scrolling granularity is usually limited because of the fixed length of the corresponding slider. In contrast, our interaction designs enable users to skim the data at different granularity levels by providing the possibility to continuously change the slider's scale, by using a nonlinear scale, and by enabling interactive manipulation of the scrolling speed, respectively.

在这个演示中，我们展示了三种界面设计，使用户能够通过沿着时间轴移动滑块拇指来直观地浏览视频数据。在这种情况下，滚动粒度通常受到限制，因为相应滑块的长度是固定的。相比之下，我们的交互设计使用户能够在不同的粒度级别上浏览数据，方法是分别提供连续改变滑块大小的可能性，使用非线性大小，以及通过交互式操作滚动速度。

引用次数: 27

Interactive tele-journalism: low cost, live, interactive television news production 互动电视新闻:低成本、现场直播、互动电视新闻制作

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027562

S. Every

The rise of the internet and the increasing availability of low cost means to create digital media have created an environment and an appetite in the audience for meaningful interaction with mass media. Television news is an area that holds great potential for community based programming and can be made to allow the audience a direct role in the production of such programming. The focus of this project has been to develop a working prototype of a system to support the live production of low cost, community orientated, interactive television news programs in which the audience has direct and immediate influence over the programming.

互联网的兴起以及创造数字媒体的低成本手段的日益普及，为受众创造了一种与大众媒体进行有意义互动的环境和兴趣。电视新闻是一个在以社区为基础的节目制作方面具有巨大潜力的领域，可以使观众直接参与这种节目的制作。该项目的重点是开发一个系统的工作原型，以支持低成本、面向社区的互动电视新闻节目的现场制作，在这些节目中，观众对节目有直接和即时的影响。

引用次数: 1

Effective automatic image annotation via a coherent language model and active learning 通过连贯的语言模型和主动学习有效的自动图像标注

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027732

Rong Jin, J. Chai, Luo Si

Image annotations allow users to access a large image database with textual queries. There have been several studies on automatic image annotation utilizing machine learning techniques, which automatically learn statistical models from annotated images and apply them to generate annotations for unseen images. One common problem shared by most previous learning approaches for automatic image annotation is that each annotated word is predicated for an image independently from other annotated words. In this paper, we proposed a coherent language model for automatic image annotation that takes into account the word-to-word correlation by estimating a coherent language model for an image. This new approach has two important advantages: 1) it is able to automatically determine the annotation length to improve the accuracy of retrieval results, and 2) it can be used with active learning to significantly reduce the required number of annotated image examples. Empirical studies with Corel dataset are presented to show the effectiveness of the coherent language model for automatic image annotation.

图像注释允许用户通过文本查询访问大型图像数据库。目前已有一些利用机器学习技术进行自动图像标注的研究，机器学习技术可以从标注的图像中自动学习统计模型，并将其应用于未见图像的标注。大多数以前的自动图像注释学习方法所共有的一个共同问题是，每个注释单词都是独立于其他注释单词的图像谓词。在本文中，我们提出了一种用于自动图像注释的连贯语言模型，该模型通过估计图像的连贯语言模型来考虑词与词之间的相关性。该方法具有两个重要的优点:1)能够自动确定标注长度，提高检索结果的准确性;2)可以与主动学习结合使用，显著减少标注图像样本所需的数量。以Corel数据集为例，验证了相干语言模型用于图像自动标注的有效性。

引用次数: 153

Challenges of networked media: integrating the navigational features of browsing histories and media playlists into a media browser 网络媒体的挑战:将浏览历史和媒体播放列表的导航功能集成到媒体浏览器中

MULTIMEDIA '04

Pub Date : 2004-10-10 DOI: 10.1145/1027527.1027643

André T. H. Pang, C. Parker, S. Pfeiffer

One of the goals of the Continuous Media Web project¹ is to integrate digital media with the World Wide Web: media documents can hyperlink to and from other documents in the same way that HTML pages do. The dual capabilities of hyperlinking (1) to other documents while viewing a media clip, and (2) into precise time intervals in a media clip, enable greatly improved user interaction with media. We discuss the idea of a novel media browser application, which merges the concept of a traditional media player that presents video and audio to the user, with a Web browser that provides hyperlinking and navigation between networked (media) documents. The particular issue we address in this article concerns the primary navigational features: a media player relies on a playlist while a Web browser uses a browsing history for navigation. We discuss design and user interface issues that arise when integrating these two navigational features in a media browser.

持续媒体网络项目的目标之一是将数字媒体与万维网集成在一起:媒体文档可以像HTML页面那样与其他文档进行超链接。超链接(1)在观看媒体剪辑时链接到其他文档，以及(2)在媒体剪辑中链接到精确的时间间隔，这两种功能极大地改善了用户与媒体的交互。我们将讨论一种新型媒体浏览器应用程序的概念，它将向用户呈现视频和音频的传统媒体播放器的概念与在网络(媒体)文档之间提供超链接和导航的Web浏览器相结合。我们在本文中解决的特殊问题涉及主要的导航功能:媒体播放器依赖于播放列表，而Web浏览器使用浏览历史记录进行导航。我们将讨论在媒体浏览器中集成这两个导航特性时出现的设计和用户界面问题。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

MULTIMEDIA '04

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀