首页 > 最新文献

2015 IEEE International Symposium on Multimedia (ISM)最新文献

英文 中文
UStream: Ultra-Metric Spanning Overlay Topology for Peer-to-Peer Streaming Systems UStream:点对点流系统的超度量跨越覆盖拓扑
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.82
O. Ojo, A. Oluwatope, Olufemi Ogunsola
The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.
在过去的十年里,由于视频的分散,互联网流量急剧增加。尽管有这样的增长,但点对点(P2P)流媒体系统的用户体验质量(QoE)仍无法与传统的电视服务相提并论。P2P流媒体系统的部署受长时延、计划外中断、flash人群、高流失率以及覆盖结构选择等因素的影响。覆盖结构在保证流量动态、公平地分配到所有物理链路上起着重要的作用,其中基于树的(TB)和基于网格的(MB)是最受欢迎的。TB在父节点失败的情况下失败,这可能导致系统的完全崩溃,而MB由于其非结构化模式,更容易受到闪电人群和高流失率的影响。本文提出了一种新的P2P流拓扑(UStream),使用TB和MB的混合来解决这两种拓扑的缺点,以确保最佳解决方案。此外,UStream还采用了超度量树的特性来保证从根对等体到任何一个子对等体所花费的时间相等,并使用生成树来监控任意时间点的所有对等体。Ustream还采用了混沌理论的原理。现在的同伴决定未来,尽管近似的现在并不能近似地决定未来。Ustream是用数学理论形式化的。在验证该拓扑时,提出并证明了几个定理。
{"title":"UStream: Ultra-Metric Spanning Overlay Topology for Peer-to-Peer Streaming Systems","authors":"O. Ojo, A. Oluwatope, Olufemi Ogunsola","doi":"10.1109/ISM.2015.82","DOIUrl":"https://doi.org/10.1109/ISM.2015.82","url":null,"abstract":"The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128677389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Dynamic Alpha Congestion Controller for WebRTC webbrtc的动态Alpha拥塞控制器
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.63
R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab
Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.
视频会议应用程序极大地改变了人们在互联网上的交流方式。万维网联盟(W3C)和互联网工程任务组(IETF)工作组起草的网络实时通信(WebRTC)为网络浏览器增加了新的功能,允许在浏览器之间进行音频/视频通话,而无需安装任何视频电话应用程序。Google拥塞控制(GCC)算法已被提出作为WebRTC的拥塞控制机制,但由于使用固定的传入速率降低因子(称为alpha (a)),其性能受到限制。在本文中,我们提出了一个动态alpha模型,以减少过度使用期间可用的接收带宽估计,如过度使用检测器所示。使用我们的特定测试平台进行的实验表明,与固定alpha模型相比,我们提出的模型实现了33%的高传入率和16%的低往返时间,同时保持了相似的丢包率和视频质量。
{"title":"A Dynamic Alpha Congestion Controller for WebRTC","authors":"R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab","doi":"10.1109/ISM.2015.63","DOIUrl":"https://doi.org/10.1109/ISM.2015.63","url":null,"abstract":"Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115476782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Feature Level Fusion for Bimodal Facial Action Unit Recognition 双峰面部动作单元识别的特征级融合
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.116
Zibo Meng, Shizhong Han, Min Chen, Yan Tong
Recognizing facial actions from spontaneous facial displays suffers from subtle and complex facial deformation, frequent head movements, and partial occlusions. It is especially challenging when the facial activities are accompanied with speech. Instead of employing information solely from the visual channel, this paper presents a novel fusion framework, which exploits information from both visual and audio channels in recognizing speech-related facial action units (AUs). In particular, features are first extracted from visual and audio channels, independently. Then, the audio features are aligned with the visual features in order to handle the difference in time scales and the time shift between the two signals. Finally, these aligned audio and visual features are integrated via a feature-level fusion framework and utilized in recognizing AUs. Experimental results on a new audiovisual AU-coded dataset have demonstrated that the proposed feature-level fusion framework outperforms a state-of-the-art visual-based method in recognizing speech-related AUs, especially for those AUs that are "invisible" in the visual channel during speech. The improvement is more impressive with occlusions on the facial images, which, fortunately, would not affect the audio channel.
从自发的面部表情中识别面部动作受到微妙而复杂的面部变形、频繁的头部运动和部分闭塞的影响。当面部活动伴随着语言时,这尤其具有挑战性。本文提出了一种新的融合框架,该框架利用视觉和音频通道的信息来识别语音相关的面部动作单元。特别是,首先从视觉和音频通道中独立提取特征。然后,将音频特征与视觉特征对齐,以处理两个信号之间的时间尺度差异和时移。最后,通过特征级融合框架将这些对齐的音频和视觉特征集成并用于识别AUs。在一个新的视听au编码数据集上的实验结果表明,所提出的特征级融合框架在识别语音相关au方面优于最先进的基于视觉的方法,特别是对于那些在语音过程中视觉通道中“不可见”的au。在面部图像遮挡的情况下,这种改进更令人印象深刻,幸运的是,这不会影响音频通道。
{"title":"Feature Level Fusion for Bimodal Facial Action Unit Recognition","authors":"Zibo Meng, Shizhong Han, Min Chen, Yan Tong","doi":"10.1109/ISM.2015.116","DOIUrl":"https://doi.org/10.1109/ISM.2015.116","url":null,"abstract":"Recognizing facial actions from spontaneous facial displays suffers from subtle and complex facial deformation, frequent head movements, and partial occlusions. It is especially challenging when the facial activities are accompanied with speech. Instead of employing information solely from the visual channel, this paper presents a novel fusion framework, which exploits information from both visual and audio channels in recognizing speech-related facial action units (AUs). In particular, features are first extracted from visual and audio channels, independently. Then, the audio features are aligned with the visual features in order to handle the difference in time scales and the time shift between the two signals. Finally, these aligned audio and visual features are integrated via a feature-level fusion framework and utilized in recognizing AUs. Experimental results on a new audiovisual AU-coded dataset have demonstrated that the proposed feature-level fusion framework outperforms a state-of-the-art visual-based method in recognizing speech-related AUs, especially for those AUs that are \"invisible\" in the visual channel during speech. The improvement is more impressive with occlusions on the facial images, which, fortunately, would not affect the audio channel.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Novel Semi-Supervised Dimensionality Reduction Framework for Multi-manifold Learning 一种新的多流形学习半监督降维框架
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.73
Xin Guo, Tie Yun, L. Qi, L. Guan
In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.
在模式识别中,由于多个类别的数据并不在一个流形上,传统的单流形假设很难保证最佳的分类性能。当数据集包含多个类并且类的结构不同时,假设每个类位于特定的流形上是更合理的。本文提出了一种用于多流形学习的半监督降维框架。在此框架下,导出了学习数据集中多个类对应的多个流形的方法,包括标记和未标记的示例。在构造邻域图时,引入了一种基于稀疏流形聚类的相似图构造方法,使未标记点与同一流形上的其他点连接起来。实验结果验证了该框架的优越性和有效性。
{"title":"A Novel Semi-Supervised Dimensionality Reduction Framework for Multi-manifold Learning","authors":"Xin Guo, Tie Yun, L. Qi, L. Guan","doi":"10.1109/ISM.2015.73","DOIUrl":"https://doi.org/10.1109/ISM.2015.73","url":null,"abstract":"In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122012363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Stepped-Line Text Layout with Phrased Segmentation for Readability Improvement of Japanese Electronic Text 基于分段的日文电子文本步行排版提高可读性
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.87
Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima
We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.
为了优化阅读位置,提高阅读效率,我们提出了一种新的电子文本格式,采用步进线布局。一般来说,在阅读日语文本时,读者的眼睛会试图关注每一个短语。到目前为止,还没有一种方法可以优化阅读时的固定位置。在英语等有行距的文本中,空格字符为眼球运动提供了边界信息,但在日语文本中,在短语之间插入空格会降低阅读速度。在本报告中提出的新的步进行文本格式中,文本行被分割并在短语之间步进,并且短语之间存在换行符。为了评估步行线布局对阅读效率的影响,我们测量了新步行线布局和传统直线布局的阅读速度和眼动。与直线布局相比,新的步进线布局的阅读速度大约快13%,而步进线布局的注视点数量大约比直线布局少11%。这主要是通过减少回归次数和增加前向扫视长度来实现的。此外,91%的参与者在阅读时没有出现难以辨认或不一致的现象,这表明在不增加认知负荷的情况下,步进线布局是一种提高阅读时眼球运动效率的新技术。
{"title":"Stepped-Line Text Layout with Phrased Segmentation for Readability Improvement of Japanese Electronic Text","authors":"Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima","doi":"10.1109/ISM.2015.87","DOIUrl":"https://doi.org/10.1109/ISM.2015.87","url":null,"abstract":"We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127156331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Network Adaptive Textured Mesh Generation for Collaborative 3D Tele-Immersion 协同三维远程沉浸的网络自适应纹理网格生成
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.111
Kevin Desai, K. Bahirat, S. Raghuraman, B. Prabhakaran
3D Tele-Immersion (3DTI) has emerged as an efficient environment for virtual interactions and collaborations in a variety of fields like rehabilitation, education, gaming, etc. In 3DTI, geographically distributed users are captured using multiple cameras and immersed in a single virtual environment. The quality of experience depends on the available network bandwidth, quality of the 3D model generated and the time taken for rendering. In a collaborative environment, achieving high quality, high frame rate rendering by transmitting data to multiple sites having different bandwidth is challenging. In this paper we introduce a network adaptive textured mesh generation scheme to transmit varying quality data based on the available bandwidth. To reduce the volume of information transmitted, a visual quality based vertex selection approach is used to generate a sparse representation of the user. This sparse representation is then transmitted to the receiver side where a sweep-line based technique is used to generate a 3D mesh of the user. High visual quality is maintained by transmitting a high resolution texture image compressed using a lossy compression algorithm. In our studies users were unable to notice visual quality variations of the rendered 3D model even at 90% compression.
3D远程沉浸(3DTI)已经成为一种有效的虚拟交互和协作环境,应用于康复、教育、游戏等各个领域。在3DTI中,使用多个摄像机捕获地理分布的用户,并沉浸在单个虚拟环境中。体验的质量取决于可用的网络带宽、生成的3D模型的质量和渲染所花费的时间。在协作环境中,通过将数据传输到具有不同带宽的多个站点来实现高质量、高帧率的渲染是具有挑战性的。本文介绍了一种基于可用带宽的网络自适应纹理网格生成方案来传输不同质量的数据。为了减少传输的信息量,使用基于视觉质量的顶点选择方法来生成用户的稀疏表示。然后将这种稀疏表示传输到接收端,在接收端使用基于扫描线的技术生成用户的3D网格。通过传输使用有损压缩算法压缩的高分辨率纹理图像来保持高视觉质量。在我们的研究中,即使在90%的压缩下,用户也无法注意到渲染3D模型的视觉质量变化。
{"title":"Network Adaptive Textured Mesh Generation for Collaborative 3D Tele-Immersion","authors":"Kevin Desai, K. Bahirat, S. Raghuraman, B. Prabhakaran","doi":"10.1109/ISM.2015.111","DOIUrl":"https://doi.org/10.1109/ISM.2015.111","url":null,"abstract":"3D Tele-Immersion (3DTI) has emerged as an efficient environment for virtual interactions and collaborations in a variety of fields like rehabilitation, education, gaming, etc. In 3DTI, geographically distributed users are captured using multiple cameras and immersed in a single virtual environment. The quality of experience depends on the available network bandwidth, quality of the 3D model generated and the time taken for rendering. In a collaborative environment, achieving high quality, high frame rate rendering by transmitting data to multiple sites having different bandwidth is challenging. In this paper we introduce a network adaptive textured mesh generation scheme to transmit varying quality data based on the available bandwidth. To reduce the volume of information transmitted, a visual quality based vertex selection approach is used to generate a sparse representation of the user. This sparse representation is then transmitted to the receiver side where a sweep-line based technique is used to generate a 3D mesh of the user. High visual quality is maintained by transmitting a high resolution texture image compressed using a lossy compression algorithm. In our studies users were unable to notice visual quality variations of the rendered 3D model even at 90% compression.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ExploratoryVideoSearch: A Music Video Search System Based on Coordinate Terms and Diversification ExploratoryVideoSearch:一个基于坐标项和多样化的音乐视频搜索系统
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.99
Kosetsu Tsukuda, Masataka Goto
Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.
许多人在视频分享网站上搜索和观看音乐视频。尽管上传的视频种类繁多,但目前视频分享网站上的搜索系统允许用户搜索有限范围的音乐视频以输入查询。特别是当用户对查询没有足够的了解时,问题会变得更糟,因为用户无法通过更改查询或向原始查询添加一些关键字来定制范围。本文提出了一个具有坐标词感知和多样性感知的音乐视频搜索系统ExploratoryVideoSearch。我们的系统以艺人姓名查询为重点,在YouTube上搜索视频,有两个新颖的功能:(1)给定艺人姓名查询,系统显示该艺人及其坐标项的搜索结果;(2)系统对该查询及其坐标项的搜索结果进行多样化,并允许用户交互更改多样化程度。坐标词是利用百万歌曲数据集(Million Song Dataset)和维基百科(Wikipedia)获得的,搜索结果是根据YouTube音乐视频附加的标签进行多样化的。ExploratoryVideoSearch使用户能够搜索各种各样的音乐视频,而不需要对查询有深入的了解。
{"title":"ExploratoryVideoSearch: A Music Video Search System Based on Coordinate Terms and Diversification","authors":"Kosetsu Tsukuda, Masataka Goto","doi":"10.1109/ISM.2015.99","DOIUrl":"https://doi.org/10.1109/ISM.2015.99","url":null,"abstract":"Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic Correction of Programming Exercises in ViPLab ViPLab中编程练习的自动校正
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.69
T. Richter, Jan Vanvinkenroye
We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.
我们描述了ILIAS学习管理系统(LMS)的ViPLab插件,它为学生提供了一个虚拟编程实验室,因此允许他们在不离开浏览器的情况下运行编程练习。特别是,本文介绍了该系统的一个新组件,它允许自动纠正编程练习,从而大大简化了大一课程中编程课程的实现。
{"title":"Automatic Correction of Programming Exercises in ViPLab","authors":"T. Richter, Jan Vanvinkenroye","doi":"10.1109/ISM.2015.69","DOIUrl":"https://doi.org/10.1109/ISM.2015.69","url":null,"abstract":"We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation 多模态词义消歧的概率集成融合
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.35
Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang
With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.
随着互联网上丰富的多媒体数据的出现,多模态机器学习已经成为利用不同模态数据的研究热点。目前的方法主要集中在开发模型来融合来自多个模态的底层特征,并从不同的模态中学习统一的表示。但大多数相关工作未能证明为什么我们应该使用多模态数据和多模态融合,很少利用不同模态之间的互补关系。在本文中,我们首先确定了多模态之间的相关和互补关系。然后,我们提出了一个概率集成融合模型来捕获两种模式(图像和文本)之间的互补关系。在UIUC-ISD数据集上的实验结果表明,我们的集成方法优于仅使用单一模态的方法。词义消歧(WSD)是我们研究的用例,以证明我们的概率集成融合模型的有效性。
{"title":"Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation","authors":"Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang","doi":"10.1109/ISM.2015.35","DOIUrl":"https://doi.org/10.1109/ISM.2015.35","url":null,"abstract":"With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130536781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Democratizing Optometric Care: A Vision-Based, Data-Driven Approach to Automatic Refractive Error Measurement for Vision Screening 民主化验光护理:一种基于视力,数据驱动的方法,用于视力筛查的自动屈光误差测量
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.55
Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do
We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.
我们提出了一种基于视觉,数据驱动的方法来识别和测量人类受试者的屈光不正,使用低成本,易于获得的设备,无需专业培训。视力问题,例如屈光不正(例如近视、散光等)是常见的眼部问题,如不加以矫正,可能会导致严重的视力损害。诊断这些缺陷通常需要昂贵的专业设备和训练有素的人员,这在许多发展中国家是一个障碍。我们的方法旨在通过利用消费级设备固有的计算能力和多媒体计算所带来的进步,使验光护理民主化。我们展示的结果表明,在某些条件下,我们的系统能够匹配并优于最先进的医疗设备。
{"title":"Democratizing Optometric Care: A Vision-Based, Data-Driven Approach to Automatic Refractive Error Measurement for Vision Screening","authors":"Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do","doi":"10.1109/ISM.2015.55","DOIUrl":"https://doi.org/10.1109/ISM.2015.55","url":null,"abstract":"We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130795839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2015 IEEE International Symposium on Multimedia (ISM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1