The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.
{"title":"UStream: Ultra-Metric Spanning Overlay Topology for Peer-to-Peer Streaming Systems","authors":"O. Ojo, A. Oluwatope, Olufemi Ogunsola","doi":"10.1109/ISM.2015.82","DOIUrl":"https://doi.org/10.1109/ISM.2015.82","url":null,"abstract":"The last decade has seen a sharp increase in Internet traffic due to dispersion of videos. Despite this growth, user's quality of experience (QoE) in peer-to-peer (P2P) streaming systems does not match the conventional television service. The deployment of P2P streaming system is affected by long delay, unplanned interruption, flash crowd, high churn situation and choice of overlay structure. The overlay structure plays significant role in ensuring that traffic are distributed to all physical links in a dynamic and fair manner, tree-based (TB) and mesh-based (MB) are the most popular. TB fails in situations where there is failure at the parent peer which can lead to total collapse of the system while MB is more vulnerable to flash crowd and high churn situation due to its unstructured pattern. This paper presents a novel P2P streaming topology (UStream), using a hybrid of TB and MB to address the disadvantages of both topologies to ensure an optimal solution. Furthermore, UStream adopts the features of ultra-metric tree to ensure that the time taken from the root peer to any of the children's peer are equal and the spanning tree to monitor all the peers at any point in time. Ustream also employs the principle of chaos theory. The present peer determines the future, though the approximate present does not approximately determines the future. Ustream was formalized using mathematical theories. Several theorems were proposed and proved in validating this topology.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128677389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab
Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.
{"title":"A Dynamic Alpha Congestion Controller for WebRTC","authors":"R. Atwah, Razib Iqbal, S. Shirmohammadi, A. Javadtalab","doi":"10.1109/ISM.2015.63","DOIUrl":"https://doi.org/10.1109/ISM.2015.63","url":null,"abstract":"Video conferencing applications have significantly changed the way people communicate over the Internet. Web Real-Time Communication (WebRTC), drafted by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) working groups, has added new functionality to the web browsers, allowing audio/video calls between browsers without the need to install any video telephony applications. The Google Congestion Control (GCC) algorithm has been proposed as WebRTC's congestion control mechanism, but its performance is limited due to using a fixed incoming rate decrease factor, known as alpha (a). In this paper, we propose a dynamic alpha model to reduce the available receiving bandwidth estimate during overuse as indicated by the over-use detector. Experiments using our specific testbed show that our proposed model achieves a 33% higher incoming rate and a 16% lower round-trip time, while keeping a similar packet loss rate and video quality, compared to a fixed alpha model.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115476782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recognizing facial actions from spontaneous facial displays suffers from subtle and complex facial deformation, frequent head movements, and partial occlusions. It is especially challenging when the facial activities are accompanied with speech. Instead of employing information solely from the visual channel, this paper presents a novel fusion framework, which exploits information from both visual and audio channels in recognizing speech-related facial action units (AUs). In particular, features are first extracted from visual and audio channels, independently. Then, the audio features are aligned with the visual features in order to handle the difference in time scales and the time shift between the two signals. Finally, these aligned audio and visual features are integrated via a feature-level fusion framework and utilized in recognizing AUs. Experimental results on a new audiovisual AU-coded dataset have demonstrated that the proposed feature-level fusion framework outperforms a state-of-the-art visual-based method in recognizing speech-related AUs, especially for those AUs that are "invisible" in the visual channel during speech. The improvement is more impressive with occlusions on the facial images, which, fortunately, would not affect the audio channel.
{"title":"Feature Level Fusion for Bimodal Facial Action Unit Recognition","authors":"Zibo Meng, Shizhong Han, Min Chen, Yan Tong","doi":"10.1109/ISM.2015.116","DOIUrl":"https://doi.org/10.1109/ISM.2015.116","url":null,"abstract":"Recognizing facial actions from spontaneous facial displays suffers from subtle and complex facial deformation, frequent head movements, and partial occlusions. It is especially challenging when the facial activities are accompanied with speech. Instead of employing information solely from the visual channel, this paper presents a novel fusion framework, which exploits information from both visual and audio channels in recognizing speech-related facial action units (AUs). In particular, features are first extracted from visual and audio channels, independently. Then, the audio features are aligned with the visual features in order to handle the difference in time scales and the time shift between the two signals. Finally, these aligned audio and visual features are integrated via a feature-level fusion framework and utilized in recognizing AUs. Experimental results on a new audiovisual AU-coded dataset have demonstrated that the proposed feature-level fusion framework outperforms a state-of-the-art visual-based method in recognizing speech-related AUs, especially for those AUs that are \"invisible\" in the visual channel during speech. The improvement is more impressive with occlusions on the facial images, which, fortunately, would not affect the audio channel.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.
{"title":"A Novel Semi-Supervised Dimensionality Reduction Framework for Multi-manifold Learning","authors":"Xin Guo, Tie Yun, L. Qi, L. Guan","doi":"10.1109/ISM.2015.73","DOIUrl":"https://doi.org/10.1109/ISM.2015.73","url":null,"abstract":"In pattern recognition, traditional single manifold assumption can hardly guarantee the best classification performance, since the data from multiple classes does not lie on a single manifold. When the dataset contains multiple classes and the structure of the classes are different, it is more reasonable to assume each class lies on a particular manifold. In this paper, we propose a novel framework of semi-supervised dimensionality reduction for multi-manifold learning. Within this framework, methods are derived to learn multiple manifold corresponding to multiple classes in a data set, including both the labeled and unlabeled examples. In order to connect each unlabeled point to the other points from the same manifold, a similarity graph construction, based on sparse manifold clustering, is introduced when constructing the neighbourhood graph. Experimental results verify the advantages and effectiveness of this new framework.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122012363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima
We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.
{"title":"Stepped-Line Text Layout with Phrased Segmentation for Readability Improvement of Japanese Electronic Text","authors":"Jumpei Kobayashi, Takashi Sekiguchi, E. Shinbori, T. Kawashima","doi":"10.1109/ISM.2015.87","DOIUrl":"https://doi.org/10.1109/ISM.2015.87","url":null,"abstract":"We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127156331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Desai, K. Bahirat, S. Raghuraman, B. Prabhakaran
3D Tele-Immersion (3DTI) has emerged as an efficient environment for virtual interactions and collaborations in a variety of fields like rehabilitation, education, gaming, etc. In 3DTI, geographically distributed users are captured using multiple cameras and immersed in a single virtual environment. The quality of experience depends on the available network bandwidth, quality of the 3D model generated and the time taken for rendering. In a collaborative environment, achieving high quality, high frame rate rendering by transmitting data to multiple sites having different bandwidth is challenging. In this paper we introduce a network adaptive textured mesh generation scheme to transmit varying quality data based on the available bandwidth. To reduce the volume of information transmitted, a visual quality based vertex selection approach is used to generate a sparse representation of the user. This sparse representation is then transmitted to the receiver side where a sweep-line based technique is used to generate a 3D mesh of the user. High visual quality is maintained by transmitting a high resolution texture image compressed using a lossy compression algorithm. In our studies users were unable to notice visual quality variations of the rendered 3D model even at 90% compression.
{"title":"Network Adaptive Textured Mesh Generation for Collaborative 3D Tele-Immersion","authors":"Kevin Desai, K. Bahirat, S. Raghuraman, B. Prabhakaran","doi":"10.1109/ISM.2015.111","DOIUrl":"https://doi.org/10.1109/ISM.2015.111","url":null,"abstract":"3D Tele-Immersion (3DTI) has emerged as an efficient environment for virtual interactions and collaborations in a variety of fields like rehabilitation, education, gaming, etc. In 3DTI, geographically distributed users are captured using multiple cameras and immersed in a single virtual environment. The quality of experience depends on the available network bandwidth, quality of the 3D model generated and the time taken for rendering. In a collaborative environment, achieving high quality, high frame rate rendering by transmitting data to multiple sites having different bandwidth is challenging. In this paper we introduce a network adaptive textured mesh generation scheme to transmit varying quality data based on the available bandwidth. To reduce the volume of information transmitted, a visual quality based vertex selection approach is used to generate a sparse representation of the user. This sparse representation is then transmitted to the receiver side where a sweep-line based technique is used to generate a 3D mesh of the user. High visual quality is maintained by transmitting a high resolution texture image compressed using a lossy compression algorithm. In our studies users were unable to notice visual quality variations of the rendered 3D model even at 90% compression.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.
许多人在视频分享网站上搜索和观看音乐视频。尽管上传的视频种类繁多,但目前视频分享网站上的搜索系统允许用户搜索有限范围的音乐视频以输入查询。特别是当用户对查询没有足够的了解时,问题会变得更糟,因为用户无法通过更改查询或向原始查询添加一些关键字来定制范围。本文提出了一个具有坐标词感知和多样性感知的音乐视频搜索系统ExploratoryVideoSearch。我们的系统以艺人姓名查询为重点,在YouTube上搜索视频,有两个新颖的功能:(1)给定艺人姓名查询,系统显示该艺人及其坐标项的搜索结果;(2)系统对该查询及其坐标项的搜索结果进行多样化,并允许用户交互更改多样化程度。坐标词是利用百万歌曲数据集(Million Song Dataset)和维基百科(Wikipedia)获得的,搜索结果是根据YouTube音乐视频附加的标签进行多样化的。ExploratoryVideoSearch使用户能够搜索各种各样的音乐视频,而不需要对查询有深入的了解。
{"title":"ExploratoryVideoSearch: A Music Video Search System Based on Coordinate Terms and Diversification","authors":"Kosetsu Tsukuda, Masataka Goto","doi":"10.1109/ISM.2015.99","DOIUrl":"https://doi.org/10.1109/ISM.2015.99","url":null,"abstract":"Many people search and watch music videos on video sharing websites. Although a vast variety of videos are uploaded, current search systems on video sharing websites allow users to search a limited range of music videos for an input query. Especially when a user does not have enough knowledge of a query, the problem gets worse because the user cannot customize the range by changing the query or adding some keywords to the original query. In this paper, we propose a music video search system, called ExploratoryVideoSearch, that is coordinate term aware and diversity aware. Our system focuses on artist name queries to search videos on YouTube and has two novel functions: (1) given an artist name query, the system shows a search result for the artist as well as those for its coordinate terms, and (2) the system diversifies search results for the query and its coordinate terms, and allows users to interactively change the diversity level. Coordinate terms are obtained by utilizing the Million Song Dataset and Wikipedia, while search results are diversified based on tags attached to YouTube music videos. ExploratoryVideoSearch enables users to search a wide variety of music videos without requiring deep knowledge about a query.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.
{"title":"Automatic Correction of Programming Exercises in ViPLab","authors":"T. Richter, Jan Vanvinkenroye","doi":"10.1109/ISM.2015.69","DOIUrl":"https://doi.org/10.1109/ISM.2015.69","url":null,"abstract":"We describe the ViPLab plug-in for the ILIAS learning management system (LMS) that offers students a virtual programming laboratory and hence allows them to run programming exercises without ever having to leave the browser. In particular, this article introduces one new component of the system that allows automatic correction of programming exercises and hence simplifies the implementation of programming classes in freshmen courses significantly.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang
With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.
{"title":"Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation","authors":"Yang Peng, D. Wang, Ishan Patwa, Dihong Gong, C. Fang","doi":"10.1109/ISM.2015.35","DOIUrl":"https://doi.org/10.1109/ISM.2015.35","url":null,"abstract":"With the advent of abundant multimedia data on the Internet, there have been research efforts on multimodal machine learning to utilize data from different modalities. Current approaches mostly focus on developing models to fuse low-level features from multiple modalities and learn unified representation from different modalities. But most related work failed to justify why we should use multimodal data and multimodal fusion, and few of them leveraged the complementary relation among different modalities. In this paper, we first identify the correlative and complementary relations among multiple modalities. Then we propose a probabilistic ensemble fusion model to capture the complementary relation between two modalities (images and text). Experimental results on the UIUC-ISD dataset show our ensemble approach outperforms approaches using only single modality. Word sense disambiguation (WSD) is the use case we studied to demonstrate the effectiveness of our probabilistic ensemble fusion model.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130536781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do
We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.
{"title":"Democratizing Optometric Care: A Vision-Based, Data-Driven Approach to Automatic Refractive Error Measurement for Vision Screening","authors":"Tiffany C. K. Kwok, N. M. Shum, G. Ngai, H. Leong, G. A. Tseng, Hoi-yi Choi, Ka-yan Mak, C. Do","doi":"10.1109/ISM.2015.55","DOIUrl":"https://doi.org/10.1109/ISM.2015.55","url":null,"abstract":"We present a vision-based, data-driven approach to identifying and measuring refractive errors in human subjects with low-cost, easily available equipment and no specialist training. Vision problems, such as refractive error (e.g. nearsightedness, astigmatism, etc) are common ocular problems, which, if uncorrected, may lead to serious visual impairment. The diagnosis of such defects conventionally requires expensive specialist equipment and trained personnel, which is a barrier in many parts of the developing world. Our approach aims to democratize optometric care by utilizing the computational power inherent in consumer-grade devices and the advances made possible by multimedia computing. We present results that show our system is able to match and outperform state-of-the-art medical devices under certain conditions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130795839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}