Shota Orihashi, Rintaro Harada, Y. Matsuo, J. Katto
In this paper, we propose a method to control H.265/HEVC encoding for 8K UHDTV moving pictures by detecting amount or complexity of object motions. In 8K video, which has very high spatial resolution, motion has a big influence on encoding efficiency and processing time. The proposed method estimates motion features by external process which uses local feature points matching between two frames, selects an optimal prediction mode and determines search ranges of motion vectors. Experiments show we can detect motion complexity of 8K movies by using local feature matching between frames and we can select optimal configurations of encoding. By our method, we achieved highly efficient and low computation encoding.
{"title":"An Adaptive H.265/HEVC Encoding Control for 8K UHDTV Movies Based on Motion Complexity Estimation","authors":"Shota Orihashi, Rintaro Harada, Y. Matsuo, J. Katto","doi":"10.1109/ISM.2015.74","DOIUrl":"https://doi.org/10.1109/ISM.2015.74","url":null,"abstract":"In this paper, we propose a method to control H.265/HEVC encoding for 8K UHDTV moving pictures by detecting amount or complexity of object motions. In 8K video, which has very high spatial resolution, motion has a big influence on encoding efficiency and processing time. The proposed method estimates motion features by external process which uses local feature points matching between two frames, selects an optimal prediction mode and determines search ranges of motion vectors. Experiments show we can detect motion complexity of 8K movies by using local feature matching between frames and we can select optimal configurations of encoding. By our method, we achieved highly efficient and low computation encoding.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126371744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a food portion estimation technique based on a single-view food image used for the estimation of the amount of energy (in kilocalories) consumed at a meal. Unlike previous methods we have developed, the new technique is capable of estimating food portion without manual tuning of parameters. Although single-view 3D scene reconstruction is in general an ill-posed problem, the use of geometric models such as the shape of a container can help to partially recover 3D parameters of food items in the scene. Based on the estimated 3D parameters of each food item and a reference object in the scene, the volume of each food item in the image can be determined. The weight of each food can then be estimated using the density of the food item. We were able to achieve an error of less than 6% for energy estimation of an image of a meal assuming accurate segmentation and food classification.
{"title":"Single-View Food Portion Estimation Based on Geometric Models","authors":"S. Fang, Chang Liu, F. Zhu, E. Delp, C. Boushey","doi":"10.1109/ISM.2015.67","DOIUrl":"https://doi.org/10.1109/ISM.2015.67","url":null,"abstract":"In this paper we present a food portion estimation technique based on a single-view food image used for the estimation of the amount of energy (in kilocalories) consumed at a meal. Unlike previous methods we have developed, the new technique is capable of estimating food portion without manual tuning of parameters. Although single-view 3D scene reconstruction is in general an ill-posed problem, the use of geometric models such as the shape of a container can help to partially recover 3D parameters of food items in the scene. Based on the estimated 3D parameters of each food item and a reference object in the scene, the volume of each food item in the image can be determined. The weight of each food can then be estimated using the density of the food item. We were able to achieve an error of less than 6% for energy estimation of an image of a meal assuming accurate segmentation and food classification.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce a model for scene category recognition using metadata of labeled training dataset. We define a measurement of object-scene relevance and apply it to scene category classification to increase coherence of objects in classification and annotation tasks. We show how our context-based extension of supervised Latent Dirichlet Allocation (LDA) model increases recognition accuracy when feature mix is influenced by our relevancy score. We demonstrate that the proposed approach performs well on LabelMe dataset. Comparison between our purposed approach and state of art semi-supervised clustering algorithms using labeled data shows effectiveness of our approach in interpretation of scenes.
{"title":"Scene Classification Using External Knowledge Source","authors":"Esfandiar Zolghadr, B. Furht","doi":"10.1109/ISM.2015.85","DOIUrl":"https://doi.org/10.1109/ISM.2015.85","url":null,"abstract":"In this paper, we introduce a model for scene category recognition using metadata of labeled training dataset. We define a measurement of object-scene relevance and apply it to scene category classification to increase coherence of objects in classification and annotation tasks. We show how our context-based extension of supervised Latent Dirichlet Allocation (LDA) model increases recognition accuracy when feature mix is influenced by our relevancy score. We demonstrate that the proposed approach performs well on LabelMe dataset. Comparison between our purposed approach and state of art semi-supervised clustering algorithms using labeled data shows effectiveness of our approach in interpretation of scenes.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123334925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ligaj Pradhan, Song Gao, Chengcui Zhang, B. Gower, S. Heymsfield, D. Allison, O. Affuso
Body volume and body shape have been used in the estimation of body composition in clinical research. However, the determination of body volume typically requires sophisticated and expensive equipment. Similarly, the use of body shape to predict body composition is limited by rater biases as well as reproducibility. In this paper, we aim to introduce simple yet relatively accurate techniques for body volume and body shape representation that reduce limitations of traditional approaches. We propose an automated method to construct a 3D model of the body by accumulating ellipse-like slices formed by using the length and width features sampled from the back and side profile images. Body volume is represented in pixels by adding up the areas of the slices. Apart from representing body volume in pixels, we also aim to extract shape features from the 2D images and to create clusters of individuals according to their body shape. The body volume representation and the proposed shape features together with other meta-information including age, sex, race, height, and weight, could be effectively used in body composition prediction. Our study results indicate that the body volume calculated by the proposed method is reasonably accurate and the extracted shape clusters provide important information when estimating body composition.
{"title":"Feature Extraction from 2D Images for Body Composition Analysis","authors":"Ligaj Pradhan, Song Gao, Chengcui Zhang, B. Gower, S. Heymsfield, D. Allison, O. Affuso","doi":"10.1109/ISM.2015.117","DOIUrl":"https://doi.org/10.1109/ISM.2015.117","url":null,"abstract":"Body volume and body shape have been used in the estimation of body composition in clinical research. However, the determination of body volume typically requires sophisticated and expensive equipment. Similarly, the use of body shape to predict body composition is limited by rater biases as well as reproducibility. In this paper, we aim to introduce simple yet relatively accurate techniques for body volume and body shape representation that reduce limitations of traditional approaches. We propose an automated method to construct a 3D model of the body by accumulating ellipse-like slices formed by using the length and width features sampled from the back and side profile images. Body volume is represented in pixels by adding up the areas of the slices. Apart from representing body volume in pixels, we also aim to extract shape features from the 2D images and to create clusters of individuals according to their body shape. The body volume representation and the proposed shape features together with other meta-information including age, sex, race, height, and weight, could be effectively used in body composition prediction. Our study results indicate that the body volume calculated by the proposed method is reasonably accurate and the extracted shape clusters provide important information when estimating body composition.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126323197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lúcio F. D. Santos, Rafael L. Dias, M. X. Ribeiro, A. Traina, C. Traina
This paper proposes a new approach to improve similarity queries with diversity, the Diversity and Visually-Interactive method (DiVI), which employs Visual Data Mining techniques in Content-Based Image Retrieval (CBIR) systems. DiVI empowers the user to understand how the measures of similarity and diversity affect their queries, as well as increases the relevance of CBIR results according to the user judgment. An overview of the image distribution in the database is shown to the user through multidimensional projection. The user interacts with the visual representation changing the projected space or the query parameters, according to his/her needs and previous knowledge. DiVI takes advantage of the users' activity to transparently reduce the semantic gap faced by CBIR systems. Empirical evaluation show that DiVI increases the precision for querying by content and also increases the applicability and acceptance of similarity with diversity in CBIR systems.
{"title":"Combining Diversity Queries and Visual Mining to Improve Content-Based Image Retrieval Systems: The DiVI Method","authors":"Lúcio F. D. Santos, Rafael L. Dias, M. X. Ribeiro, A. Traina, C. Traina","doi":"10.1109/ISM.2015.115","DOIUrl":"https://doi.org/10.1109/ISM.2015.115","url":null,"abstract":"This paper proposes a new approach to improve similarity queries with diversity, the Diversity and Visually-Interactive method (DiVI), which employs Visual Data Mining techniques in Content-Based Image Retrieval (CBIR) systems. DiVI empowers the user to understand how the measures of similarity and diversity affect their queries, as well as increases the relevance of CBIR results according to the user judgment. An overview of the image distribution in the database is shown to the user through multidimensional projection. The user interacts with the visual representation changing the projected space or the query parameters, according to his/her needs and previous knowledge. DiVI takes advantage of the users' activity to transparently reduce the semantic gap faced by CBIR systems. Empirical evaluation show that DiVI increases the precision for querying by content and also increases the applicability and acceptance of similarity with diversity in CBIR systems.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122265256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social signal processing is becoming an important topic in affective computing. In this paper, we focus on an important social interaction in real life, namely, fighting. Fight detection will be useful in public transportation, prisons, bars, or even sport. A robust mechanism in detecting fights from a video will be extremely useful, especially in applications relevant to surveillance systems. Recent research works focus on extracting visual features from high resolution video, leading to computationally expensive systems. In this paper, we propose an approach to detect fights in a natural and robust way based on motion analysis, which is not only intuitive, but also robust. Experimental results show that we can accurately detect fight activities in different video surveillance settings.
{"title":"Automatic Fight Detection Based on Motion Analysis","authors":"E. Fu, H. Leong, G. Ngai, S. Chan","doi":"10.1109/ISM.2015.98","DOIUrl":"https://doi.org/10.1109/ISM.2015.98","url":null,"abstract":"Social signal processing is becoming an important topic in affective computing. In this paper, we focus on an important social interaction in real life, namely, fighting. Fight detection will be useful in public transportation, prisons, bars, or even sport. A robust mechanism in detecting fights from a video will be extremely useful, especially in applications relevant to surveillance systems. Recent research works focus on extracting visual features from high resolution video, leading to computationally expensive systems. In this paper, we propose an approach to detect fights in a natural and robust way based on motion analysis, which is not only intuitive, but also robust. Experimental results show that we can accurately detect fight activities in different video surveillance settings.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"46 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120933324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Road detection algorithms constitute a basis for intelligent vehicle systems which are designed to improve safety and efficiency for human drivers. In this paper, a novel road detection approach intended for tackling illumination-related effects is proposed. First, a grayscale image of modified saturation is derived from the input color image during preprocessing, effectively diminishing cast shadows. Second, the road boundary lines are detected, which provides an adaptive region of interest for the following lane-marking detection. Finally, an improved feature-based method is employed to identify lane-markings from the shadows. The experimental results show that the proposed approach is robust against illumination-related effects.
{"title":"An Illumination-Robust Approach for Feature-Based Road Detection","authors":"Zhenqiang Ying, Ge Li, Guozhen Tan","doi":"10.1109/ISM.2015.46","DOIUrl":"https://doi.org/10.1109/ISM.2015.46","url":null,"abstract":"Road detection algorithms constitute a basis for intelligent vehicle systems which are designed to improve safety and efficiency for human drivers. In this paper, a novel road detection approach intended for tackling illumination-related effects is proposed. First, a grayscale image of modified saturation is derived from the input color image during preprocessing, effectively diminishing cast shadows. Second, the road boundary lines are detected, which provides an adaptive region of interest for the following lane-marking detection. Finally, an improved feature-based method is employed to identify lane-markings from the shadows. The experimental results show that the proposed approach is robust against illumination-related effects.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123373631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has been past more than a decade since the Earth Mover's Distance and the Quadratic Form Distance have been proposed as distance-based similarity measures for color-based image similarity. Ever since their utilization in various domains, they have developed into major general-purpose distance functions. In this paper, we subject both dissimilarity measures to a fundamental analytical and empirical analysis in order to reveal their strengths and weaknesses.
{"title":"Earth Mover's Distance vs. Quadratic form Distance: An Analytical and Empirical Comparison","authors":"C. Beecks, M. S. Uysal, T. Seidl","doi":"10.1109/ISM.2015.76","DOIUrl":"https://doi.org/10.1109/ISM.2015.76","url":null,"abstract":"It has been past more than a decade since the Earth Mover's Distance and the Quadratic Form Distance have been proposed as distance-based similarity measures for color-based image similarity. Ever since their utilization in various domains, they have developed into major general-purpose distance functions. In this paper, we subject both dissimilarity measures to a fundamental analytical and empirical analysis in order to reveal their strengths and weaknesses.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133212667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziying Tang, Sonia Lawson, David Messing, Jin Guo, Ted Smith, Jinjuan Feng
Repetitive rehabilitation exercises on a regular basis are crucial for enhancing and restoring functional ability and quality of life to those with cognitive and physical disabilities such as stroke. However, a good proportion of patients are often non-compliant to the repetitive exercise regimen prescribed by their therapists. This may be due to a variety of factors including lack of motivation. Although interactive gaming systems such as the Wii by Nintendo and haptic devices have been introduced to make repetitive actions more fun and engaging for typically functioning individuals, challenges still remain for those who have some impairment. Expensive and complicated gaming systems are still not widely used by those with disabilities. In addition, therapists and caregivers, who play an important role in rehabilitation, have not been sufficiently involved. To address these problems as they pertain to people with disabilities, we propose a collaborative rehabilitation support system (CRSS) where mobile-based interactive games are included. Through our approach, we expand in-home rehabilitation to self-rehabilitation which allows users to perform rehab anywhere and anytime. A pilot study involving stroke survivors, caregivers, therapists and physicians was conducted to evaluate our system, and users' feedback is highly positive.
{"title":"Collaborative Rehabilitation Support System: A Comprehensive Solution for Everyday Rehab","authors":"Ziying Tang, Sonia Lawson, David Messing, Jin Guo, Ted Smith, Jinjuan Feng","doi":"10.1109/ISM.2015.62","DOIUrl":"https://doi.org/10.1109/ISM.2015.62","url":null,"abstract":"Repetitive rehabilitation exercises on a regular basis are crucial for enhancing and restoring functional ability and quality of life to those with cognitive and physical disabilities such as stroke. However, a good proportion of patients are often non-compliant to the repetitive exercise regimen prescribed by their therapists. This may be due to a variety of factors including lack of motivation. Although interactive gaming systems such as the Wii by Nintendo and haptic devices have been introduced to make repetitive actions more fun and engaging for typically functioning individuals, challenges still remain for those who have some impairment. Expensive and complicated gaming systems are still not widely used by those with disabilities. In addition, therapists and caregivers, who play an important role in rehabilitation, have not been sufficiently involved. To address these problems as they pertain to people with disabilities, we propose a collaborative rehabilitation support system (CRSS) where mobile-based interactive games are included. Through our approach, we expand in-home rehabilitation to self-rehabilitation which allows users to perform rehab anywhere and anytime. A pilot study involving stroke survivors, caregivers, therapists and physicians was conducted to evaluate our system, and users' feedback is highly positive.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114352661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Marshall, Anuya Ghanekar, John A. Springer, E. Matson
To quantify social tags' relatedness in an image collection, we examine the betweenness centrality measure. We depict the image collection as a multi-graph representation, where nodes are the social tags and edges bind an image's social tags. We present our weighted betweenness centrality algorithm and compare it to the unweighted version on sparse and dense graphs. The MIRFLICKR and ImageCLEF benchmark image collections are used in our experimental evaluation. We notice an 11% increase in the computation runtime with weighted edges in determining shortest paths within our image collections. We discuss the intended impact of our approach in conjunction with a node importance evaluation, via the k-path centrality algorithm, for determining situation-aware path planning applications.
{"title":"Betweenness Centrality Approaches for Image Retrieval","authors":"B. Marshall, Anuya Ghanekar, John A. Springer, E. Matson","doi":"10.1109/ISM.2015.83","DOIUrl":"https://doi.org/10.1109/ISM.2015.83","url":null,"abstract":"To quantify social tags' relatedness in an image collection, we examine the betweenness centrality measure. We depict the image collection as a multi-graph representation, where nodes are the social tags and edges bind an image's social tags. We present our weighted betweenness centrality algorithm and compare it to the unweighted version on sparse and dense graphs. The MIRFLICKR and ImageCLEF benchmark image collections are used in our experimental evaluation. We notice an 11% increase in the computation runtime with weighted edges in determining shortest paths within our image collections. We discuss the intended impact of our approach in conjunction with a node importance evaluation, via the k-path centrality algorithm, for determining situation-aware path planning applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114863058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}