The rule of thirds is one of the most important composition rules used by photographers to create high-quality photos. The rule of thirds states that placing important objects along the imagery thirds lines or around their intersections often produces highly aesthetic photos. In this paper, we present a method to automatically determine whether a photo respects the rule of thirds. Detecting the rule of thirds from a photo requires semantic content understanding to locate important objects, which is beyond the state of the art. This paper makes use of the recent saliency and generic objectness analysis as an alternative and accordingly designs a range of features. Our experiment with a variety of saliency and generic objectness methods shows that an encouraging performance can be achieved in detecting the rule of thirds from photos.
{"title":"Rule of Thirds Detection from Photograph","authors":"Long Mai, Hoang Le, Yuzhen Niu, Feng Liu","doi":"10.1109/ISM.2011.23","DOIUrl":"https://doi.org/10.1109/ISM.2011.23","url":null,"abstract":"The rule of thirds is one of the most important composition rules used by photographers to create high-quality photos. The rule of thirds states that placing important objects along the imagery thirds lines or around their intersections often produces highly aesthetic photos. In this paper, we present a method to automatically determine whether a photo respects the rule of thirds. Detecting the rule of thirds from a photo requires semantic content understanding to locate important objects, which is beyond the state of the art. This paper makes use of the recent saliency and generic objectness analysis as an alternative and accordingly designs a range of features. Our experiment with a variety of saliency and generic objectness methods shows that an encouraging performance can be achieved in detecting the rule of thirds from photos.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121647939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harish Katti, Karthik Yadati, M. Kankanhalli, Tat-Seng Chua
We propose a semi-automated, eye-gaze based method for affective analysis of videos. Pupillary Dilation (PD) is introduced as a valuable behavioural signal for assessment of subject arousal and engagement. We use PD information for computationally inexpensive, arousal based composition of video summaries and descriptive story-boards. Video summarization and story-board generation is done offline, subsequent to a subject viewing the video. The method also includes novel eye-gaze analysis and fusion with content based features to discover affective segments of videos and Regions of interest (ROIs) contained therein. Effectiveness of the framework is evaluated using experiments over a diverse set of clips, significant pool of subjects and comparison with a fully automated state-of-art affective video summarization algorithm. Acquisition and analysis of PD information is demonstrated and used as a proxy for human visual attention and arousal based video summarization and story-board generation. An important contribution is to demonstrate usefulness of PD information in identifying affective video segments with abstract semantics or affective elements of discourse and story-telling, that are likely to be missed by automated methods. Another contribution is the use of eye-fixations in the close temporal proximity of PD based events for key frame extraction and subsequent story board generation. We also show how PD based video summarization can to generate either a personalized video summary or to represent a consensus over affective preferences of a larger group or community.
{"title":"Affective Video Summarization and Story Board Generation Using Pupillary Dilation and Eye Gaze","authors":"Harish Katti, Karthik Yadati, M. Kankanhalli, Tat-Seng Chua","doi":"10.1109/ISM.2011.57","DOIUrl":"https://doi.org/10.1109/ISM.2011.57","url":null,"abstract":"We propose a semi-automated, eye-gaze based method for affective analysis of videos. Pupillary Dilation (PD) is introduced as a valuable behavioural signal for assessment of subject arousal and engagement. We use PD information for computationally inexpensive, arousal based composition of video summaries and descriptive story-boards. Video summarization and story-board generation is done offline, subsequent to a subject viewing the video. The method also includes novel eye-gaze analysis and fusion with content based features to discover affective segments of videos and Regions of interest (ROIs) contained therein. Effectiveness of the framework is evaluated using experiments over a diverse set of clips, significant pool of subjects and comparison with a fully automated state-of-art affective video summarization algorithm. Acquisition and analysis of PD information is demonstrated and used as a proxy for human visual attention and arousal based video summarization and story-board generation. An important contribution is to demonstrate usefulness of PD information in identifying affective video segments with abstract semantics or affective elements of discourse and story-telling, that are likely to be missed by automated methods. Another contribution is the use of eye-fixations in the close temporal proximity of PD based events for key frame extraction and subsequent story board generation. We also show how PD based video summarization can to generate either a personalized video summary or to represent a consensus over affective preferences of a larger group or community.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133214170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a method for recognizing recipe ingredients based on the load on a chopping board when ingredients are cut. The load is measured by four sensors attached to the board. Each chop is detected by indentifying a sharp falling edge in the load data. The load features, including the maximum value, duration, impulse, peak position, and kurtosis, are extracted and used for ingredient recognition. Experimental results showed a precision of 98.1% in chop detection and 67.4% in ingredient recognition with a support vector machine (SVM) classifier for 16 common ingredients.
{"title":"Cooking Ingredient Recognition Based on the Load on a Chopping Board during Cutting","authors":"Yoko Yamakata, Yoshiki Tsuchimoto, Atsushi Hashimoto, Takuya Funatomi, Mayumi Ueda, M. Minoh","doi":"10.1109/ISM.2011.69","DOIUrl":"https://doi.org/10.1109/ISM.2011.69","url":null,"abstract":"This paper presents a method for recognizing recipe ingredients based on the load on a chopping board when ingredients are cut. The load is measured by four sensors attached to the board. Each chop is detected by indentifying a sharp falling edge in the load data. The load features, including the maximum value, duration, impulse, peak position, and kurtosis, are extracted and used for ingredient recognition. Experimental results showed a precision of 98.1% in chop detection and 67.4% in ingredient recognition with a support vector machine (SVM) classifier for 16 common ingredients.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115025356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Goulart, Carlos Dias Maciel, R. Guido, Katia Cristina Silva Paulo, Ivan Nunes da Silva
In this letter, we present an automatic music genre classification scheme based on a Gaussian Mixture Model (GMM) classifier. The proposed technique adopts entropies and lacunarities as features for the classifications. Tests were carried out with four styles of Brazilian music, namely Ax ´e, Bossa Nova, Forro ´, and Samba.
在这篇文章中,我们提出了一个基于高斯混合模型(GMM)分类器的音乐类型自动分类方案。该方法采用熵和缺度作为分类的特征。测试使用了四种巴西音乐风格,即Ax ' e, Bossa Nova, Forro '和Samba。
{"title":"Music Genre Classification Based on Entropy and Fractal Lacunarity","authors":"A. Goulart, Carlos Dias Maciel, R. Guido, Katia Cristina Silva Paulo, Ivan Nunes da Silva","doi":"10.1109/ISM.2011.94","DOIUrl":"https://doi.org/10.1109/ISM.2011.94","url":null,"abstract":"In this letter, we present an automatic music genre classification scheme based on a Gaussian Mixture Model (GMM) classifier. The proposed technique adopts entropies and lacunarities as features for the classifications. Tests were carried out with four styles of Brazilian music, namely Ax ´e, Bossa Nova, Forro ´, and Samba.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129938710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew D. Bagdanov, M. Bertini, A. Bimbo, Lorenzo Seidenari
This article describes an approach to adaptive video coding for video surveillance applications. Using a combination of low-level features with low computational cost, we show how it is possible to control the quality of video compression so that semantically meaningful elements of the scene are encoded with higher fidelity, while background elements are allocated fewer bits in the transmitted representation. Our approach is based on adaptive smoothing of individual video frames so that image features highly correlated to semantically interesting objects are preserved. Using only low-level image features on individual frames, this adaptive smoothing can be seamlessly inserted into a video coding pipeline as a pre-processing state. Experiments show that our technique is efficient, outperforms standard H.264 encoding at comparable bit rates, and preserves features critical for downstream detection and recognition.
{"title":"Adaptive Video Compression for Video Surveillance Applications","authors":"Andrew D. Bagdanov, M. Bertini, A. Bimbo, Lorenzo Seidenari","doi":"10.1109/ISM.2011.38","DOIUrl":"https://doi.org/10.1109/ISM.2011.38","url":null,"abstract":"This article describes an approach to adaptive video coding for video surveillance applications. Using a combination of low-level features with low computational cost, we show how it is possible to control the quality of video compression so that semantically meaningful elements of the scene are encoded with higher fidelity, while background elements are allocated fewer bits in the transmitted representation. Our approach is based on adaptive smoothing of individual video frames so that image features highly correlated to semantically interesting objects are preserved. Using only low-level image features on individual frames, this adaptive smoothing can be seamlessly inserted into a video coding pipeline as a pre-processing state. Experiments show that our technique is efficient, outperforms standard H.264 encoding at comparable bit rates, and preserves features critical for downstream detection and recognition.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114389273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The physical characteristics of current mobile devices impose significant constraints on the processing of 3D graphics. The remote rendering framework is considered a better choice in this regard. However, limited battery life is a critical constraint when using this approach. Earlier methods based on this framework suffered from high transmission frequency. We present a software solution to this problem with a key element, an Adaptive Splitting and Error Handling Mechanism that indirectly reserves the electricity in mobile devices by reducing the transmission frequency. To achieve this goal, a geometric relation is maintained that tightly couples several consecutive Levels of Detail (LOD). Adaptive Splitting can then approximate the LOD from a much coarser split base under the guidance of the relation. Data transmission between the server and mobile device occurs only when out-ranged LOD is about to be displayed. Our remote rendering architecture, based on the above approach, trades splitting process for transmission, thereby alleviating the problem of frequent data transmission.
{"title":"An Adaptive Splitting and Transmission Control Method for Rendering Point Model on Mobile Devices","authors":"Yajie Yan, Xiaohui Liang, Ke Xie, Qinping Zhao","doi":"10.1145/1730804.1730977","DOIUrl":"https://doi.org/10.1145/1730804.1730977","url":null,"abstract":"The physical characteristics of current mobile devices impose significant constraints on the processing of 3D graphics. The remote rendering framework is considered a better choice in this regard. However, limited battery life is a critical constraint when using this approach. Earlier methods based on this framework suffered from high transmission frequency. We present a software solution to this problem with a key element, an Adaptive Splitting and Error Handling Mechanism that indirectly reserves the electricity in mobile devices by reducing the transmission frequency. To achieve this goal, a geometric relation is maintained that tightly couples several consecutive Levels of Detail (LOD). Adaptive Splitting can then approximate the LOD from a much coarser split base under the guidance of the relation. Data transmission between the server and mobile device occurs only when out-ranged LOD is about to be displayed. Our remote rendering architecture, based on the above approach, trades splitting process for transmission, thereby alleviating the problem of frequent data transmission.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117344168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}