Adaptive bitrate selection adjusts the quality of HTTP streaming video to a changing context. A number of different schemes have been proposed that use buffer state in the selection of the appropriate video rate. However, models describing the relationship between video quality levels and buffer occupancy are mostly based on heuristics, which often results in unstable and/or suboptimal quality. In this paper, we present a QoE-aware video rate evolution model based on buffer state changes. The scheme is evaluated within a real world Internet environment, where it is shown to improve the stability of the video rate. Up to 27% gain in average video rate can be achieved compared to the baseline ABR. The average throughput utilisation at a steady-state reaches 100% in some of the investigated scenarios.
{"title":"Modelling Video Rate Evolution in Adaptive Bitrate Selection","authors":"Yusuf Sani, A. Mauthe, C. Edwards","doi":"10.1109/ISM.2015.65","DOIUrl":"https://doi.org/10.1109/ISM.2015.65","url":null,"abstract":"Adaptive bitrate selection adjusts the quality of HTTP streaming video to a changing context. A number of different schemes have been proposed that use buffer state in the selection of the appropriate video rate. However, models describing the relationship between video quality levels and buffer occupancy are mostly based on heuristics, which often results in unstable and/or suboptimal quality. In this paper, we present a QoE-aware video rate evolution model based on buffer state changes. The scheme is evaluated within a real world Internet environment, where it is shown to improve the stability of the video rate. Up to 27% gain in average video rate can be achieved compared to the baseline ABR. The average throughput utilisation at a steady-state reaches 100% in some of the investigated scenarios.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134564965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The new video coding standard, MPEG-H Part 2 High Efficiency Video Coding (HEVC) or H.265, was developed to be roughly twice as efficient as H.264/AVC -- meaning H.265/HEVC could deliver the same quality as H.264/AVC using roughly half the bitrate. In this paper we describe a subjective experiment designed to test this claim. We present an experiment using 20 different 1080p 29.97 fps scenes and 12 impairment levels spanning MPEG-2, H.264/AVC and H.265/HEVC. Additionally we compare the results obtained from the subjective assessment to quality estimates from two objective metrics: VQM and PSNR. Our subjective results show that H.265/HEVC can deliver the same quality at half the bitrate compared to H.264/AVC and can perform better at one quarter the bitrate compared to MPEG-2 in many, but not all, situations. For all 20 scenes coded with H.265/HEVC at 4 Mbps mean opinion scores span 38% of the subjective scale, which indicates the importance of scene selection. Objective quality estimations of HEVC have a low correlation with subjective results (0.60 for VQM, 0.64 for PSNR).
新的视频编码标准MPEG-H Part 2 High Efficiency video coding (HEVC)或H.265的效率大约是H.264/AVC的两倍,这意味着H.265/HEVC可以以大约一半的比特率提供与H.264/AVC相同的质量。在本文中,我们描述了一个主观实验,旨在验证这一说法。我们提出了一个实验,使用20个不同的1080p 29.97 fps场景和12个MPEG-2, H.264/AVC和H.265/HEVC的损伤级别。此外,我们比较了从主观评估中获得的结果与两个客观指标的质量估计:VQM和PSNR。我们的主观结果表明,H.265/HEVC可以以H.264/AVC一半的比特率提供相同的质量,并且在许多情况下(但不是所有情况)可以以比MPEG-2四分之一的比特率表现得更好。对于所有用H.265/HEVC以4 Mbps编码的20个场景,平均意见得分超过主观尺度的38%,这表明场景选择的重要性。HEVC的客观质量评价与主观结果的相关性较低(VQM为0.60,PSNR为0.64)。
{"title":"Characterization of the HEVC Coding Efficiency Advance Using 20 Scenes, ITU-T Rec. P.913 Compliant Subjective Methods, VQM, and PSNR","authors":"Andrew Catellier, M. Pinson","doi":"10.1109/ISM.2015.38","DOIUrl":"https://doi.org/10.1109/ISM.2015.38","url":null,"abstract":"The new video coding standard, MPEG-H Part 2 High Efficiency Video Coding (HEVC) or H.265, was developed to be roughly twice as efficient as H.264/AVC -- meaning H.265/HEVC could deliver the same quality as H.264/AVC using roughly half the bitrate. In this paper we describe a subjective experiment designed to test this claim. We present an experiment using 20 different 1080p 29.97 fps scenes and 12 impairment levels spanning MPEG-2, H.264/AVC and H.265/HEVC. Additionally we compare the results obtained from the subjective assessment to quality estimates from two objective metrics: VQM and PSNR. Our subjective results show that H.265/HEVC can deliver the same quality at half the bitrate compared to H.264/AVC and can perform better at one quarter the bitrate compared to MPEG-2 in many, but not all, situations. For all 20 scenes coded with H.265/HEVC at 4 Mbps mean opinion scores span 38% of the subjective scale, which indicates the importance of scene selection. Objective quality estimations of HEVC have a low correlation with subjective results (0.60 for VQM, 0.64 for PSNR).","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121087317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In streaming media over the Internet using HTTP, it is often desirable to minimize the duration of the media segments of a stream (for example, to reduce camera-to-viewer delay in streaming a live sports event). The operational behavior of Internet transport protocols limits the minimum duration of segments below which the segments can't be transferred in time to avoid a playback stall. This paper proposes a method to calculate the minimum duration for the segments of a media stream under given network transport conditions to avoid a playback stall. This method can be used to select the segment duration for a media stream under anticipated worst-case network conditions.
{"title":"Calculating a Minimum Playable Duration for HTTP Streaming Media Segments","authors":"M. Thornburgh","doi":"10.1109/ISM.2015.39","DOIUrl":"https://doi.org/10.1109/ISM.2015.39","url":null,"abstract":"In streaming media over the Internet using HTTP, it is often desirable to minimize the duration of the media segments of a stream (for example, to reduce camera-to-viewer delay in streaming a live sports event). The operational behavior of Internet transport protocols limits the minimum duration of segments below which the segments can't be transferred in time to avoid a playback stall. This paper proposes a method to calculate the minimum duration for the segments of a media stream under given network transport conditions to avoid a playback stall. This method can be used to select the segment duration for a media stream under anticipated worst-case network conditions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115770728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Classification of imbalanced data is an important research problem as lots of real-world data sets have skewed class distributions in which the majority of data instances (examples) belong to one class and far fewer instances belong to others. While in many applications, the minority instances actually represent the concept of interest (e.g., fraud in banking operations, abnormal cell in medical data, etc.), a classifier induced from an imbalanced data set is more likely to be biased towards the majority class and show very poor classification accuracy on the minority class. Despite extensive research efforts, imbalanced data classification remains one of the most challenging problems in data mining and machine learning, especially for multimedia data. To tackle this challenge, in this paper, we propose an extended deep learning approach to achieve promising performance in classifying skewed multimedia data sets. Specifically, we investigate the integration of bootstrapping methods and a state-of-the-art deep learning approach, Convolutional Neural Networks (CNNs), with extensive empirical studies. Considering the fact that deep learning approaches such as CNNs are usually computationally expensive, we propose to feed low-level features to CNNs and prove its feasibility in achieving promising performance while saving a lot of training time. The experimental results show the effectiveness of our framework in classifying severely imbalanced data in the TRECVID data set.
{"title":"Deep Learning for Imbalanced Multimedia Data Classification","authors":"Yilin Yan, Min Chen, M. Shyu, Shu‐Ching Chen","doi":"10.1109/ISM.2015.126","DOIUrl":"https://doi.org/10.1109/ISM.2015.126","url":null,"abstract":"Classification of imbalanced data is an important research problem as lots of real-world data sets have skewed class distributions in which the majority of data instances (examples) belong to one class and far fewer instances belong to others. While in many applications, the minority instances actually represent the concept of interest (e.g., fraud in banking operations, abnormal cell in medical data, etc.), a classifier induced from an imbalanced data set is more likely to be biased towards the majority class and show very poor classification accuracy on the minority class. Despite extensive research efforts, imbalanced data classification remains one of the most challenging problems in data mining and machine learning, especially for multimedia data. To tackle this challenge, in this paper, we propose an extended deep learning approach to achieve promising performance in classifying skewed multimedia data sets. Specifically, we investigate the integration of bootstrapping methods and a state-of-the-art deep learning approach, Convolutional Neural Networks (CNNs), with extensive empirical studies. Considering the fact that deep learning approaches such as CNNs are usually computationally expensive, we propose to feed low-level features to CNNs and prove its feasibility in achieving promising performance while saving a lot of training time. The experimental results show the effectiveness of our framework in classifying severely imbalanced data in the TRECVID data set.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116711842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel fast Coding Tree Unit partitioning for HEVC/H.265 encoder is proposed in this paper. This method relies on run-time trained neural networks for fast Coding Units splitting decisions. Contrasting to state-of-the-art solutions, this method does not require any pre-training and provides a high adaptivity to the dynamic changes in video contents. By an efficient sampling strategy and a multi-thread implementation, the presented technique successfully mitigates the computational overhead inherent to the training process on both the overall processing performance and on the initial encoding delay. The experiments show that the proposed method successfully reduces the HEVC/H.265 encoding time for up to 65% with negligible rate-distortion penalties.
{"title":"Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision","authors":"S. Momcilovic, N. Roma, L. Sousa, I. Milentijevic","doi":"10.1109/ISM.2015.70","DOIUrl":"https://doi.org/10.1109/ISM.2015.70","url":null,"abstract":"A novel fast Coding Tree Unit partitioning for HEVC/H.265 encoder is proposed in this paper. This method relies on run-time trained neural networks for fast Coding Units splitting decisions. Contrasting to state-of-the-art solutions, this method does not require any pre-training and provides a high adaptivity to the dynamic changes in video contents. By an efficient sampling strategy and a multi-thread implementation, the presented technique successfully mitigates the computational overhead inherent to the training process on both the overall processing performance and on the initial encoding delay. The experiments show that the proposed method successfully reduces the HEVC/H.265 encoding time for up to 65% with negligible rate-distortion penalties.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122766020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new interface that facilitates content navigation in videos on devices with touchscreen interaction. This interface allows both coarse-grained and fine-grained navigation in an intuitive way and enables better performance when used to locate specific scenes in videos. We implemented this interface on a 5.5-inch smartphone and tested it with 24 users. Our results show that for video navigation tasks the proposed interface significantly outperforms the seekerbar interface, commonly used with video players on mobile devices. Moreover, we found that the interaction concept of the Scrubbing Wheel has a much lower perceived workload than the widely used seeker-bar, and is the preferred tool to locate scenes in videos for all tested users in our study.
{"title":"Scrubbing Wheel: An Interaction Concept to Improve Video Content Navigation on Devices with Touchscreens","authors":"Klaus Schöffmann, Lukas Burgstaller","doi":"10.1109/ISM.2015.20","DOIUrl":"https://doi.org/10.1109/ISM.2015.20","url":null,"abstract":"We propose a new interface that facilitates content navigation in videos on devices with touchscreen interaction. This interface allows both coarse-grained and fine-grained navigation in an intuitive way and enables better performance when used to locate specific scenes in videos. We implemented this interface on a 5.5-inch smartphone and tested it with 24 users. Our results show that for video navigation tasks the proposed interface significantly outperforms the seekerbar interface, commonly used with video players on mobile devices. Moreover, we found that the interaction concept of the Scrubbing Wheel has a much lower perceived workload than the widely used seeker-bar, and is the preferred tool to locate scenes in videos for all tested users in our study.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129725257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, video content is delivered to a myriad of devices over different communication networks. Video delivery must be adapted to the available bandwidth, screen size, resolution and the decoding capability of the end user devices. In this work we present an approach to predict the transcoding time of a video into another given transcoding parameters and an input video. To obtain enough information on the characteristics of real world online videos and their transcoding parameters needed to model transcoding time, we built a video characteristics dataset, using data collected from a large video-on-demand system, YouTube. The dataset contains a million randomly sampled video instances listing 10 fundamental video characteristics. We report our analysis on the dataset which provides insightful statistics on fundamental online video characteristics that can be further exploited to optimize or model components of a multimedia processing systems. We also present experimental results on transcoding time prediction models, based on support vector machines, linear regression and multi-layer perceptron feed forward artificial neural network.
{"title":"Analysis and Transcoding Time Prediction of Online Videos","authors":"Tewodros Deneke, S. Lafond, J. Lilius","doi":"10.1109/ISM.2015.100","DOIUrl":"https://doi.org/10.1109/ISM.2015.100","url":null,"abstract":"Today, video content is delivered to a myriad of devices over different communication networks. Video delivery must be adapted to the available bandwidth, screen size, resolution and the decoding capability of the end user devices. In this work we present an approach to predict the transcoding time of a video into another given transcoding parameters and an input video. To obtain enough information on the characteristics of real world online videos and their transcoding parameters needed to model transcoding time, we built a video characteristics dataset, using data collected from a large video-on-demand system, YouTube. The dataset contains a million randomly sampled video instances listing 10 fundamental video characteristics. We report our analysis on the dataset which provides insightful statistics on fundamental online video characteristics that can be further exploited to optimize or model components of a multimedia processing systems. We also present experimental results on transcoding time prediction models, based on support vector machines, linear regression and multi-layer perceptron feed forward artificial neural network.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126232280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Gremes, Nicola Palavecino, Lucas Seeber, Santiago Herrero
Musicians often lack the ability to harmonize their voices within a track. To help with this matter, a tool can be developed for detecting the scale or key in which a track is sung and synthesizing pitches to make a triadchord or a tetrachord (combinations of three or four notes that fit in the scale's harmony) for each corresponding tone in the melody. In this paper, we present a fast and precise method to detect the pitch of voice and shift it to the appropriate frequencies, consequently building up a harmony out of the original melody. Four techniques are involved in this sequential process: segmentation into consonant and vowel intervals, pitch detection by the McLeod Pitch Method (MPM), functional harmony for establishing a cadence, and pitch shifting by means of a phase vocoder.
{"title":"Synthetic Voice Harmonization: A Fast and Precise Method","authors":"Juan Gremes, Nicola Palavecino, Lucas Seeber, Santiago Herrero","doi":"10.1109/ISM.2015.122","DOIUrl":"https://doi.org/10.1109/ISM.2015.122","url":null,"abstract":"Musicians often lack the ability to harmonize their voices within a track. To help with this matter, a tool can be developed for detecting the scale or key in which a track is sung and synthesizing pitches to make a triadchord or a tetrachord (combinations of three or four notes that fit in the scale's harmony) for each corresponding tone in the melody. In this paper, we present a fast and precise method to detect the pitch of voice and shift it to the appropriate frequencies, consequently building up a harmony out of the original melody. Four techniques are involved in this sequential process: segmentation into consonant and vowel intervals, pitch detection by the McLeod Pitch Method (MPM), functional harmony for establishing a cadence, and pitch shifting by means of a phase vocoder.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kazuki Fujisawa, Yuko Hirabe, H. Suwa, Yutaka Arakawa, K. Yasumoto
In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.
{"title":"Automatic Content Curation System for Multiple Live Sport Video Streams","authors":"Kazuki Fujisawa, Yuko Hirabe, H. Suwa, Yutaka Arakawa, K. Yasumoto","doi":"10.1109/ISM.2015.17","DOIUrl":"https://doi.org/10.1109/ISM.2015.17","url":null,"abstract":"In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"436 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115808721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the explosion of multimedia data, the demand for the sophisticated multimedia knowledge discovery systems has been increased. The multimodal nature of multimedia data is the big barrier for knowledge extraction. The representation of multimodal data in a unimodal space will be more advantageous for any mining task. We initially represent the multimodal multimedia documents in a unimodal space by converting the multimedia objects into signal objects. The dynamic nature of the glowworms motivated us to propose the Glowworm Swarm Optimization based Multimedia Document Clustering (GSOMDC) algorithm to group the multimedia documents into topics. The better purity and entropy values indicates that the GSOMDC algorithm successfully clusters the multimedia documents into topics. The goodness of the clustering is evaluated by performing the cluster based retrieval of multimedia documents with better precision values.
{"title":"A New Glowworm Swarm Optimization Based Clustering Algorithm for Multimedia Documents","authors":"K. Pushpalatha, S. AnanthanarayanaV.","doi":"10.1109/ISM.2015.94","DOIUrl":"https://doi.org/10.1109/ISM.2015.94","url":null,"abstract":"Due to the explosion of multimedia data, the demand for the sophisticated multimedia knowledge discovery systems has been increased. The multimodal nature of multimedia data is the big barrier for knowledge extraction. The representation of multimodal data in a unimodal space will be more advantageous for any mining task. We initially represent the multimodal multimedia documents in a unimodal space by converting the multimedia objects into signal objects. The dynamic nature of the glowworms motivated us to propose the Glowworm Swarm Optimization based Multimedia Document Clustering (GSOMDC) algorithm to group the multimedia documents into topics. The better purity and entropy values indicates that the GSOMDC algorithm successfully clusters the multimedia documents into topics. The goodness of the clustering is evaluated by performing the cluster based retrieval of multimedia documents with better precision values.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134028245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}