Semantic annotation of multimedia data is needed for various tasks like content based indexing of databases and also for making inferences about the activities taking place in the environment. In this paper, we present a top level ontology which provides a framework for describing the semantic features in video. We do this in three steps - First, we identify the key components of semantic descriptions like objects and events and how domain specific ontologies can be developed from them. Second, we present a set of predicates for composing events and for describing various spatio-temporal relationships between events/entities. Third, we develop a scheme for reasoning with the developed ontologies to infer complex events from simple events using relational algebra. Finally, we have demonstrated the utility of our framework by developing an ontology for a specific domain. We conclude by analyzing the performance of the reasoning mechanism with simulated events in this domain.
{"title":"EDF: A framework for Semantic Annotation of Video","authors":"P. Natarajan, R. Nevatia","doi":"10.1109/ICCV.2005.255","DOIUrl":"https://doi.org/10.1109/ICCV.2005.255","url":null,"abstract":"Semantic annotation of multimedia data is needed for various tasks like content based indexing of databases and also for making inferences about the activities taking place in the environment. In this paper, we present a top level ontology which provides a framework for describing the semantic features in video. We do this in three steps - First, we identify the key components of semantic descriptions like objects and events and how domain specific ontologies can be developed from them. Second, we present a set of predicates for composing events and for describing various spatio-temporal relationships between events/entities. Third, we develop a scheme for reasoning with the developed ontologies to infer complex events from simple events using relational algebra. Finally, we have demonstrated the utility of our framework by developing an ontology for a specific domain. We conclude by analyzing the performance of the reasoning mechanism with simulated events in this domain.","PeriodicalId":432729,"journal":{"name":"Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129138518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a framework for autonomous behaviour in vision based artificial cognitive systems by imitation through coupled percept-action (stimulus and response) exemplars. Attributed Relational Graphs (ARGs) are used as a symbolic representation of scene information (percepts). A measure of similarity between ARGs is implemented with the use of a graph isomorphism algorithm and is used to hierarchically group the percepts. By hierarchically grouping percept exemplars into progressively more general models coupled to progressively more general Gaussian action models, we attempt to model the percept space and create a direct mapping to associated actions. The system is built on a simulated shape sorter puzzle that represents a robust vision system. Spatio temporal hypothesis exploration is performed ef- ficiently in a Bayesian framework using a particle filter to propagate game play over time.
{"title":"A generalised exemplar approach to modeling perception action coupling","authors":"L. Ellis, R. Bowden","doi":"10.1109/ICCV.2005.254","DOIUrl":"https://doi.org/10.1109/ICCV.2005.254","url":null,"abstract":"We present a framework for autonomous behaviour in vision based artificial cognitive systems by imitation through coupled percept-action (stimulus and response) exemplars. Attributed Relational Graphs (ARGs) are used as a symbolic representation of scene information (percepts). A measure of similarity between ARGs is implemented with the use of a graph isomorphism algorithm and is used to hierarchically group the percepts. By hierarchically grouping percept exemplars into progressively more general models coupled to progressively more general Gaussian action models, we attempt to model the percept space and create a direct mapping to associated actions. The system is built on a simulated shape sorter puzzle that represents a robust vision system. Spatio temporal hypothesis exploration is performed ef- ficiently in a Bayesian framework using a particle filter to propagate game play over time.","PeriodicalId":432729,"journal":{"name":"Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114644224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an original approach for the symbol grounding problem involved in semantic image interpretation, i.e. the problem of the mapping between image data and semantic data. Our approach involves the following aspects of cognitive vision : knowledge acquisition and knowledge representation, reasoning and machine learning. The symbol grounding problem is considered as a problem as such and we propose an independent cognitive system dedicated to symbol grounding. This symbol grounding system introduces an intermediate layer between the semantic interpretation problem (reasoning in the semantic level) and the image processing problem. An important aspect of the work concerns the use of two ontologies to make easier the communication between the different layers : a visual concept ontology and an image processing ontology. We use two approaches to solve the symbol grounding problem: a machine learning approach and an a priori knowledge based approach.
{"title":"Symbol Grounding for Semantic Image Interpretation: From Image Data to Semantics","authors":"C. Hudelot, Nicolas Maillot, M. Thonnat","doi":"10.1109/ICCV.2005.258","DOIUrl":"https://doi.org/10.1109/ICCV.2005.258","url":null,"abstract":"This paper presents an original approach for the symbol grounding problem involved in semantic image interpretation, i.e. the problem of the mapping between image data and semantic data. Our approach involves the following aspects of cognitive vision : knowledge acquisition and knowledge representation, reasoning and machine learning. The symbol grounding problem is considered as a problem as such and we propose an independent cognitive system dedicated to symbol grounding. This symbol grounding system introduces an intermediate layer between the semantic interpretation problem (reasoning in the semantic level) and the image processing problem. An important aspect of the work concerns the use of two ontologies to make easier the communication between the different layers : a visual concept ontology and an image processing ontology. We use two approaches to solve the symbol grounding problem: a machine learning approach and an a priori knowledge based approach.","PeriodicalId":432729,"journal":{"name":"Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05)","volume":"72 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131334513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We approach the problem of large-scale satellite image browsing from a content-based retrieval and semantic categorization perspective. A two-stage method for query based automatic retrieval of satellite image patches is proposed. The semantic category of query patches are determined and patches from that category are ranked based on an image similarity measure. Semantic categorization is done by a learning approach involving the two-dimensional multi-resolution hidden Markov model (2-D MHMM). Patches that do not belong to any trained category are handled using a support vector machine (SVM) based classifier. Experiments yield promising results in modeling semantic categories within satellite images using 2-D MHMM, producing accurate and convenient browsing. We also show that prior semantic categorization improves retrieval performance.
{"title":"Large-scale Satellite Image Browsing using Automatic Semantic Categorization","authors":"A. Parulekar, R. Datta, Jia Li, J.Z. Wang","doi":"10.1109/ICCV.2005.257","DOIUrl":"https://doi.org/10.1109/ICCV.2005.257","url":null,"abstract":"We approach the problem of large-scale satellite image browsing from a content-based retrieval and semantic categorization perspective. A two-stage method for query based automatic retrieval of satellite image patches is proposed. The semantic category of query patches are determined and patches from that category are ranked based on an image similarity measure. Semantic categorization is done by a learning approach involving the two-dimensional multi-resolution hidden Markov model (2-D MHMM). Patches that do not belong to any trained category are handled using a support vector machine (SVM) based classifier. Experiments yield promising results in modeling semantic categories within satellite images using 2-D MHMM, producing accurate and convenient browsing. We also show that prior semantic categorization improves retrieval performance.","PeriodicalId":432729,"journal":{"name":"Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126655865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes an approach to incorporate semantic knowledge sources within a discriminative learning framework. We consider a joint scene categorization and region labelling task and assume that some semantic knowledge is available. For example we might know what objects are allowed to appear in a given scene. Our goal is to use this knowledge to minimize the number of fully labelled examples (i.e. data for which each region in the image is labelled) required for learning. For each scene category the probability of a given labelling of image regions is modelled by a Conditional Random Field (CRF). Our model extends the CRF framework by incorporating hidden variables and combining class conditional CRFs into a joint framework for scene categorization and region labelling. We integrate semantic knowledge into the model by constraining the configurations that the latent region label variable can take, i.e. by constraining the possible region labelling for a given scene category. In a series of synthetic experiments, designed to illustrate the feasibility of the approach, adding semantic constraints about object entailment increased the region labelling accuracy given a fixed amount of fully labelled data.
{"title":"Incorporating Semantic Constraints into a Discriminative Categorization and Labelling Model.","authors":"A. Quattoni, M. Collins, Trevor Darrell","doi":"10.1109/ICCV.2005.256","DOIUrl":"https://doi.org/10.1109/ICCV.2005.256","url":null,"abstract":"This paper describes an approach to incorporate semantic knowledge sources within a discriminative learning framework. We consider a joint scene categorization and region labelling task and assume that some semantic knowledge is available. For example we might know what objects are allowed to appear in a given scene. Our goal is to use this knowledge to minimize the number of fully labelled examples (i.e. data for which each region in the image is labelled) required for learning. For each scene category the probability of a given labelling of image regions is modelled by a Conditional Random Field (CRF). Our model extends the CRF framework by incorporating hidden variables and combining class conditional CRFs into a joint framework for scene categorization and region labelling. We integrate semantic knowledge into the model by constraining the configurations that the latent region label variable can take, i.e. by constraining the possible region labelling for a given scene category. In a series of synthetic experiments, designed to illustrate the feasibility of the approach, adding semantic constraints about object entailment increased the region labelling accuracy given a fixed amount of fully labelled data.","PeriodicalId":432729,"journal":{"name":"Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115903479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}