Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7026188
Jose Rivera-Rubio, Saad Idrees, I. Alexiou, Lucas Hadjilucas, A. Bharath
Visual object recognition is just one of the many applications of camera-equipped smartphones. The ability to recognise objects through photos taken with wearable and handheld cameras is already possible through some of the larger internet search providers; yet, there is little rigorous analysis of the quality of search results, particularly where there is great disparity in image quality. This has motivated us to develop the Small Hand-held Object Recognition Test (SHORT). This includes a dataset that is suitable for recognising hand-held objects from either snapshots or videos acquired using hand-held or wearable cameras. SHORT provides a collection of images and ground truth that help evaluate the different factors that affect recognition performance. At its present state, the dataset is comprised of a set of high quality training images and a large set of nearly 135,000 smartphone-captured test images of 30 grocery products. In this paper, we will discuss some open challenges in the visual object recognition of objects that are being held by users. We evaluate the performance of a number of popular object recognition algorithms, with differing levels of complexity, when tested against SHORT.
{"title":"A dataset for Hand-Held Object Recognition","authors":"Jose Rivera-Rubio, Saad Idrees, I. Alexiou, Lucas Hadjilucas, A. Bharath","doi":"10.1109/ICIP.2014.7026188","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7026188","url":null,"abstract":"Visual object recognition is just one of the many applications of camera-equipped smartphones. The ability to recognise objects through photos taken with wearable and handheld cameras is already possible through some of the larger internet search providers; yet, there is little rigorous analysis of the quality of search results, particularly where there is great disparity in image quality. This has motivated us to develop the Small Hand-held Object Recognition Test (SHORT). This includes a dataset that is suitable for recognising hand-held objects from either snapshots or videos acquired using hand-held or wearable cameras. SHORT provides a collection of images and ground truth that help evaluate the different factors that affect recognition performance. At its present state, the dataset is comprised of a set of high quality training images and a large set of nearly 135,000 smartphone-captured test images of 30 grocery products. In this paper, we will discuss some open challenges in the visual object recognition of objects that are being held by users. We evaluate the performance of a number of popular object recognition algorithms, with differing levels of complexity, when tested against SHORT.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"65 1","pages":"5881-5885"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76468036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025245
S. Momcilovic, A. Ilic, N. Roma, L. Sousa
In this paper we propose an efficient method for collaborative H.264/AVC inter-prediction in heterogeneous CPU+GPU systems. In order to minimize the overall encoding time, the proposed method provides stable and balanced load distribution of the most computationally demanding video encoding modules, by relying on accurate and dynamically built functional performance models. In an extensive RD analysis, an efficient temporary dependent prediction of the search area center is proposed, which allows dependency-aware workload partitioning and efficient GPU parallelization, while preserving high compression efficiency. The proposed method also introduces efficient communication-aware techniques, which maximize data reusing, and decrease the overhead of expensive data transfers in collaborative video encoding. The experimental results show that the proposed method is able of achieving real-time video encoding for very demanding video coding parameters, i.e. full HD video format, 64×64 pixels search area and the exhaustive motion estimation.
{"title":"Collaborative inter-prediction on CPU+GPU systems","authors":"S. Momcilovic, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/ICIP.2014.7025245","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025245","url":null,"abstract":"In this paper we propose an efficient method for collaborative H.264/AVC inter-prediction in heterogeneous CPU+GPU systems. In order to minimize the overall encoding time, the proposed method provides stable and balanced load distribution of the most computationally demanding video encoding modules, by relying on accurate and dynamically built functional performance models. In an extensive RD analysis, an efficient temporary dependent prediction of the search area center is proposed, which allows dependency-aware workload partitioning and efficient GPU parallelization, while preserving high compression efficiency. The proposed method also introduces efficient communication-aware techniques, which maximize data reusing, and decrease the overhead of expensive data transfers in collaborative video encoding. The experimental results show that the proposed method is able of achieving real-time video encoding for very demanding video coding parameters, i.e. full HD video format, 64×64 pixels search area and the exhaustive motion estimation.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"230 1","pages":"1228-1232"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76508724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025200
Qingyun She, Zongqing Lu, Q. Liao
In this paper, we present an efficient vanishing point detection method for challenging road images. This detection process is based on the geometrical features of the roads. The slope distribution of the line segments is analyzed to reduce the spurious lines. A distance-based weighting scheme is also utilized to eliminate the voting noise in the voting stage. The proposed algorithm has been tested on a natural data set from Defense Advanced Research Projects Agency (DARPA). Experimental results with both quantitative and qualitative analyses are provided, which demonstrate the superiority of the proposed method over some state-of-the-art methods.
{"title":"Vanishing point estimation for challenging road images","authors":"Qingyun She, Zongqing Lu, Q. Liao","doi":"10.1109/ICIP.2014.7025200","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025200","url":null,"abstract":"In this paper, we present an efficient vanishing point detection method for challenging road images. This detection process is based on the geometrical features of the roads. The slope distribution of the line segments is analyzed to reduce the spurious lines. A distance-based weighting scheme is also utilized to eliminate the voting noise in the voting stage. The proposed algorithm has been tested on a natural data set from Defense Advanced Research Projects Agency (DARPA). Experimental results with both quantitative and qualitative analyses are provided, which demonstrate the superiority of the proposed method over some state-of-the-art methods.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"23 1","pages":"996-1000"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77390017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025651
Hongbin Liu, Ying Chen
In 3D extension of HEVC (High Efficiency Video Coding), namely, 3D-HEVC, segment-wise DC coding (SDC) was adopted to more efficiently represent the depth residual for Intra coded depth blocks. Instead of coding pixel-wise residual as in HEVC, SDC codes one DC residual value for each segment of a Prediction Unit (PU) and skips transform and quantization. SDC was originally proposed for only a couple of modes, including the DC mode, Planar mode and depth modeling mode (DMM), which has an arbitrary straight line separation of a PU. This paper proposes a generic SDC method that applies to the conventional angular Intra modes. For each depth prediction unit coded with Intra prediction mode, encoder can adaptively choose to code pixel-wise residual or segment-wise residual to achieve better compression efficiency. Experimental results show that proposed method can reduce the total bit rate by about 1% even though the depth views altogether consumes relatively low percentage of the total bit rate.
在HEVC (High Efficiency Video Coding)的3D扩展中,即3D-HEVC,采用分段DC编码(segwise DC Coding, SDC)更有效地表示Intra编码深度块的深度残差。与HEVC中逐像素的残差编码不同,SDC为预测单元(PU)的每个片段编码一个DC残差值,并跳过变换和量化。SDC最初只提出了几种模式,包括DC模式、Planar模式和深度建模模式(DMM),其中PU具有任意直线分离。本文提出了一种适用于常规角度内模态的通用SDC方法。对于使用Intra预测模式编码的每个深度预测单元,编码器可以自适应选择编码逐像素残差或逐段残差,以获得更好的压缩效率。实验结果表明,尽管深度视图占用总比特率的百分比相对较低,但该方法仍能将总比特率降低约1%。
{"title":"Generic segment-wise DC for 3D-HEVC depth intra coding","authors":"Hongbin Liu, Ying Chen","doi":"10.1109/ICIP.2014.7025651","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025651","url":null,"abstract":"In 3D extension of HEVC (High Efficiency Video Coding), namely, 3D-HEVC, segment-wise DC coding (SDC) was adopted to more efficiently represent the depth residual for Intra coded depth blocks. Instead of coding pixel-wise residual as in HEVC, SDC codes one DC residual value for each segment of a Prediction Unit (PU) and skips transform and quantization. SDC was originally proposed for only a couple of modes, including the DC mode, Planar mode and depth modeling mode (DMM), which has an arbitrary straight line separation of a PU. This paper proposes a generic SDC method that applies to the conventional angular Intra modes. For each depth prediction unit coded with Intra prediction mode, encoder can adaptively choose to code pixel-wise residual or segment-wise residual to achieve better compression efficiency. Experimental results show that proposed method can reduce the total bit rate by about 1% even though the depth views altogether consumes relatively low percentage of the total bit rate.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"44 1","pages":"3219-3222"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77569444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025624
Shao Huang, Weiqiang Wang, Hui Zhang
The need for fast retrieving images has recently increased tremendously in many application areas (biomedicine, military, commerce, education, etc.). In this work, we exploit the saliency detection to select a group of salient regions and utilize an undirected graph to model the dependency among these salient regions, so that the similarity of images can be measured by calculating the similarity of the corresponding graphs. Identification of salient pixels can decrease interferences from irrelevant information, and make the image representation more effective. The introduction of the graph model can better characterize the spatial constraints among salient regions. The comparison experiments are carried out on the three representative datasets publicly available (Holidays, UKB, and Oxford 5k), and the experimental results show that the integration of the proposed method and the SIFT-like local descriptors can better improve the existing state-of-the-art retrieval accuracy.
{"title":"Retrieving images using saliency detection and graph matching","authors":"Shao Huang, Weiqiang Wang, Hui Zhang","doi":"10.1109/ICIP.2014.7025624","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025624","url":null,"abstract":"The need for fast retrieving images has recently increased tremendously in many application areas (biomedicine, military, commerce, education, etc.). In this work, we exploit the saliency detection to select a group of salient regions and utilize an undirected graph to model the dependency among these salient regions, so that the similarity of images can be measured by calculating the similarity of the corresponding graphs. Identification of salient pixels can decrease interferences from irrelevant information, and make the image representation more effective. The introduction of the graph model can better characterize the spatial constraints among salient regions. The comparison experiments are carried out on the three representative datasets publicly available (Holidays, UKB, and Oxford 5k), and the experimental results show that the integration of the proposed method and the SIFT-like local descriptors can better improve the existing state-of-the-art retrieval accuracy.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"116 1","pages":"3087-3091"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77621207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025844
S. Bonettini, A. Benfenati, V. Ruggiero
Image restoration often requires the minimization of a convex, possibly nonsmooth functional, given by the sum of a data fidelity measure plus a regularization term. In order to face the lack of smoothness, alternative formulations of the minimization problem could be exploited via the duality principle. Indeed, the primal-dual and the dual formulation have been well explored in the literature when the data suffer from Gaussian noise and, thus, the data fidelity term is quadratic. Unfortunately, the most part of the approaches proposed for the Gaussian are difficult to apply to general data discrepancy terms, such as the Kullback-Leibler divergence. In this work we propose primal-dual methods which apply to the minimization of the sum of general convex functions and whose iteration is easy to compute, regardless of the form of the objective function, since it essentially consists in a subgradient projection step. We provide the convergence analysis and we suggest some strategies to improve the convergence speed by means of a careful selection of the steplength parameters. A numerical experience on Total Variation based denoising and deblurring problems from Poisson data shows the behavior of the proposed method with respect to other state-of-the-art algorithms.
{"title":"Primal-dual first order methods for total variation image restoration in presence of poisson noise","authors":"S. Bonettini, A. Benfenati, V. Ruggiero","doi":"10.1109/ICIP.2014.7025844","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025844","url":null,"abstract":"Image restoration often requires the minimization of a convex, possibly nonsmooth functional, given by the sum of a data fidelity measure plus a regularization term. In order to face the lack of smoothness, alternative formulations of the minimization problem could be exploited via the duality principle. Indeed, the primal-dual and the dual formulation have been well explored in the literature when the data suffer from Gaussian noise and, thus, the data fidelity term is quadratic. Unfortunately, the most part of the approaches proposed for the Gaussian are difficult to apply to general data discrepancy terms, such as the Kullback-Leibler divergence. In this work we propose primal-dual methods which apply to the minimization of the sum of general convex functions and whose iteration is easy to compute, regardless of the form of the objective function, since it essentially consists in a subgradient projection step. We provide the convergence analysis and we suggest some strategies to improve the convergence speed by means of a careful selection of the steplength parameters. A numerical experience on Total Variation based denoising and deblurring problems from Poisson data shows the behavior of the proposed method with respect to other state-of-the-art algorithms.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"13 1","pages":"4156-4160"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77715713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7026081
S. Lameri, Paolo Bestagini, A. Melloni, S. Milani, A. Rocha, M. Tagliasacchi, S. Tubaro
Nowadays, a significant fraction of the available video content is created by reusing already existing online videos. In these cases, the source video is seldom reused as is. Conversely, it is typically time clipped to extract only a subset of the original frames, and other transformations are commonly applied (e.g., cropping, logo insertion, etc.). In this paper, we analyze a pool of videos related to the same event or topic. We propose a method that aims at automatically reconstructing the content of the original source videos, i.e., the parent sequences, by splicing together sets of near-duplicate shots seemingly extracted from the same parent sequence. The result of the analysis shows how content is reused, thus revealing the intent of content creators, and enables us to reconstruct a parent sequence also when it is no longer available online. In doing so, we make use of a robust-hash algorithm that allows us to detect whether groups of frames are near-duplicates. Based on that, we developed an algorithm to automatically find near-duplicate matchings between multiple parts of multiple sequences. All the near-duplicate parts are finally temporally aligned to reconstruct the parent sequence. The proposed method is validated with both synthetic and real world datasets downloaded from YouTube.
{"title":"Who is my parent? Reconstructing video sequences from partially matching shots","authors":"S. Lameri, Paolo Bestagini, A. Melloni, S. Milani, A. Rocha, M. Tagliasacchi, S. Tubaro","doi":"10.1109/ICIP.2014.7026081","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7026081","url":null,"abstract":"Nowadays, a significant fraction of the available video content is created by reusing already existing online videos. In these cases, the source video is seldom reused as is. Conversely, it is typically time clipped to extract only a subset of the original frames, and other transformations are commonly applied (e.g., cropping, logo insertion, etc.). In this paper, we analyze a pool of videos related to the same event or topic. We propose a method that aims at automatically reconstructing the content of the original source videos, i.e., the parent sequences, by splicing together sets of near-duplicate shots seemingly extracted from the same parent sequence. The result of the analysis shows how content is reused, thus revealing the intent of content creators, and enables us to reconstruct a parent sequence also when it is no longer available online. In doing so, we make use of a robust-hash algorithm that allows us to detect whether groups of frames are near-duplicates. Based on that, we developed an algorithm to automatically find near-duplicate matchings between multiple parts of multiple sequences. All the near-duplicate parts are finally temporally aligned to reconstruct the parent sequence. The proposed method is validated with both synthetic and real world datasets downloaded from YouTube.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"5342-5346"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77815042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025522
P. Wang, V. Eglin, C. Largeron, Christophe Garcia
In this paper, we develop a comprehensive representation model for handwriting, which contains both morphological and topological information. An adapted Shape Context descriptor built on structural points is employed to describe the contour of the text. Graphs are first constructed by using the structural points as nodes and the skeleton of the strokes as edges. Based on graphs, Topological Node Features (TNFs) of n-neighbourhood are extracted. Bag-of-Words representation model based on the TNFs is employed to depict the topological characteristics of word images. Moreover, a novel approach for word spotting application by using the proposed model is presented. The final distance is a weighted mixture of the SC cost, and the TNF distribution comparison. Linear Discriminant Analysis (LDA) is used to learn the optimal weight for each part of the distance with the consideration of writing styles. The evaluation of the proposed approach shows the significance of combining the properties of the handwriting from different aspects.
{"title":"Handwritten word spotting based on a hybrid optimal distance","authors":"P. Wang, V. Eglin, C. Largeron, Christophe Garcia","doi":"10.1109/ICIP.2014.7025522","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025522","url":null,"abstract":"In this paper, we develop a comprehensive representation model for handwriting, which contains both morphological and topological information. An adapted Shape Context descriptor built on structural points is employed to describe the contour of the text. Graphs are first constructed by using the structural points as nodes and the skeleton of the strokes as edges. Based on graphs, Topological Node Features (TNFs) of n-neighbourhood are extracted. Bag-of-Words representation model based on the TNFs is employed to depict the topological characteristics of word images. Moreover, a novel approach for word spotting application by using the proposed model is presented. The final distance is a weighted mixture of the SC cost, and the TNF distribution comparison. Linear Discriminant Analysis (LDA) is used to learn the optimal weight for each part of the distance with the consideration of writing styles. The evaluation of the proposed approach shows the significance of combining the properties of the handwriting from different aspects.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"4 1","pages":"2580-2584"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79951751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7026182
Zhen Wang, Z. Long, G. Al-Regib, Asjad Amin, Mohamed Deriche
The identification of reservoir regions has a close relationship with the detection of faults in seismic volumes. However, only relying on human intervention, most fault detection algorithms are inefficient. In this paper, we present a new technique that automatically tracks faults across a 3D seismic volume. To implement automation, we propose a two-way fault line projection based on estimated tracking vectors. In the tracking process, projected fault lines are integrated into a synthesized line as the tracked fault line, through an optimization process with local geological constraints. The tracking algorithm is evaluated using real-world seismic data sets with promising results. The proposed method provides comparable accuracy to the detection of faults explicitly in every seismic section, and it also reduces computational complexity.
{"title":"Automatic fault tracking across seismic volumes via tracking vectors","authors":"Zhen Wang, Z. Long, G. Al-Regib, Asjad Amin, Mohamed Deriche","doi":"10.1109/ICIP.2014.7026182","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7026182","url":null,"abstract":"The identification of reservoir regions has a close relationship with the detection of faults in seismic volumes. However, only relying on human intervention, most fault detection algorithms are inefficient. In this paper, we present a new technique that automatically tracks faults across a 3D seismic volume. To implement automation, we propose a two-way fault line projection based on estimated tracking vectors. In the tracking process, projected fault lines are integrated into a synthesized line as the tracked fault line, through an optimization process with local geological constraints. The tracking algorithm is evaluated using real-world seismic data sets with promising results. The proposed method provides comparable accuracy to the detection of faults explicitly in every seismic section, and it also reduces computational complexity.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"12 1","pages":"5851-5855"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79071717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/ICIP.2014.7025809
Brian Ravenet, M. Ochs, C. Pelachaud
Virtual worlds are more and more populated with autonomous conversational agents embodying different roles like tutor, guide, or personal assistant. In order to create more engaging and natural interactions, these agents should be endowed with social capabilities such as expressing different social attitudes through their behaviors. In this paper, we present the architecture of a socio-conversational agent composed of communicative components to detect and respond verbally and non-verbally to the user's speech and to convey different social attitudes. This paper presents the main components of this architecture. These descrpitions are illustrated with scenarios of interaction.
{"title":"Architecture of a socio-conversational agent in virtual worlds","authors":"Brian Ravenet, M. Ochs, C. Pelachaud","doi":"10.1109/ICIP.2014.7025809","DOIUrl":"https://doi.org/10.1109/ICIP.2014.7025809","url":null,"abstract":"Virtual worlds are more and more populated with autonomous conversational agents embodying different roles like tutor, guide, or personal assistant. In order to create more engaging and natural interactions, these agents should be endowed with social capabilities such as expressing different social attitudes through their behaviors. In this paper, we present the architecture of a socio-conversational agent composed of communicative components to detect and respond verbally and non-verbally to the user's speech and to convey different social attitudes. This paper presents the main components of this architecture. These descrpitions are illustrated with scenarios of interaction.","PeriodicalId":6856,"journal":{"name":"2014 IEEE International Conference on Image Processing (ICIP)","volume":"76 1","pages":"3983-3987"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79287407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}