首页 > 最新文献

2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)最新文献

英文 中文
Banknote portrait detection using convolutional neural network 基于卷积神经网络的纸币肖像检测
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986895
Ryutaro Kitagawa, Yoshihiko Mochizuki, S. Iizuka, E. Simo-Serra, Hiroshi Matsuki, N. Natori, H. Ishikawa
Banknotes generally have different designs according to their denominations. Thus, if characteristics of each design can be recognized, they can be used for sorting banknotes according to denominations. Portrait in banknotes is one such characteristic that can be used for classification. A sorting system for banknotes can be designed that recognizes portraits in each banknote and sort it accordingly. In this paper, our aim is to automate the configuration of such a sorting system by automatically detect portraits in sample banknotes, so that it can be quickly deployed in a new target country. We use Convolutional Neural Networks to detect portraits in completely new set of banknotes robust to variation in the ways they are shown, such as the size and the orientation of the face.
根据不同的面额,纸币通常有不同的图案。因此,如果每一种设计的特征都能被识别出来,它们就可以用来根据面值对纸币进行分类。纸币上的肖像就是这样一种特征,可以用来分类。可以设计一种钞票分类系统,识别每张钞票上的肖像并进行相应的分类。在本文中,我们的目标是通过自动检测样本钞票中的肖像来自动配置这种分类系统,以便它可以快速部署在新的目标国家。我们使用卷积神经网络来检测全新纸币上的肖像,这些纸币对其展示方式的变化(如脸部的大小和方向)具有很强的抵抗力。
{"title":"Banknote portrait detection using convolutional neural network","authors":"Ryutaro Kitagawa, Yoshihiko Mochizuki, S. Iizuka, E. Simo-Serra, Hiroshi Matsuki, N. Natori, H. Ishikawa","doi":"10.23919/MVA.2017.7986895","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986895","url":null,"abstract":"Banknotes generally have different designs according to their denominations. Thus, if characteristics of each design can be recognized, they can be used for sorting banknotes according to denominations. Portrait in banknotes is one such characteristic that can be used for classification. A sorting system for banknotes can be designed that recognizes portraits in each banknote and sort it accordingly. In this paper, our aim is to automate the configuration of such a sorting system by automatically detect portraits in sample banknotes, so that it can be quickly deployed in a new target country. We use Convolutional Neural Networks to detect portraits in completely new set of banknotes robust to variation in the ways they are shown, such as the size and the orientation of the face.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126669113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Robust registration of serial cell microscopic images using 3D Hilbert scan search 使用三维希尔伯特扫描搜索的连续细胞显微图像的鲁棒配准
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986917
Yongwen Lai, S. Kamata, Zhizhong Fu
Microscopic images are quite helpful for us to observe the details of cells because of its high resolution. Furthermore it can benefit biologists and doctors to view the cell structure from any aspect by using a serial images to generate 3D cell structure. However each cell slice is placed at the microscopy respectively, which will bring in the arbitrary rotation and translation among the serial slices. What's more, the sectioning process will destroy the cell structure such as tearing or warping. Therefore we must register the serial slices before rendering the volume data in 3D. In this paper we propose a robust registration algorithm based on an improved 3D Hilbert scam search. Besides we put forward a simple but effective method to remove false matching in consecutive images. Finally we correct the local deformation based on optical-flow theory and adopt multi-resolution method. Our algorithm is tested, on a serial microscopy kidney cell images, and the experimental results show how accurate and robust of our method is.
显微图像由于分辨率高,对我们观察细胞的细节很有帮助。此外,利用序列图像生成三维细胞结构,可以使生物学家和医生从任何角度观察细胞结构。然而,每个细胞切片分别放置在显微镜下,这将导致序列切片之间的任意旋转和平移。更重要的是,切片过程会破坏细胞结构,如撕裂或翘曲。因此,在三维体数据绘制之前,必须对序列切片进行配准。本文提出了一种基于改进的三维希尔伯特骗局搜索的鲁棒配准算法。此外,我们还提出了一种简单有效的去除连续图像中虚假匹配的方法。最后基于光流理论对局部变形进行校正,并采用多分辨率方法。在一系列肾脏细胞显微图像上对我们的算法进行了测试,实验结果表明了我们方法的准确性和鲁棒性。
{"title":"Robust registration of serial cell microscopic images using 3D Hilbert scan search","authors":"Yongwen Lai, S. Kamata, Zhizhong Fu","doi":"10.23919/MVA.2017.7986917","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986917","url":null,"abstract":"Microscopic images are quite helpful for us to observe the details of cells because of its high resolution. Furthermore it can benefit biologists and doctors to view the cell structure from any aspect by using a serial images to generate 3D cell structure. However each cell slice is placed at the microscopy respectively, which will bring in the arbitrary rotation and translation among the serial slices. What's more, the sectioning process will destroy the cell structure such as tearing or warping. Therefore we must register the serial slices before rendering the volume data in 3D. In this paper we propose a robust registration algorithm based on an improved 3D Hilbert scam search. Besides we put forward a simple but effective method to remove false matching in consecutive images. Finally we correct the local deformation based on optical-flow theory and adopt multi-resolution method. Our algorithm is tested, on a serial microscopy kidney cell images, and the experimental results show how accurate and robust of our method is.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pedestrian near-miss analysis on vehicle-mounted driving recorders 基于车载行车记录仪的行人未遂分析
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986889
Teppei Suzuki, Y. Aoki, Hirokatsu Kataoka
Recently, a demand for video analysis on vehicle-mounted driving recorders has been increasing in vision-based safety systems, such as for autonomous vehicles. The technology must be positioned one of the most important task, however, the conventional traffic datasets (e.g. KITTI, Caltech Pedestrian) are not included any dangerous scenes (near-miss scenes), even though the objective of a safety system is to avoid danger. In this paper, (i) we create a pedestrian near-miss dataset on vehicle-mounted driving recorders and (ii) propose a method to jointly learns to predict pedestrian detection and its danger level {high, low, no-danger} with convolutional neural networks (CNN) based on the ResNets. According to the result, we demonstrate the effectiveness of our approach that achieved 68% accuracy of joint pedestrian detection and danger label prediction, and 58.6fps processing time on the self-collected pedestrian near-miss dataset.
最近,在自动驾驶汽车等基于视觉的安全系统中,对车载驾驶记录仪的视频分析需求不断增加。该技术必须定位最重要的任务之一,然而,传统的交通数据集(例如KITTI, Caltech Pedestrian)不包括任何危险场景(险些的场景),即使安全系统的目标是避免危险。在本文中,(i)我们在车载驾驶记录仪上创建了行人近险数据集,(ii)提出了一种基于ResNets的卷积神经网络(CNN)联合学习预测行人检测及其危险等级(高、低、无危险)的方法。根据结果,我们证明了该方法的有效性,该方法在行人联合检测和危险标签预测方面达到了68%的准确率,在自收集的行人近险数据集上的处理时间为58.6fps。
{"title":"Pedestrian near-miss analysis on vehicle-mounted driving recorders","authors":"Teppei Suzuki, Y. Aoki, Hirokatsu Kataoka","doi":"10.23919/MVA.2017.7986889","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986889","url":null,"abstract":"Recently, a demand for video analysis on vehicle-mounted driving recorders has been increasing in vision-based safety systems, such as for autonomous vehicles. The technology must be positioned one of the most important task, however, the conventional traffic datasets (e.g. KITTI, Caltech Pedestrian) are not included any dangerous scenes (near-miss scenes), even though the objective of a safety system is to avoid danger. In this paper, (i) we create a pedestrian near-miss dataset on vehicle-mounted driving recorders and (ii) propose a method to jointly learns to predict pedestrian detection and its danger level {high, low, no-danger} with convolutional neural networks (CNN) based on the ResNets. According to the result, we demonstrate the effectiveness of our approach that achieved 68% accuracy of joint pedestrian detection and danger label prediction, and 58.6fps processing time on the self-collected pedestrian near-miss dataset.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133942421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Ball-like observation model and multi-peak distribution estimation based particle filter for 3D Ping-pong ball tracking 基于类球观测模型和多峰分布估计的三维乒乓球跟踪粒子滤波
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986883
Ziwei Deng, Xina Cheng, T. Ikenaga
3D ball tracking is of great significance to ping-pong game analysis, which can be utilized to applications such as TV content and tactic analysis. To achieve a high success rate in ping-pong ball tracking, the main problems are the lack of unique features and the complexity of background, which make it difficult to distinguish the ball from similar noises. This paper proposes a ball-like observation model and a multi-peak distribution estimation to improve accuracy. For the balllike observation model, we utilize gradient feature from the edge of upper semicircle to construct a histogram, besides, ball-size likelihood is proposed to deal with the situation when noises are different in size with the ball. The multi-peak distribution estimation aims at obtaining a precise ball position in case the partidles' weight distribution has multiple peaks. Experiments are based on ping-pong videos recorded in an official match from 4 perspectives, which in total have 122 hit cases with 2 pairs of players. The tracking success rate finally reaches 99.33%.
三维球跟踪对乒乓球比赛分析具有重要意义,可以应用于电视内容、战术分析等方面。为了实现高成功率的乒乓球跟踪,主要存在的问题是缺乏独特的特征和背景的复杂性,使得很难将乒乓球从类似的噪声中区分出来。为了提高精度,本文提出了球状观测模型和多峰分布估计。对于类球观测模型,我们利用上半圆边缘的梯度特征来构建直方图,并提出了球大小似然来处理噪声与球大小不同的情况。多峰分布估计的目的是在粒子的重量分布有多个峰的情况下获得精确的球位置。实验以一场正式比赛的乒乓球录像为基础,从4个角度拍摄,两对选手共122次击球。最终跟踪成功率达到99.33%。
{"title":"Ball-like observation model and multi-peak distribution estimation based particle filter for 3D Ping-pong ball tracking","authors":"Ziwei Deng, Xina Cheng, T. Ikenaga","doi":"10.23919/MVA.2017.7986883","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986883","url":null,"abstract":"3D ball tracking is of great significance to ping-pong game analysis, which can be utilized to applications such as TV content and tactic analysis. To achieve a high success rate in ping-pong ball tracking, the main problems are the lack of unique features and the complexity of background, which make it difficult to distinguish the ball from similar noises. This paper proposes a ball-like observation model and a multi-peak distribution estimation to improve accuracy. For the balllike observation model, we utilize gradient feature from the edge of upper semicircle to construct a histogram, besides, ball-size likelihood is proposed to deal with the situation when noises are different in size with the ball. The multi-peak distribution estimation aims at obtaining a precise ball position in case the partidles' weight distribution has multiple peaks. Experiments are based on ping-pong videos recorded in an official match from 4 perspectives, which in total have 122 hit cases with 2 pairs of players. The tracking success rate finally reaches 99.33%.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127521618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Event based surveillance video synopsis using trajectory kinematics descriptors 利用轨迹运动学描述符的基于事件的监控视频摘要
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986848
Wei-Cheng Wang, P. Chung, Chun-Rong Huang, Wei-Yun Huang
Video synopsis has been shown its promising performance in visual surveillance, but the rearranged foreground objects may disorderly occlude to each other which makes end users hard to identify the targets. In this paper, a novel event based video synopsis method is proposed by using the clustering results of trajectories of foreground objects. To represent the kinematic events of each trajectory, trajectory kinematics descriptors are applied. Then, affinity propagation is used to cluster trajectories with similar kinematic events. Finally, each kinematic event group is used to generate an event based synopsis video. As shown in the experiments, the generated event based synopsis videos can effectively and efficiently reduce the lengths of the surveillance videos and are much clear for browsing compared to the states-of-the-art video synopsis methods.
视频摘要在视觉监控中已显示出其良好的应用前景,但重新排列后的前景对象可能会相互无序遮挡,给最终用户识别目标带来困难。本文利用前景目标轨迹聚类结果,提出了一种基于事件的视频摘要方法。为了表示每个轨迹的运动学事件,应用了轨迹运动学描述符。然后,使用亲和传播对具有相似运动事件的轨迹进行聚类。最后,利用每个运动事件组生成一个基于事件的摘要视频。实验表明,与现有的视频摘要方法相比,所生成的基于事件的视频摘要可以有效地缩短监控视频的长度,并且更易于浏览。
{"title":"Event based surveillance video synopsis using trajectory kinematics descriptors","authors":"Wei-Cheng Wang, P. Chung, Chun-Rong Huang, Wei-Yun Huang","doi":"10.23919/MVA.2017.7986848","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986848","url":null,"abstract":"Video synopsis has been shown its promising performance in visual surveillance, but the rearranged foreground objects may disorderly occlude to each other which makes end users hard to identify the targets. In this paper, a novel event based video synopsis method is proposed by using the clustering results of trajectories of foreground objects. To represent the kinematic events of each trajectory, trajectory kinematics descriptors are applied. Then, affinity propagation is used to cluster trajectories with similar kinematic events. Finally, each kinematic event group is used to generate an event based synopsis video. As shown in the experiments, the generated event based synopsis videos can effectively and efficiently reduce the lengths of the surveillance videos and are much clear for browsing compared to the states-of-the-art video synopsis methods.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125150193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Mixture particle filter with block jump biomechanics constraint for volleyball players lower body parts tracking 基于挡跳生物力学约束的混合粒子滤波在排球运动员下体跟踪中的应用
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986861
Fanglu Xie, Xina Cheng, T. Ikenaga
Volleyball player body parts tracking is very important for block or jump height calculation which can be applied to TV contents and tactical analysis. This paper proposes a mixture particle filter with block jump biomechanics constraint based on 3D articulated human model. Using mixture particle filters tracking different body parts can effectively reduce the freedom degree of the human model and make each particle filter track the specific target more accurately. Block jump biomechanics constraint executes adaptive prediction model and likelihood model which can make the particle filter specific for block tracking. The experiments are based on videos of the Final Game of 2014 Japan Inter High School Games of Men's Volleyball in Tokyo. The tracking success rate reached 93.9% for left foot and 93.8% for right foot.
排球运动员身体部位的跟踪是计算挡跳高度的重要手段,可以应用于电视节目内容和战术分析。基于三维关节人体模型,提出了一种具有块跳生物力学约束的混合粒子滤波器。采用混合粒子滤波器对人体不同部位进行跟踪,可以有效降低人体模型的自由度,使每个粒子滤波器对特定目标的跟踪更加准确。块跳生物力学约束采用自适应预测模型和似然模型,使粒子滤波具有块跟踪的特异性。实验基于2014年东京日本高中男子排球比赛决赛的视频。左脚和右脚的跟踪成功率分别为93.9%和93.8%。
{"title":"Mixture particle filter with block jump biomechanics constraint for volleyball players lower body parts tracking","authors":"Fanglu Xie, Xina Cheng, T. Ikenaga","doi":"10.23919/MVA.2017.7986861","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986861","url":null,"abstract":"Volleyball player body parts tracking is very important for block or jump height calculation which can be applied to TV contents and tactical analysis. This paper proposes a mixture particle filter with block jump biomechanics constraint based on 3D articulated human model. Using mixture particle filters tracking different body parts can effectively reduce the freedom degree of the human model and make each particle filter track the specific target more accurately. Block jump biomechanics constraint executes adaptive prediction model and likelihood model which can make the particle filter specific for block tracking. The experiments are based on videos of the Final Game of 2014 Japan Inter High School Games of Men's Volleyball in Tokyo. The tracking success rate reached 93.9% for left foot and 93.8% for right foot.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"75 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA implementation of high frame rate and ultra-low delay vision system with local and global parallel based matching FPGA实现基于局部和全局并行匹配的高帧率和超低延迟视觉系统
Pub Date : 2017-07-19 DOI: 10.23919/MVA.2017.7986857
Tingting Hu, T. Ikenaga
High frame rate and ultra-low delay image processing system plays an increasingly important role in human-machine interactive applications which call for a better experience. Current works based on vision chip target on video with simple patterns or simple shapes in order to get a higher speed, while a more complicated system is required for real-life applications. This paper proposes a BRIEF based matching system with high frame rate and ultra-low delay for specific object tracking, implemented on FPGA board. Local parallel and global pipeline based matching and 4-1-4 thread transformation are proposed for the implementation of this system. Local parallel and global pipeline based matching is proposed for high-speed matching. And 4-1-4 thread transformation is proposed to reduce the enormous resource cost caused by highly paralled and pipelined structure. In a broader framework, the proposed image processing system is made parallelized and pipelined for a high throughput which can meet the high frame rate and ultra-low delay system's demand. Evaluation results show that the proposed image processing core can work at 1306fps and 0.808ms delay with the resolution of 640×480. System using the image processing core and a camera with 784fps frame rate and 640×480 resolution is designed.
高帧率和超低延迟图像处理系统在人机交互应用中发挥着越来越重要的作用,需要更好的体验。目前基于视觉芯片的工作是瞄准具有简单图案或简单形状的视频,以获得更高的速度,而现实应用需要更复杂的系统。本文提出了一种基于BRIEF的高帧率、超低延迟的特定目标跟踪匹配系统,并在FPGA板上实现。提出了基于局部并行和全局流水线匹配以及4-1-4线程转换的方法来实现该系统。提出了基于局部并行和全局流水线的高速匹配方法。提出了4-1-4线程转换,以减少高度并行和流水线结构带来的巨大资源成本。在更广泛的框架下,所提出的图像处理系统采用并行化和流水线化的方式,具有高吞吐量,可以满足高帧率和超低延迟系统的需求。评估结果表明,所提出的图像处理核心工作速度为1306fps,延迟0.808ms,分辨率为640×480。系统采用图像处理核心和帧率为784fps、分辨率为640×480的摄像机进行设计。
{"title":"FPGA implementation of high frame rate and ultra-low delay vision system with local and global parallel based matching","authors":"Tingting Hu, T. Ikenaga","doi":"10.23919/MVA.2017.7986857","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986857","url":null,"abstract":"High frame rate and ultra-low delay image processing system plays an increasingly important role in human-machine interactive applications which call for a better experience. Current works based on vision chip target on video with simple patterns or simple shapes in order to get a higher speed, while a more complicated system is required for real-life applications. This paper proposes a BRIEF based matching system with high frame rate and ultra-low delay for specific object tracking, implemented on FPGA board. Local parallel and global pipeline based matching and 4-1-4 thread transformation are proposed for the implementation of this system. Local parallel and global pipeline based matching is proposed for high-speed matching. And 4-1-4 thread transformation is proposed to reduce the enormous resource cost caused by highly paralled and pipelined structure. In a broader framework, the proposed image processing system is made parallelized and pipelined for a high throughput which can meet the high frame rate and ultra-low delay system's demand. Evaluation results show that the proposed image processing core can work at 1306fps and 0.808ms delay with the resolution of 640×480. System using the image processing core and a camera with 784fps frame rate and 640×480 resolution is designed.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131799079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Automatic extraction and recognition of shoe logos with a wide variety of appearance 自动提取和识别具有多种外观的鞋标
Pub Date : 2017-05-08 DOI: 10.23919/MVA.2017.7986838
Kazunori Aoki, W. Ohyama, T. Wakabayashi
A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products, that is, there is much variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearanee using a limited number training samples. The proposed method employs maximally stable extremal regions (MSERs) for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearance show that the proposed method achieves promising performance for both logo extraction and recognition.
标志是一种象征性的表现形式,它不仅是为了识别产品制造商,而且也是为了吸引购物者的注意。利用图像分析技术对鞋标进行自动提取和识别是一个具有挑战性的课题,因为鞋标具有区别于其他产品的特征,即鞋标的外观变化很大。在本文中,我们提出了一种使用有限数量的训练样本对具有多种外观的鞋标进行自动提取和识别的方法。该方法采用最大稳定极值区域(mser)进行初始区域提取,迭代算法进行区域分组,梯度特征和支持向量机进行标识识别。使用由多种外观组成的徽标数据集进行性能评估实验的结果表明,所提出的方法在徽标提取和识别方面都取得了令人满意的性能。
{"title":"Automatic extraction and recognition of shoe logos with a wide variety of appearance","authors":"Kazunori Aoki, W. Ohyama, T. Wakabayashi","doi":"10.23919/MVA.2017.7986838","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986838","url":null,"abstract":"A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products, that is, there is much variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearanee using a limited number training samples. The proposed method employs maximally stable extremal regions (MSERs) for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearance show that the proposed method achieves promising performance for both logo extraction and recognition.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115361272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Real-time recognition of sign language gestures and air-writing using leap motion 实时识别手语手势和空中书写使用跳跃运动
Pub Date : 2017-05-08 DOI: 10.23919/MVA.2017.7986825
Pradeep Kumar, Rajkumar Saini, S. Behera, D. P. Dogra, P. Roy
A sign language is generally composed of three main parts, namely manual signas that are gestures made by hand or fingers movements, non-manual signs such as facial expressions or body postures, and finger-spelling where words are spelt out using gestures by the signers to convey the meaning. In literature, researchers have proposed various Sign Language Recognition (SLR) systems by focusing only one part of the sign language. However, combination of different parts has not been explored much. In this paper, we present a framework to recognize manual signs and finger spellings using Leap motion sensor. In the first phase, Support Vector Machine (SVM) classifier has been used to differentiate between manual and finger spelling gestures. Next, two BLSTM-NN classifiers are used for the recognition of manual signs and finger-spelling gestures using sequence-classification and sequence-transcription based approaches, respectively. A dataset of 2240 sign gestures consisting of 28 isolated manual signs and 28 finger-spelling words, has been recorded involving 10 users. We have obtained an overall accuracy of 63.57% in real-time recognition of sign gestures.
手语通常由三个主要部分组成,即用手或手指做出的手势,非手势,如面部表情或身体姿势,以及手指拼写,即用手势拼写单词以传达意思。在文献中,研究人员通过只关注手语的一部分提出了各种各样的手语识别系统。然而,不同部分的组合却没有太多的探索。在本文中,我们提出了一个使用Leap运动传感器识别手势和手指拼写的框架。在第一阶段,使用支持向量机(SVM)分类器来区分手动和手指拼写手势。接下来,两个BLSTM-NN分类器分别使用基于序列分类和基于序列转录的方法用于识别手动手势和手指拼写手势。一个包含2240个手势的数据集,包括28个孤立的手势和28个手指拼写单词,涉及10个用户。我们在手势的实时识别中获得了63.57%的总体准确率。
{"title":"Real-time recognition of sign language gestures and air-writing using leap motion","authors":"Pradeep Kumar, Rajkumar Saini, S. Behera, D. P. Dogra, P. Roy","doi":"10.23919/MVA.2017.7986825","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986825","url":null,"abstract":"A sign language is generally composed of three main parts, namely manual signas that are gestures made by hand or fingers movements, non-manual signs such as facial expressions or body postures, and finger-spelling where words are spelt out using gestures by the signers to convey the meaning. In literature, researchers have proposed various Sign Language Recognition (SLR) systems by focusing only one part of the sign language. However, combination of different parts has not been explored much. In this paper, we present a framework to recognize manual signs and finger spellings using Leap motion sensor. In the first phase, Support Vector Machine (SVM) classifier has been used to differentiate between manual and finger spelling gestures. Next, two BLSTM-NN classifiers are used for the recognition of manual signs and finger-spelling gestures using sequence-classification and sequence-transcription based approaches, respectively. A dataset of 2240 sign gestures consisting of 28 isolated manual signs and 28 finger-spelling words, has been recorded involving 10 users. We have obtained an overall accuracy of 63.57% in real-time recognition of sign gestures.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126830850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
A linear method for recovering the depth of Ultra HD cameras using a kinect V2 sensor 使用kinect V2传感器恢复超高清摄像机深度的线性方法
Pub Date : 2017-05-08 DOI: 10.23919/MVA.2017.7986908
Yuan Gao, M. Ziegler, Frederik Zilly, Sandro Esquivel, R. Koch
Depth-Image-Based Rendering (DIBR) is a mature and important method for making free-viewpoint videos. As for the study of the DIBR approach, on the one hand, most of current research focuses on how to use it in systems with low resolution cameras, while a lot of Ultra HD rendering devices have been launched into markets. On the other hand, the quality and accuracy of the depth image directly affects the final rendering result. Therefore, in this paper we try to make some improvements on solving the problem of recovering the depth information for Ultra HD cameras with the help of a Kinect V2 sensor. To this end, a linear least squares method is proposed, which recovers the rigid transformation between a Kinect V2 and an Ultra HD camera, using the depth information from the Kinect V2 sensor. In addition, a non-linear coarse-to-fine method, which is based on Sparse Bundle Adjustment (SBA), is compared with this linear method. Experiments show that our proposed method performs better than the non-linear method for the Ultra HD depth image recovery both in computing time and precision.
基于深度图像的渲染(deep - image - based Rendering, DIBR)是制作自由视点视频的一种成熟而重要的方法。对于DIBR方法的研究,一方面,目前的研究大多集中在如何在低分辨率摄像机的系统中使用它,而许多超高清渲染设备已经投入市场。另一方面,深度图像的质量和精度直接影响最终的渲染效果。因此,在本文中,我们尝试在解决Kinect V2传感器对超高清相机深度信息恢复的问题上做一些改进。为此,提出了一种线性最小二乘法,利用Kinect V2传感器的深度信息恢复Kinect V2与超高清摄像机之间的刚性变换。此外,还将基于稀疏束调整(SBA)的非线性粗到精方法与线性方法进行了比较。实验表明,该方法在计算时间和精度上都优于非线性方法。
{"title":"A linear method for recovering the depth of Ultra HD cameras using a kinect V2 sensor","authors":"Yuan Gao, M. Ziegler, Frederik Zilly, Sandro Esquivel, R. Koch","doi":"10.23919/MVA.2017.7986908","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986908","url":null,"abstract":"Depth-Image-Based Rendering (DIBR) is a mature and important method for making free-viewpoint videos. As for the study of the DIBR approach, on the one hand, most of current research focuses on how to use it in systems with low resolution cameras, while a lot of Ultra HD rendering devices have been launched into markets. On the other hand, the quality and accuracy of the depth image directly affects the final rendering result. Therefore, in this paper we try to make some improvements on solving the problem of recovering the depth information for Ultra HD cameras with the help of a Kinect V2 sensor. To this end, a linear least squares method is proposed, which recovers the rigid transformation between a Kinect V2 and an Ultra HD camera, using the depth information from the Kinect V2 sensor. In addition, a non-linear coarse-to-fine method, which is based on Sparse Bundle Adjustment (SBA), is compared with this linear method. Experiments show that our proposed method performs better than the non-linear method for the Ultra HD depth image recovery both in computing time and precision.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"934 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116429471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1