首页 > 最新文献

2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)最新文献

英文 中文
Is the transmission of depth data always necessary for 3D video streaming? 3D视频流是否总是需要传输深度数据?
Li Yu, M. Hannuksela, T. Tillo, Chunyu Lin, M. Gabbouj
Depth data is of vital importance in the 3D video streaming, which allows the flexible rendering of views at arbitrary viewpoints. Given the importance of the depth data, the Multi-view Video plus Depth (MVD) format has been conventionally used. In the MVD format, the depth data is transmitted along with the texture data. In this work, we argue that the transmission of the depth data is not necessary in cases, when 1) the bandwidth is limited and 2) viewpoint switching is not frequent. We propose that the depth transmission could be replaced by a receiver- side unit that can estimate the depth from the received multi- view videos. This replacement does not only spare the bandwidth dedicated for the transmission of the depth, but also achieves a competitive rate-distortion performance with the MVD method.
深度数据在3D视频流中是至关重要的,它允许在任意视点灵活地渲染视图。考虑到深度数据的重要性,通常使用多视图视频加深度(MVD)格式。在MVD格式中,深度数据与纹理数据一起传输。在这项工作中,我们认为在1)带宽有限和2)视点切换不频繁的情况下,深度数据的传输是不必要的。我们提出可以用一个接收端单元来代替深度传输,该单元可以从接收到的多视点视频中估计深度。这种替换不仅节省了用于深度传输的专用带宽,而且实现了与MVD方法竞争的率失真性能。
{"title":"Is the transmission of depth data always necessary for 3D video streaming?","authors":"Li Yu, M. Hannuksela, T. Tillo, Chunyu Lin, M. Gabbouj","doi":"10.1109/IPTA.2018.8608123","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608123","url":null,"abstract":"Depth data is of vital importance in the 3D video streaming, which allows the flexible rendering of views at arbitrary viewpoints. Given the importance of the depth data, the Multi-view Video plus Depth (MVD) format has been conventionally used. In the MVD format, the depth data is transmitted along with the texture data. In this work, we argue that the transmission of the depth data is not necessary in cases, when 1) the bandwidth is limited and 2) viewpoint switching is not frequent. We propose that the depth transmission could be replaced by a receiver- side unit that can estimate the depth from the received multi- view videos. This replacement does not only spare the bandwidth dedicated for the transmission of the depth, but also achieves a competitive rate-distortion performance with the MVD method.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130679802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Dilated Convolutional Network for Material Recognition 材料识别的深度扩展卷积网络
Xiaoyue Jiang, Junna Du, B. Sun, Xiaoyi Feng
Material is actually one of the intrinsic features for objects, consequently material recognition plays an important role in image understanding. For the same material, it may have various shapes and appearances, but keeps the same physical characteristic, which brings great challenges for material recognition. Most recent material recognition methods are based on image patches, and cannot give accurate segmentation results for each specific material. In this paper, we propose a deep learning based method to do pixel level material segmentation for whole images directly. In classical convolutional network, the spacial size of features becomes smaller and smaller with the increasing of convolutional layers, which loses the details for pixel-wise segmentation. Therefore we propose to use dilated convolutional layers to keep the details of features. In addition, the dilated convolutional features are combined with traditional convolutional features to remove the artifacts that are brough by dilated convolution. In the experiments, the proposed dilated network showed its effectiveness on the popular MINC dataset and its extended version.
材料实际上是物体的内在特征之一,因此材料识别在图像理解中起着重要的作用。对于同一种材料,它可能具有不同的形状和外观,但保持相同的物理特性,这给材料识别带来了很大的挑战。目前大多数的材料识别方法都是基于图像的小块,无法对每个特定的材料给出准确的分割结果。在本文中,我们提出了一种基于深度学习的方法,直接对整个图像进行像素级的材料分割。在经典的卷积网络中,随着卷积层数的增加,特征的空间大小会越来越小,从而在逐像素分割时失去了细节。因此,我们建议使用扩展卷积层来保留特征的细节。此外,将扩展卷积特征与传统卷积特征相结合,去除扩展卷积带来的伪影。在实验中,所提出的扩展网络在流行的MINC数据集及其扩展版本上显示了其有效性。
{"title":"Deep Dilated Convolutional Network for Material Recognition","authors":"Xiaoyue Jiang, Junna Du, B. Sun, Xiaoyi Feng","doi":"10.1109/IPTA.2018.8608160","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608160","url":null,"abstract":"Material is actually one of the intrinsic features for objects, consequently material recognition plays an important role in image understanding. For the same material, it may have various shapes and appearances, but keeps the same physical characteristic, which brings great challenges for material recognition. Most recent material recognition methods are based on image patches, and cannot give accurate segmentation results for each specific material. In this paper, we propose a deep learning based method to do pixel level material segmentation for whole images directly. In classical convolutional network, the spacial size of features becomes smaller and smaller with the increasing of convolutional layers, which loses the details for pixel-wise segmentation. Therefore we propose to use dilated convolutional layers to keep the details of features. In addition, the dilated convolutional features are combined with traditional convolutional features to remove the artifacts that are brough by dilated convolution. In the experiments, the proposed dilated network showed its effectiveness on the popular MINC dataset and its extended version.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134577342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Look At Non-Cooperative Presentation Attacks in Fingerprint Systems 指纹系统中的非合作表示攻击研究
Emanuela Marasco, S. Cando, Larry L Tang, Luca Ghiani, G. Marcialis
Scientific literature lacks of countermeasures specifically for fingerprint presentation attacks (PAs) realized with non-cooperative methods; even though, in realistic scenarios, it is unlikely that individuals would agree to duplicate their fingerprints. For example, replicas can be created from finger marks left on a surface without the person’s knowledge. Existing anti-spoofing mechanisms are trained to detect presentation attacks realized with cooperation of the user and are assumed to be able to identify non-cooperative spoofs as well. In this regard, latent prints are perceived to be of low quality and less likely to succeed in gaining unauthorized access. Thus, they are expected to be blocked without the need of a particular presentation attack detection system. Currently, the lowest Presentation Attack Detection (PAD) error rates on spoofs from latent prints are achieved using frameworks involving Convolutional Neural Networks (CNNs) trained on cooperative PAs; however, the computational requirement of these networks does not make them easily portable for mobile applications. Therefore, the focus of this paper is to investigate the degree of success of spoofs made from latent fingerprints to improve the understanding of their vitality features. Furthermore, we experimentally show the performance drop of existing liveness detectors when dealing with non-cooperative attacks and analyze the quality estimates pertaining to such spoofs, which are commonly believed to be of lower quality compared to the molds fabricated with user’s consensus.
科学文献缺乏针对非合作方式实现的指纹呈现攻击的对策;尽管在现实情况下,个人不太可能同意复制自己的指纹。例如,可以在人不知情的情况下,通过在物体表面留下的手印创造出复制品。对现有的反欺骗机制进行训练,以检测与用户合作实现的表示攻击,并假设能够识别非合作的欺骗。在这方面,潜在的指纹被认为是低质量的,不太可能成功地获得未经授权的访问。因此,不需要特定的表示攻击检测系统就可以阻止它们。目前,使用基于协作PAs训练的卷积神经网络(cnn)框架来实现来自潜在打印的欺骗的最低呈现攻击检测(PAD)错误率;然而,这些网络的计算需求使它们不容易移植到移动应用程序中。因此,本文的重点是研究利用潜在指纹进行欺骗的成功程度,以提高对其生命力特征的认识。此外,我们通过实验证明了现有活动性检测器在处理非合作攻击时的性能下降,并分析了与此类欺骗有关的质量估计,这些欺骗通常被认为与用户共识制作的模具相比质量较低。
{"title":"A Look At Non-Cooperative Presentation Attacks in Fingerprint Systems","authors":"Emanuela Marasco, S. Cando, Larry L Tang, Luca Ghiani, G. Marcialis","doi":"10.1109/IPTA.2018.8608133","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608133","url":null,"abstract":"Scientific literature lacks of countermeasures specifically for fingerprint presentation attacks (PAs) realized with non-cooperative methods; even though, in realistic scenarios, it is unlikely that individuals would agree to duplicate their fingerprints. For example, replicas can be created from finger marks left on a surface without the person’s knowledge. Existing anti-spoofing mechanisms are trained to detect presentation attacks realized with cooperation of the user and are assumed to be able to identify non-cooperative spoofs as well. In this regard, latent prints are perceived to be of low quality and less likely to succeed in gaining unauthorized access. Thus, they are expected to be blocked without the need of a particular presentation attack detection system. Currently, the lowest Presentation Attack Detection (PAD) error rates on spoofs from latent prints are achieved using frameworks involving Convolutional Neural Networks (CNNs) trained on cooperative PAs; however, the computational requirement of these networks does not make them easily portable for mobile applications. Therefore, the focus of this paper is to investigate the degree of success of spoofs made from latent fingerprints to improve the understanding of their vitality features. Furthermore, we experimentally show the performance drop of existing liveness detectors when dealing with non-cooperative attacks and analyze the quality estimates pertaining to such spoofs, which are commonly believed to be of lower quality compared to the molds fabricated with user’s consensus.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117109466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Study of Measures for Contour-based Recognition and Localization of Known Objects in Digital Images 数字图像中基于轮廓的已知目标识别与定位方法研究
H. Abdulrahman, Baptiste Magnier
Usually, the most important structures in an image are extracted by an edge detector. Once extracted edges are binarized, they represent the shape boundary information of an object. For the edge-based localization/matching process, the differences between a reference edge map and a candidate image are quantified by computing a performance measure. This study investigates supervised contour measures for determining the degree to which an object shape differs from a desired position. Therefore, several distance measures are evaluated for different shape alterations: translation, rotation and scale change. Experiments on both synthetic and real images exhibit which measures are accurate enough for an object pose or matching estimation, useful for robot task as to refine the object pose.
通常,图像中最重要的结构是由边缘检测器提取的。一旦提取的边缘被二值化,它们就代表了一个物体的形状边界信息。对于基于边缘的定位/匹配过程,参考边缘图和候选图像之间的差异通过计算性能度量来量化。本研究调查了监督轮廓测量,以确定物体形状与期望位置不同的程度。因此,对不同的形状变化评估了几种距离度量:平移、旋转和尺度变化。在合成图像和真实图像上的实验表明,这些测量方法对物体姿态或匹配估计足够准确,对机器人任务如改进物体姿态有用。
{"title":"A Study of Measures for Contour-based Recognition and Localization of Known Objects in Digital Images","authors":"H. Abdulrahman, Baptiste Magnier","doi":"10.1109/IPTA.2018.8608165","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608165","url":null,"abstract":"Usually, the most important structures in an image are extracted by an edge detector. Once extracted edges are binarized, they represent the shape boundary information of an object. For the edge-based localization/matching process, the differences between a reference edge map and a candidate image are quantified by computing a performance measure. This study investigates supervised contour measures for determining the degree to which an object shape differs from a desired position. Therefore, several distance measures are evaluated for different shape alterations: translation, rotation and scale change. Experiments on both synthetic and real images exhibit which measures are accurate enough for an object pose or matching estimation, useful for robot task as to refine the object pose.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129573547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Human-Computer Interaction using Finger Signing Recognition with Hand Palm Centroid PSO Search and Skin-Color Classification and Segmentation 基于手掌质心粒子群搜索和肤色分类分割的手指签名识别人机交互
Z. Hamici
This paper presents a novel image processing technique for recognizing finger signs language alphabet. A human-computer interaction system is built based on the recognition of sign language which constitutes an interface between the computer and hearing-impaired persons, or as an assistive technology in industrial robotics. The sign language recognition is articulated on the extraction of the contours of the sign language alphabets, therefore, converting image recognition into one dimensional signal processing, which improves the recognition efficiency and significantly reduces the processing time. The pre-processing of images is performed by a novel skin-color region segmentation defined inside the standard RGB (sRGB) color space, then a morphological filtering is used for non-skin residuals removal. Afterwards, a circular correlation achieves the identification of the sign language after extracting the sign closed contour vector and performing matching between extracted vector and target alphabets vectors. The closed contour vector is generated around the hand palm centroid with position optimized by a particle swarm optimization algorithm search. Finally, a multi-objective function is used for computing the recognition score. The results presented in this paper for skin color segmentation, centroid search and pattern recognition show high effectiveness of the novel artificial vision engine.
本文提出了一种新的用于识别手势、语言字母的图像处理技术。基于手语识别的人机交互系统构成了计算机与听障人士之间的接口,或作为工业机器人的辅助技术。因此,将图像识别转化为一维信号处理,提高了识别效率,显著缩短了处理时间。首先在标准RGB (sRGB)色彩空间内定义一种新的肤色区域分割方法,对图像进行预处理,然后使用形态学滤波去除非皮肤残差。然后提取符号闭合轮廓向量,并将提取的向量与目标字母向量进行匹配,通过循环相关实现对手语的识别。在手掌质心周围生成闭合轮廓向量,并通过粒子群优化算法进行位置优化。最后,利用多目标函数计算识别分数。本文在肤色分割、质心搜索和模式识别方面的实验结果表明,该人工视觉引擎具有很高的有效性。
{"title":"Human-Computer Interaction using Finger Signing Recognition with Hand Palm Centroid PSO Search and Skin-Color Classification and Segmentation","authors":"Z. Hamici","doi":"10.1109/IPTA.2018.8608145","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608145","url":null,"abstract":"This paper presents a novel image processing technique for recognizing finger signs language alphabet. A human-computer interaction system is built based on the recognition of sign language which constitutes an interface between the computer and hearing-impaired persons, or as an assistive technology in industrial robotics. The sign language recognition is articulated on the extraction of the contours of the sign language alphabets, therefore, converting image recognition into one dimensional signal processing, which improves the recognition efficiency and significantly reduces the processing time. The pre-processing of images is performed by a novel skin-color region segmentation defined inside the standard RGB (sRGB) color space, then a morphological filtering is used for non-skin residuals removal. Afterwards, a circular correlation achieves the identification of the sign language after extracting the sign closed contour vector and performing matching between extracted vector and target alphabets vectors. The closed contour vector is generated around the hand palm centroid with position optimized by a particle swarm optimization algorithm search. Finally, a multi-objective function is used for computing the recognition score. The results presented in this paper for skin color segmentation, centroid search and pattern recognition show high effectiveness of the novel artificial vision engine.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acoustic Based Method for Automatic Segmentation of Images of Objects in Periodic Motion: detection of vocal folds edges case study 基于声学的周期性运动物体图像自动分割方法:声带边缘检测案例研究
Bartosz Kopczynski, P. Strumiłło, Marcin Just, E. Niebudek-Bogusz
We describe a novel image segmentation technique for automated detection of objects being in periodic motion that generates acoustic waves. The method is based on measuring similarity of two independently collected but time synchronized data, i.e. the audio signals and image sequences. Such a technique enables automatic and optimized segmentation procedure of a sequence of images depicting an oscillating object. The proposed segmentation procedure has been validated on the problem of detecting edges of vibrating vocal folds. The similarity measure of the synchronously collected sequence of laryngoscopic images and the voice signal is achieved by applying time-frequency analysis. The developed segmentation technique and motion analysis method can be applied for early detection of oscillation anomalies of the vocal folds which may cause hoarse voice, also known as dysphonia. In particular, the image segmentation result can aid the phoniatrist in the analysis of the vocal folds phonation process and help in early detection of voice anomalies.
我们描述了一种新的图像分割技术,用于自动检测产生声波的周期性运动物体。该方法是基于测量两个独立采集但时间同步的数据,即音频信号和图像序列的相似度。这种技术使描绘振荡对象的图像序列的自动和优化分割过程成为可能。本文提出的分割方法在振动声带边缘检测问题上得到了验证。采用时频分析的方法,对同步采集的喉镜图像序列与语音信号进行相似性度量。所开发的分割技术和运动分析方法可用于早期发现可能导致声音嘶哑的声带振荡异常,也称为发声障碍。特别是,图像分割结果可以帮助语音学家分析声带发声过程,有助于早期发现语音异常。
{"title":"Acoustic Based Method for Automatic Segmentation of Images of Objects in Periodic Motion: detection of vocal folds edges case study","authors":"Bartosz Kopczynski, P. Strumiłło, Marcin Just, E. Niebudek-Bogusz","doi":"10.1109/IPTA.2018.8608152","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608152","url":null,"abstract":"We describe a novel image segmentation technique for automated detection of objects being in periodic motion that generates acoustic waves. The method is based on measuring similarity of two independently collected but time synchronized data, i.e. the audio signals and image sequences. Such a technique enables automatic and optimized segmentation procedure of a sequence of images depicting an oscillating object. The proposed segmentation procedure has been validated on the problem of detecting edges of vibrating vocal folds. The similarity measure of the synchronously collected sequence of laryngoscopic images and the voice signal is achieved by applying time-frequency analysis. The developed segmentation technique and motion analysis method can be applied for early detection of oscillation anomalies of the vocal folds which may cause hoarse voice, also known as dysphonia. In particular, the image segmentation result can aid the phoniatrist in the analysis of the vocal folds phonation process and help in early detection of voice anomalies.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132377393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Pedestrian Detection in Infrared Images Using Fast RCNN 基于快速RCNN的红外图像行人检测
Asad Ullah, Hongmei Xie, M. Farooq, Zhaoyun Sun
Compared to visible spectrum image the infrared image is much clearer in poor lighting conditions. Infrared imaging devices are capable to operate even without the availability of visible light, acquires clear images of objects which are helpful in efficient classification and detection. For image object classification and detection, CNN which belongs to the class of feed-forward ANN, has been successfully used. Fast RCNN combines advantages of modern CNN detectors i.e. RCNN and SPPnet to classify object proposals more efficiently, resulting in better and faster detection. To further improve the detection rate and speed of Fast RCNN, two modifications are proposed in this paper. One for accuracy in which an extra convolutional layer is added to the network and named it as Fast RCNN type 2, the other for speed in which the input channel is reduced from three channel input to one and named as Fast RCNN type 3.Fast RCNN type 1 has better detection rate than RCNN and compare to Fast RCNN, Fast RCNN type 2 has better detection rate while Fast RCNN type 3 is faster.
与可见光谱图像相比,在较差的光照条件下,红外图像清晰得多。红外成像设备能够在没有可见光的情况下工作,获得清晰的物体图像,有助于有效的分类和检测。对于图像对象的分类和检测,已经成功地使用了前馈神经网络中的CNN。Fast RCNN结合了现代CNN检测器(RCNN和SPPnet)的优点,更有效地对目标提案进行分类,从而实现更好更快的检测。为了进一步提高Fast RCNN的检测率和速度,本文提出了两个改进方案。一种是为了提高准确性,在网络中增加一个额外的卷积层,并将其命名为Fast RCNN type 2;另一种是为了提高速度,将输入通道从三个通道减少到一个通道,并将其命名为Fast RCNN type 3。Fast RCNN type 1的检出率优于RCNN, Fast RCNN type 2的检出率优于Fast RCNN type 3的检出率。
{"title":"Pedestrian Detection in Infrared Images Using Fast RCNN","authors":"Asad Ullah, Hongmei Xie, M. Farooq, Zhaoyun Sun","doi":"10.1109/IPTA.2018.8608121","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608121","url":null,"abstract":"Compared to visible spectrum image the infrared image is much clearer in poor lighting conditions. Infrared imaging devices are capable to operate even without the availability of visible light, acquires clear images of objects which are helpful in efficient classification and detection. For image object classification and detection, CNN which belongs to the class of feed-forward ANN, has been successfully used. Fast RCNN combines advantages of modern CNN detectors i.e. RCNN and SPPnet to classify object proposals more efficiently, resulting in better and faster detection. To further improve the detection rate and speed of Fast RCNN, two modifications are proposed in this paper. One for accuracy in which an extra convolutional layer is added to the network and named it as Fast RCNN type 2, the other for speed in which the input channel is reduced from three channel input to one and named as Fast RCNN type 3.Fast RCNN type 1 has better detection rate than RCNN and compare to Fast RCNN, Fast RCNN type 2 has better detection rate while Fast RCNN type 3 is faster.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Research on Low-Resolution Pedestrian Detection Algorithms based on R-CNN with Targeted Pooling and Proposal 基于目标池化的R-CNN低分辨率行人检测算法研究与建议
Peng Shi, Jun Wu, Kai Wang, Yao Zhang, Jiapei Wang, Juneho Yi
We present an effective low-resolution pedestrian detection using targeted pooling and Region Proposal Network (RPN) in the Faster R-CNN. Our method firstly rearranges the anchor from the RPN exploiting an optimal hyper-parameter setting called "Elaborate Setup". Secondly, it refines the granularity in the pooling operation from the ROI pooling layer. The experimental results demonstrate that the proposed RPN together with fine-grained pooling, which we call LRPD-R-CNN is able to achieve high average precision and robust performance on the VOC 2007 dataset. This method has great potential in commercial values and wide application prospect in the field of computer vision, security and intelligent city.
我们在Faster R-CNN中使用目标池和区域建议网络(RPN)提出了一种有效的低分辨率行人检测方法。我们的方法首先利用一种称为“精细设置”的最优超参数设置来重新排列RPN中的锚。其次,从ROI池化层细化池化操作中的粒度。实验结果表明,本文提出的RPN与细粒度池(LRPD-R-CNN)相结合,能够在VOC 2007数据集上获得较高的平均精度和鲁棒性。该方法在计算机视觉、安防、智慧城市等领域具有巨大的潜在商业价值和广阔的应用前景。
{"title":"Research on Low-Resolution Pedestrian Detection Algorithms based on R-CNN with Targeted Pooling and Proposal","authors":"Peng Shi, Jun Wu, Kai Wang, Yao Zhang, Jiapei Wang, Juneho Yi","doi":"10.1109/IPTA.2018.8608142","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608142","url":null,"abstract":"We present an effective low-resolution pedestrian detection using targeted pooling and Region Proposal Network (RPN) in the Faster R-CNN. Our method firstly rearranges the anchor from the RPN exploiting an optimal hyper-parameter setting called \"Elaborate Setup\". Secondly, it refines the granularity in the pooling operation from the ROI pooling layer. The experimental results demonstrate that the proposed RPN together with fine-grained pooling, which we call LRPD-R-CNN is able to achieve high average precision and robust performance on the VOC 2007 dataset. This method has great potential in commercial values and wide application prospect in the field of computer vision, security and intelligent city.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116067858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of LiDAR Point Cloud based on Multiscale Features and PointNet 基于多尺度特征和点网的激光雷达点云分类
Zhao Zhongyang, Cheng Yinglei, Shi Xiaosong, Qin Xianxiang, Sun Li
Aiming at classifying the feature of LiDAR point cloud data in complex scenario, this paper proposed a deep neural network model based on multi-scale features and PointNet. The method improves the local feature of PointNet and realize automatic classification of LiDAR point cloud under the complex scene. Firstly, this paper adds multi-scale network on the basis of PointNet network to extract the local features of points. And then these local features of different scales are composed into a multi-dimensional feature through the fully connected layer, and combined with the global features extracted by PointNet, the scores of each point class are returned to complete the point cloud classification. The deep neural network model proposed in this paper is verified using the Semantic3D dataset and the Vaihingen dataset provided by ISPRS. The experimental results show that the proposed algorithm achieves higher classification accuracy compared with other neural networks used for point cloud classification.
针对复杂场景下LiDAR点云数据的特征分类问题,提出了一种基于多尺度特征和PointNet的深度神经网络模型。该方法改进了PointNet的局部特征,实现了复杂场景下激光雷达点云的自动分类。首先,在PointNet网络的基础上加入多尺度网络,提取点的局部特征;然后通过全连通层将这些不同尺度的局部特征组合成多维特征,并结合PointNet提取的全局特征,返回各点类的分数,完成点云分类。利用ISPRS提供的Semantic3D数据集和Vaihingen数据集对本文提出的深度神经网络模型进行了验证。实验结果表明,与其他用于点云分类的神经网络相比,该算法具有更高的分类精度。
{"title":"Classification of LiDAR Point Cloud based on Multiscale Features and PointNet","authors":"Zhao Zhongyang, Cheng Yinglei, Shi Xiaosong, Qin Xianxiang, Sun Li","doi":"10.1109/IPTA.2018.8608120","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608120","url":null,"abstract":"Aiming at classifying the feature of LiDAR point cloud data in complex scenario, this paper proposed a deep neural network model based on multi-scale features and PointNet. The method improves the local feature of PointNet and realize automatic classification of LiDAR point cloud under the complex scene. Firstly, this paper adds multi-scale network on the basis of PointNet network to extract the local features of points. And then these local features of different scales are composed into a multi-dimensional feature through the fully connected layer, and combined with the global features extracted by PointNet, the scores of each point class are returned to complete the point cloud classification. The deep neural network model proposed in this paper is verified using the Semantic3D dataset and the Vaihingen dataset provided by ISPRS. The experimental results show that the proposed algorithm achieves higher classification accuracy compared with other neural networks used for point cloud classification.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126438592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Driver Drowsiness Detection in Facial Images 人脸图像中的驾驶员睡意检测
F. Dornaika, J. Reta, Ignacio Arganda-Carreras, A. Moujahid
Extracting effective features of fatigue in images and videos is an open problem. This paper introduces a face image descriptor that can be used for discriminating driver fatigue in static frames. In this method, first, each facial image in the sequence is represented by a pyramid whose levels are divided into non-overlapping blocks of the same size, and hybrid image descriptor are employed to extract features in all blocks. Then the obtained descriptor is filtered out using feature selection. Finally, non-linear SVM is applied to predict the drowsiness state of the subject in the image. The proposed method was tested on the public dataset NTH Drowsy Driver Detection (NTHUDDD). This dataset includes a wide range of human subjects of different genders, poses, and illuminations in real-life fatigue conditions. Experimental results show the effectiveness of the proposed method. These results show that the proposed hand-crafted feature compare favorably with several approaches based on the use of deep Convolutional Neural Nets.
从图像和视频中提取有效的疲劳特征是一个有待解决的问题。本文介绍了一种人脸图像描述符,可用于识别静态帧中的驾驶员疲劳状态。该方法首先将序列中的每张人脸图像用一个金字塔表示,金字塔的层次被划分为大小相同的不重叠的块,并使用混合图像描述符提取所有块中的特征。然后使用特征选择对得到的描述符进行过滤。最后,利用非线性支持向量机预测图像中被试的困倦状态。在公共数据集NTH嗜睡驾驶员检测(NTHUDDD)上对该方法进行了测试。该数据集包括在现实疲劳条件下不同性别、姿势和光照的广泛人类受试者。实验结果表明了该方法的有效性。这些结果表明,所提出的手工特征与基于使用深度卷积神经网络的几种方法相比具有优势。
{"title":"Driver Drowsiness Detection in Facial Images","authors":"F. Dornaika, J. Reta, Ignacio Arganda-Carreras, A. Moujahid","doi":"10.1109/IPTA.2018.8608130","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608130","url":null,"abstract":"Extracting effective features of fatigue in images and videos is an open problem. This paper introduces a face image descriptor that can be used for discriminating driver fatigue in static frames. In this method, first, each facial image in the sequence is represented by a pyramid whose levels are divided into non-overlapping blocks of the same size, and hybrid image descriptor are employed to extract features in all blocks. Then the obtained descriptor is filtered out using feature selection. Finally, non-linear SVM is applied to predict the drowsiness state of the subject in the image. The proposed method was tested on the public dataset NTH Drowsy Driver Detection (NTHUDDD). This dataset includes a wide range of human subjects of different genders, poses, and illuminations in real-life fatigue conditions. Experimental results show the effectiveness of the proposed method. These results show that the proposed hand-crafted feature compare favorably with several approaches based on the use of deep Convolutional Neural Nets.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114182293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1