Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945886
Ammad Ul Islam, Muhammad Jaleed Khan, K. Khurshid, F. Shafait
Handwriting is a behavioral characteristic of human beings that is one of the common idiosyncrasies utilized for litigation purposes. Writer identification is commonly used for forensic examination of questioned and specimen documents. Recent advancements in imaging and machine learning technologies have empowered the development of automated, intelligent and robust writer identification methods. Most of the existing methods based on human defined features and color imaging have limited performance in terms of accuracy and robustness. However, rich spectral information content obtained from hyperspectral imaging (HSI) and suitable spatio-spectral features extracted using deep learning can significantly enhance the performance of writer identification in terms of accuracy and robustness. In this paper, we propose a novel writer identification method in which spectral responses of text pixels in a hyperspectral document image are extracted and are fed to a Convolutional Neural Network (CNN) for writer classification. Different CNN architectures, hyperparameters, spatio-spectral formats, train-test ratios and inks are used to evaluate the performance of the proposed system on the UWA Writing Inks Hyperspectral Images (WIHSI) database and to select the most suitable set of parameters for writer identification. The findings of this work have opened a new arena in forensic document analysis for writer identification using HSI and deep learning.
{"title":"Hyperspectral Image Analysis for Writer Identification using Deep Learning","authors":"Ammad Ul Islam, Muhammad Jaleed Khan, K. Khurshid, F. Shafait","doi":"10.1109/DICTA47822.2019.8945886","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945886","url":null,"abstract":"Handwriting is a behavioral characteristic of human beings that is one of the common idiosyncrasies utilized for litigation purposes. Writer identification is commonly used for forensic examination of questioned and specimen documents. Recent advancements in imaging and machine learning technologies have empowered the development of automated, intelligent and robust writer identification methods. Most of the existing methods based on human defined features and color imaging have limited performance in terms of accuracy and robustness. However, rich spectral information content obtained from hyperspectral imaging (HSI) and suitable spatio-spectral features extracted using deep learning can significantly enhance the performance of writer identification in terms of accuracy and robustness. In this paper, we propose a novel writer identification method in which spectral responses of text pixels in a hyperspectral document image are extracted and are fed to a Convolutional Neural Network (CNN) for writer classification. Different CNN architectures, hyperparameters, spatio-spectral formats, train-test ratios and inks are used to evaluate the performance of the proposed system on the UWA Writing Inks Hyperspectral Images (WIHSI) database and to select the most suitable set of parameters for writer identification. The findings of this work have opened a new arena in forensic document analysis for writer identification using HSI and deep learning.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"82 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73211382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945911
Xun Li, Geoff Bull, R. Coe, Sakda Eamkulworapong, J. Scarrow, Michael Salim, M. Schaefer, X. Sirault
With the development of computer vision technologies, using images acquired by aerial platforms to measure large scale agricultural fields has been increasingly studied. In order to provide a more time efficient, light weight and low cost solution, in this paper we present a highly automated processing pipeline that performs plant height estimation based on a dense point cloud generated from aerial RGB images, requiring only a single flight. A previously acquired terrain model is not required as input. The process extracts a segmented plant layer and bare ground layer. Ground height estimation achieves sub 10cm accuracy. High throughput plant height estimation has been performed and results are compared with LiDAR based measurements.
{"title":"High-Throughput Plant Height Estimation from RGB Images Acquired with Aerial Platforms: A 3D Point Cloud Based Approach","authors":"Xun Li, Geoff Bull, R. Coe, Sakda Eamkulworapong, J. Scarrow, Michael Salim, M. Schaefer, X. Sirault","doi":"10.1109/DICTA47822.2019.8945911","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945911","url":null,"abstract":"With the development of computer vision technologies, using images acquired by aerial platforms to measure large scale agricultural fields has been increasingly studied. In order to provide a more time efficient, light weight and low cost solution, in this paper we present a highly automated processing pipeline that performs plant height estimation based on a dense point cloud generated from aerial RGB images, requiring only a single flight. A previously acquired terrain model is not required as input. The process extracts a segmented plant layer and bare ground layer. Ground height estimation achieves sub 10cm accuracy. High throughput plant height estimation has been performed and results are compared with LiDAR based measurements.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"42 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81285092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945806
K. Onoguchi
This paper presents the method to measure the number and speed of passing vehicles from the traffic surveillance camera. In the bird's-eye view image obtained by the inverse perspective mapping, vehicles traveling at constant speed move at constant speed which depends on the height from the road surface. Using this feature, the proposed method detects individual vehicles and calculates the vehicle speed. In the image taken from the back of the vehicle, the vehicle appears from the roof and the rear view of the vehicle appears gradually from the top to the bottom. For this reason, the proposed method detects the position of the horizontal edge segment in the bird's-eye view image and creates the time series image in which it's arranged in order of frame numbers. The trajectory of the position of the horizontal edge segment draws a straight line in the time series image when the vehicle moves at a constant speed in the measurement area. Therefore, the slope of the straight line, that is, the speed of the horizontal edge segment is calculated to separate vehicles. When the trajectory of the horizontal edge segment with higher speed appears, it's determined that a new vehicle has entered the measurement area. At this point, the number of vehicles is incremented and the speed of the vehicle is calculated from the slop of the previous trajectory. The proposed method is robust to overlap between vehicles and the sudden change in brightness. The processing speed is also lower than the video rate.
{"title":"Measurement of Traffic Volume by Time Series Images Created from Horizontal Edge Segments","authors":"K. Onoguchi","doi":"10.1109/DICTA47822.2019.8945806","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945806","url":null,"abstract":"This paper presents the method to measure the number and speed of passing vehicles from the traffic surveillance camera. In the bird's-eye view image obtained by the inverse perspective mapping, vehicles traveling at constant speed move at constant speed which depends on the height from the road surface. Using this feature, the proposed method detects individual vehicles and calculates the vehicle speed. In the image taken from the back of the vehicle, the vehicle appears from the roof and the rear view of the vehicle appears gradually from the top to the bottom. For this reason, the proposed method detects the position of the horizontal edge segment in the bird's-eye view image and creates the time series image in which it's arranged in order of frame numbers. The trajectory of the position of the horizontal edge segment draws a straight line in the time series image when the vehicle moves at a constant speed in the measurement area. Therefore, the slope of the straight line, that is, the speed of the horizontal edge segment is calculated to separate vehicles. When the trajectory of the horizontal edge segment with higher speed appears, it's determined that a new vehicle has entered the measurement area. At this point, the number of vehicles is incremented and the speed of the vehicle is calculated from the slop of the previous trajectory. The proposed method is robust to overlap between vehicles and the sudden change in brightness. The processing speed is also lower than the video rate.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"67 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77232515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946021
Sajib Saha, Y. Kanagasingam
Image registration is an important step in several retinal image analysis tasks. Robust detection, description and accurate matching of landmark points (also called keypoints) between images are crucial for successful registration of image pairs. This paper introduces a novel binary descriptor named Local Haar Patter of Bifurcation point (LHPB), so that retinal keypoints can be described more precisely and matched more accurately. LHPB uses 32 patterns that are reminiscent of Haar basis function and relies on pixel intensity test to form 256 bit binary vector. LHPB descriptors are matched using Hamming distance. Experiments are conducted on publicly available retinal image registration dataset named FIRE. The proposed descriptor has been compared with the state-of-the art Chen et al.'s method and ALOHA descriptor. Experiments show that the proposed LHPB descriptor is about 2% more accurate than ALOHA and 17% more accurate than Chen et al.'s method.
图像配准是许多视网膜图像分析任务中的重要步骤。图像之间的地标点(也称为关键点)的鲁棒检测、描述和准确匹配是成功配准图像对的关键。本文引入了一种新的二元描述子——局部哈尔分岔点模式(Local Haar pattern of Bifurcation point, LHPB),使视网膜关键点能够更精确地描述和匹配。LHPB使用32种模式,使人联想到哈尔基函数,依靠像素强度测试形成256位二进制向量。LHPB描述符使用汉明距离进行匹配。实验在公开的视网膜图像配准数据集FIRE上进行。所提出的描述符已与最先进的Chen等人的方法和ALOHA描述符进行了比较。实验表明,所提出的LHPB描述符比ALOHA准确约2%,比Chen等人的方法准确约17%。
{"title":"Haar Pattern Based Binary Feature Descriptor for Retinal Image Registration","authors":"Sajib Saha, Y. Kanagasingam","doi":"10.1109/DICTA47822.2019.8946021","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946021","url":null,"abstract":"Image registration is an important step in several retinal image analysis tasks. Robust detection, description and accurate matching of landmark points (also called keypoints) between images are crucial for successful registration of image pairs. This paper introduces a novel binary descriptor named Local Haar Patter of Bifurcation point (LHPB), so that retinal keypoints can be described more precisely and matched more accurately. LHPB uses 32 patterns that are reminiscent of Haar basis function and relies on pixel intensity test to form 256 bit binary vector. LHPB descriptors are matched using Hamming distance. Experiments are conducted on publicly available retinal image registration dataset named FIRE. The proposed descriptor has been compared with the state-of-the art Chen et al.'s method and ALOHA descriptor. Experiments show that the proposed LHPB descriptor is about 2% more accurate than ALOHA and 17% more accurate than Chen et al.'s method.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"48 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78987956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946028
Gemma Dianne, A. Wiliem, B. Lovell
There is considerable research effort directed toward ground based cloud detection due to its many applications in Air traffic control, Cloud-track wind data monitoring, and Solar-power forecasting to name a few. There are key challenges that have been identified consistently in the literature being primarily: glare, varied illumination, poorly defined boundaries, and thin wispy clouds. At this time there is one significant research database for use in Cloud Segmentation; the SWIMSEG database [1] which consists of 1013 Images and the corresponding Ground Truths. While investigating the limitations around detecting thin cloud, we found significant ambiguity even within this high quality hand labelled research dataset. This is to be expected, as the task of tracing cloud boundaries is subjective. We propose capitalising on these inconsistencies by utilising robust deep-learning techniques, which have been recently shown to be effective on this data. By implementing a two-stage training strategy, validated on the smaller HYTA dataset, we plan to leverage the mistakes in the first stage of training to refine class features in the second. This approach is based on the assumption that the majority of mistakes made in the first stage will correspond to thin cloud pixels. The results of our experimentation indicate that this assumption is true, with this two-stage process producing quality results, while also proving to be robust when extended to unseen data.
{"title":"Deep-Learning from Mistakes: Automating Cloud Class Refinement for Sky Image Segmentation","authors":"Gemma Dianne, A. Wiliem, B. Lovell","doi":"10.1109/DICTA47822.2019.8946028","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946028","url":null,"abstract":"There is considerable research effort directed toward ground based cloud detection due to its many applications in Air traffic control, Cloud-track wind data monitoring, and Solar-power forecasting to name a few. There are key challenges that have been identified consistently in the literature being primarily: glare, varied illumination, poorly defined boundaries, and thin wispy clouds. At this time there is one significant research database for use in Cloud Segmentation; the SWIMSEG database [1] which consists of 1013 Images and the corresponding Ground Truths. While investigating the limitations around detecting thin cloud, we found significant ambiguity even within this high quality hand labelled research dataset. This is to be expected, as the task of tracing cloud boundaries is subjective. We propose capitalising on these inconsistencies by utilising robust deep-learning techniques, which have been recently shown to be effective on this data. By implementing a two-stage training strategy, validated on the smaller HYTA dataset, we plan to leverage the mistakes in the first stage of training to refine class features in the second. This approach is based on the assumption that the majority of mistakes made in the first stage will correspond to thin cloud pixels. The results of our experimentation indicate that this assumption is true, with this two-stage process producing quality results, while also proving to be robust when extended to unseen data.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88781305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945944
Y. George, M. Aldeen, R. Garnavi
The presence of nipples in human trunk images is considered a main problem in psoriasis images. Existing segmentation methods fail to differentiate between psoriasis lesions and nipples due to the high degree of visual similarity. In this paper, we present an automated nipple detection method as an important component for severity assessment of psoriasis. First, edges are extracted using Canny edge detector where the smoothing sigma parameter is automatically customized for every image based on psoriasis severity level. Then, circular hough transform (CHT) and local maximum filtering are applied for circle detection. This is followed by a nipple selection method where we use two new nipple similarity measures, namely: hough transform peak intensity value and structure similarity index. Finally, nipple selection refinement is performed by using the location criteria for the selected nipples. The proposed method is evaluated on 72 trunk images with psoriasis lesions. The conducted experiments demonstrate that the proposed method performs very well even in the presence of heavy hair, severe and mild lesions, and various nipple sizes, with an overall nipple detection accuracy of 95.14% across the evaluation set.
{"title":"Automatic Nipple Detection Method for Digital Skin Images with Psoriasis Lesions","authors":"Y. George, M. Aldeen, R. Garnavi","doi":"10.1109/DICTA47822.2019.8945944","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945944","url":null,"abstract":"The presence of nipples in human trunk images is considered a main problem in psoriasis images. Existing segmentation methods fail to differentiate between psoriasis lesions and nipples due to the high degree of visual similarity. In this paper, we present an automated nipple detection method as an important component for severity assessment of psoriasis. First, edges are extracted using Canny edge detector where the smoothing sigma parameter is automatically customized for every image based on psoriasis severity level. Then, circular hough transform (CHT) and local maximum filtering are applied for circle detection. This is followed by a nipple selection method where we use two new nipple similarity measures, namely: hough transform peak intensity value and structure similarity index. Finally, nipple selection refinement is performed by using the location criteria for the selected nipples. The proposed method is evaluated on 72 trunk images with psoriasis lesions. The conducted experiments demonstrate that the proposed method performs very well even in the presence of heavy hair, severe and mild lesions, and various nipple sizes, with an overall nipple detection accuracy of 95.14% across the evaluation set.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"76 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91181031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945839
Takumi Sato, K. Hotta
EncapNet is a kind of Capsule network that significantly improved routing problems that has thought to be the main bottleneck of capsule network. In this paper, we propose EncapNet-3D that has stronger connection between master and aide branch, which original EncapNet has only single co-efficient per capsule. We achieved this by adding 3D convolution and Dropout layers to connection between them. 3D convolution makes connection between capsules stronger. We also propose U-EncapNet, which uses U-net architecture to achieve high accuracy in semantic segmentation task. EncapNet-3D has successfully accomplished to reduce network parameters 321 times smaller compared to U-EncapNet, 52 times smaller than U-net. We show the result on segmentation problem of cell images. U-EncapNet has advanced performance of 1.1% in cell mean IoU in comparison with U-net. EncapNet-3D has achieved 3% increase in comparison with ResNet-6 in cell membrane IoU.
{"title":"EncapNet-3D and U-EncapNet for Cell Segmentation","authors":"Takumi Sato, K. Hotta","doi":"10.1109/DICTA47822.2019.8945839","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945839","url":null,"abstract":"EncapNet is a kind of Capsule network that significantly improved routing problems that has thought to be the main bottleneck of capsule network. In this paper, we propose EncapNet-3D that has stronger connection between master and aide branch, which original EncapNet has only single co-efficient per capsule. We achieved this by adding 3D convolution and Dropout layers to connection between them. 3D convolution makes connection between capsules stronger. We also propose U-EncapNet, which uses U-net architecture to achieve high accuracy in semantic segmentation task. EncapNet-3D has successfully accomplished to reduce network parameters 321 times smaller compared to U-EncapNet, 52 times smaller than U-net. We show the result on segmentation problem of cell images. U-EncapNet has advanced performance of 1.1% in cell mean IoU in comparison with U-net. EncapNet-3D has achieved 3% increase in comparison with ResNet-6 in cell membrane IoU.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"R-24 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84740116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945975
Taha Emara, H. A. E. Munim, Hazem M. Abbas
Semantic image segmentation plays a pivotal role in many vision applications including autonomous driving and medical image analysis. Most of the former approaches move towards enhancing the performance in terms of accuracy with a little awareness of computational efficiency. In this paper, we introduce LiteSeg, a lightweight architecture for semantic image segmentation. In this work, we explore a new deeper version of Atrous Spatial Pyramid Pooling module (ASPP) and apply short and long residual connections, and depthwise separable convolution, resulting in a faster and efficient model. LiteSeg architecture is introduced and tested with multiple backbone networks as Darknet19, MobileNet, and ShuffleNet to provide multiple trade-offs between accuracy and computational cost. The proposed model LiteSeg, with MobileNetV2 as a backbone network, achieves an accuracy of 67.81% mean intersection over union at 161 frames per second with 640 × 360 resolution on the Cityscapes dataset.
{"title":"LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation","authors":"Taha Emara, H. A. E. Munim, Hazem M. Abbas","doi":"10.1109/DICTA47822.2019.8945975","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945975","url":null,"abstract":"Semantic image segmentation plays a pivotal role in many vision applications including autonomous driving and medical image analysis. Most of the former approaches move towards enhancing the performance in terms of accuracy with a little awareness of computational efficiency. In this paper, we introduce LiteSeg, a lightweight architecture for semantic image segmentation. In this work, we explore a new deeper version of Atrous Spatial Pyramid Pooling module (ASPP) and apply short and long residual connections, and depthwise separable convolution, resulting in a faster and efficient model. LiteSeg architecture is introduced and tested with multiple backbone networks as Darknet19, MobileNet, and ShuffleNet to provide multiple trade-offs between accuracy and computational cost. The proposed model LiteSeg, with MobileNetV2 as a backbone network, achieves an accuracy of 67.81% mean intersection over union at 161 frames per second with 640 × 360 resolution on the Cityscapes dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"94 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82083019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945972
Junaid Younas, Syed Tahseen Raza Rizvi, M. I. Malik, F. Shafait, P. Lukowicz, Sheraz Ahmed
In this work, we present a novel and generic approach, Figure and Formula Detector (FFD) to detect the formulas and figures from document images. Our proposed method employs traditional computer vision approaches in addition to deep models. We transform input images by applying connected component analysis (CC), distance transform, and colour transform, which are stacked together to generate an input image for the network. The best results produced by FFD for figure and formula detection are with F1-score of 0.906 and 0.905, respectively. We also propose a new dataset for figures and formulas detection to aid future research in this direction. The obtained results advocate that enhancing the input representation can simplify the subsequent optimization problem resulting in significant gains over their conventional counterparts.
{"title":"FFD: Figure and Formula Detection from Document Images","authors":"Junaid Younas, Syed Tahseen Raza Rizvi, M. I. Malik, F. Shafait, P. Lukowicz, Sheraz Ahmed","doi":"10.1109/DICTA47822.2019.8945972","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945972","url":null,"abstract":"In this work, we present a novel and generic approach, Figure and Formula Detector (FFD) to detect the formulas and figures from document images. Our proposed method employs traditional computer vision approaches in addition to deep models. We transform input images by applying connected component analysis (CC), distance transform, and colour transform, which are stacked together to generate an input image for the network. The best results produced by FFD for figure and formula detection are with F1-score of 0.906 and 0.905, respectively. We also propose a new dataset for figures and formulas detection to aid future research in this direction. The obtained results advocate that enhancing the input representation can simplify the subsequent optimization problem resulting in significant gains over their conventional counterparts.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84729837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945880
T. Wakahara, Yukihiko Yamashita
This paper describes a new area-based image alignment technique, norm conserved GAT (Global Affine Transformation) correlation. The cutting-edge techniques of image alignment are mostly feature-based, such well-known techniques as SIFT, SURF, ASIFT, and ORB. The proposed technique determines affine parameters maximizing ZNCC (zero-means normalized cross-correlation) between warped and reference images. In experiments using artificially warped images subject to rotation, blur, random noise, a few kinds of general affine transformation, and a simple 2D projection transformation, we compare the proposed technique against the feature-based ORB (Oriented FAST and Rotated BRIEF), the competing areabased ECC (Enhanced Correlation Coefficient), the original GAT correlation, and the GPT (Global Projection Transformation) correlation techniques. We show a very promising ability of the proposed norm conserved GAT correlation by discussing the advantages and disadvantages of these techniques with respect to both ability of image alignment and computational complexity.
本文介绍了一种新的基于区域的图像对准技术——范数保守全局仿射变换相关。图像对齐的前沿技术大多是基于特征的,如SIFT、SURF、ASIFT、ORB等。该技术确定了扭曲图像和参考图像之间的仿射参数,使ZNCC(零均值归一化互相关)最大化。在实验中,我们使用人工扭曲的图像进行旋转、模糊、随机噪声、几种一般仿射变换和简单的二维投影变换,将所提出的技术与基于特征的ORB (Oriented FAST and rotational BRIEF)、基于竞争区域的ECC (Enhanced Correlation Coefficient)、原始GAT相关和GPT (Global projection transformation)相关技术进行比较。通过讨论这些技术在图像对齐能力和计算复杂性方面的优缺点,我们展示了所提出的范数保守GAT相关的非常有前途的能力。
{"title":"Image Alignment using Norm Conserved GAT Correlation","authors":"T. Wakahara, Yukihiko Yamashita","doi":"10.1109/DICTA47822.2019.8945880","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945880","url":null,"abstract":"This paper describes a new area-based image alignment technique, norm conserved GAT (Global Affine Transformation) correlation. The cutting-edge techniques of image alignment are mostly feature-based, such well-known techniques as SIFT, SURF, ASIFT, and ORB. The proposed technique determines affine parameters maximizing ZNCC (zero-means normalized cross-correlation) between warped and reference images. In experiments using artificially warped images subject to rotation, blur, random noise, a few kinds of general affine transformation, and a simple 2D projection transformation, we compare the proposed technique against the feature-based ORB (Oriented FAST and Rotated BRIEF), the competing areabased ECC (Enhanced Correlation Coefficient), the original GAT correlation, and the GPT (Global Projection Transformation) correlation techniques. We show a very promising ability of the proposed norm conserved GAT correlation by discussing the advantages and disadvantages of these techniques with respect to both ability of image alignment and computational complexity.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80782755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}