Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946008
F. T. Kurdi, M. Awrangjeb, Alan Wee-Chung Liew
Although much effort has been spent in developing a stable algorithm for 3D building modelling from Lidar data, this topic still attracts a lot of attention in the literature. A key task of this problem is the automatic building roof segmentation. Due to the great diversity of building typology, and the noisiness and heterogeneity of point cloud data, the building roof segmentation result needs to be verified/rectified with some geometric constrains before it is used to generate the 3D building models. Otherwise, the generated building model may suffer from undesirable deformations. This paper suggests the generation of 3D building model from Lidar data in two steps. The first step is the automatic 2D building modelling and the second step is the automatic conversion of a 2D building model into 3D model. This approach allows the 2D building model to be refined before starting the 3D building model generation. Furthermore, this approach allows getting the 2D and 3D building models simultaneously. The first step of the proposed algorithm is the generation of the 2D building model. Then after enhancing and fitting the roof planes, the roof plane boundaries are converted into 3D by analysing the relationships between neighbouring planes. This is followed by the adjustment of the 3D roof vertices. Experiment indicated that the proposed algorithm is accurate and robust in generating 3D building models from Lidar data.
{"title":"Automated Building Footprint and 3D Building Model Generation from Lidar Point Cloud Data","authors":"F. T. Kurdi, M. Awrangjeb, Alan Wee-Chung Liew","doi":"10.1109/DICTA47822.2019.8946008","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946008","url":null,"abstract":"Although much effort has been spent in developing a stable algorithm for 3D building modelling from Lidar data, this topic still attracts a lot of attention in the literature. A key task of this problem is the automatic building roof segmentation. Due to the great diversity of building typology, and the noisiness and heterogeneity of point cloud data, the building roof segmentation result needs to be verified/rectified with some geometric constrains before it is used to generate the 3D building models. Otherwise, the generated building model may suffer from undesirable deformations. This paper suggests the generation of 3D building model from Lidar data in two steps. The first step is the automatic 2D building modelling and the second step is the automatic conversion of a 2D building model into 3D model. This approach allows the 2D building model to be refined before starting the 3D building model generation. Furthermore, this approach allows getting the 2D and 3D building models simultaneously. The first step of the proposed algorithm is the generation of the 2D building model. Then after enhancing and fitting the roof planes, the roof plane boundaries are converted into 3D by analysing the relationships between neighbouring planes. This is followed by the adjustment of the 3D roof vertices. Experiment indicated that the proposed algorithm is accurate and robust in generating 3D building models from Lidar data.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"102 8 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91324146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945865
Mikko Vihlman, Jakke Kulovesi, A. Visala
Log identification is an important task in silviculture and forestry. It involves matching tree logs with each other and telling which of the known individuals a given specimen is. Forest harvesters can image the logs and assess their quality while cutting trees in the forest. Identification allows each log to be traced back to the location it was grown in and efficiently choosing logs of specific quality in the sawmill. In this paper, a deep two-stream convolutional neural network is used to measure the likelihood that a pair of images represents the same part of a log. The similarity between the images is assessed based on the cross-correlation of the convolutional feature maps at one or more levels of the network. The performance of the network is evaluated with two large datasets, containing either spruce or pine logs. The best architecture identifies correctly 99% of the test logs in the spruce dataset and 97% of the test logs in the pine dataset. The results show that the proposed model performs very well in relatively good conditions. The analysis forms a basis for future attempts to utilize deep networks for log identification in challenging real-world forestry applications.
{"title":"Tree Log Identity Matching using Convolutional Correlation Networks","authors":"Mikko Vihlman, Jakke Kulovesi, A. Visala","doi":"10.1109/DICTA47822.2019.8945865","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945865","url":null,"abstract":"Log identification is an important task in silviculture and forestry. It involves matching tree logs with each other and telling which of the known individuals a given specimen is. Forest harvesters can image the logs and assess their quality while cutting trees in the forest. Identification allows each log to be traced back to the location it was grown in and efficiently choosing logs of specific quality in the sawmill. In this paper, a deep two-stream convolutional neural network is used to measure the likelihood that a pair of images represents the same part of a log. The similarity between the images is assessed based on the cross-correlation of the convolutional feature maps at one or more levels of the network. The performance of the network is evaluated with two large datasets, containing either spruce or pine logs. The best architecture identifies correctly 99% of the test logs in the spruce dataset and 97% of the test logs in the pine dataset. The results show that the proposed model performs very well in relatively good conditions. The analysis forms a basis for future attempts to utilize deep networks for log identification in challenging real-world forestry applications.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"25 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77322567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946046
Annus Zulfiqar, A. Ul-Hasan, F. Shafait
Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.
{"title":"Logical Layout Analysis using Deep Learning","authors":"Annus Zulfiqar, A. Ul-Hasan, F. Shafait","doi":"10.1109/DICTA47822.2019.8946046","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946046","url":null,"abstract":"Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75491786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946023
M. Ghaffari, A. Sowmya, R. Oliver, Len Hamey
Reliable brain tumour segmentation methods from brain scans are essential for accurate diagnosis and treatment planning. In this paper, we propose a semantic segmentation method based on convolutional neural networks for brain tumour segmentation using multimodal brain scans. The proposed model is a modified version of the well-known U-net architecture. It gains from DenseNet blocks between the encoder and decoder parts of the U-net to transfer more semantic information from the input to the output. In addition, to speed up the training process, we employed deep supervision by adding segmentation blocks at the end of the decoder layers and summing up their outputs to generate the final output of the network. We trained and evaluated our model using the BraTS 2018 dataset. Comparing the results from the proposed model and a generic U-net, our model achieved higher segmentation accuracy in terms of the Dice score.
{"title":"Multimodal Brain Tumour Segmentation using Densely Connected 3D Convolutional Neural Network","authors":"M. Ghaffari, A. Sowmya, R. Oliver, Len Hamey","doi":"10.1109/DICTA47822.2019.8946023","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946023","url":null,"abstract":"Reliable brain tumour segmentation methods from brain scans are essential for accurate diagnosis and treatment planning. In this paper, we propose a semantic segmentation method based on convolutional neural networks for brain tumour segmentation using multimodal brain scans. The proposed model is a modified version of the well-known U-net architecture. It gains from DenseNet blocks between the encoder and decoder parts of the U-net to transfer more semantic information from the input to the output. In addition, to speed up the training process, we employed deep supervision by adding segmentation blocks at the end of the decoder layers and summing up their outputs to generate the final output of the network. We trained and evaluated our model using the BraTS 2018 dataset. Comparing the results from the proposed model and a generic U-net, our model achieved higher segmentation accuracy in terms of the Dice score.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90159836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945905
Teng Zhang, Liangchen Liu, A. Wiliem, Stephen Connor, Zelkjo Ilich, Eddie Van Der Draai, B. Lovell
Galvanised steel transmission towers in electrical power grids suffer from corrosion to different levels depending on age and environment. To ensure the power grid can operate safely, significant resources are spent to monitor the corrosion level of towers. Photographs from helicopters, drones, and a variety of staffs are often used to capture condition, however, these images still need manual inspection to determine the corrosion level before carrying out maintenance works. In this paper, we describe a framework employing multiple deep neural networks based classifiers and detectors to perform automatic image-based condition monitoring for steel transmission towers. Given a random variety of images of a tower, our proposed framework will first determine the location of the image on the structure via a trained zone classifier. Then, fine-grain corrosion inspection will be performed on both fasteners and structural members, respectively. In addition, an automatic zoomin functionality will be applied to images which have high resolution but are a long distance away. This step will ensure the detection performance on small objects on the tower. Finally, the overall corrosion status report for this tower will be calculated and generated automatically. Additionally, we released a subset of our data to contribute to this novel direction. Experiments show that our framework can assess the tower efficiently.
{"title":"Deep Corrosion Assessment for Electrical Transmission Towers","authors":"Teng Zhang, Liangchen Liu, A. Wiliem, Stephen Connor, Zelkjo Ilich, Eddie Van Der Draai, B. Lovell","doi":"10.1109/DICTA47822.2019.8945905","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945905","url":null,"abstract":"Galvanised steel transmission towers in electrical power grids suffer from corrosion to different levels depending on age and environment. To ensure the power grid can operate safely, significant resources are spent to monitor the corrosion level of towers. Photographs from helicopters, drones, and a variety of staffs are often used to capture condition, however, these images still need manual inspection to determine the corrosion level before carrying out maintenance works. In this paper, we describe a framework employing multiple deep neural networks based classifiers and detectors to perform automatic image-based condition monitoring for steel transmission towers. Given a random variety of images of a tower, our proposed framework will first determine the location of the image on the structure via a trained zone classifier. Then, fine-grain corrosion inspection will be performed on both fasteners and structural members, respectively. In addition, an automatic zoomin functionality will be applied to images which have high resolution but are a long distance away. This step will ensure the detection performance on small objects on the tower. Finally, the overall corrosion status report for this tower will be calculated and generated automatically. Additionally, we released a subset of our data to contribute to this novel direction. Experiments show that our framework can assess the tower efficiently.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"71 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88857474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of computer vision research, the architecture of convolutional neural network becomes more and more complex to reach the state-of-the-art performance. Is the complexity of the model necessarily proportional to its accuracy? To answer this, the compression of the network has attracted much attention in the academy and industry. Existing network pruning methods mostly rely on the scoring mechanism of complexity or diversity of kernels to compress the network, and then build the network model after removing the kernels by tuning or training on the input data. These methods are cumbersome and depend on a well-trained pre-trained model. In this paper, we propose an end-to-end block pruning method based on kernel and feature stability by pruning blocks efficiently. To accomplish this, we firstly introduce a mask to scale the output of the blocks, and the L1 regularization term to monitor the mask update. Second, we introduce the Center Loss to guarantee that the feature does not deviate greatly during learning. To converge fast, we introduce fast iterative shrinkage-thresholding algorithm (FISTA) to optimize the mask, by which a more fast and reliable pruning process is achieved. We implement experiments on different datasets, including CIFAR-10 and ImageNet ILSVRC2012. All the experiments have achieved the state-of-the-art accuracy.
{"title":"Efficient Block Pruning Based on Kernel and Feature Stablization","authors":"Sheng Xu, Hanlin Chen, Kexin Liu, Jinhu Lii, Baochang Zhang","doi":"10.1109/DICTA47822.2019.8946001","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946001","url":null,"abstract":"With the development of computer vision research, the architecture of convolutional neural network becomes more and more complex to reach the state-of-the-art performance. Is the complexity of the model necessarily proportional to its accuracy? To answer this, the compression of the network has attracted much attention in the academy and industry. Existing network pruning methods mostly rely on the scoring mechanism of complexity or diversity of kernels to compress the network, and then build the network model after removing the kernels by tuning or training on the input data. These methods are cumbersome and depend on a well-trained pre-trained model. In this paper, we propose an end-to-end block pruning method based on kernel and feature stability by pruning blocks efficiently. To accomplish this, we firstly introduce a mask to scale the output of the blocks, and the L1 regularization term to monitor the mask update. Second, we introduce the Center Loss to guarantee that the feature does not deviate greatly during learning. To converge fast, we introduce fast iterative shrinkage-thresholding algorithm (FISTA) to optimize the mask, by which a more fast and reliable pruning process is achieved. We implement experiments on different datasets, including CIFAR-10 and ImageNet ILSVRC2012. All the experiments have achieved the state-of-the-art accuracy.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80003359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945893
Mohammad Al-Naser, Shoaib Ahmed Siddiqui, Hiroki Ohashi, Sheraz Ahmed, Nakamura Katsuyki, Takuto Sato, A. Dengel
This paper proposes a novel gaze-estimation model for attentional object selection tasks. The key features of our model are two-fold: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and background, (ii) formulation of the gaze-estimation problem in two different ways, i.e. as a classification as well as a regression problem. We combine the two different formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. The experimental results on two publicly available datasets indicates that our model not only achieved real-time performance (13–18 FPS), but also outperformed the state-of-the-art models on the OSdataset along with comparable performance on GTEA-plus dataset.
{"title":"OGaze: Gaze Prediction in Egocentric Videos for Attentional Object Selection","authors":"Mohammad Al-Naser, Shoaib Ahmed Siddiqui, Hiroki Ohashi, Sheraz Ahmed, Nakamura Katsuyki, Takuto Sato, A. Dengel","doi":"10.1109/DICTA47822.2019.8945893","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945893","url":null,"abstract":"This paper proposes a novel gaze-estimation model for attentional object selection tasks. The key features of our model are two-fold: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and background, (ii) formulation of the gaze-estimation problem in two different ways, i.e. as a classification as well as a regression problem. We combine the two different formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. The experimental results on two publicly available datasets indicates that our model not only achieved real-time performance (13–18 FPS), but also outperformed the state-of-the-art models on the OSdataset along with comparable performance on GTEA-plus dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87873428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945850
N. Bhagat, Y. Vishnusai, G. Rathna
Speech impaired people use hand based gestures to communicate. Unfortunately, the vast majority of the people are not aware of the semantics of these gestures. In a attempt to bridge the same, we propose a real time hand gesture recognition system based on the data captured by the Microsoft Kinect RGB-D camera. Given that there is no one to one mapping between the pixels of the depth and the RGB camera, we used computer vision techniques like 3D contruction and affine transformation. After achieving one to one mapping, segmentation of the hand gestures was done from the background noise. Convolutional Neural Networks (CNNs) were utilised for training 36 static gestures relating to Indian Sign Language (ISL) alphabets and numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The model showed accurate real time performance on prediction of ISL static gestures, leaving a scope for further research on sentence formation through gestures. The model also showed competitive adaptability to American Sign Language (ASL) gestures when the ISL models weights were transfer learned to ASL and it resulted in giving 97.71% accuracy.
{"title":"Indian Sign Language Gesture Recognition using Image Processing and Deep Learning","authors":"N. Bhagat, Y. Vishnusai, G. Rathna","doi":"10.1109/DICTA47822.2019.8945850","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945850","url":null,"abstract":"Speech impaired people use hand based gestures to communicate. Unfortunately, the vast majority of the people are not aware of the semantics of these gestures. In a attempt to bridge the same, we propose a real time hand gesture recognition system based on the data captured by the Microsoft Kinect RGB-D camera. Given that there is no one to one mapping between the pixels of the depth and the RGB camera, we used computer vision techniques like 3D contruction and affine transformation. After achieving one to one mapping, segmentation of the hand gestures was done from the background noise. Convolutional Neural Networks (CNNs) were utilised for training 36 static gestures relating to Indian Sign Language (ISL) alphabets and numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The model showed accurate real time performance on prediction of ISL static gestures, leaving a scope for further research on sentence formation through gestures. The model also showed competitive adaptability to American Sign Language (ASL) gestures when the ISL models weights were transfer learned to ASL and it resulted in giving 97.71% accuracy.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86439365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946114
P. Ambalathankandy, Yafei Ou, Jyotsna Kochiyil, Shinya Takamaeda-Yamazaki, M. Motomura, T. Asai, M. Ikebe
In this paper, we analyze and propose the usefulness of smoothed LHE (Local Histogram Equalization) filters for processing images with low contrast like digital radiographic images. Digital X-rays are known to have optical illusions like Mach bands and background contrast effects, which are caused by lateral inhibition phenomena. We observe that using multilayer (ML) methods with latest edge preserving filter for contrast enhancement in medical images can be problematic and could lead to faulty diagnosis from detail exaggeration which are caused by uncontrolled texture boosting from user defined gain settings. ML filters are designed with few subjectively selected filter kernel sizes, which can result in unnaturalness in output images. We propose a smoothed LHE-like filter with an adaptive gain control, that is more robust and can enhance fine details in digital X-rays while maintaining their intrinsic naturalness. Preserving naturalness in X-ray images are an essential feature for radiographic diagnostics. Our proposed filter has 0(1) complexity and can easily be controlled and operated with a continuously varying kernel size, which functions like an active high pass filter, amplifying all frequencies within the kernel.
{"title":"Radiography Contrast Enhancement: Smoothed LHE Filter a Practical Solution for Digital X-Rays with Mach Band","authors":"P. Ambalathankandy, Yafei Ou, Jyotsna Kochiyil, Shinya Takamaeda-Yamazaki, M. Motomura, T. Asai, M. Ikebe","doi":"10.1109/DICTA47822.2019.8946114","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946114","url":null,"abstract":"In this paper, we analyze and propose the usefulness of smoothed LHE (Local Histogram Equalization) filters for processing images with low contrast like digital radiographic images. Digital X-rays are known to have optical illusions like Mach bands and background contrast effects, which are caused by lateral inhibition phenomena. We observe that using multilayer (ML) methods with latest edge preserving filter for contrast enhancement in medical images can be problematic and could lead to faulty diagnosis from detail exaggeration which are caused by uncontrolled texture boosting from user defined gain settings. ML filters are designed with few subjectively selected filter kernel sizes, which can result in unnaturalness in output images. We propose a smoothed LHE-like filter with an adaptive gain control, that is more robust and can enhance fine details in digital X-rays while maintaining their intrinsic naturalness. Preserving naturalness in X-ray images are an essential feature for radiographic diagnostics. Our proposed filter has 0(1) complexity and can easily be controlled and operated with a continuously varying kernel size, which functions like an active high pass filter, amplifying all frequencies within the kernel.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86201287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945929
M. Shahzad, Rabeya Noor, Sheraz Ahmad, A. Mian, F. Shafait
Traditional computer vision approaches heavily relied on hand-crafted features for tasks such as visual object detection and recognition. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features is considered outdated. In this paper, we present an approach for table detection in which we leverage a deep learning based table detection model with hand-crafted features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to perform better at the detection task. Experiments on publicly available UNLV dataset show that the presented method achieves an accuracy comparable with the state-of-the-art deep learning methods without the need of extensive hyper-parameter tuning.
{"title":"Feature Engineering Meets Deep Learning: A Case Study on Table Detection in Documents","authors":"M. Shahzad, Rabeya Noor, Sheraz Ahmad, A. Mian, F. Shafait","doi":"10.1109/DICTA47822.2019.8945929","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945929","url":null,"abstract":"Traditional computer vision approaches heavily relied on hand-crafted features for tasks such as visual object detection and recognition. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features is considered outdated. In this paper, we present an approach for table detection in which we leverage a deep learning based table detection model with hand-crafted features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to perform better at the detection task. Experiments on publicly available UNLV dataset show that the presented method achieves an accuracy comparable with the state-of-the-art deep learning methods without the need of extensive hyper-parameter tuning.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81733879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}