首页 > 最新文献

2019 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Hyperspectral Image Analysis for Writer Identification using Deep Learning 基于深度学习的作家识别高光谱图像分析
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945886
Ammad Ul Islam, Muhammad Jaleed Khan, K. Khurshid, F. Shafait
Handwriting is a behavioral characteristic of human beings that is one of the common idiosyncrasies utilized for litigation purposes. Writer identification is commonly used for forensic examination of questioned and specimen documents. Recent advancements in imaging and machine learning technologies have empowered the development of automated, intelligent and robust writer identification methods. Most of the existing methods based on human defined features and color imaging have limited performance in terms of accuracy and robustness. However, rich spectral information content obtained from hyperspectral imaging (HSI) and suitable spatio-spectral features extracted using deep learning can significantly enhance the performance of writer identification in terms of accuracy and robustness. In this paper, we propose a novel writer identification method in which spectral responses of text pixels in a hyperspectral document image are extracted and are fed to a Convolutional Neural Network (CNN) for writer classification. Different CNN architectures, hyperparameters, spatio-spectral formats, train-test ratios and inks are used to evaluate the performance of the proposed system on the UWA Writing Inks Hyperspectral Images (WIHSI) database and to select the most suitable set of parameters for writer identification. The findings of this work have opened a new arena in forensic document analysis for writer identification using HSI and deep learning.
书写是人类的一种行为特征,是用于诉讼目的的常见特质之一。写信人身份鉴定通常用于质疑文件和样本文件的法医检查。成像和机器学习技术的最新进步促进了自动化,智能和强大的作家识别方法的发展。现有的基于人类自定义特征和彩色成像的方法在准确性和鲁棒性方面性能有限。然而,从高光谱成像(HSI)中获得丰富的光谱信息含量,并利用深度学习提取合适的空间光谱特征,可以显著提高作者识别的准确性和鲁棒性。本文提出了一种新的写作者识别方法,该方法提取高光谱文档图像中文本像素的光谱响应,并将其馈送到卷积神经网络(CNN)中进行写作者分类。使用不同的CNN架构、超参数、空间光谱格式、训练测试比率和墨水来评估该系统在西弗吉尼亚大学写作墨水高光谱图像(WIHSI)数据库上的性能,并选择最合适的参数集用于作家识别。这项工作的发现为使用HSI和深度学习进行作者身份鉴定的法医文件分析开辟了一个新的领域。
{"title":"Hyperspectral Image Analysis for Writer Identification using Deep Learning","authors":"Ammad Ul Islam, Muhammad Jaleed Khan, K. Khurshid, F. Shafait","doi":"10.1109/DICTA47822.2019.8945886","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945886","url":null,"abstract":"Handwriting is a behavioral characteristic of human beings that is one of the common idiosyncrasies utilized for litigation purposes. Writer identification is commonly used for forensic examination of questioned and specimen documents. Recent advancements in imaging and machine learning technologies have empowered the development of automated, intelligent and robust writer identification methods. Most of the existing methods based on human defined features and color imaging have limited performance in terms of accuracy and robustness. However, rich spectral information content obtained from hyperspectral imaging (HSI) and suitable spatio-spectral features extracted using deep learning can significantly enhance the performance of writer identification in terms of accuracy and robustness. In this paper, we propose a novel writer identification method in which spectral responses of text pixels in a hyperspectral document image are extracted and are fed to a Convolutional Neural Network (CNN) for writer classification. Different CNN architectures, hyperparameters, spatio-spectral formats, train-test ratios and inks are used to evaluate the performance of the proposed system on the UWA Writing Inks Hyperspectral Images (WIHSI) database and to select the most suitable set of parameters for writer identification. The findings of this work have opened a new arena in forensic document analysis for writer identification using HSI and deep learning.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"82 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73211382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
High-Throughput Plant Height Estimation from RGB Images Acquired with Aerial Platforms: A 3D Point Cloud Based Approach 基于航拍平台获取的RGB图像的高通量植物高度估计:一种基于3D点云的方法
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945911
Xun Li, Geoff Bull, R. Coe, Sakda Eamkulworapong, J. Scarrow, Michael Salim, M. Schaefer, X. Sirault
With the development of computer vision technologies, using images acquired by aerial platforms to measure large scale agricultural fields has been increasingly studied. In order to provide a more time efficient, light weight and low cost solution, in this paper we present a highly automated processing pipeline that performs plant height estimation based on a dense point cloud generated from aerial RGB images, requiring only a single flight. A previously acquired terrain model is not required as input. The process extracts a segmented plant layer and bare ground layer. Ground height estimation achieves sub 10cm accuracy. High throughput plant height estimation has been performed and results are compared with LiDAR based measurements.
随着计算机视觉技术的发展,利用航空平台获取的图像对大尺度农田进行测量的研究越来越多。为了提供更省时、轻量和低成本的解决方案,本文提出了一种高度自动化的处理管道,该管道基于由航空RGB图像生成的密集点云执行植物高度估计,只需要一次飞行。不需要先前获得的地形模型作为输入。该工艺提取了分段的植物层和裸露的地面层。地面高度估计精度达到10cm以下。进行了高通量植物高度估计,并将结果与基于激光雷达的测量结果进行了比较。
{"title":"High-Throughput Plant Height Estimation from RGB Images Acquired with Aerial Platforms: A 3D Point Cloud Based Approach","authors":"Xun Li, Geoff Bull, R. Coe, Sakda Eamkulworapong, J. Scarrow, Michael Salim, M. Schaefer, X. Sirault","doi":"10.1109/DICTA47822.2019.8945911","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945911","url":null,"abstract":"With the development of computer vision technologies, using images acquired by aerial platforms to measure large scale agricultural fields has been increasingly studied. In order to provide a more time efficient, light weight and low cost solution, in this paper we present a highly automated processing pipeline that performs plant height estimation based on a dense point cloud generated from aerial RGB images, requiring only a single flight. A previously acquired terrain model is not required as input. The process extracts a segmented plant layer and bare ground layer. Ground height estimation achieves sub 10cm accuracy. High throughput plant height estimation has been performed and results are compared with LiDAR based measurements.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"42 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81285092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Measurement of Traffic Volume by Time Series Images Created from Horizontal Edge Segments 由水平边缘段生成的时间序列图像测量交通量
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945806
K. Onoguchi
This paper presents the method to measure the number and speed of passing vehicles from the traffic surveillance camera. In the bird's-eye view image obtained by the inverse perspective mapping, vehicles traveling at constant speed move at constant speed which depends on the height from the road surface. Using this feature, the proposed method detects individual vehicles and calculates the vehicle speed. In the image taken from the back of the vehicle, the vehicle appears from the roof and the rear view of the vehicle appears gradually from the top to the bottom. For this reason, the proposed method detects the position of the horizontal edge segment in the bird's-eye view image and creates the time series image in which it's arranged in order of frame numbers. The trajectory of the position of the horizontal edge segment draws a straight line in the time series image when the vehicle moves at a constant speed in the measurement area. Therefore, the slope of the straight line, that is, the speed of the horizontal edge segment is calculated to separate vehicles. When the trajectory of the horizontal edge segment with higher speed appears, it's determined that a new vehicle has entered the measurement area. At this point, the number of vehicles is incremented and the speed of the vehicle is calculated from the slop of the previous trajectory. The proposed method is robust to overlap between vehicles and the sudden change in brightness. The processing speed is also lower than the video rate.
本文提出了一种利用交通监控摄像头采集车辆数量和速度的方法。在反透视映射得到的鸟瞰图中,匀速行驶的车辆以与路面高度有关的匀速行驶。利用这一特征,该方法检测单个车辆并计算车速。在从车辆后部拍摄的图像中,车辆从车顶出现,车辆的后视图从顶部到底部逐渐出现。为此,本文提出的方法检测水平边缘段在鸟瞰图中的位置,并生成按帧数顺序排列的时间序列图像。当车辆在测量区域内匀速运动时,水平边缘段的位置轨迹在时间序列图像上画一条直线。因此,计算直线的斜率,即水平边缘段的速度来分离车辆。当出现速度较高的水平边缘段轨迹时,判定有新车进入测量区域。在这一点上,飞行器的数量增加,飞行器的速度由前一个轨迹的斜率计算。该方法对车辆重叠和亮度突变具有较强的鲁棒性。处理速度也低于视频速率。
{"title":"Measurement of Traffic Volume by Time Series Images Created from Horizontal Edge Segments","authors":"K. Onoguchi","doi":"10.1109/DICTA47822.2019.8945806","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945806","url":null,"abstract":"This paper presents the method to measure the number and speed of passing vehicles from the traffic surveillance camera. In the bird's-eye view image obtained by the inverse perspective mapping, vehicles traveling at constant speed move at constant speed which depends on the height from the road surface. Using this feature, the proposed method detects individual vehicles and calculates the vehicle speed. In the image taken from the back of the vehicle, the vehicle appears from the roof and the rear view of the vehicle appears gradually from the top to the bottom. For this reason, the proposed method detects the position of the horizontal edge segment in the bird's-eye view image and creates the time series image in which it's arranged in order of frame numbers. The trajectory of the position of the horizontal edge segment draws a straight line in the time series image when the vehicle moves at a constant speed in the measurement area. Therefore, the slope of the straight line, that is, the speed of the horizontal edge segment is calculated to separate vehicles. When the trajectory of the horizontal edge segment with higher speed appears, it's determined that a new vehicle has entered the measurement area. At this point, the number of vehicles is incremented and the speed of the vehicle is calculated from the slop of the previous trajectory. The proposed method is robust to overlap between vehicles and the sudden change in brightness. The processing speed is also lower than the video rate.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"67 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77232515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Haar Pattern Based Binary Feature Descriptor for Retinal Image Registration 基于Haar模式的视网膜图像配准二值特征描述符
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946021
Sajib Saha, Y. Kanagasingam
Image registration is an important step in several retinal image analysis tasks. Robust detection, description and accurate matching of landmark points (also called keypoints) between images are crucial for successful registration of image pairs. This paper introduces a novel binary descriptor named Local Haar Patter of Bifurcation point (LHPB), so that retinal keypoints can be described more precisely and matched more accurately. LHPB uses 32 patterns that are reminiscent of Haar basis function and relies on pixel intensity test to form 256 bit binary vector. LHPB descriptors are matched using Hamming distance. Experiments are conducted on publicly available retinal image registration dataset named FIRE. The proposed descriptor has been compared with the state-of-the art Chen et al.'s method and ALOHA descriptor. Experiments show that the proposed LHPB descriptor is about 2% more accurate than ALOHA and 17% more accurate than Chen et al.'s method.
图像配准是许多视网膜图像分析任务中的重要步骤。图像之间的地标点(也称为关键点)的鲁棒检测、描述和准确匹配是成功配准图像对的关键。本文引入了一种新的二元描述子——局部哈尔分岔点模式(Local Haar pattern of Bifurcation point, LHPB),使视网膜关键点能够更精确地描述和匹配。LHPB使用32种模式,使人联想到哈尔基函数,依靠像素强度测试形成256位二进制向量。LHPB描述符使用汉明距离进行匹配。实验在公开的视网膜图像配准数据集FIRE上进行。所提出的描述符已与最先进的Chen等人的方法和ALOHA描述符进行了比较。实验表明,所提出的LHPB描述符比ALOHA准确约2%,比Chen等人的方法准确约17%。
{"title":"Haar Pattern Based Binary Feature Descriptor for Retinal Image Registration","authors":"Sajib Saha, Y. Kanagasingam","doi":"10.1109/DICTA47822.2019.8946021","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946021","url":null,"abstract":"Image registration is an important step in several retinal image analysis tasks. Robust detection, description and accurate matching of landmark points (also called keypoints) between images are crucial for successful registration of image pairs. This paper introduces a novel binary descriptor named Local Haar Patter of Bifurcation point (LHPB), so that retinal keypoints can be described more precisely and matched more accurately. LHPB uses 32 patterns that are reminiscent of Haar basis function and relies on pixel intensity test to form 256 bit binary vector. LHPB descriptors are matched using Hamming distance. Experiments are conducted on publicly available retinal image registration dataset named FIRE. The proposed descriptor has been compared with the state-of-the art Chen et al.'s method and ALOHA descriptor. Experiments show that the proposed LHPB descriptor is about 2% more accurate than ALOHA and 17% more accurate than Chen et al.'s method.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"48 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78987956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep-Learning from Mistakes: Automating Cloud Class Refinement for Sky Image Segmentation 从错误中深度学习:自动云类细化天空图像分割
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946028
Gemma Dianne, A. Wiliem, B. Lovell
There is considerable research effort directed toward ground based cloud detection due to its many applications in Air traffic control, Cloud-track wind data monitoring, and Solar-power forecasting to name a few. There are key challenges that have been identified consistently in the literature being primarily: glare, varied illumination, poorly defined boundaries, and thin wispy clouds. At this time there is one significant research database for use in Cloud Segmentation; the SWIMSEG database [1] which consists of 1013 Images and the corresponding Ground Truths. While investigating the limitations around detecting thin cloud, we found significant ambiguity even within this high quality hand labelled research dataset. This is to be expected, as the task of tracing cloud boundaries is subjective. We propose capitalising on these inconsistencies by utilising robust deep-learning techniques, which have been recently shown to be effective on this data. By implementing a two-stage training strategy, validated on the smaller HYTA dataset, we plan to leverage the mistakes in the first stage of training to refine class features in the second. This approach is based on the assumption that the majority of mistakes made in the first stage will correspond to thin cloud pixels. The results of our experimentation indicate that this assumption is true, with this two-stage process producing quality results, while also proving to be robust when extended to unseen data.
由于地面云检测在空中交通管制、云轨迹风数据监测和太阳能预测等方面的许多应用,因此有相当多的研究工作指向地面云检测。在文献中一致确定的关键挑战主要是:眩光,不同的照明,不明确的边界和薄云。目前有一个重要的研究数据库用于云分割;SWIMSEG数据库[1]由1013张图像和相应的Ground truth组成。在调查薄云检测的局限性时,我们发现即使在这个高质量的手工标记研究数据集中也存在显著的模糊性。这是可以预料到的,因为跟踪云边界的任务是主观的。我们建议利用强大的深度学习技术来利用这些不一致性,这些技术最近被证明对这些数据是有效的。通过实施两阶段训练策略,在较小的HYTA数据集上进行验证,我们计划利用第一阶段训练中的错误来改进第二阶段的类特征。这种方法是基于这样的假设,即在第一阶段所犯的大多数错误将对应于薄云像素。我们的实验结果表明,这个假设是正确的,这个两阶段的过程产生了高质量的结果,同时也证明了在扩展到看不见的数据时是稳健的。
{"title":"Deep-Learning from Mistakes: Automating Cloud Class Refinement for Sky Image Segmentation","authors":"Gemma Dianne, A. Wiliem, B. Lovell","doi":"10.1109/DICTA47822.2019.8946028","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946028","url":null,"abstract":"There is considerable research effort directed toward ground based cloud detection due to its many applications in Air traffic control, Cloud-track wind data monitoring, and Solar-power forecasting to name a few. There are key challenges that have been identified consistently in the literature being primarily: glare, varied illumination, poorly defined boundaries, and thin wispy clouds. At this time there is one significant research database for use in Cloud Segmentation; the SWIMSEG database [1] which consists of 1013 Images and the corresponding Ground Truths. While investigating the limitations around detecting thin cloud, we found significant ambiguity even within this high quality hand labelled research dataset. This is to be expected, as the task of tracing cloud boundaries is subjective. We propose capitalising on these inconsistencies by utilising robust deep-learning techniques, which have been recently shown to be effective on this data. By implementing a two-stage training strategy, validated on the smaller HYTA dataset, we plan to leverage the mistakes in the first stage of training to refine class features in the second. This approach is based on the assumption that the majority of mistakes made in the first stage will correspond to thin cloud pixels. The results of our experimentation indicate that this assumption is true, with this two-stage process producing quality results, while also proving to be robust when extended to unseen data.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88781305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic Nipple Detection Method for Digital Skin Images with Psoriasis Lesions 银屑病病变数字皮肤图像乳头自动检测方法
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945944
Y. George, M. Aldeen, R. Garnavi
The presence of nipples in human trunk images is considered a main problem in psoriasis images. Existing segmentation methods fail to differentiate between psoriasis lesions and nipples due to the high degree of visual similarity. In this paper, we present an automated nipple detection method as an important component for severity assessment of psoriasis. First, edges are extracted using Canny edge detector where the smoothing sigma parameter is automatically customized for every image based on psoriasis severity level. Then, circular hough transform (CHT) and local maximum filtering are applied for circle detection. This is followed by a nipple selection method where we use two new nipple similarity measures, namely: hough transform peak intensity value and structure similarity index. Finally, nipple selection refinement is performed by using the location criteria for the selected nipples. The proposed method is evaluated on 72 trunk images with psoriasis lesions. The conducted experiments demonstrate that the proposed method performs very well even in the presence of heavy hair, severe and mild lesions, and various nipple sizes, with an overall nipple detection accuracy of 95.14% across the evaluation set.
乳头在人体躯干图像的存在被认为是银屑病图像的主要问题。现有的分割方法由于视觉相似性高,无法区分牛皮癣病变和乳头。在本文中,我们提出了一种自动乳头检测方法,作为牛皮癣严重程度评估的重要组成部分。首先,采用Canny边缘检测器提取边缘,并根据银屑病严重程度自动自定义平滑sigma参数;然后,采用圆霍夫变换和局部极大值滤波进行圆检测。接下来是乳头选择方法,其中我们使用了两个新的乳头相似度量,即:霍夫变换峰值强度值和结构相似指数。最后,乳头选择细化是使用的位置标准,为选定的乳头执行。在72张牛皮癣病变的躯干图像上对该方法进行了评价。实验表明,该方法即使在存在浓密毛发、严重和轻微病变以及各种乳头大小的情况下也表现良好,在整个评估集中,乳头检测的总体准确率为95.14%。
{"title":"Automatic Nipple Detection Method for Digital Skin Images with Psoriasis Lesions","authors":"Y. George, M. Aldeen, R. Garnavi","doi":"10.1109/DICTA47822.2019.8945944","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945944","url":null,"abstract":"The presence of nipples in human trunk images is considered a main problem in psoriasis images. Existing segmentation methods fail to differentiate between psoriasis lesions and nipples due to the high degree of visual similarity. In this paper, we present an automated nipple detection method as an important component for severity assessment of psoriasis. First, edges are extracted using Canny edge detector where the smoothing sigma parameter is automatically customized for every image based on psoriasis severity level. Then, circular hough transform (CHT) and local maximum filtering are applied for circle detection. This is followed by a nipple selection method where we use two new nipple similarity measures, namely: hough transform peak intensity value and structure similarity index. Finally, nipple selection refinement is performed by using the location criteria for the selected nipples. The proposed method is evaluated on 72 trunk images with psoriasis lesions. The conducted experiments demonstrate that the proposed method performs very well even in the presence of heavy hair, severe and mild lesions, and various nipple sizes, with an overall nipple detection accuracy of 95.14% across the evaluation set.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"76 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91181031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
EncapNet-3D and U-EncapNet for Cell Segmentation EncapNet-3D和U-EncapNet用于细胞分割
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945839
Takumi Sato, K. Hotta
EncapNet is a kind of Capsule network that significantly improved routing problems that has thought to be the main bottleneck of capsule network. In this paper, we propose EncapNet-3D that has stronger connection between master and aide branch, which original EncapNet has only single co-efficient per capsule. We achieved this by adding 3D convolution and Dropout layers to connection between them. 3D convolution makes connection between capsules stronger. We also propose U-EncapNet, which uses U-net architecture to achieve high accuracy in semantic segmentation task. EncapNet-3D has successfully accomplished to reduce network parameters 321 times smaller compared to U-EncapNet, 52 times smaller than U-net. We show the result on segmentation problem of cell images. U-EncapNet has advanced performance of 1.1% in cell mean IoU in comparison with U-net. EncapNet-3D has achieved 3% increase in comparison with ResNet-6 in cell membrane IoU.
EncapNet是一种胶囊网络,它显著改善了被认为是胶囊网络主要瓶颈的路由问题。在本文中,我们提出的EncapNet- 3d具有更强的主分支和辅助分支之间的连接,而原来的EncapNet每个胶囊只有一个协效。我们通过在它们之间的连接中添加3D卷积和Dropout层来实现这一点。三维卷积增强了胶囊之间的连接。我们还提出了U-EncapNet,它采用U-net架构来实现高精度的语义分割任务。与U-EncapNet相比,EncapNet-3D成功地将网络参数减小了321倍,比U-net小了52倍。我们展示了在细胞图像分割问题上的结果。与U-net相比,U-EncapNet的单元平均IoU提高了1.1%。与ResNet-6相比,EncapNet-3D的细胞膜IoU增加了3%。
{"title":"EncapNet-3D and U-EncapNet for Cell Segmentation","authors":"Takumi Sato, K. Hotta","doi":"10.1109/DICTA47822.2019.8945839","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945839","url":null,"abstract":"EncapNet is a kind of Capsule network that significantly improved routing problems that has thought to be the main bottleneck of capsule network. In this paper, we propose EncapNet-3D that has stronger connection between master and aide branch, which original EncapNet has only single co-efficient per capsule. We achieved this by adding 3D convolution and Dropout layers to connection between them. 3D convolution makes connection between capsules stronger. We also propose U-EncapNet, which uses U-net architecture to achieve high accuracy in semantic segmentation task. EncapNet-3D has successfully accomplished to reduce network parameters 321 times smaller compared to U-EncapNet, 52 times smaller than U-net. We show the result on segmentation problem of cell images. U-EncapNet has advanced performance of 1.1% in cell mean IoU in comparison with U-net. EncapNet-3D has achieved 3% increase in comparison with ResNet-6 in cell membrane IoU.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"R-24 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84740116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation LiteSeg:一种新的轻量级语义分割卷积神经网络
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945975
Taha Emara, H. A. E. Munim, Hazem M. Abbas
Semantic image segmentation plays a pivotal role in many vision applications including autonomous driving and medical image analysis. Most of the former approaches move towards enhancing the performance in terms of accuracy with a little awareness of computational efficiency. In this paper, we introduce LiteSeg, a lightweight architecture for semantic image segmentation. In this work, we explore a new deeper version of Atrous Spatial Pyramid Pooling module (ASPP) and apply short and long residual connections, and depthwise separable convolution, resulting in a faster and efficient model. LiteSeg architecture is introduced and tested with multiple backbone networks as Darknet19, MobileNet, and ShuffleNet to provide multiple trade-offs between accuracy and computational cost. The proposed model LiteSeg, with MobileNetV2 as a backbone network, achieves an accuracy of 67.81% mean intersection over union at 161 frames per second with 640 × 360 resolution on the Cityscapes dataset.
语义图像分割在包括自动驾驶和医学图像分析在内的许多视觉应用中起着关键作用。大多数前一种方法都是在提高精度方面的性能,而很少考虑计算效率。在本文中,我们介绍了一种轻量级的语义图像分割架构LiteSeg。在这项工作中,我们探索了一个新的更深层次的阿特劳斯空间金字塔池模块(ASPP),并应用了长短残差连接和深度可分卷积,从而得到了一个更快、更高效的模型。LiteSeg架构在Darknet19、MobileNet和ShuffleNet等多个骨干网络上进行了介绍和测试,以提供精度和计算成本之间的多重权衡。提出的LiteSeg模型以MobileNetV2为骨干网络,在cityscape数据集上以每秒161帧和640 × 360分辨率实现了67.81%的平均相交精度。
{"title":"LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation","authors":"Taha Emara, H. A. E. Munim, Hazem M. Abbas","doi":"10.1109/DICTA47822.2019.8945975","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945975","url":null,"abstract":"Semantic image segmentation plays a pivotal role in many vision applications including autonomous driving and medical image analysis. Most of the former approaches move towards enhancing the performance in terms of accuracy with a little awareness of computational efficiency. In this paper, we introduce LiteSeg, a lightweight architecture for semantic image segmentation. In this work, we explore a new deeper version of Atrous Spatial Pyramid Pooling module (ASPP) and apply short and long residual connections, and depthwise separable convolution, resulting in a faster and efficient model. LiteSeg architecture is introduced and tested with multiple backbone networks as Darknet19, MobileNet, and ShuffleNet to provide multiple trade-offs between accuracy and computational cost. The proposed model LiteSeg, with MobileNetV2 as a backbone network, achieves an accuracy of 67.81% mean intersection over union at 161 frames per second with 640 × 360 resolution on the Cityscapes dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"94 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82083019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
FFD: Figure and Formula Detection from Document Images FFD:从文档图像中检测图形和公式
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945972
Junaid Younas, Syed Tahseen Raza Rizvi, M. I. Malik, F. Shafait, P. Lukowicz, Sheraz Ahmed
In this work, we present a novel and generic approach, Figure and Formula Detector (FFD) to detect the formulas and figures from document images. Our proposed method employs traditional computer vision approaches in addition to deep models. We transform input images by applying connected component analysis (CC), distance transform, and colour transform, which are stacked together to generate an input image for the network. The best results produced by FFD for figure and formula detection are with F1-score of 0.906 and 0.905, respectively. We also propose a new dataset for figures and formulas detection to aid future research in this direction. The obtained results advocate that enhancing the input representation can simplify the subsequent optimization problem resulting in significant gains over their conventional counterparts.
在这项工作中,我们提出了一种新的通用方法,图形和公式检测器(FFD)来检测文档图像中的公式和图形。我们提出的方法除了采用深度模型外,还采用传统的计算机视觉方法。我们通过应用连接分量分析(CC)、距离变换和颜色变换来变换输入图像,这些变换叠加在一起为网络生成输入图像。FFD检测图形和配方的最佳结果分别为f1得分为0.906和0.905。我们还提出了一个新的数据集,用于数字和公式的检测,以帮助未来在这方面的研究。得到的结果表明,增强输入表示可以简化后续的优化问题,从而比传统的优化问题获得显著的收益。
{"title":"FFD: Figure and Formula Detection from Document Images","authors":"Junaid Younas, Syed Tahseen Raza Rizvi, M. I. Malik, F. Shafait, P. Lukowicz, Sheraz Ahmed","doi":"10.1109/DICTA47822.2019.8945972","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945972","url":null,"abstract":"In this work, we present a novel and generic approach, Figure and Formula Detector (FFD) to detect the formulas and figures from document images. Our proposed method employs traditional computer vision approaches in addition to deep models. We transform input images by applying connected component analysis (CC), distance transform, and colour transform, which are stacked together to generate an input image for the network. The best results produced by FFD for figure and formula detection are with F1-score of 0.906 and 0.905, respectively. We also propose a new dataset for figures and formulas detection to aid future research in this direction. The obtained results advocate that enhancing the input representation can simplify the subsequent optimization problem resulting in significant gains over their conventional counterparts.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84729837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Image Alignment using Norm Conserved GAT Correlation 使用范数保守GAT相关的图像对齐
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945880
T. Wakahara, Yukihiko Yamashita
This paper describes a new area-based image alignment technique, norm conserved GAT (Global Affine Transformation) correlation. The cutting-edge techniques of image alignment are mostly feature-based, such well-known techniques as SIFT, SURF, ASIFT, and ORB. The proposed technique determines affine parameters maximizing ZNCC (zero-means normalized cross-correlation) between warped and reference images. In experiments using artificially warped images subject to rotation, blur, random noise, a few kinds of general affine transformation, and a simple 2D projection transformation, we compare the proposed technique against the feature-based ORB (Oriented FAST and Rotated BRIEF), the competing areabased ECC (Enhanced Correlation Coefficient), the original GAT correlation, and the GPT (Global Projection Transformation) correlation techniques. We show a very promising ability of the proposed norm conserved GAT correlation by discussing the advantages and disadvantages of these techniques with respect to both ability of image alignment and computational complexity.
本文介绍了一种新的基于区域的图像对准技术——范数保守全局仿射变换相关。图像对齐的前沿技术大多是基于特征的,如SIFT、SURF、ASIFT、ORB等。该技术确定了扭曲图像和参考图像之间的仿射参数,使ZNCC(零均值归一化互相关)最大化。在实验中,我们使用人工扭曲的图像进行旋转、模糊、随机噪声、几种一般仿射变换和简单的二维投影变换,将所提出的技术与基于特征的ORB (Oriented FAST and rotational BRIEF)、基于竞争区域的ECC (Enhanced Correlation Coefficient)、原始GAT相关和GPT (Global projection transformation)相关技术进行比较。通过讨论这些技术在图像对齐能力和计算复杂性方面的优缺点,我们展示了所提出的范数保守GAT相关的非常有前途的能力。
{"title":"Image Alignment using Norm Conserved GAT Correlation","authors":"T. Wakahara, Yukihiko Yamashita","doi":"10.1109/DICTA47822.2019.8945880","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945880","url":null,"abstract":"This paper describes a new area-based image alignment technique, norm conserved GAT (Global Affine Transformation) correlation. The cutting-edge techniques of image alignment are mostly feature-based, such well-known techniques as SIFT, SURF, ASIFT, and ORB. The proposed technique determines affine parameters maximizing ZNCC (zero-means normalized cross-correlation) between warped and reference images. In experiments using artificially warped images subject to rotation, blur, random noise, a few kinds of general affine transformation, and a simple 2D projection transformation, we compare the proposed technique against the feature-based ORB (Oriented FAST and Rotated BRIEF), the competing areabased ECC (Enhanced Correlation Coefficient), the original GAT correlation, and the GPT (Global Projection Transformation) correlation techniques. We show a very promising ability of the proposed norm conserved GAT correlation by discussing the advantages and disadvantages of these techniques with respect to both ability of image alignment and computational complexity.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80782755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1