Pub Date : 2020-11-29DOI: 10.1109/DICTA51227.2020.9363407
Michael Halstead, S. Denman, C. Fookes, C. McCool
Agricultural robotics is a rapidly evolving research field due to advances in computer vision, machine learning, robotics, and increased agricultural demand. However, there is still a considerable gap between farming requirements and available technology due to the large differences between cropping environments. This creates a pressing need for models with greater generalisability. We explore the issue of generalisability by considering a fruit (sweet pepper) that is grown using different cultivar (sub-species) and in different environments (field vs glasshouse). To investigate these differences, we publicly release three novel datasets captured with different domains, cultivar, cameras, and geographic locations. We exploit these new datasets in a singular and combined (to promote generalisation) manner to evaluate sweet pepper (fruit) detection and classification in the wild. For evaluation, we employ Faster-RCNN for detection due to the ease in which it can be expanded to incorporate multitask learning by utilising the Mask-RCNN framework (instance-based segmentation). This multi-task learning technique is shown to increase the cross dataset detection F1-Score from 0.323 to 0.700, demonstrating the potential to reduce the requirements of new annotations through improved generalisation of the model. We further exploit the Faster-RCNN architecture to include both super- and sub-classes, fruit and ripeness respectively, by incorporating a parallel classification layer. For sub-class classification considering the percentage of correct detections, we are able to achieve an accuracy score of 0.900 in a cross domain evaluation. In our experiments, we find that intra-environmental inference is generally inferior, however, diversifying the data by using a combination of datasets increases performance through greater diversity in the training data. Overall, the introduction of these three novel and diverse datasets demonstrates the potential for multi-task learning to improve cross-dataset generalisability while also highlighting the importance of diverse data to adequately train and evaluate real-world systems.
{"title":"Fruit Detection in the Wild: The Impact of Varying Conditions and Cultivar","authors":"Michael Halstead, S. Denman, C. Fookes, C. McCool","doi":"10.1109/DICTA51227.2020.9363407","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363407","url":null,"abstract":"Agricultural robotics is a rapidly evolving research field due to advances in computer vision, machine learning, robotics, and increased agricultural demand. However, there is still a considerable gap between farming requirements and available technology due to the large differences between cropping environments. This creates a pressing need for models with greater generalisability. We explore the issue of generalisability by considering a fruit (sweet pepper) that is grown using different cultivar (sub-species) and in different environments (field vs glasshouse). To investigate these differences, we publicly release three novel datasets captured with different domains, cultivar, cameras, and geographic locations. We exploit these new datasets in a singular and combined (to promote generalisation) manner to evaluate sweet pepper (fruit) detection and classification in the wild. For evaluation, we employ Faster-RCNN for detection due to the ease in which it can be expanded to incorporate multitask learning by utilising the Mask-RCNN framework (instance-based segmentation). This multi-task learning technique is shown to increase the cross dataset detection F1-Score from 0.323 to 0.700, demonstrating the potential to reduce the requirements of new annotations through improved generalisation of the model. We further exploit the Faster-RCNN architecture to include both super- and sub-classes, fruit and ripeness respectively, by incorporating a parallel classification layer. For sub-class classification considering the percentage of correct detections, we are able to achieve an accuracy score of 0.900 in a cross domain evaluation. In our experiments, we find that intra-environmental inference is generally inferior, however, diversifying the data by using a combination of datasets increases performance through greater diversity in the training data. Overall, the introduction of these three novel and diverse datasets demonstrates the potential for multi-task learning to improve cross-dataset generalisability while also highlighting the importance of diverse data to adequately train and evaluate real-world systems.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-29DOI: 10.1109/DICTA51227.2020.9363386
Ifty Mohammad Rezwan, Mirza Belal Ahmed, S. Sourav, Ezab Quader, Arafat Hossain, Nabeel Mohammed
In this paper, we propose a new variant of Capsule Networks called MixCaps. It is a new architecture that significantly decreases the compute capability required to run capsule networks. Due to the nature of our modules, we propose a new routing algorithm that does not require multiple iterations. All routing models prior to this architecture uses multiple iterations. This decreases our model's memory requirements by a significant margin unlike previous methods. This also provides us with the advantage to use both Matrix and Vector Poses. The model learns better complex representations as an aftereffect. Despite all this, we also show that our model performs on par with all prior capsule architectures on complex datasets such as Cifar-10 and Cifar-100.
{"title":"MixCaps: Capsules With Iteration Free Routing","authors":"Ifty Mohammad Rezwan, Mirza Belal Ahmed, S. Sourav, Ezab Quader, Arafat Hossain, Nabeel Mohammed","doi":"10.1109/DICTA51227.2020.9363386","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363386","url":null,"abstract":"In this paper, we propose a new variant of Capsule Networks called MixCaps. It is a new architecture that significantly decreases the compute capability required to run capsule networks. Due to the nature of our modules, we propose a new routing algorithm that does not require multiple iterations. All routing models prior to this architecture uses multiple iterations. This decreases our model's memory requirements by a significant margin unlike previous methods. This also provides us with the advantage to use both Matrix and Vector Poses. The model learns better complex representations as an aftereffect. Despite all this, we also show that our model performs on par with all prior capsule architectures on complex datasets such as Cifar-10 and Cifar-100.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128782368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-29DOI: 10.1109/DICTA51227.2020.9363394
J. Scott, Andrew Busch
Australia's sugar industry is currently undergoing significant hardships, due to global market contractions from COVID-19, increased crop forecasts from larger global producers, and falling oil prices. Current planting practices utilize inefficient mass-flow planting techniques, and no attempt to map the seed using machine vision has been made, to date, in order to understand the underlying problems. This paper investigates the plausibility of creating a labeled sugarcane billet dataset using a readily-available camera positioned beneath a planter and analysing this using a YOLOv3 network. This network resulted in a high mean average precision at intersect over union of 0.5 (mAP50) of 0.852 on test images, and was used to provide planting metrics by generating a furrow map.
{"title":"Furrow Mapping of Sugarcane Billet Density Using Deep Learning and Object Detection","authors":"J. Scott, Andrew Busch","doi":"10.1109/DICTA51227.2020.9363394","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363394","url":null,"abstract":"Australia's sugar industry is currently undergoing significant hardships, due to global market contractions from COVID-19, increased crop forecasts from larger global producers, and falling oil prices. Current planting practices utilize inefficient mass-flow planting techniques, and no attempt to map the seed using machine vision has been made, to date, in order to understand the underlying problems. This paper investigates the plausibility of creating a labeled sugarcane billet dataset using a readily-available camera positioned beneath a planter and analysing this using a YOLOv3 network. This network resulted in a high mean average precision at intersect over union of 0.5 (mAP50) of 0.852 on test images, and was used to provide planting metrics by generating a furrow map.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123467616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-05DOI: 10.1109/DICTA51227.2020.9363388
Chee-Kheng Chng, Álvaro Parra, Tat-Jun Chin, Y. Latif
Estimating absolute camera orientations is essential for attitude estimation tasks. An established approach is to first carry out visual odometry (VO) or visual SLAM (V-SLAM), and retrieve the camera orientations (3 DOF) from the camera poses (6 DOF) estimated by VO or V-SLAM. One drawback of this approach, besides the redundancy in estimating full 6 DOF camera poses, is the dependency on estimating a map (3D scene points) jointly with the 6 DOF poses due to the basic constraint on structure-and-motion. To simplify the task of absolute orientation estimation, we formulate the monocular rotational odometry problem and devise a fast algorithm to accurately estimate camera orientations with 2D-2D feature matches alone. Underpinning our system is a new incremental rotation averaging method for fast and constant time iterative updating. Furthermore, our system maintains a view-graph that 1) allows solving loop closure to remove camera orientation drift, and 2) can be used to warm start a V-SLAM system. We conduct extensive quantitative experiments on real-world datasets to demonstrate the accuracy of our incremental camera orientation solver. Finally, we showcase the benefit of our algorithm to V-SLAM: 1) solving the known rotation problem to estimate the trajectory of the camera and the surrounding map, and 2) enabling V-SLAM systems to track pure rotational motions.
{"title":"Monocular Rotational Odometry with Incremental Rotation Averaging and Loop Closure","authors":"Chee-Kheng Chng, Álvaro Parra, Tat-Jun Chin, Y. Latif","doi":"10.1109/DICTA51227.2020.9363388","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363388","url":null,"abstract":"Estimating absolute camera orientations is essential for attitude estimation tasks. An established approach is to first carry out visual odometry (VO) or visual SLAM (V-SLAM), and retrieve the camera orientations (3 DOF) from the camera poses (6 DOF) estimated by VO or V-SLAM. One drawback of this approach, besides the redundancy in estimating full 6 DOF camera poses, is the dependency on estimating a map (3D scene points) jointly with the 6 DOF poses due to the basic constraint on structure-and-motion. To simplify the task of absolute orientation estimation, we formulate the monocular rotational odometry problem and devise a fast algorithm to accurately estimate camera orientations with 2D-2D feature matches alone. Underpinning our system is a new incremental rotation averaging method for fast and constant time iterative updating. Furthermore, our system maintains a view-graph that 1) allows solving loop closure to remove camera orientation drift, and 2) can be used to warm start a V-SLAM system. We conduct extensive quantitative experiments on real-world datasets to demonstrate the accuracy of our incremental camera orientation solver. Finally, we showcase the benefit of our algorithm to V-SLAM: 1) solving the known rotation problem to estimate the trajectory of the camera and the surrounding map, and 2) enabling V-SLAM systems to track pure rotational motions.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129109636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-29DOI: 10.1109/DICTA51227.2020.9363374
Shivaank Agarwal, R. Gudi, Paresh Saxena
The problem of segregating recyclable waste is fairly daunting for many countries. This article presents an approach for image based classification of plastic waste using one-shot learning techniques. The proposed approach exploits discriminative features generated via the siamese and triplet loss convolutional neural networks to help differentiate between 5 types of plastic waste based on their resin codes. The approach achieves an accuracy of 99.74% on the WaDaBa Database [1].
{"title":"One-Shot learning based classification for segregation of plastic waste","authors":"Shivaank Agarwal, R. Gudi, Paresh Saxena","doi":"10.1109/DICTA51227.2020.9363374","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363374","url":null,"abstract":"The problem of segregating recyclable waste is fairly daunting for many countries. This article presents an approach for image based classification of plastic waste using one-shot learning techniques. The proposed approach exploits discriminative features generated via the siamese and triplet loss convolutional neural networks to help differentiate between 5 types of plastic waste based on their resin codes. The approach achieves an accuracy of 99.74% on the WaDaBa Database [1].","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"91 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126026663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-22DOI: 10.1109/DICTA51227.2020.9363393
Md. Aminur Rab Ratul, M. T. Elahi, M. Mozaffari, Won-Sook Lee
Protein secondary structure is crucial to creating an information bridge between the primary and tertiary structures. Precise prediction of eight-state protein secondary structure (PSS) has been significantly utilized in the structural and functional analysis of proteins. Deep learning techniques have been recently applied in this area and raised the eight-state (Q8) protein secondary structure prediction accuracy remarkably. Nevertheless, from a theoretical standpoint, there are still many rooms for improvement, specifically in the eight-state PSS prediction. In this study, we have presented a new deep convolutional neural network called PS8- Net, to enhance the accuracy of eight-class PSS prediction. The input of this architecture is a carefully constructed feature matrix from the proteins sequence features and profile features. We introduce a new PS8 module with skip connection to extracting the long-term inter-dependencies from higher layers, obtaining local contexts in earlier layers, and achieving global information during secondary structure prediction. This architecture enables the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class. To the best of our knowledge, our proposed PS8-Net experiment results demonstrate that it outperforms all the state-of-the-art methods on the benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets.
{"title":"PS8-Net: A Deep Convolutional Neural Network to Predict the Eight-State Protein Secondary Structure","authors":"Md. Aminur Rab Ratul, M. T. Elahi, M. Mozaffari, Won-Sook Lee","doi":"10.1109/DICTA51227.2020.9363393","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363393","url":null,"abstract":"Protein secondary structure is crucial to creating an information bridge between the primary and tertiary structures. Precise prediction of eight-state protein secondary structure (PSS) has been significantly utilized in the structural and functional analysis of proteins. Deep learning techniques have been recently applied in this area and raised the eight-state (Q8) protein secondary structure prediction accuracy remarkably. Nevertheless, from a theoretical standpoint, there are still many rooms for improvement, specifically in the eight-state PSS prediction. In this study, we have presented a new deep convolutional neural network called PS8- Net, to enhance the accuracy of eight-class PSS prediction. The input of this architecture is a carefully constructed feature matrix from the proteins sequence features and profile features. We introduce a new PS8 module with skip connection to extracting the long-term inter-dependencies from higher layers, obtaining local contexts in earlier layers, and achieving global information during secondary structure prediction. This architecture enables the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class. To the best of our knowledge, our proposed PS8-Net experiment results demonstrate that it outperforms all the state-of-the-art methods on the benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128507113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/DICTA51227.2020.9363409
Hassan Mahmood, Asim Iqbal, S. Islam
Image registration is a widely-used technique in analysing large scale datasets that are captured through various imaging modalities and techniques in biomedical imaging such as MRI, X-Rays, etc. These datasets are typically collected from various sites and under different imaging protocols using a variety of scanners. Such heterogeneity in the data collection process causes inhomogeneity or variation in intensity (brightness) and noise distribution. These variations play a detrimental role in the performance of image registration, segmentation and detection algorithms. Classical image registration methods are computationally expensive but are able to handle these artifacts relatively better. However, deep learning-based techniques are shown to be computationally efficient for automated brain registration but are sensitive to the intensity variations. In this study, we investigate the effect of variation in intensity distribution among input image pairs for deep learning-based image registration methods. We find a performance degradation of these models when brain image pairs with different intensity distribution are presented even with similar structures. To overcome this limitation, we incorporate a structural similarity-based loss function in a deep neural network and test its performance on the validation split separated before training as well as on a completely unseen new dataset. We report that the deep learning models trained with structure similarity-based loss seems to perform better for both datasets. This investigation highlights a possible performance limiting factor in deep learning-based registration models and suggests a potential solution to incorporate the intensity distribution variation in the input image pairs. Our code and models are available at https://github.com/hassaanmahmood/DeepIntense.
{"title":"Exploring Intensity Invariance in Deep Neural Networks for Brain Image Registration","authors":"Hassan Mahmood, Asim Iqbal, S. Islam","doi":"10.1109/DICTA51227.2020.9363409","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363409","url":null,"abstract":"Image registration is a widely-used technique in analysing large scale datasets that are captured through various imaging modalities and techniques in biomedical imaging such as MRI, X-Rays, etc. These datasets are typically collected from various sites and under different imaging protocols using a variety of scanners. Such heterogeneity in the data collection process causes inhomogeneity or variation in intensity (brightness) and noise distribution. These variations play a detrimental role in the performance of image registration, segmentation and detection algorithms. Classical image registration methods are computationally expensive but are able to handle these artifacts relatively better. However, deep learning-based techniques are shown to be computationally efficient for automated brain registration but are sensitive to the intensity variations. In this study, we investigate the effect of variation in intensity distribution among input image pairs for deep learning-based image registration methods. We find a performance degradation of these models when brain image pairs with different intensity distribution are presented even with similar structures. To overcome this limitation, we incorporate a structural similarity-based loss function in a deep neural network and test its performance on the validation split separated before training as well as on a completely unseen new dataset. We report that the deep learning models trained with structure similarity-based loss seems to perform better for both datasets. This investigation highlights a possible performance limiting factor in deep learning-based registration models and suggests a potential solution to incorporate the intensity distribution variation in the input image pairs. Our code and models are available at https://github.com/hassaanmahmood/DeepIntense.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128023528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-18DOI: 10.1109/DICTA51227.2020.9363371
Scarlett Raine, R. Marchant, Peyman Moghadam, F. Maire, B. Kettle, Brano Kusy
Underwater surveys conducted using divers or robots equipped with customized camera payloads can generate a large number of images. Manual review of these images to extract ecological data is prohibitive in terms of time and cost, thus providing strong incentive to automate this process using machine learning solutions. In this paper, we introduce a multi-species detector and classifier for seagrasses based on a deep convolutional neural network (achieved an overall accuracy of 92.4%). We also introduce a simple method to semi-automatically label image patches and therefore minimize manual labelling requirement. We describe and release publicly the dataset collected in this study as well as the code and pre-trained models to replicate our experiments at: https://github.com/csiro-robotics/deepseagrass
{"title":"Multi-species Seagrass Detection and Classification from Underwater Images","authors":"Scarlett Raine, R. Marchant, Peyman Moghadam, F. Maire, B. Kettle, Brano Kusy","doi":"10.1109/DICTA51227.2020.9363371","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363371","url":null,"abstract":"Underwater surveys conducted using divers or robots equipped with customized camera payloads can generate a large number of images. Manual review of these images to extract ecological data is prohibitive in terms of time and cost, thus providing strong incentive to automate this process using machine learning solutions. In this paper, we introduce a multi-species detector and classifier for seagrasses based on a deep convolutional neural network (achieved an overall accuracy of 92.4%). We also introduce a simple method to semi-automatically label image patches and therefore minimize manual labelling requirement. We describe and release publicly the dataset collected in this study as well as the code and pre-trained models to replicate our experiments at: https://github.com/csiro-robotics/deepseagrass","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131040689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-14DOI: 10.1109/DICTA51227.2020.9363415
Shashank Gupta, A. Robles-Kelly
In this paper, we present a method aimed at integrating domain knowledge abstracted as logic rules into the predictive behaviour of a neural network using feature extracting functions. We combine the declarative first-order logic rules which represents the human knowledge in a logically-structured format akin to that introduced in [1] with feature-extracting functions which act as the decision rules presented in [2]. These functions are embodied as programming functions which can represent, in a straightforward manner, the applicable domain knowledge as a set of logical instructions and provide a cumulative set of probability distributions of the input data. These distributions can then be used during the training process in a mini-batch strategy. We also illustrate the utility of our method for sentiment analysis and compare our results to those obtained using a number of alternatives elsewhere in the literature.
{"title":"Feature-Extracting Functions for Neural Logic Rule Learning","authors":"Shashank Gupta, A. Robles-Kelly","doi":"10.1109/DICTA51227.2020.9363415","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363415","url":null,"abstract":"In this paper, we present a method aimed at integrating domain knowledge abstracted as logic rules into the predictive behaviour of a neural network using feature extracting functions. We combine the declarative first-order logic rules which represents the human knowledge in a logically-structured format akin to that introduced in [1] with feature-extracting functions which act as the decision rules presented in [2]. These functions are embodied as programming functions which can represent, in a straightforward manner, the applicable domain knowledge as a set of logical instructions and provide a cumulative set of probability distributions of the input data. These distributions can then be used during the training process in a mini-batch strategy. We also illustrate the utility of our method for sentiment analysis and compare our results to those obtained using a number of alternatives elsewhere in the literature.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114152513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-04DOI: 10.1109/DICTA51227.2020.9363383
D. Gomes, Lihong Zheng
Plant phenotyping concerns the study of plant traits resulted from their interaction with their environment. Computer vision (CV) techniques represent promising, noninvasive approaches for related tasks such as leaf counting, defining leaf area, and tracking plant growth. Between potential CV techniques, deep learning has been prevalent in the last couple of years. Such an increase in interest happened mainly due to the release of a data set containing rosette plants that defined objective metrics to benchmark solutions. This paper discusses an interesting aspect of the recent best-performing works in this field: the fact that their main contribution comes from novel data augmentation techniques, rather than model improvements. Moreover, experiments are set to highlight the significance of data augmentation practices for limited data sets with narrow distributions. This paper intends to review the ingenious techniques to generate synthetic data to augment training and display evidence of their potential importance.
{"title":"Recent Data Augmentation Strategies for Deep Learning in Plant Phenotyping and Their Significance","authors":"D. Gomes, Lihong Zheng","doi":"10.1109/DICTA51227.2020.9363383","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363383","url":null,"abstract":"Plant phenotyping concerns the study of plant traits resulted from their interaction with their environment. Computer vision (CV) techniques represent promising, noninvasive approaches for related tasks such as leaf counting, defining leaf area, and tracking plant growth. Between potential CV techniques, deep learning has been prevalent in the last couple of years. Such an increase in interest happened mainly due to the release of a data set containing rosette plants that defined objective metrics to benchmark solutions. This paper discusses an interesting aspect of the recent best-performing works in this field: the fact that their main contribution comes from novel data augmentation techniques, rather than model improvements. Moreover, experiments are set to highlight the significance of data augmentation practices for limited data sets with narrow distributions. This paper intends to review the ingenious techniques to generate synthetic data to augment training and display evidence of their potential importance.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116882627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}