Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034561
Machine learning is widely used in agriculture to optimize practices such as planting, crop detection, and harvesting. The sugar industry is a major contributor to the global economy, valuable both as a food source and as a sustainable crop with useful byproducts. This paper presents three machine vision algorithms capable of performing quality classification and segmentation of raw sugarcane billets, developing a proof-of-concept for implementation at our industry partner's mill in NSW. Such a system has the potential to improve quality and reduce costs associated with an essential yet labor-intensive, inefficient, and unreliable process. Two recent iterations of the popular You Only Look Once (YOLO) algorithm, YOLOR and YOLOX, are trained for classification, with the state-of-the-art Mask R-CNN network used for segmentation. The best performing classification model, YOLOX, achieves a classification mAP50:95 of 90.1% across 7 classes in real time, with an average inference speed of 19.36 ms per image. Segmentation accuracy of AP50 of 70.8% and AR50-95 of 83.5% was achieved using the Mask CNN-R network.
机器学习广泛应用于农业,用于优化种植、作物检测和收获等实践。制糖业是全球经济的主要贡献者,作为食物来源和具有有用副产品的可持续作物都很有价值。本文介绍了三种机器视觉算法,能够对原甘蔗坯料进行质量分类和分割,并在新南威尔士州的行业合作伙伴的工厂中开发了概念验证。这样的系统具有提高质量和降低成本的潜力,这些成本与一个必要但劳力密集、效率低下和不可靠的过程有关。流行的YOLO (You Only Look Once)算法的两个最新迭代,即YOLO和YOLOX,用于分类训练,最先进的Mask R-CNN网络用于分割。表现最好的分类模型YOLOX在7个类别中实现了90.1%的实时分类mAP50:95,平均每张图像的推理速度为19.36 ms。利用Mask CNN-R网络实现了AP50和AR50-95的分割准确率分别为70.8%和83.5%。
{"title":"Quality Classification and Segmentation of Sugarcane Billets Using Machine Vision","authors":"","doi":"10.1109/DICTA56598.2022.10034561","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034561","url":null,"abstract":"Machine learning is widely used in agriculture to optimize practices such as planting, crop detection, and harvesting. The sugar industry is a major contributor to the global economy, valuable both as a food source and as a sustainable crop with useful byproducts. This paper presents three machine vision algorithms capable of performing quality classification and segmentation of raw sugarcane billets, developing a proof-of-concept for implementation at our industry partner's mill in NSW. Such a system has the potential to improve quality and reduce costs associated with an essential yet labor-intensive, inefficient, and unreliable process. Two recent iterations of the popular You Only Look Once (YOLO) algorithm, YOLOR and YOLOX, are trained for classification, with the state-of-the-art Mask R-CNN network used for segmentation. The best performing classification model, YOLOX, achieves a classification mAP50:95 of 90.1% across 7 classes in real time, with an average inference speed of 19.36 ms per image. Segmentation accuracy of AP50 of 70.8% and AR50-95 of 83.5% was achieved using the Mask CNN-R network.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034563
Realistic digital representations of 3D objects and surroundings have been recently made possible. This is due to recent advances in computer graphics allowing real-time and realistic physical world interactions with users [1], [2]. Emerging technologies enable real-world objects, persons, and scenes to move dynamically across users' views convincingly using a 3D point cloud [3]–[5]. A point cloud is a set of individual 3D points that are not organized and without any relationship in the 3D space [1], [6]. Each point has a 3D position but can also contain some other attributes (e.g., texture, reflectance, colour, and normal), creating a realistic visual representation model for static and dynamic 3D objects [3], [7]. This is desirable for many applications such as geographic information systems, cultural heritage, immersive telepresence, telehealth, disabled access, 3D telepresence, telecommunication, autonomous driving, gaming and robotics, virtual reality (VR), and augmented reality (AR) [2], [8]. Even the use of point cloud in Metaverse when creating an avatar or content in Metaverse and object-based interaction is required. The Metaverse is a virtual world that creates a network where anyone can interact through their avatars [9]. Therefore, it is critical to present the 3D virtual world as close to the real world as possible, with high-resolution and minimal noise and blur.
{"title":"Dynamic point cloud compression using slicing focusing on self-occluded points","authors":"","doi":"10.1109/DICTA56598.2022.10034563","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034563","url":null,"abstract":"Realistic digital representations of 3D objects and surroundings have been recently made possible. This is due to recent advances in computer graphics allowing real-time and realistic physical world interactions with users [1], [2]. Emerging technologies enable real-world objects, persons, and scenes to move dynamically across users' views convincingly using a 3D point cloud [3]–[5]. A point cloud is a set of individual 3D points that are not organized and without any relationship in the 3D space [1], [6]. Each point has a 3D position but can also contain some other attributes (e.g., texture, reflectance, colour, and normal), creating a realistic visual representation model for static and dynamic 3D objects [3], [7]. This is desirable for many applications such as geographic information systems, cultural heritage, immersive telepresence, telehealth, disabled access, 3D telepresence, telecommunication, autonomous driving, gaming and robotics, virtual reality (VR), and augmented reality (AR) [2], [8]. Even the use of point cloud in Metaverse when creating an avatar or content in Metaverse and object-based interaction is required. The Metaverse is a virtual world that creates a network where anyone can interact through their avatars [9]. Therefore, it is critical to present the 3D virtual world as close to the real world as possible, with high-resolution and minimal noise and blur.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123255024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034579
The sketch is essential in everyday communication and has received much attention in the computer vision community. In general, researchers use learning-based approaches to study sketch-based algorithms. These methods rely on large-scale data to train complex models to achieve satisfactory performance. Most existing datasets are drawn by unskilled users in a closed environment. These datasets are of low complexity, making deep learning models unable to extract more information. This paper proposes a new large-scale comic sketch dataset called ComicLib for sketch understanding. We scan 181,354 comic sketch images from the comic library and annotate them through a crowdsourcing annotation platform developed by ourselves. Finally, we obtain a dataset of millions of comic objects in 17 categories. We conduct comparative experiments on sketch recognition, retrieval, detection, generation and colorization using a number of deep learning algorithms. These experiments provide the benchmark performance of the ComicLib dataset. We hope that ComicLib can contribute to the field of sketch-based research.
{"title":"ComicLib: A New Large-Scale Comic Dataset for Sketch Understanding","authors":"","doi":"10.1109/DICTA56598.2022.10034579","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034579","url":null,"abstract":"The sketch is essential in everyday communication and has received much attention in the computer vision community. In general, researchers use learning-based approaches to study sketch-based algorithms. These methods rely on large-scale data to train complex models to achieve satisfactory performance. Most existing datasets are drawn by unskilled users in a closed environment. These datasets are of low complexity, making deep learning models unable to extract more information. This paper proposes a new large-scale comic sketch dataset called ComicLib for sketch understanding. We scan 181,354 comic sketch images from the comic library and annotate them through a crowdsourcing annotation platform developed by ourselves. Finally, we obtain a dataset of millions of comic objects in 17 categories. We conduct comparative experiments on sketch recognition, retrieval, detection, generation and colorization using a number of deep learning algorithms. These experiments provide the benchmark performance of the ComicLib dataset. We hope that ComicLib can contribute to the field of sketch-based research.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126599200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034571
Salient face prediction in multiple-face videos is a fundamental task in machine vision. It finds usage in various applications like video editing and human-machine interactions. The field has seen significant progress in recent years, backed by large datasets comprising specifically of multi-face videos. As the first contribution, we present promise in a visual-only baseline, achieving state-of-the-art results for salient face prediction. Our work motivates reconsideration towards sophisticated multimodal, multi-stream architectures. We further show that a simple upstream task like active speaker detection can give a reasonable baseline and match prior tailored models for detecting salient faces. Moreover, we bring to light the inconsistencies in evaluation strategies, highlighting a need for standardization. We propose using a ranking-based evaluation for the task. Overall, our work motivates a fundamental course correction before re-initiating the search for novel architectures and frameworks.
{"title":"Salient Face Prediction without Bells and Whistles","authors":"","doi":"10.1109/DICTA56598.2022.10034571","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034571","url":null,"abstract":"Salient face prediction in multiple-face videos is a fundamental task in machine vision. It finds usage in various applications like video editing and human-machine interactions. The field has seen significant progress in recent years, backed by large datasets comprising specifically of multi-face videos. As the first contribution, we present promise in a visual-only baseline, achieving state-of-the-art results for salient face prediction. Our work motivates reconsideration towards sophisticated multimodal, multi-stream architectures. We further show that a simple upstream task like active speaker detection can give a reasonable baseline and match prior tailored models for detecting salient faces. Moreover, we bring to light the inconsistencies in evaluation strategies, highlighting a need for standardization. We propose using a ranking-based evaluation for the task. Overall, our work motivates a fundamental course correction before re-initiating the search for novel architectures and frameworks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034620
Weight-bearing cone beam CT (CBCT), which provides high-resolution scanning in the natural weight-bearing position, is an emerging technique in orthopedic research. The high quality scans from CBCT machines have greatly facilitated the treatment and diagnosis of human foot [1], such as foot align [2] and foot surgery [3] [4]. In these clinical practices, an essential step to analyze the CBCT foot scan is the anatomical segmentation of foot bones which provides an overall understanding of the patient's situation.
{"title":"FootSeg: Automatic Anatomical Segmentation of Foot Bones from Weight-Bearing Cone Beam CT Scans","authors":"","doi":"10.1109/DICTA56598.2022.10034620","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034620","url":null,"abstract":"Weight-bearing cone beam CT (CBCT), which provides high-resolution scanning in the natural weight-bearing position, is an emerging technique in orthopedic research. The high quality scans from CBCT machines have greatly facilitated the treatment and diagnosis of human foot [1], such as foot align [2] and foot surgery [3] [4]. In these clinical practices, an essential step to analyze the CBCT foot scan is the anatomical segmentation of foot bones which provides an overall understanding of the patient's situation.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034564
Images captured by drones are extremely difficult to detect due to varying camera angles, distances, sizes, and environmental conditions, making it challenging to accurately detect an object from a height. Nonetheless, object detection plays a crucial role in computer vision and has made significant improvements to images captured by drones. We apply the YOLOv5 framework with modified feature extraction and focus detection. The problem with aerial images is object size and viewing angle from a high altitude, so we proposed a single-stage object detection model called “SimplestNet-Drone”. We included a fourth prediction head to improve the object detection on the smallest objects and improve the detection speed. The algorithm's prediction accuracy is improved by adding an attention model mechanism, which detects attention regions in environments and suppresses unnecessary information. The model was trained and tested on the VisDorne dataset and compared with other object detection models. The model shows great improvement, with a mean average precision of 63.72%, and has improved the Yolo architecture. A real-time implementation of our model can be watched in the following YouTube video: https://youtu.be/De8t4tjtb6w
{"title":"SimplestNet-Drone: An efficient and Accurate Object Detection Algorithm for Drone Aerial Image Analytics","authors":"","doi":"10.1109/DICTA56598.2022.10034564","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034564","url":null,"abstract":"Images captured by drones are extremely difficult to detect due to varying camera angles, distances, sizes, and environmental conditions, making it challenging to accurately detect an object from a height. Nonetheless, object detection plays a crucial role in computer vision and has made significant improvements to images captured by drones. We apply the YOLOv5 framework with modified feature extraction and focus detection. The problem with aerial images is object size and viewing angle from a high altitude, so we proposed a single-stage object detection model called “SimplestNet-Drone”. We included a fourth prediction head to improve the object detection on the smallest objects and improve the detection speed. The algorithm's prediction accuracy is improved by adding an attention model mechanism, which detects attention regions in environments and suppresses unnecessary information. The model was trained and tested on the VisDorne dataset and compared with other object detection models. The model shows great improvement, with a mean average precision of 63.72%, and has improved the Yolo architecture. A real-time implementation of our model can be watched in the following YouTube video: https://youtu.be/De8t4tjtb6w","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034643
Segmenting various instances in various contexts with a common model is a challenge for instance segmentation. In this paper, we address this problem by capturing rich relationship information and propose our Co-Graph Convolution Network (CGC-Net). Based on Mask R-CNN, we propose our co-graph convolution mask head. Specifically, we decouple the mask head into two mask heads. For each mask head, we append a graph convolution layer to capture the corresponding relationship information. One focuses on the relationship information between appearance features for each position of the instance itself, while the other pays more attention to the semantic relationship between each channel for the corresponding instance's features. In addition, we add a co-relationship module to each graph convolution layer to share similar relationships between instances with the same category in an image. We integrate the outputs of two mask heads by element-wise multiplication to improve feature representation for final instance segmentation prediction. Compared with other state-of-the-art instance segmentation methods, experiments on MS COCO and Cityscapes datasets demonstrate our method's competitiveness. Besides, in order to verify the generalization of our CGC-Net, we also add our CGC-Net to other instance segmentation networks, and the experiment results show our method still can obtain stable gains in performance.
{"title":"Co-Graph Convolution for Instance Segmentation","authors":"","doi":"10.1109/DICTA56598.2022.10034643","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034643","url":null,"abstract":"Segmenting various instances in various contexts with a common model is a challenge for instance segmentation. In this paper, we address this problem by capturing rich relationship information and propose our Co-Graph Convolution Network (CGC-Net). Based on Mask R-CNN, we propose our co-graph convolution mask head. Specifically, we decouple the mask head into two mask heads. For each mask head, we append a graph convolution layer to capture the corresponding relationship information. One focuses on the relationship information between appearance features for each position of the instance itself, while the other pays more attention to the semantic relationship between each channel for the corresponding instance's features. In addition, we add a co-relationship module to each graph convolution layer to share similar relationships between instances with the same category in an image. We integrate the outputs of two mask heads by element-wise multiplication to improve feature representation for final instance segmentation prediction. Compared with other state-of-the-art instance segmentation methods, experiments on MS COCO and Cityscapes datasets demonstrate our method's competitiveness. Besides, in order to verify the generalization of our CGC-Net, we also add our CGC-Net to other instance segmentation networks, and the experiment results show our method still can obtain stable gains in performance.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130273688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034567
Prostate cancer (PC) is one of the most aggressive cancers that exist. Early detection of PC is indispensable for treatment. Biopsies are often carried out to determine the Gleason score of PC which helps to predict the aggressiveness of PC. As biopsies have considerable associated risk, especially for old people, machine learning can be used to predict the PC Gleason grade from clinical biomarkers. These biomarkers are typically structured in a table. In this paper, we propose to use advanced tabular deep neural network architectures, like TabNet and TabTransformer, to grade PC. We also perform a comparative study of various machine learning approaches, including traditional methods, tree-based classifiers, and shallow neural networks, for this purpose. Our experimental results demonstrate the superior performance of the TabNet deep learning method.
{"title":"Prostate Cancer Diagnosis from Structured Clinical Biomarkers with Deep Learning: Anonymous Authors","authors":"","doi":"10.1109/DICTA56598.2022.10034567","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034567","url":null,"abstract":"Prostate cancer (PC) is one of the most aggressive cancers that exist. Early detection of PC is indispensable for treatment. Biopsies are often carried out to determine the Gleason score of PC which helps to predict the aggressiveness of PC. As biopsies have considerable associated risk, especially for old people, machine learning can be used to predict the PC Gleason grade from clinical biomarkers. These biomarkers are typically structured in a table. In this paper, we propose to use advanced tabular deep neural network architectures, like TabNet and TabTransformer, to grade PC. We also perform a comparative study of various machine learning approaches, including traditional methods, tree-based classifiers, and shallow neural networks, for this purpose. Our experimental results demonstrate the superior performance of the TabNet deep learning method.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128919707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034577
Dyslexia is a learning syndrome commonly found in children that causes poor reading and comprehending skills even though they have normal intelligence. Dyslexia is more prevalent among school children. Dyslexia is caused by wide range of features and the exact cause is still unclear which makes it difficult for developing a generalized dyslexia detection model. Feature engineering to extract major features that contribute for generalized capability of the classifier is a significant challenge while developing a classification model for dyslexia. Conventional models for prediction of dyslexia based on psychological assessments, Imaging methods such as Magnetic Resonance Images, functional MRI images and Electroencephalogram (EEG) signals are not usually preferred for clinical disorders such as dyslexia especially on children due to adverse radioactive effects. To overcome these problems, this research work adapts an image-based technique for prediction of dyslexia based on eye gaze points while reading. Eye movement tracking methods are non-invasive and rich indices of brain study and cognitive processing. The eye gaze point while reading is tracked and represented as 2-D scan path images. The work also proposes an enhanced Dense Net deep transfer learning solution for feature engineering and classification of dyslexia. A new approach of enhanced Dense Net deep transfer learning is proposed where a deep learning model is built from 2d-scanpath images of dyslexia. This pre-trained model is used further to classify dyslexia using deep transfer learning. The proposed system uses the key characteristics of deep learning and transfer learning and has shown high performance when compared to existing state-of-the-art machine learning models with a high accuracy rate of 96.36 %. The results demonstrate that the enhanced deep transfer learning model performed well in identifying significant features and classification of dyslexia using 2-D scan path images.
{"title":"Image-based Detection of Dyslexic Readers from 2-D Scan path using an Enhanced Deep Transfer Learning Paradigm","authors":"","doi":"10.1109/DICTA56598.2022.10034577","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034577","url":null,"abstract":"Dyslexia is a learning syndrome commonly found in children that causes poor reading and comprehending skills even though they have normal intelligence. Dyslexia is more prevalent among school children. Dyslexia is caused by wide range of features and the exact cause is still unclear which makes it difficult for developing a generalized dyslexia detection model. Feature engineering to extract major features that contribute for generalized capability of the classifier is a significant challenge while developing a classification model for dyslexia. Conventional models for prediction of dyslexia based on psychological assessments, Imaging methods such as Magnetic Resonance Images, functional MRI images and Electroencephalogram (EEG) signals are not usually preferred for clinical disorders such as dyslexia especially on children due to adverse radioactive effects. To overcome these problems, this research work adapts an image-based technique for prediction of dyslexia based on eye gaze points while reading. Eye movement tracking methods are non-invasive and rich indices of brain study and cognitive processing. The eye gaze point while reading is tracked and represented as 2-D scan path images. The work also proposes an enhanced Dense Net deep transfer learning solution for feature engineering and classification of dyslexia. A new approach of enhanced Dense Net deep transfer learning is proposed where a deep learning model is built from 2d-scanpath images of dyslexia. This pre-trained model is used further to classify dyslexia using deep transfer learning. The proposed system uses the key characteristics of deep learning and transfer learning and has shown high performance when compared to existing state-of-the-art machine learning models with a high accuracy rate of 96.36 %. The results demonstrate that the enhanced deep transfer learning model performed well in identifying significant features and classification of dyslexia using 2-D scan path images.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130918195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034607
Learning from imbalanced datasets remains a significant challenge for real-world applications. The decoupled training approach seems to achieve better performance among existing approaches for long-tail recognition. Moreover, there are simple and effective tricks that can be used to further improve the performance of decoupled learning and help models trained on long-tailed datasets to be more robust to the class imbalance problem. However, if used inappropriately, these tricks can result in lower than expected recognition accuracy. Unfortunately, there is a lack of comprehensive empirical studies that provide guidelines on how to combine these tricks appropriately. In this paper, we explore existing long-tail visual recognition tricks and perform extensive experiments to provide a detailed analysis of the impact of each trick and come up with an effective combination of these tricks for decoupled training. Furthermore, we introduce a new loss function called hard mining loss (HML), which is more suitable to learn the model to better discriminate head and tail classes. In addition, unlike previous work, we introduce a new learning scheme for decoupled training following an end-to-end process. We conducted our evaluation experiments on the CIFAR10, CIFAR100 and iNaturalist 2018 datasets. The results11Code is available at the link will be made available. show that our method outperforms existing methods that address class imbalance issue for image classification tasks. We believe that our approach will serve as a solid foundation for improving class imbalance problems in many other computer vision tasks.
{"title":"Rethinking Decoupled Training with Bag of Tricks for Long-Tailed Recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034607","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034607","url":null,"abstract":"Learning from imbalanced datasets remains a significant challenge for real-world applications. The decoupled training approach seems to achieve better performance among existing approaches for long-tail recognition. Moreover, there are simple and effective tricks that can be used to further improve the performance of decoupled learning and help models trained on long-tailed datasets to be more robust to the class imbalance problem. However, if used inappropriately, these tricks can result in lower than expected recognition accuracy. Unfortunately, there is a lack of comprehensive empirical studies that provide guidelines on how to combine these tricks appropriately. In this paper, we explore existing long-tail visual recognition tricks and perform extensive experiments to provide a detailed analysis of the impact of each trick and come up with an effective combination of these tricks for decoupled training. Furthermore, we introduce a new loss function called hard mining loss (HML), which is more suitable to learn the model to better discriminate head and tail classes. In addition, unlike previous work, we introduce a new learning scheme for decoupled training following an end-to-end process. We conducted our evaluation experiments on the CIFAR10, CIFAR100 and iNaturalist 2018 datasets. The results11Code is available at the link will be made available. show that our method outperforms existing methods that address class imbalance issue for image classification tasks. We believe that our approach will serve as a solid foundation for improving class imbalance problems in many other computer vision tasks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"855 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}