首页 > 最新文献

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Quality Classification and Segmentation of Sugarcane Billets Using Machine Vision 基于机器视觉的甘蔗坯质量分类与分割
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034561
Machine learning is widely used in agriculture to optimize practices such as planting, crop detection, and harvesting. The sugar industry is a major contributor to the global economy, valuable both as a food source and as a sustainable crop with useful byproducts. This paper presents three machine vision algorithms capable of performing quality classification and segmentation of raw sugarcane billets, developing a proof-of-concept for implementation at our industry partner's mill in NSW. Such a system has the potential to improve quality and reduce costs associated with an essential yet labor-intensive, inefficient, and unreliable process. Two recent iterations of the popular You Only Look Once (YOLO) algorithm, YOLOR and YOLOX, are trained for classification, with the state-of-the-art Mask R-CNN network used for segmentation. The best performing classification model, YOLOX, achieves a classification mAP50:95 of 90.1% across 7 classes in real time, with an average inference speed of 19.36 ms per image. Segmentation accuracy of AP50 of 70.8% and AR50-95 of 83.5% was achieved using the Mask CNN-R network.
机器学习广泛应用于农业,用于优化种植、作物检测和收获等实践。制糖业是全球经济的主要贡献者,作为食物来源和具有有用副产品的可持续作物都很有价值。本文介绍了三种机器视觉算法,能够对原甘蔗坯料进行质量分类和分割,并在新南威尔士州的行业合作伙伴的工厂中开发了概念验证。这样的系统具有提高质量和降低成本的潜力,这些成本与一个必要但劳力密集、效率低下和不可靠的过程有关。流行的YOLO (You Only Look Once)算法的两个最新迭代,即YOLO和YOLOX,用于分类训练,最先进的Mask R-CNN网络用于分割。表现最好的分类模型YOLOX在7个类别中实现了90.1%的实时分类mAP50:95,平均每张图像的推理速度为19.36 ms。利用Mask CNN-R网络实现了AP50和AR50-95的分割准确率分别为70.8%和83.5%。
{"title":"Quality Classification and Segmentation of Sugarcane Billets Using Machine Vision","authors":"","doi":"10.1109/DICTA56598.2022.10034561","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034561","url":null,"abstract":"Machine learning is widely used in agriculture to optimize practices such as planting, crop detection, and harvesting. The sugar industry is a major contributor to the global economy, valuable both as a food source and as a sustainable crop with useful byproducts. This paper presents three machine vision algorithms capable of performing quality classification and segmentation of raw sugarcane billets, developing a proof-of-concept for implementation at our industry partner's mill in NSW. Such a system has the potential to improve quality and reduce costs associated with an essential yet labor-intensive, inefficient, and unreliable process. Two recent iterations of the popular You Only Look Once (YOLO) algorithm, YOLOR and YOLOX, are trained for classification, with the state-of-the-art Mask R-CNN network used for segmentation. The best performing classification model, YOLOX, achieves a classification mAP50:95 of 90.1% across 7 classes in real time, with an average inference speed of 19.36 ms per image. Segmentation accuracy of AP50 of 70.8% and AR50-95 of 83.5% was achieved using the Mask CNN-R network.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic point cloud compression using slicing focusing on self-occluded points 动态点云压缩使用切片聚焦自遮挡点
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034563
Realistic digital representations of 3D objects and surroundings have been recently made possible. This is due to recent advances in computer graphics allowing real-time and realistic physical world interactions with users [1], [2]. Emerging technologies enable real-world objects, persons, and scenes to move dynamically across users' views convincingly using a 3D point cloud [3]–[5]. A point cloud is a set of individual 3D points that are not organized and without any relationship in the 3D space [1], [6]. Each point has a 3D position but can also contain some other attributes (e.g., texture, reflectance, colour, and normal), creating a realistic visual representation model for static and dynamic 3D objects [3], [7]. This is desirable for many applications such as geographic information systems, cultural heritage, immersive telepresence, telehealth, disabled access, 3D telepresence, telecommunication, autonomous driving, gaming and robotics, virtual reality (VR), and augmented reality (AR) [2], [8]. Even the use of point cloud in Metaverse when creating an avatar or content in Metaverse and object-based interaction is required. The Metaverse is a virtual world that creates a network where anyone can interact through their avatars [9]. Therefore, it is critical to present the 3D virtual world as close to the real world as possible, with high-resolution and minimal noise and blur.
最近,3D物体和周围环境的逼真数字表示已经成为可能。这是由于计算机图形学的最新进展允许与用户进行实时和真实的物理世界交互[1],b[2]。新兴技术使现实世界的物体、人物和场景能够使用3D点云[3]-[5]在用户视图中动态移动,令人信服。点云是一组独立的三维点,它们在三维空间[1],[6]中没有组织,没有任何关系。每个点都有一个3D位置,但也可以包含一些其他属性(例如,纹理,反射率,颜色和法线),为静态和动态3D对象[3],[7]创建一个逼真的视觉表示模型。这对于地理信息系统、文化遗产、沉浸式远程呈现、远程医疗、残疾人访问、3D远程呈现、电信、自动驾驶、游戏和机器人、虚拟现实(VR)和增强现实(AR)等许多应用都是理想的。在Metaverse中创建角色或内容以及基于对象的交互时,甚至需要使用Metaverse中的点云。虚拟世界是一个虚拟世界,它创造了一个网络,任何人都可以通过他们的化身b[9]进行互动。因此,呈现出尽可能接近真实世界的3D虚拟世界是至关重要的,要有高分辨率和最小的噪点和模糊。
{"title":"Dynamic point cloud compression using slicing focusing on self-occluded points","authors":"","doi":"10.1109/DICTA56598.2022.10034563","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034563","url":null,"abstract":"Realistic digital representations of 3D objects and surroundings have been recently made possible. This is due to recent advances in computer graphics allowing real-time and realistic physical world interactions with users [1], [2]. Emerging technologies enable real-world objects, persons, and scenes to move dynamically across users' views convincingly using a 3D point cloud [3]–[5]. A point cloud is a set of individual 3D points that are not organized and without any relationship in the 3D space [1], [6]. Each point has a 3D position but can also contain some other attributes (e.g., texture, reflectance, colour, and normal), creating a realistic visual representation model for static and dynamic 3D objects [3], [7]. This is desirable for many applications such as geographic information systems, cultural heritage, immersive telepresence, telehealth, disabled access, 3D telepresence, telecommunication, autonomous driving, gaming and robotics, virtual reality (VR), and augmented reality (AR) [2], [8]. Even the use of point cloud in Metaverse when creating an avatar or content in Metaverse and object-based interaction is required. The Metaverse is a virtual world that creates a network where anyone can interact through their avatars [9]. Therefore, it is critical to present the 3D virtual world as close to the real world as possible, with high-resolution and minimal noise and blur.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123255024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ComicLib: A New Large-Scale Comic Dataset for Sketch Understanding ComicLib:一个新的用于草图理解的大规模漫画数据集
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034579
The sketch is essential in everyday communication and has received much attention in the computer vision community. In general, researchers use learning-based approaches to study sketch-based algorithms. These methods rely on large-scale data to train complex models to achieve satisfactory performance. Most existing datasets are drawn by unskilled users in a closed environment. These datasets are of low complexity, making deep learning models unable to extract more information. This paper proposes a new large-scale comic sketch dataset called ComicLib for sketch understanding. We scan 181,354 comic sketch images from the comic library and annotate them through a crowdsourcing annotation platform developed by ourselves. Finally, we obtain a dataset of millions of comic objects in 17 categories. We conduct comparative experiments on sketch recognition, retrieval, detection, generation and colorization using a number of deep learning algorithms. These experiments provide the benchmark performance of the ComicLib dataset. We hope that ComicLib can contribute to the field of sketch-based research.
草图在日常交流中是必不可少的,在计算机视觉界受到了广泛的关注。一般来说,研究人员使用基于学习的方法来研究基于草图的算法。这些方法依靠大规模的数据来训练复杂的模型,以达到令人满意的性能。大多数现有的数据集是由不熟练的用户在封闭的环境中绘制的。这些数据集的复杂性较低,使得深度学习模型无法提取更多的信息。本文提出了一种新的大规模漫画草图数据集ComicLib,用于草图理解。我们从漫画库中扫描了181354张漫画素描图片,并通过自己开发的众包标注平台进行标注。最后,我们获得了一个包含17个类别的数百万个漫画对象的数据集。我们使用多种深度学习算法在素描识别、检索、检测、生成和着色方面进行了对比实验。这些实验提供了ComicLib数据集的基准性能。我们希望ComicLib能够为基于草图的研究领域做出贡献。
{"title":"ComicLib: A New Large-Scale Comic Dataset for Sketch Understanding","authors":"","doi":"10.1109/DICTA56598.2022.10034579","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034579","url":null,"abstract":"The sketch is essential in everyday communication and has received much attention in the computer vision community. In general, researchers use learning-based approaches to study sketch-based algorithms. These methods rely on large-scale data to train complex models to achieve satisfactory performance. Most existing datasets are drawn by unskilled users in a closed environment. These datasets are of low complexity, making deep learning models unable to extract more information. This paper proposes a new large-scale comic sketch dataset called ComicLib for sketch understanding. We scan 181,354 comic sketch images from the comic library and annotate them through a crowdsourcing annotation platform developed by ourselves. Finally, we obtain a dataset of millions of comic objects in 17 categories. We conduct comparative experiments on sketch recognition, retrieval, detection, generation and colorization using a number of deep learning algorithms. These experiments provide the benchmark performance of the ComicLib dataset. We hope that ComicLib can contribute to the field of sketch-based research.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126599200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Salient Face Prediction without Bells and Whistles 突出的面孔预测没有铃铛和口哨
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034571
Salient face prediction in multiple-face videos is a fundamental task in machine vision. It finds usage in various applications like video editing and human-machine interactions. The field has seen significant progress in recent years, backed by large datasets comprising specifically of multi-face videos. As the first contribution, we present promise in a visual-only baseline, achieving state-of-the-art results for salient face prediction. Our work motivates reconsideration towards sophisticated multimodal, multi-stream architectures. We further show that a simple upstream task like active speaker detection can give a reasonable baseline and match prior tailored models for detecting salient faces. Moreover, we bring to light the inconsistencies in evaluation strategies, highlighting a need for standardization. We propose using a ranking-based evaluation for the task. Overall, our work motivates a fundamental course correction before re-initiating the search for novel architectures and frameworks.
多人脸视频中的显著性人脸预测是机器视觉中的一项基本任务。它在视频编辑和人机交互等各种应用中都有应用。近年来,该领域取得了重大进展,特别是由多面视频组成的大型数据集。作为第一个贡献,我们提出了一个只有视觉基线的承诺,实现了最先进的显著面部预测结果。我们的工作促使我们重新思考复杂的多模式、多流架构。我们进一步表明,一个简单的上游任务,如主动说话人检测,可以提供一个合理的基线,并匹配先前定制的模型来检测显著面。此外,我们揭示了评估策略的不一致性,强调了标准化的必要性。我们建议对任务使用基于排名的评估。总的来说,我们的工作在重新开始寻找新的体系结构和框架之前激发了一个基本的过程修正。
{"title":"Salient Face Prediction without Bells and Whistles","authors":"","doi":"10.1109/DICTA56598.2022.10034571","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034571","url":null,"abstract":"Salient face prediction in multiple-face videos is a fundamental task in machine vision. It finds usage in various applications like video editing and human-machine interactions. The field has seen significant progress in recent years, backed by large datasets comprising specifically of multi-face videos. As the first contribution, we present promise in a visual-only baseline, achieving state-of-the-art results for salient face prediction. Our work motivates reconsideration towards sophisticated multimodal, multi-stream architectures. We further show that a simple upstream task like active speaker detection can give a reasonable baseline and match prior tailored models for detecting salient faces. Moreover, we bring to light the inconsistencies in evaluation strategies, highlighting a need for standardization. We propose using a ranking-based evaluation for the task. Overall, our work motivates a fundamental course correction before re-initiating the search for novel architectures and frameworks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FootSeg: Automatic Anatomical Segmentation of Foot Bones from Weight-Bearing Cone Beam CT Scans FootSeg:从负重锥束CT扫描中自动解剖分割足骨
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034620
Weight-bearing cone beam CT (CBCT), which provides high-resolution scanning in the natural weight-bearing position, is an emerging technique in orthopedic research. The high quality scans from CBCT machines have greatly facilitated the treatment and diagnosis of human foot [1], such as foot align [2] and foot surgery [3] [4]. In these clinical practices, an essential step to analyze the CBCT foot scan is the anatomical segmentation of foot bones which provides an overall understanding of the patient's situation.
负重锥束CT (CBCT)是骨科研究中的一项新兴技术,可在自然负重位置提供高分辨率扫描。CBCT机的高质量扫描极大地促进了人类足部[1]的治疗和诊断,如foot align[2]和foot surgery[3][4]。在这些临床实践中,分析CBCT足部扫描的一个重要步骤是足部骨骼的解剖分割,这可以全面了解患者的情况。
{"title":"FootSeg: Automatic Anatomical Segmentation of Foot Bones from Weight-Bearing Cone Beam CT Scans","authors":"","doi":"10.1109/DICTA56598.2022.10034620","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034620","url":null,"abstract":"Weight-bearing cone beam CT (CBCT), which provides high-resolution scanning in the natural weight-bearing position, is an emerging technique in orthopedic research. The high quality scans from CBCT machines have greatly facilitated the treatment and diagnosis of human foot [1], such as foot align [2] and foot surgery [3] [4]. In these clinical practices, an essential step to analyze the CBCT foot scan is the anatomical segmentation of foot bones which provides an overall understanding of the patient's situation.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimplestNet-Drone: An efficient and Accurate Object Detection Algorithm for Drone Aerial Image Analytics SimplestNet-Drone:一种高效、准确的无人机航拍图像目标检测算法
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034564
Images captured by drones are extremely difficult to detect due to varying camera angles, distances, sizes, and environmental conditions, making it challenging to accurately detect an object from a height. Nonetheless, object detection plays a crucial role in computer vision and has made significant improvements to images captured by drones. We apply the YOLOv5 framework with modified feature extraction and focus detection. The problem with aerial images is object size and viewing angle from a high altitude, so we proposed a single-stage object detection model called “SimplestNet-Drone”. We included a fourth prediction head to improve the object detection on the smallest objects and improve the detection speed. The algorithm's prediction accuracy is improved by adding an attention model mechanism, which detects attention regions in environments and suppresses unnecessary information. The model was trained and tested on the VisDorne dataset and compared with other object detection models. The model shows great improvement, with a mean average precision of 63.72%, and has improved the Yolo architecture. A real-time implementation of our model can be watched in the following YouTube video: https://youtu.be/De8t4tjtb6w
由于相机角度、距离、尺寸和环境条件的变化,无人机捕获的图像极难检测,这使得从高处准确检测物体具有挑战性。尽管如此,物体检测在计算机视觉中起着至关重要的作用,并对无人机捕获的图像进行了重大改进。我们将YOLOv5框架应用于改进的特征提取和焦点检测。航空图像的问题是物体大小和高空视角,因此我们提出了一种单级目标检测模型,称为“SimplestNet-Drone”。我们加入了第四个预测头,以提高对最小目标的目标检测,提高检测速度。该算法通过添加注意模型机制来检测环境中的注意区域并抑制不必要的信息,从而提高了算法的预测精度。该模型在VisDorne数据集上进行了训练和测试,并与其他目标检测模型进行了比较。该模型改进了Yolo结构,平均精度达到63.72%。我们的模型的实时实现可以在以下YouTube视频中观看:https://youtu.be/De8t4tjtb6w
{"title":"SimplestNet-Drone: An efficient and Accurate Object Detection Algorithm for Drone Aerial Image Analytics","authors":"","doi":"10.1109/DICTA56598.2022.10034564","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034564","url":null,"abstract":"Images captured by drones are extremely difficult to detect due to varying camera angles, distances, sizes, and environmental conditions, making it challenging to accurately detect an object from a height. Nonetheless, object detection plays a crucial role in computer vision and has made significant improvements to images captured by drones. We apply the YOLOv5 framework with modified feature extraction and focus detection. The problem with aerial images is object size and viewing angle from a high altitude, so we proposed a single-stage object detection model called “SimplestNet-Drone”. We included a fourth prediction head to improve the object detection on the smallest objects and improve the detection speed. The algorithm's prediction accuracy is improved by adding an attention model mechanism, which detects attention regions in environments and suppresses unnecessary information. The model was trained and tested on the VisDorne dataset and compared with other object detection models. The model shows great improvement, with a mean average precision of 63.72%, and has improved the Yolo architecture. A real-time implementation of our model can be watched in the following YouTube video: https://youtu.be/De8t4tjtb6w","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Graph Convolution for Instance Segmentation 实例分割的协同图卷积
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034643
Segmenting various instances in various contexts with a common model is a challenge for instance segmentation. In this paper, we address this problem by capturing rich relationship information and propose our Co-Graph Convolution Network (CGC-Net). Based on Mask R-CNN, we propose our co-graph convolution mask head. Specifically, we decouple the mask head into two mask heads. For each mask head, we append a graph convolution layer to capture the corresponding relationship information. One focuses on the relationship information between appearance features for each position of the instance itself, while the other pays more attention to the semantic relationship between each channel for the corresponding instance's features. In addition, we add a co-relationship module to each graph convolution layer to share similar relationships between instances with the same category in an image. We integrate the outputs of two mask heads by element-wise multiplication to improve feature representation for final instance segmentation prediction. Compared with other state-of-the-art instance segmentation methods, experiments on MS COCO and Cityscapes datasets demonstrate our method's competitiveness. Besides, in order to verify the generalization of our CGC-Net, we also add our CGC-Net to other instance segmentation networks, and the experiment results show our method still can obtain stable gains in performance.
对于实例分割来说,用一个通用模型分割各种上下文中的各种实例是一个挑战。在本文中,我们通过捕获丰富的关系信息来解决这个问题,并提出了我们的协图卷积网络(CGC-Net)。基于掩模R-CNN,我们提出了共图卷积掩模头。具体来说,我们将掩码头解耦为两个掩码头。对于每个掩码头,我们附加一个图卷积层来捕获相应的关系信息。一种方法关注实例本身每个位置的外观特征之间的关系信息,而另一种方法更关注对应实例特征的每个通道之间的语义关系。此外,我们在每个图卷积层中添加了一个互关系模块,以共享图像中具有相同类别的实例之间的相似关系。我们通过元素乘法来整合两个掩码头的输出,以改进最终实例分割预测的特征表示。与其他最先进的实例分割方法相比,在MS COCO和cityscape数据集上的实验证明了我们的方法的竞争力。此外,为了验证我们的CGC-Net的泛化性,我们还将我们的CGC-Net添加到其他实例分割网络中,实验结果表明我们的方法仍然可以获得稳定的性能提升。
{"title":"Co-Graph Convolution for Instance Segmentation","authors":"","doi":"10.1109/DICTA56598.2022.10034643","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034643","url":null,"abstract":"Segmenting various instances in various contexts with a common model is a challenge for instance segmentation. In this paper, we address this problem by capturing rich relationship information and propose our Co-Graph Convolution Network (CGC-Net). Based on Mask R-CNN, we propose our co-graph convolution mask head. Specifically, we decouple the mask head into two mask heads. For each mask head, we append a graph convolution layer to capture the corresponding relationship information. One focuses on the relationship information between appearance features for each position of the instance itself, while the other pays more attention to the semantic relationship between each channel for the corresponding instance's features. In addition, we add a co-relationship module to each graph convolution layer to share similar relationships between instances with the same category in an image. We integrate the outputs of two mask heads by element-wise multiplication to improve feature representation for final instance segmentation prediction. Compared with other state-of-the-art instance segmentation methods, experiments on MS COCO and Cityscapes datasets demonstrate our method's competitiveness. Besides, in order to verify the generalization of our CGC-Net, we also add our CGC-Net to other instance segmentation networks, and the experiment results show our method still can obtain stable gains in performance.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130273688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prostate Cancer Diagnosis from Structured Clinical Biomarkers with Deep Learning: Anonymous Authors 从结构化临床生物标志物与深度学习诊断前列腺癌:匿名作者
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034567
Prostate cancer (PC) is one of the most aggressive cancers that exist. Early detection of PC is indispensable for treatment. Biopsies are often carried out to determine the Gleason score of PC which helps to predict the aggressiveness of PC. As biopsies have considerable associated risk, especially for old people, machine learning can be used to predict the PC Gleason grade from clinical biomarkers. These biomarkers are typically structured in a table. In this paper, we propose to use advanced tabular deep neural network architectures, like TabNet and TabTransformer, to grade PC. We also perform a comparative study of various machine learning approaches, including traditional methods, tree-based classifiers, and shallow neural networks, for this purpose. Our experimental results demonstrate the superior performance of the TabNet deep learning method.
前列腺癌(PC)是存在的最具侵袭性的癌症之一。早期发现前列腺癌是治疗不可或缺的。活检通常用于确定前列腺癌的Gleason评分,这有助于预测前列腺癌的侵袭性。由于活检具有相当大的相关风险,特别是对于老年人,机器学习可用于从临床生物标志物预测PC Gleason分级。这些生物标记物通常以表格的形式呈现。在本文中,我们建议使用先进的表格深度神经网络架构,如TabNet和TabTransformer,对PC进行分级。为此,我们还对各种机器学习方法进行了比较研究,包括传统方法、基于树的分类器和浅神经网络。实验结果证明了TabNet深度学习方法的优越性能。
{"title":"Prostate Cancer Diagnosis from Structured Clinical Biomarkers with Deep Learning: Anonymous Authors","authors":"","doi":"10.1109/DICTA56598.2022.10034567","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034567","url":null,"abstract":"Prostate cancer (PC) is one of the most aggressive cancers that exist. Early detection of PC is indispensable for treatment. Biopsies are often carried out to determine the Gleason score of PC which helps to predict the aggressiveness of PC. As biopsies have considerable associated risk, especially for old people, machine learning can be used to predict the PC Gleason grade from clinical biomarkers. These biomarkers are typically structured in a table. In this paper, we propose to use advanced tabular deep neural network architectures, like TabNet and TabTransformer, to grade PC. We also perform a comparative study of various machine learning approaches, including traditional methods, tree-based classifiers, and shallow neural networks, for this purpose. Our experimental results demonstrate the superior performance of the TabNet deep learning method.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128919707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image-based Detection of Dyslexic Readers from 2-D Scan path using an Enhanced Deep Transfer Learning Paradigm 使用增强深度迁移学习范式的二维扫描路径中基于图像的失读症读者检测
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034577
Dyslexia is a learning syndrome commonly found in children that causes poor reading and comprehending skills even though they have normal intelligence. Dyslexia is more prevalent among school children. Dyslexia is caused by wide range of features and the exact cause is still unclear which makes it difficult for developing a generalized dyslexia detection model. Feature engineering to extract major features that contribute for generalized capability of the classifier is a significant challenge while developing a classification model for dyslexia. Conventional models for prediction of dyslexia based on psychological assessments, Imaging methods such as Magnetic Resonance Images, functional MRI images and Electroencephalogram (EEG) signals are not usually preferred for clinical disorders such as dyslexia especially on children due to adverse radioactive effects. To overcome these problems, this research work adapts an image-based technique for prediction of dyslexia based on eye gaze points while reading. Eye movement tracking methods are non-invasive and rich indices of brain study and cognitive processing. The eye gaze point while reading is tracked and represented as 2-D scan path images. The work also proposes an enhanced Dense Net deep transfer learning solution for feature engineering and classification of dyslexia. A new approach of enhanced Dense Net deep transfer learning is proposed where a deep learning model is built from 2d-scanpath images of dyslexia. This pre-trained model is used further to classify dyslexia using deep transfer learning. The proposed system uses the key characteristics of deep learning and transfer learning and has shown high performance when compared to existing state-of-the-art machine learning models with a high accuracy rate of 96.36 %. The results demonstrate that the enhanced deep transfer learning model performed well in identifying significant features and classification of dyslexia using 2-D scan path images.
阅读障碍是一种常见的儿童学习综合症,导致阅读和理解能力低下,即使他们有正常的智力。诵读困难症在学龄儿童中更为普遍。阅读障碍是由多种特征引起的,其确切原因尚不清楚,因此很难建立一个通用的阅读障碍检测模型。在开发阅读障碍分类模型时,提取有助于分类器泛化能力的主要特征是一个重大挑战。传统的基于心理评估的阅读障碍预测模型,成像方法如磁共振图像,功能性MRI图像和脑电图(EEG)信号通常不适合临床疾病,如阅读障碍,特别是儿童,由于不良的放射性影响。为了克服这些问题,本研究采用了一种基于图像的技术来预测阅读障碍,该技术是基于阅读时眼睛注视点的。眼动追踪方法是非侵入性的,是大脑研究和认知加工的丰富指标。在阅读时,眼睛注视点被跟踪并表示为二维扫描路径图像。该工作还提出了一种用于特征工程和阅读障碍分类的增强型密集网络深度迁移学习解决方案。提出了一种新的增强密集网络深度迁移学习方法,该方法基于阅读障碍的二维扫描路径图像构建深度学习模型。这个预训练模型被进一步用于使用深度迁移学习对阅读障碍进行分类。该系统利用了深度学习和迁移学习的关键特征,与现有的最先进的机器学习模型相比,该系统表现出了很高的性能,准确率高达96.36%。结果表明,增强的深度迁移学习模型在使用二维扫描路径图像识别阅读障碍的重要特征和分类方面表现良好。
{"title":"Image-based Detection of Dyslexic Readers from 2-D Scan path using an Enhanced Deep Transfer Learning Paradigm","authors":"","doi":"10.1109/DICTA56598.2022.10034577","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034577","url":null,"abstract":"Dyslexia is a learning syndrome commonly found in children that causes poor reading and comprehending skills even though they have normal intelligence. Dyslexia is more prevalent among school children. Dyslexia is caused by wide range of features and the exact cause is still unclear which makes it difficult for developing a generalized dyslexia detection model. Feature engineering to extract major features that contribute for generalized capability of the classifier is a significant challenge while developing a classification model for dyslexia. Conventional models for prediction of dyslexia based on psychological assessments, Imaging methods such as Magnetic Resonance Images, functional MRI images and Electroencephalogram (EEG) signals are not usually preferred for clinical disorders such as dyslexia especially on children due to adverse radioactive effects. To overcome these problems, this research work adapts an image-based technique for prediction of dyslexia based on eye gaze points while reading. Eye movement tracking methods are non-invasive and rich indices of brain study and cognitive processing. The eye gaze point while reading is tracked and represented as 2-D scan path images. The work also proposes an enhanced Dense Net deep transfer learning solution for feature engineering and classification of dyslexia. A new approach of enhanced Dense Net deep transfer learning is proposed where a deep learning model is built from 2d-scanpath images of dyslexia. This pre-trained model is used further to classify dyslexia using deep transfer learning. The proposed system uses the key characteristics of deep learning and transfer learning and has shown high performance when compared to existing state-of-the-art machine learning models with a high accuracy rate of 96.36 %. The results demonstrate that the enhanced deep transfer learning model performed well in identifying significant features and classification of dyslexia using 2-D scan path images.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130918195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Decoupled Training with Bag of Tricks for Long-Tailed Recognition 对长尾识别解耦训练的再思考
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034607
Learning from imbalanced datasets remains a significant challenge for real-world applications. The decoupled training approach seems to achieve better performance among existing approaches for long-tail recognition. Moreover, there are simple and effective tricks that can be used to further improve the performance of decoupled learning and help models trained on long-tailed datasets to be more robust to the class imbalance problem. However, if used inappropriately, these tricks can result in lower than expected recognition accuracy. Unfortunately, there is a lack of comprehensive empirical studies that provide guidelines on how to combine these tricks appropriately. In this paper, we explore existing long-tail visual recognition tricks and perform extensive experiments to provide a detailed analysis of the impact of each trick and come up with an effective combination of these tricks for decoupled training. Furthermore, we introduce a new loss function called hard mining loss (HML), which is more suitable to learn the model to better discriminate head and tail classes. In addition, unlike previous work, we introduce a new learning scheme for decoupled training following an end-to-end process. We conducted our evaluation experiments on the CIFAR10, CIFAR100 and iNaturalist 2018 datasets. The results11Code is available at the link will be made available. show that our method outperforms existing methods that address class imbalance issue for image classification tasks. We believe that our approach will serve as a solid foundation for improving class imbalance problems in many other computer vision tasks.
从不平衡数据集中学习对于现实世界的应用来说仍然是一个重大的挑战。在现有的长尾识别方法中,解耦训练方法似乎取得了更好的性能。此外,还有一些简单有效的技巧可以用来进一步提高解耦学习的性能,并帮助在长尾数据集上训练的模型对类不平衡问题具有更强的鲁棒性。然而,如果使用不当,这些技巧可能会导致低于预期的识别精度。不幸的是,缺乏全面的实证研究,为如何适当地结合这些技巧提供指导。在本文中,我们探索了现有的长尾视觉识别技巧,并进行了大量的实验,详细分析了每种技巧的影响,并提出了这些技巧的有效组合以进行解耦训练。此外,我们引入了一个新的损失函数,称为硬挖掘损失(HML),它更适合学习模型,以更好地区分头和尾类。此外,与以前的工作不同,我们引入了一种新的学习方案,用于遵循端到端过程的解耦训练。我们在CIFAR10、CIFAR100和iNaturalist 2018数据集上进行了评估实验。results11代码可在链接将提供。表明我们的方法优于现有的解决图像分类任务的类不平衡问题的方法。我们相信我们的方法将为改善许多其他计算机视觉任务中的类不平衡问题奠定坚实的基础。
{"title":"Rethinking Decoupled Training with Bag of Tricks for Long-Tailed Recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034607","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034607","url":null,"abstract":"Learning from imbalanced datasets remains a significant challenge for real-world applications. The decoupled training approach seems to achieve better performance among existing approaches for long-tail recognition. Moreover, there are simple and effective tricks that can be used to further improve the performance of decoupled learning and help models trained on long-tailed datasets to be more robust to the class imbalance problem. However, if used inappropriately, these tricks can result in lower than expected recognition accuracy. Unfortunately, there is a lack of comprehensive empirical studies that provide guidelines on how to combine these tricks appropriately. In this paper, we explore existing long-tail visual recognition tricks and perform extensive experiments to provide a detailed analysis of the impact of each trick and come up with an effective combination of these tricks for decoupled training. Furthermore, we introduce a new loss function called hard mining loss (HML), which is more suitable to learn the model to better discriminate head and tail classes. In addition, unlike previous work, we introduce a new learning scheme for decoupled training following an end-to-end process. We conducted our evaluation experiments on the CIFAR10, CIFAR100 and iNaturalist 2018 datasets. The results11Code is available at the link will be made available. show that our method outperforms existing methods that address class imbalance issue for image classification tasks. We believe that our approach will serve as a solid foundation for improving class imbalance problems in many other computer vision tasks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"855 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1