Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10281-4
Adrian Pekar, Laszlo Arpad Makara, Gergely Biczok
This paper explores the comparative analysis of federated learning (FL) and centralized learning (CL) models in the context of multi-class traffic flow classification for network applications, a timely study in the context of increasing privacy preservation concerns. Unlike existing literature that often omits detailed class-wise performance evaluation, and consistent data handling and feature selection approaches, our study rectifies these gaps by implementing a feed-forward neural network and assessing FL performance under both independent and identically distributed (IID) and non-independent and identically distributed (non-IID) conditions, with a particular focus on incremental training. In our cross-silo experimental setup involving five clients per round, FL models exhibit notable adaptability. Under IID conditions, the accuracy of the FL model peaked at 96.65%, demonstrating its robustness. Moreover, despite the challenges presented by non-IID environments, our FL models demonstrated significant resilience, adapting incrementally over rounds to optimize performance; in most scenarios, our FL models performed comparably to the idealistic CL model regarding multiple well-established metrics. Through a comprehensive traffic flow classification use case, this work (i) contributes to a better understanding of the capabilities and limitations of FL, offering valuable insights for the real-world deployment of FL, and (ii) provides a novel, large, carefully curated traffic flow dataset for the research community.
{"title":"Incremental federated learning for traffic flow classification in heterogeneous data scenarios","authors":"Adrian Pekar, Laszlo Arpad Makara, Gergely Biczok","doi":"10.1007/s00521-024-10281-4","DOIUrl":"https://doi.org/10.1007/s00521-024-10281-4","url":null,"abstract":"<p>This paper explores the comparative analysis of federated learning (FL) and centralized learning (CL) models in the context of multi-class traffic flow classification for network applications, a timely study in the context of increasing privacy preservation concerns. Unlike existing literature that often omits detailed class-wise performance evaluation, and consistent data handling and feature selection approaches, our study rectifies these gaps by implementing a feed-forward neural network and assessing FL performance under both independent and identically distributed (IID) and non-independent and identically distributed (non-IID) conditions, with a particular focus on incremental training. In our cross-silo experimental setup involving five clients per round, FL models exhibit notable adaptability. Under IID conditions, the accuracy of the FL model peaked at 96.65%, demonstrating its robustness. Moreover, despite the challenges presented by non-IID environments, our FL models demonstrated significant resilience, adapting incrementally over rounds to optimize performance; in most scenarios, our FL models performed comparably to the idealistic CL model regarding multiple well-established metrics. Through a comprehensive traffic flow classification use case, this work (i) contributes to a better understanding of the capabilities and limitations of FL, offering valuable insights for the real-world deployment of FL, and (ii) provides a novel, large, carefully curated traffic flow dataset for the research community.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10142-0
Harry Rogers, Beatriz De La Iglesia, Tahmina Zebin, Grzegorz Cielniak, Ben Magri
Modern agriculture relies heavily on the precise application of chemicals such as fertilisers, herbicides, and pesticides, which directly affect both crop yield and environmental footprint. Therefore, it is crucial to assess the accuracy of precision sprayers regarding the spatial location of spray deposits. However, there is currently no fully automated evaluation method for this. In this study, we collected a novel dataset from a precision spot spraying system to enable us to classify and detect spray deposits on target weeds and non-target crops. We employed multiple deep convolutional backbones for this task; subsequently, we have proposed a robustness testing methodology for evaluation purposes. We experimented with two novel data augmentation techniques: subtraction and thresholding which enhanced the classification accuracy and robustness of the developed models. On average, across nine different tests and four distinct convolutional neural networks, subtraction improves robustness by 50.83%, and thresholding increases by 42.26% from a baseline. Additionally, we have presented the results from a novel weakly supervised object detection task using our dataset, establishing a baseline Intersection over Union score of 42.78%. Our proposed pipeline includes an explainable artificial intelligence stage and provides insights not only into the spatial location of the spray deposits but also into the specific filtering methods within that spatial location utilised for classification.
{"title":"Advancing precision agriculture: domain-specific augmentations and robustness testing for convolutional neural networks in precision spraying evaluation","authors":"Harry Rogers, Beatriz De La Iglesia, Tahmina Zebin, Grzegorz Cielniak, Ben Magri","doi":"10.1007/s00521-024-10142-0","DOIUrl":"https://doi.org/10.1007/s00521-024-10142-0","url":null,"abstract":"<p>Modern agriculture relies heavily on the precise application of chemicals such as fertilisers, herbicides, and pesticides, which directly affect both crop yield and environmental footprint. Therefore, it is crucial to assess the accuracy of precision sprayers regarding the spatial location of spray deposits. However, there is currently no fully automated evaluation method for this. In this study, we collected a novel dataset from a precision spot spraying system to enable us to classify and detect spray deposits on target weeds and non-target crops. We employed multiple deep convolutional backbones for this task; subsequently, we have proposed a robustness testing methodology for evaluation purposes. We experimented with two novel data augmentation techniques: subtraction and thresholding which enhanced the classification accuracy and robustness of the developed models. On average, across nine different tests and four distinct convolutional neural networks, subtraction improves robustness by 50.83%, and thresholding increases by 42.26% from a baseline. Additionally, we have presented the results from a novel weakly supervised object detection task using our dataset, establishing a baseline Intersection over Union score of 42.78%. Our proposed pipeline includes an explainable artificial intelligence stage and provides insights not only into the spatial location of the spray deposits but also into the specific filtering methods within that spatial location utilised for classification.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10286-z
Lucas O. Teixeira, Diego Bertolini, Luiz S. Oliveira, George D. C. Cavalcanti, Yandre M. G. Costa
A primary challenge in pattern recognition is imbalanced datasets, resulting in skewed and biased predictions. This problem is exacerbated by limited data availability, increasing the reliance on expensive expert data labeling. The study introduces a novel method called contrastive dissimilarity, which combines dissimilarity-based representation with contrastive learning to improve classification performance in imbalance and data scarcity scenarios. Based on pairwise sample differences, dissimilarity representation excels in situations with numerous overlapping classes and limited samples per class. Unlike traditional methods that use fixed distance functions like Euclidean or cosine, our proposal employs metric learning with contrastive loss to estimate a custom dissimilarity function. We conducted extensive evaluations in 13 databases across multiple training–test splits. The results showed that this approach outperforms traditional models like SVM, random forest, and Naive Bayes, particularly in settings with limited training data.
{"title":"Contrastive dissimilarity: optimizing performance on imbalanced and limited data sets","authors":"Lucas O. Teixeira, Diego Bertolini, Luiz S. Oliveira, George D. C. Cavalcanti, Yandre M. G. Costa","doi":"10.1007/s00521-024-10286-z","DOIUrl":"https://doi.org/10.1007/s00521-024-10286-z","url":null,"abstract":"<p>A primary challenge in pattern recognition is imbalanced datasets, resulting in skewed and biased predictions. This problem is exacerbated by limited data availability, increasing the reliance on expensive expert data labeling. The study introduces a novel method called contrastive dissimilarity, which combines dissimilarity-based representation with contrastive learning to improve classification performance in imbalance and data scarcity scenarios. Based on pairwise sample differences, dissimilarity representation excels in situations with numerous overlapping classes and limited samples per class. Unlike traditional methods that use fixed distance functions like Euclidean or cosine, our proposal employs metric learning with contrastive loss to estimate a custom dissimilarity function. We conducted extensive evaluations in 13 databases across multiple training–test splits. The results showed that this approach outperforms traditional models like SVM, random forest, and Naive Bayes, particularly in settings with limited training data.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"369 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10237-8
Alberto Manastarla, Leandro A. Silva
In dynamic ensemble selection (DES) techniques, the competence level of each classifier is estimated from a pool of classifiers, and only the most competent ones are selected to classify a specific test sample and predict its class labels. A significant challenge in DES is efficiently estimating classifier competence for accurate prediction, especially when these techniques employ the K-Nearest Neighbors (KNN) algorithm to define the competence region of a test sample based on a validation set (known as the dynamic selection dataset or DSEL). This challenge is exacerbated when the DSEL does not accurately reflect the original data distribution or contains noisy data. Such conditions can reduce the precision of the system, induce unexpected behaviors, and compromise stability. To address these issues, this paper introduces the self-generating prototype ensemble selection (SGP.DES) framework, which combines meta-learning with prototype selection. The proposed meta-classifier of SGP.DES supports multiple classification algorithms and utilizes meta-features from prototypes derived from the original training set, enhancing the selection of the best classifiers for a test sample. The method improves the efficiency of KNN in defining competence regions by generating a reduced and noise-free DSEL set that preserves the original data distribution. Furthermore, the SGP.DES framework facilitates tailored optimization for specific classification challenges through the use of hyperparameters that control prototype selection and the meta-classifier operation mode to select the most appropriate classification algorithm for dynamic selection. Empirical evaluations of twenty-four classification problems have demonstrated that SGP.DES outperforms state-of-the-art DES methods as well as traditional single-model and ensemble methods in terms of accuracy, confirming its effectiveness across a wide range of classification contexts.
{"title":"Enhancing dynamic ensemble selection: combining self-generating prototypes and meta-classifier for data classification","authors":"Alberto Manastarla, Leandro A. Silva","doi":"10.1007/s00521-024-10237-8","DOIUrl":"https://doi.org/10.1007/s00521-024-10237-8","url":null,"abstract":"<p>In dynamic ensemble selection (DES) techniques, the competence level of each classifier is estimated from a pool of classifiers, and only the most competent ones are selected to classify a specific test sample and predict its class labels. A significant challenge in DES is efficiently estimating classifier competence for accurate prediction, especially when these techniques employ the K-Nearest Neighbors (KNN) algorithm to define the competence region of a test sample based on a validation set (known as the dynamic selection dataset or DSEL). This challenge is exacerbated when the DSEL does not accurately reflect the original data distribution or contains noisy data. Such conditions can reduce the precision of the system, induce unexpected behaviors, and compromise stability. To address these issues, this paper introduces the self-generating prototype ensemble selection (SGP.DES) framework, which combines meta-learning with prototype selection. The proposed meta-classifier of SGP.DES supports multiple classification algorithms and utilizes meta-features from prototypes derived from the original training set, enhancing the selection of the best classifiers for a test sample. The method improves the efficiency of KNN in defining competence regions by generating a reduced and noise-free DSEL set that preserves the original data distribution. Furthermore, the SGP.DES framework facilitates tailored optimization for specific classification challenges through the use of hyperparameters that control prototype selection and the meta-classifier operation mode to select the most appropriate classification algorithm for dynamic selection. Empirical evaluations of twenty-four classification problems have demonstrated that SGP.DES outperforms state-of-the-art DES methods as well as traditional single-model and ensemble methods in terms of accuracy, confirming its effectiveness across a wide range of classification contexts.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10049-w
Farshad Farahbod
The distillation tower is a crucial component of the refining process. Its energy efficiency has been a major area of research, especially following the oil crisis. This study focuses on optimizing energy consumption in the Shiraz refinery’s distillation unit. The unit is simulated using ASPEN-HYSYS software. Simulation results are validated against real data to ensure model accuracy. The operational data aligns well with model predictions. Following the creation of a data bank using HYSYS software, the tower’s operating conditions are optimized using neural networks and MATLAB software. In this study, a neural network model is developed for the distillation tower. This modeling approach is cost-effective, does not require complex theories, and does not rely on prior system knowledge. Additionally, real-time modeling is achievable through parallel distributed processing. The findings indicate that the optimal feed tray is 9 and the optimal feed temperature is 283.5°C. Furthermore, the optimized number of trays in the distillation tower is 47. Results show that in optimal conditions, cold and hot energy consumption are reduced by approximately 9.7% and 10.8%, respectively. Moreover, implementing optimal conditions results in a reduction of hot energy consumption in the reboiler by 60,000 MW and a reduction of cold energy consumption in the condenser by 30,000 MW.
{"title":"Optimization of energy consumption of oil refinery reboiler and condenser using neural network","authors":"Farshad Farahbod","doi":"10.1007/s00521-024-10049-w","DOIUrl":"https://doi.org/10.1007/s00521-024-10049-w","url":null,"abstract":"<p>The distillation tower is a crucial component of the refining process. Its energy efficiency has been a major area of research, especially following the oil crisis. This study focuses on optimizing energy consumption in the Shiraz refinery’s distillation unit. The unit is simulated using ASPEN-HYSYS software. Simulation results are validated against real data to ensure model accuracy. The operational data aligns well with model predictions. Following the creation of a data bank using HYSYS software, the tower’s operating conditions are optimized using neural networks and MATLAB software. In this study, a neural network model is developed for the distillation tower. This modeling approach is cost-effective, does not require complex theories, and does not rely on prior system knowledge. Additionally, real-time modeling is achievable through parallel distributed processing. The findings indicate that the optimal feed tray is 9 and the optimal feed temperature is 283.5°C. Furthermore, the optimized number of trays in the distillation tower is 47. Results show that in optimal conditions, cold and hot energy consumption are reduced by approximately 9.7% and 10.8%, respectively. Moreover, implementing optimal conditions results in a reduction of hot energy consumption in the reboiler by 60,000 MW and a reduction of cold energy consumption in the condenser by 30,000 MW.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10236-9
Youmin Zhang, Lei Sun, Ye Wang, Qun Liu, Li Liu
One-shot knowledge graph completion (KGC) aims to infer unseen facts when only one support entity pair is available for a particular relationship. Prior studies learn reference representations from one support pair for matching query pairs. This strategy can be challenging, particularly when dealing with multiple relationships between identical support pairs, resulting in indistinguishable reference representations. To this end, we propose a disentangled representation learning framework for one-shot KGC. Specifically, to learn sufficient representations, we construct an entity encoder with a fine-grained attention mechanism to explicitly model the input and output neighbors. We adopt an orthogonal regularizer to promote the independence of learned factors in entity representation, enabling the matching processor with max pooling to adaptively identify the semantic roles associated with a particular relation. Subsequently, the one-shot KGC is accomplished by seamlessly integrating the aforementioned modules in an end-to-end learning manner. Extensive experiments on real-world datasets demonstrate the outperformance of the proposed framework.
{"title":"One-shot knowledge graph completion based on disentangled representation learning","authors":"Youmin Zhang, Lei Sun, Ye Wang, Qun Liu, Li Liu","doi":"10.1007/s00521-024-10236-9","DOIUrl":"https://doi.org/10.1007/s00521-024-10236-9","url":null,"abstract":"<p>One-shot knowledge graph completion (KGC) aims to infer unseen facts when only one support entity pair is available for a particular relationship. Prior studies learn reference representations from one support pair for matching query pairs. This strategy can be challenging, particularly when dealing with multiple relationships between identical support pairs, resulting in indistinguishable reference representations. To this end, we propose a disentangled representation learning framework for one-shot KGC. Specifically, to learn sufficient representations, we construct an entity encoder with a fine-grained attention mechanism to explicitly model the input and output neighbors. We adopt an orthogonal regularizer to promote the independence of learned factors in entity representation, enabling the matching processor with max pooling to adaptively identify the semantic roles associated with a particular relation. Subsequently, the one-shot KGC is accomplished by seamlessly integrating the aforementioned modules in an end-to-end learning manner. Extensive experiments on real-world datasets demonstrate the outperformance of the proposed framework.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10266-3
Ujwalla Gawande, Kamal Hajari, Yogesh Golhar, Punit Fulzele
In this paper, we propose a novel keyframe extraction extraction method based on the gray wolf optimization (GWO) algorithm, addressing the challenge of information loss in traditional methods due to redundant and similar frames. The proposed method GWOKConvLSTM prioritizes speed, accuracy, and compression efficiency while preserving semantic information. Inspired by wolf behavior, we construct a fitness function that minimizes reconstruction error and achieves optimal compression ratios below 8%. Compared to traditional methods, our GWO method achieves the lowest reconstruction error for a given compression rate, providing a concise and visually coherent summary of keyframes while maintaining consistency across similar motions. Additionally, we propose a template-based method for video classification tasks, achieving the highest accuracy when combined with pre-trained CNNs and ConvLSTM. Our method effectively prevents dynamic background noise from affecting keyframe selection, leading to significantly improve video classification performance using deep neural networks.
{"title":"A Novel gray wolf optimization-based key frame extraction method for video classification using ConvLSTM","authors":"Ujwalla Gawande, Kamal Hajari, Yogesh Golhar, Punit Fulzele","doi":"10.1007/s00521-024-10266-3","DOIUrl":"https://doi.org/10.1007/s00521-024-10266-3","url":null,"abstract":"<p>In this paper, we propose a novel keyframe extraction extraction method based on the gray wolf optimization (GWO) algorithm, addressing the challenge of information loss in traditional methods due to redundant and similar frames. The proposed method GWOKConvLSTM prioritizes speed, accuracy, and compression efficiency while preserving semantic information. Inspired by wolf behavior, we construct a fitness function that minimizes reconstruction error and achieves optimal compression ratios below 8%. Compared to traditional methods, our GWO method achieves the lowest reconstruction error for a given compression rate, providing a concise and visually coherent summary of keyframes while maintaining consistency across similar motions. Additionally, we propose a template-based method for video classification tasks, achieving the highest accuracy when combined with pre-trained CNNs and ConvLSTM. Our method effectively prevents dynamic background noise from affecting keyframe selection, leading to significantly improve video classification performance using deep neural networks.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird’s-eye View (BEV) coordinates; and therefore, the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI [1] Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices; while, VPIT maintains its ability to track the object.
本文提出了一种新颖的基于体素的三维单个物体跟踪(3D SOT)方法,称为体素伪图像跟踪(VPIT)。VPIT 是第一种使用体素伪图像进行 3D SOT 的方法。输入点云通过基于柱的体素化进行结构化,生成的伪图像用作类似二维连体 SOT 方法的输入。伪图像以鸟瞰图(BEV)坐标创建,因此其中的物体大小不变。因此,在新的坐标系中,只有物体的旋转会发生变化,而物体的比例不会发生变化。因此,我们用多旋转搜索取代多尺度搜索,将不同旋转搜索区域与单一目标表示进行比较,以预测物体的位置和旋转。在 KITTI [1] 跟踪数据集上的实验表明,VPIT 是最快的 3D SOT 方法,并保持了具有竞争力的成功率和精确度值。在现实世界中应用 SOT 方法会遇到一些限制,例如嵌入式设备的计算能力较低,以及延迟环境不宽松,如果推理速度不够快,该方法就会被迫跳过某些数据帧。我们实施了一个实时评估协议,结果表明其他方法在嵌入式设备上的性能大打折扣,而 VPIT 却能保持跟踪物体的能力。
{"title":"Vpit: real-time embedded single object 3D tracking using voxel pseudo images","authors":"Illia Oleksiienko, Paraskevi Nousi, Nikolaos Passalis, Anastasios Tefas, Alexandros Iosifidis","doi":"10.1007/s00521-024-10259-2","DOIUrl":"https://doi.org/10.1007/s00521-024-10259-2","url":null,"abstract":"<p>In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird’s-eye View (BEV) coordinates; and therefore, the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI [1] Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices; while, VPIT maintains its ability to track the object.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00521-024-10179-1
Swash Sami Mohammed, Hülya Gökalp Clarke
The availability of comprehensive datasets is a crucial challenge for developing artificial intelligence (AI) models in various applications and fields. The lack of large and diverse public fabric defect datasets forms a major obstacle to properly and accurately developing and training AI models for detecting and classifying fabric defects in real-life applications. Models trained on limited datasets struggle to identify underrepresented defects, reducing their practicality. To address these issues, this study suggests using a conditional generative adversarial network (cGAN) for fabric defect data augmentation. The proposed image-to-image translator GAN features a conditional U-Net generator and a 6-layered PatchGAN discriminator. The conditional U-Network (U-Net) generator can produce highly realistic synthetic defective samples and offers the ability to control various characteristics of the generated samples by taking two input images: a segmented defect mask and a clean fabric image. The segmented defect mask provides information about various aspects of the defects to be added to the clean fabric sample, including their type, shape, size, and location. By augmenting the training dataset with diverse and realistic synthetic samples, the AI models can learn to identify a broader range of defects more accurately. This technique helps overcome the limitations of small or unvaried datasets, leading to improved defect detection accuracy and generalizability. Moreover, this proposed augmentation method can find applications in other challenging fields, such as generating synthetic samples for medical imaging datasets related to brain and lung tumors.
{"title":"Conditional image-to-image translation generative adversarial network (cGAN) for fabric defect data augmentation","authors":"Swash Sami Mohammed, Hülya Gökalp Clarke","doi":"10.1007/s00521-024-10179-1","DOIUrl":"https://doi.org/10.1007/s00521-024-10179-1","url":null,"abstract":"<p>The availability of comprehensive datasets is a crucial challenge for developing artificial intelligence (AI) models in various applications and fields. The lack of large and diverse public fabric defect datasets forms a major obstacle to properly and accurately developing and training AI models for detecting and classifying fabric defects in real-life applications. Models trained on limited datasets struggle to identify underrepresented defects, reducing their practicality. To address these issues, this study suggests using a conditional generative adversarial network (cGAN) for fabric defect data augmentation. The proposed image-to-image translator GAN features a conditional U-Net generator and a 6-layered PatchGAN discriminator. The conditional U-Network (U-Net) generator can produce highly realistic synthetic defective samples and offers the ability to control various characteristics of the generated samples by taking two input images: a segmented defect mask and a clean fabric image. The segmented defect mask provides information about various aspects of the defects to be added to the clean fabric sample, including their type, shape, size, and location. By augmenting the training dataset with diverse and realistic synthetic samples, the AI models can learn to identify a broader range of defects more accurately. This technique helps overcome the limitations of small or unvaried datasets, leading to improved defect detection accuracy and generalizability. Moreover, this proposed augmentation method can find applications in other challenging fields, such as generating synthetic samples for medical imaging datasets related to brain and lung tumors.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"110 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Entity alignment (EA) aims to discover the equivalent entities in different knowledge graphs (KGs), which plays an important role in knowledge engineering. Recently, EA with dangling entities has been proposed as a more realistic setting, which assumes that not all entities have corresponding equivalent entities. In this paper, we focus on this setting. Some work has explored this problem by leveraging translation API, pre-trained word embeddings, and other off-the-shelf tools. However, these approaches over-rely on the side information (e.g., entity names) and fail to work when the side information is absent. On the contrary, they still insufficiently exploit the most fundamental graph structure information in KG. To improve the exploitation of the structural information, we propose a novel entity alignment framework called Structure-aware Wasserstein Graph Contrastive Learning (SWGCL), which is refined on three dimensions: (i) Model. We propose a novel Gated Graph Attention Network to capture local and global graph structure attention. (ii) Training. Two learning objectives: contrastive learning and optimal transport learning, are designed to obtain distinguishable entity representations. (iii) Inference. In the inference phase, a PageRank-based method HOSS (Higher-Order Structural Similarity) is proposed to calculate higher-order graph structural similarity. Extensive experiments on two dangling benchmarks demonstrate that our SWGCL outperforms the current state-of-the-art methods with pure structural information in both traditional (relaxed) and dangling (consolidated) settings.
{"title":"Advancing entity alignment with dangling cases: a structure-aware approach through optimal transport learning and contrastive learning","authors":"Jin Xu, Yangning Li, Xiangjin Xie, Niu Hu, Yinghui Li, Hai-Tao Zheng, Yong Jiang","doi":"10.1007/s00521-024-10276-1","DOIUrl":"https://doi.org/10.1007/s00521-024-10276-1","url":null,"abstract":"<p>Entity alignment (EA) aims to discover the equivalent entities in different knowledge graphs (KGs), which plays an important role in knowledge engineering. Recently, EA with dangling entities has been proposed as a more realistic setting, which assumes that not all entities have corresponding equivalent entities. In this paper, we focus on this setting. Some work has explored this problem by leveraging translation API, pre-trained word embeddings, and other off-the-shelf tools. However, these approaches over-rely on the side information (e.g., entity names) and fail to work when the side information is absent. On the contrary, they still insufficiently exploit the most fundamental graph structure information in KG. To improve the exploitation of the structural information, we propose a novel entity alignment framework called Structure-aware Wasserstein Graph Contrastive Learning (SWGCL), which is refined on three dimensions: (i) Model. We propose a novel Gated Graph Attention Network to capture local and global graph structure attention. (ii) Training. Two learning objectives: contrastive learning and optimal transport learning, are designed to obtain distinguishable entity representations. (iii) Inference. In the inference phase, a PageRank-based method HOSS (Higher-Order Structural Similarity) is proposed to calculate higher-order graph structural similarity. Extensive experiments on two dangling benchmarks demonstrate that our SWGCL outperforms the current state-of-the-art methods with pure structural information in both traditional (relaxed) and dangling (consolidated) settings.\u0000</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}