Asghar Amir, Tariqullah Jan, Mohammad Haseeb Zafar, Shadan Khan Khattak
This paper introduces a novel ensemble Deep learning (DL)-based Multi-Label Retinal Disease Classification (MLRDC) system, known for its high accuracy and efficiency. Utilising a stacking ensemble approach, and integrating DenseNet201, EfficientNetB4, EfficientNetB3 and EfficientNetV2S models, exceptional performance in retinal disease classification is achieved. The proposed MLRDC model, leveraging DL as the meta-model, outperforms individual base detectors, with DenseNet201 and EfficientNetV2S achieving an accuracy of 96.5%, precision of 98.6%, recall of 97.1%, and F1 score of 97.8%. Weighted multilabel classifiers in the ensemble exhibit an average accuracy of 90.6%, precision of 98.3%, recall of 91.2%, and F1 score of 94.6%, whereas unweighted models achieve an average accuracy of 90%, precision of 98.6%, recall of 93.1%, and F1 score of 95.7%. Employing Logistic Regression (LR) as the meta-model, the proposed MLRDC system achieves an accuracy of 93.5%, precision of 98.2%, recall of 93.9%, and F1 score of 96%, with a minimal loss of 0.029. These results highlight the superiority of the proposed model over benchmark state-of-the-art ensembles, emphasising its practical applicability in medical image classification.
{"title":"Sophisticated Ensemble Deep Learning Approaches for Multilabel Retinal Disease Classification in Medical Imaging","authors":"Asghar Amir, Tariqullah Jan, Mohammad Haseeb Zafar, Shadan Khan Khattak","doi":"10.1049/cit2.70012","DOIUrl":"10.1049/cit2.70012","url":null,"abstract":"<p>This paper introduces a novel ensemble Deep learning (DL)-based Multi-Label Retinal Disease Classification (MLRDC) system, known for its high accuracy and efficiency. Utilising a stacking ensemble approach, and integrating DenseNet201, EfficientNetB4, EfficientNetB3 and EfficientNetV2S models, exceptional performance in retinal disease classification is achieved. The proposed MLRDC model, leveraging DL as the meta-model, outperforms individual base detectors, with DenseNet201 and EfficientNetV2S achieving an accuracy of 96.5%, precision of 98.6%, recall of 97.1%, and F1 score of 97.8%. Weighted multilabel classifiers in the ensemble exhibit an average accuracy of 90.6%, precision of 98.3%, recall of 91.2%, and F1 score of 94.6%, whereas unweighted models achieve an average accuracy of 90%, precision of 98.6%, recall of 93.1%, and F1 score of 95.7%. Employing Logistic Regression (LR) as the meta-model, the proposed MLRDC system achieves an accuracy of 93.5%, precision of 98.2%, recall of 93.9%, and F1 score of 96%, with a minimal loss of 0.029. These results highlight the superiority of the proposed model over benchmark state-of-the-art ensembles, emphasising its practical applicability in medical image classification.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1159-1173"},"PeriodicalIF":7.3,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengpeng Liu, Zhi Zeng, Qisheng Wang, Min Chen, Guixuan Zhang
Realistic human reconstruction embraces an extensive range of applications as depth sensors advance. However, current state-of-the-art methods with RGB-D input still suffer from artefacts, such as noisy surfaces, non-human shapes, and depth ambiguity, especially for the invisible parts. The authors observe the main issue is the lack of geometric semantics without using depth input priors fully. This paper focuses on improving the representation ability of implicit function, exploring an effective method to utilise depth-related semantics effectively and efficiently. The proposed geometry-enhanced implicit function enhances the geometric semantics with the extra voxel-aligned features from point clouds, promoting the completion of missing parts for unseen regions while preserving the local details on the input. For incorporating multi-scale pixel-aligned and voxel-aligned features, the authors use the Squeeze-and-Excitation attention to capture and fully use channel interdependencies. For the multi-view reconstruction, the proposed depth-enhanced attention explicitly excites the network to “sense” the geometric structure for a more reasonable feature aggregation. Experiments and results show that our method outperforms current RGB and depth-based SOTA methods on the challenging data from Twindom and Thuman3.0, and achieves a detailed and completed human reconstruction, balancing performance and efficiency well.
{"title":"Geometry-Enhanced Implicit Function for Detailed Clothed Human Reconstruction With RGB-D Input","authors":"Pengpeng Liu, Zhi Zeng, Qisheng Wang, Min Chen, Guixuan Zhang","doi":"10.1049/cit2.70009","DOIUrl":"10.1049/cit2.70009","url":null,"abstract":"<p>Realistic human reconstruction embraces an extensive range of applications as depth sensors advance. However, current state-of-the-art methods with RGB-D input still suffer from artefacts, such as noisy surfaces, non-human shapes, and depth ambiguity, especially for the invisible parts. The authors observe the main issue is the lack of geometric semantics without using depth input priors fully. This paper focuses on improving the representation ability of implicit function, exploring an effective method to utilise depth-related semantics effectively and efficiently. The proposed geometry-enhanced implicit function enhances the geometric semantics with the extra voxel-aligned features from point clouds, promoting the completion of missing parts for unseen regions while preserving the local details on the input. For incorporating multi-scale pixel-aligned and voxel-aligned features, the authors use the Squeeze-and-Excitation attention to capture and fully use channel interdependencies. For the multi-view reconstruction, the proposed depth-enhanced attention explicitly excites the network to “sense” the geometric structure for a more reasonable feature aggregation. Experiments and results show that our method outperforms current RGB and depth-based SOTA methods on the challenging data from Twindom and Thuman3.0, and achieves a detailed and completed human reconstruction, balancing performance and efficiency well.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"858-870"},"PeriodicalIF":7.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zan Hongying, Arifa Javed, Muhammad Abdullah, Javed Rashid, Muhammad Faheem
Neural machine translation (NMT) has advanced with deep learning and large-scale multilingual models, yet translating low-resource languages often lacks sufficient training data and leads to hallucinations. This often results in translated content that diverges significantly from the source text. This research proposes a refined Contrastive Decoding (CD) algorithm that dynamically adjusts weights of log probabilities from strong expert and weak amateur models to mitigate hallucinations in low-resource NMT and improve translation quality. Advanced large language NMT models, including ChatGLM and LLaMA, are fine-tuned and implemented for their superior contextual understanding and cross-lingual capabilities. The refined CD algorithm evaluates multiple candidate translations using BLEU score, semantic similarity, and Named Entity Recognition accuracy. Extensive experimental results show substantial improvements in translation quality and a significant reduction in hallucination rates. Fine-tuned models achieve higher evaluation metrics compared to baseline models and state-of-the-art models. An ablation study confirms the contributions of each methodological component and highlights the effectiveness of the refined CD algorithm and advanced models in mitigating hallucinations. Notably, the refined methodology increased the BLEU score by approximately 30% compared to baseline models.
{"title":"Large Language Models With Contrastive Decoding Algorithm for Hallucination Mitigation in Low-Resource Languages","authors":"Zan Hongying, Arifa Javed, Muhammad Abdullah, Javed Rashid, Muhammad Faheem","doi":"10.1049/cit2.70004","DOIUrl":"10.1049/cit2.70004","url":null,"abstract":"<p>Neural machine translation (NMT) has advanced with deep learning and large-scale multilingual models, yet translating low-resource languages often lacks sufficient training data and leads to hallucinations. This often results in translated content that diverges significantly from the source text. This research proposes a refined Contrastive Decoding (CD) algorithm that dynamically adjusts weights of log probabilities from strong expert and weak amateur models to mitigate hallucinations in low-resource NMT and improve translation quality. Advanced large language NMT models, including ChatGLM and LLaMA, are fine-tuned and implemented for their superior contextual understanding and cross-lingual capabilities. The refined CD algorithm evaluates multiple candidate translations using BLEU score, semantic similarity, and Named Entity Recognition accuracy. Extensive experimental results show substantial improvements in translation quality and a significant reduction in hallucination rates. Fine-tuned models achieve higher evaluation metrics compared to baseline models and state-of-the-art models. An ablation study confirms the contributions of each methodological component and highlights the effectiveness of the refined CD algorithm and advanced models in mitigating hallucinations. Notably, the refined methodology increased the BLEU score by approximately 30% compared to baseline models.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1104-1117"},"PeriodicalIF":7.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning’s widespread dependence on large datasets raises privacy concerns due to the potential presence of sensitive information. Differential privacy stands out as a crucial method for preserving privacy, garnering significant interest for its ability to offer robust and verifiable privacy safeguards during data training. However, classic differentially private learning introduces the same level of noise into the gradients across training iterations, which affects the trade-off between model utility and privacy guarantees. To address this issue, an adaptive differential privacy mechanism was proposed in this paper, which dynamically adjusts the privacy budget at the layer-level as training progresses to resist member inference attacks. Specifically, an equal privacy budget is initially allocated to each layer. Subsequently, as training advances, the privacy budget for layers closer to the output is reduced (adding more noise), while the budget for layers closer to the input is increased. The adjustment magnitude depends on the training iterations and is automatically determined based on the iteration count. This dynamic allocation provides a simple process for adjusting privacy budgets, alleviating the burden on users to tweak parameters and ensuring that privacy preservation strategies align with training progress. Extensive experiments on five well-known datasets indicate that the proposed method outperforms competing methods in terms of accuracy and resilience against membership inference attacks.
{"title":"Layer-Level Adaptive Gradient Perturbation Protecting Deep Learning Based on Differential Privacy","authors":"Zhang Xiangfei, Zhang Qingchen, Jiang Liming","doi":"10.1049/cit2.70008","DOIUrl":"10.1049/cit2.70008","url":null,"abstract":"<p>Deep learning’s widespread dependence on large datasets raises privacy concerns due to the potential presence of sensitive information. Differential privacy stands out as a crucial method for preserving privacy, garnering significant interest for its ability to offer robust and verifiable privacy safeguards during data training. However, classic differentially private learning introduces the same level of noise into the gradients across training iterations, which affects the trade-off between model utility and privacy guarantees. To address this issue, an adaptive differential privacy mechanism was proposed in this paper, which dynamically adjusts the privacy budget at the layer-level as training progresses to resist member inference attacks. Specifically, an equal privacy budget is initially allocated to each layer. Subsequently, as training advances, the privacy budget for layers closer to the output is reduced (adding more noise), while the budget for layers closer to the input is increased. The adjustment magnitude depends on the training iterations and is automatically determined based on the iteration count. This dynamic allocation provides a simple process for adjusting privacy budgets, alleviating the burden on users to tweak parameters and ensuring that privacy preservation strategies align with training progress. Extensive experiments on five well-known datasets indicate that the proposed method outperforms competing methods in terms of accuracy and resilience against membership inference attacks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"929-944"},"PeriodicalIF":7.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Wang, Zhikang Wang, Xiaojie Wang, Fangxiang Feng, Bo Yang
Referring expression comprehension (REC) aims to locate a specific region in an image described by a natural language. Existing two-stage methods generate multiple candidate proposals in the first stage, followed by selecting one of these proposals as the grounding result in the second stage. Nevertheless, the number of candidate proposals generated in the first stage significantly exceeds ground truth and the recall of critical objects is inadequate, thereby enormously limiting the overall network performance. To address the above issues, the authors propose an innovative method termed Separate Non-Maximum Suppression (Sep-NMS) for two-stage REC. Particularly, Sep-NMS models information from the two stages independently and collaboratively, ultimately achieving an overall improvement in comprehension and identification of the target objects. Specifically, the authors propose a Ref-Relatedness module for filtering referent proposals rigorously, decreasing the redundancy of referent proposals. A