Pub Date : 2024-05-05DOI: 10.1007/s12559-024-10285-1
Sunder Ali Khowaja, Parus Khuwaja, Kapal Dev, Weizheng Wang, Lewis Nkenyereye
ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e. sustainability, privacy, digital divide, and ethics and suggests that not only chatGPT but every subsequent entry in the category of conversational bots should undergo Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This paper discusses in detail the issues and concerns raised over chatGPT in line with aforementioned characteristics. We also discuss the recent EU AI Act briefly in accordance with the SPADE evaluation. We support our hypothesis by some preliminary data collection and visualizations along with hypothesized facts. We also suggest mitigations and recommendations for each of the concerns. Furthermore, we also suggest some policies and recommendations for EU AI policy act concerning ethics, digital divide, and sustainability.
{"title":"ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review","authors":"Sunder Ali Khowaja, Parus Khuwaja, Kapal Dev, Weizheng Wang, Lewis Nkenyereye","doi":"10.1007/s12559-024-10285-1","DOIUrl":"https://doi.org/10.1007/s12559-024-10285-1","url":null,"abstract":"<p>ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e. sustainability, privacy, digital divide, and ethics and suggests that not only chatGPT but every subsequent entry in the category of conversational bots should undergo Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This paper discusses in detail the issues and concerns raised over chatGPT in line with aforementioned characteristics. We also discuss the recent EU AI Act briefly in accordance with the SPADE evaluation. We support our hypothesis by some preliminary data collection and visualizations along with hypothesized facts. We also suggest mitigations and recommendations for each of the concerns. Furthermore, we also suggest some policies and recommendations for EU AI policy act concerning ethics, digital divide, and sustainability.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"8 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1007/s12559-024-10289-x
Haider Ali, Mingzhao Wang, Juanying Xie
Thyroid nodule (TYN) is a life-threatening disease that is commonly observed among adults globally. The applications of deep learning in computer-aided diagnosis systems (CADs) for diagnosing thyroid nodules have attracted attention among clinical professionals due to their significantly potential role in reducing the occurrence of missed diagnoses. However, most techniques for segmenting thyroid nodules rely on U-Net structures or deep convolutional neural networks, which have limitations in obtaining different context information due to the diversities in the shapes and sizes, ambiguous boundaries, and heterostructure of thyroid nodules. To resolve these challenges, we present an encoder-decoder-based architecture (referred to as CIL-Net) for boosting TYN segmentation. There are three contributions in the proposed CIL-Net. First, the encoder is established using dense connectivity for efficient feature extraction and the triplet attention block (TAB) for highlighting essential feature maps. Second, we design a feature improvement block (FIB) using dilated convolutions and attention mechanisms to capture the global context information and also build up robust feature maps between the encoder-decoder branches. Third, we introduce the residual context block (RCB), which leverages residual units (ResUnits) to accumulate the context information from the multiple blocks of decoders in the decoder branch. We assess the segmentation quality of our proposed method using six different evaluation metrics on two standard datasets (DDTI and TN3K) of TYN and demonstrate competitive performance against advanced state-of-the-art methods. We consider that the proposed method advances the performance of TYN region localization and segmentation, which heavily rely on an accurate assessment of different context information. This advancement is primarily attributed to the comprehensive incorporation of dense connectivity, TAB, FIB, and RCB, which effectively capture both extensive and intricate contextual details. We anticipate that this approach reliability and visual explainability make it a valuable tool that holds the potential to significantly enhance clinical practices by offering reliable predictions to facilitate cognitive and healthcare decision-making.
{"title":"CIL-Net: Densely Connected Context Information Learning Network for Boosting Thyroid Nodule Segmentation Using Ultrasound Images","authors":"Haider Ali, Mingzhao Wang, Juanying Xie","doi":"10.1007/s12559-024-10289-x","DOIUrl":"https://doi.org/10.1007/s12559-024-10289-x","url":null,"abstract":"<p>Thyroid nodule (TYN) is a life-threatening disease that is commonly observed among adults globally. The applications of deep learning in computer-aided diagnosis systems (CADs) for diagnosing thyroid nodules have attracted attention among clinical professionals due to their significantly potential role in reducing the occurrence of missed diagnoses. However, most techniques for segmenting thyroid nodules rely on U-Net structures or deep convolutional neural networks, which have limitations in obtaining different context information due to the diversities in the shapes and sizes, ambiguous boundaries, and heterostructure of thyroid nodules. To resolve these challenges, we present an encoder-decoder-based architecture (referred to as CIL-Net) for boosting TYN segmentation. There are three contributions in the proposed CIL-Net. First, the encoder is established using dense connectivity for efficient feature extraction and the triplet attention block (TAB) for highlighting essential feature maps. Second, we design a feature improvement block (FIB) using dilated convolutions and attention mechanisms to capture the global context information and also build up robust feature maps between the encoder-decoder branches. Third, we introduce the residual context block (RCB), which leverages residual units (ResUnits) to accumulate the context information from the multiple blocks of decoders in the decoder branch. We assess the segmentation quality of our proposed method using six different evaluation metrics on two standard datasets (DDTI and TN3K) of TYN and demonstrate competitive performance against advanced state-of-the-art methods. We consider that the proposed method advances the performance of TYN region localization and segmentation, which heavily rely on an accurate assessment of different context information. This advancement is primarily attributed to the comprehensive incorporation of dense connectivity, TAB, FIB, and RCB, which effectively capture both extensive and intricate contextual details. We anticipate that this approach reliability and visual explainability make it a valuable tool that holds the potential to significantly enhance clinical practices by offering reliable predictions to facilitate cognitive and healthcare decision-making.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"54 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1007/s12559-024-10262-8
Radwa El Shawi
Various industries and fields have utilized machine learning models, particularly those that demand a significant degree of accountability and transparency. With the introduction of the General Data Protection Regulation (GDPR), it has become imperative for machine learning model predictions to be both plausible and verifiable. One approach to explaining these predictions involves assigning an importance score to each input element. Another category aims to quantify the importance of human-understandable concepts to explain global and local model behaviours. The way concepts are constructed in such concept-based explanation techniques lacks inherent interpretability. Additionally, the magnitude and diversity of the discovered concepts make it difficult for machine learning practitioners to comprehend and make sense of the concept space. To this end, we introduce ConceptGlassbox, a novel local explanation framework that seeks to learn high-level transparent concept definitions. Our approach leverages human knowledge and feedback to facilitate the acquisition of concepts with minimal human labelling effort. The ConceptGlassbox learns concepts consistent with the user’s understanding of a concept’s meaning. It then dissects the evidence for the prediction by identifying the key concepts the black-box model uses to arrive at its decision regarding the instance being explained. Additionally, ConceptGlassbox produces counterfactual explanations, proposing the smallest changes to the instance’s concept-based explanation that would result in a counterfactual decision as specified by the user. Our systematic experiments confirm that ConceptGlassbox successfully discovers relevant and comprehensible concepts that are important for neural network predictions.
{"title":"ConceptGlassbox: Guided Concept-Based Explanation for Deep Neural Networks","authors":"Radwa El Shawi","doi":"10.1007/s12559-024-10262-8","DOIUrl":"https://doi.org/10.1007/s12559-024-10262-8","url":null,"abstract":"<p>Various industries and fields have utilized machine learning models, particularly those that demand a significant degree of accountability and transparency. With the introduction of the General Data Protection Regulation (GDPR), it has become imperative for machine learning model predictions to be both plausible and verifiable. One approach to explaining these predictions involves assigning an importance score to each input element. Another category aims to quantify the importance of human-understandable concepts to explain global and local model behaviours. The way concepts are constructed in such concept-based explanation techniques lacks inherent interpretability. Additionally, the magnitude and diversity of the discovered concepts make it difficult for machine learning practitioners to comprehend and make sense of the concept space. To this end, we introduce ConceptGlassbox, a novel local explanation framework that seeks to learn high-level transparent concept definitions. Our approach leverages human knowledge and feedback to facilitate the acquisition of concepts with minimal human labelling effort. The ConceptGlassbox learns concepts consistent with the user’s understanding of a concept’s meaning. It then dissects the evidence for the prediction by identifying the key concepts the black-box model uses to arrive at its decision regarding the instance being explained. Additionally, ConceptGlassbox produces counterfactual explanations, proposing the smallest changes to the instance’s concept-based explanation that would result in a counterfactual decision as specified by the user. Our systematic experiments confirm that ConceptGlassbox successfully discovers relevant and comprehensible concepts that are important for neural network predictions.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"33 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Document-level relation extraction aims at extracting relations between entities in a document. In contrast to sentence-level correspondences, document-level relation extraction requires reasoning over multiple sentences to extract complex relational triples. Recent work has found that adding additional evidence extraction tasks and using the extracted evidence sentences to help predict can improve the performance of document-level relation extraction tasks, however, these approaches still face the problem of inadequate modeling of the interactions between entity pairs. In this paper, based on the review of human cognitive processes, we propose a hybrid network HIMAC applied to the entity pair feature matrix, in which the multi-head attention sub-module can fuse global entity-pair information on a specific inference path, while the convolution sub-module is able to obtain local information of adjacent entity pairs. Then we incorporate the contextual interaction information learned by the entity pairs into the relation prediction and evidence extraction tasks. Finally, the extracted evidence sentences are used to further correct the relation extraction results. We conduct extensive experiments on two document-level relation extraction benchmark datasets (DocRED/Re-DocRED), and the experimental results demonstrate that our method achieves state-of-the-art performance (62.84/75.89 F1). Experiments demonstrate the effectiveness of the proposed method.
{"title":"Enhancing Document-Level Relation Extraction with Attention-Convolutional Hybrid Networks and Evidence Extraction","authors":"Feiyu Zhang, Ruiming Hu, Guiduo Duan, Tianxi Huang","doi":"10.1007/s12559-024-10269-1","DOIUrl":"https://doi.org/10.1007/s12559-024-10269-1","url":null,"abstract":"<p>Document-level relation extraction aims at extracting relations between entities in a document. In contrast to sentence-level correspondences, document-level relation extraction requires reasoning over multiple sentences to extract complex relational triples. Recent work has found that adding additional evidence extraction tasks and using the extracted evidence sentences to help predict can improve the performance of document-level relation extraction tasks, however, these approaches still face the problem of inadequate modeling of the interactions between entity pairs. In this paper, based on the review of human cognitive processes, we propose a hybrid network HIMAC applied to the entity pair feature matrix, in which the multi-head attention sub-module can fuse global entity-pair information on a specific inference path, while the convolution sub-module is able to obtain local information of adjacent entity pairs. Then we incorporate the contextual interaction information learned by the entity pairs into the relation prediction and evidence extraction tasks. Finally, the extracted evidence sentences are used to further correct the relation extraction results. We conduct extensive experiments on two document-level relation extraction benchmark datasets (DocRED/Re-DocRED), and the experimental results demonstrate that our method achieves state-of-the-art performance (62.84/75.89 F1). Experiments demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"42 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-02DOI: 10.1007/s12559-024-10268-2
Md. Easin Arafat, Md. Wakil Ahmad, S. M. Shovan, Towhid Ul Haq, Nazrul Islam, Mufti Mahmud, M. Shamim Kaiser
Methylation is considered one of the proteins’ most important post-translational modifications (PTM). Plasticity and cellular dynamics are among the many traits that are regulated by methylation. Currently, methylation sites are identified using experimental approaches. However, these methods are time-consuming and expensive. With the use of computer modelling, methylation sites can be identified quickly and accurately, providing valuable information for further trial and investigation. In this study, we propose a new machine-learning model called MeSEP to predict methylation sites that incorporates both evolutionary and structural-based information. To build this model, we first extract evolutionary and structural features from the PSSM and SPD2 profiles, respectively. We then employ Extreme Gradient Boosting (XGBoost) as the classification model to predict methylation sites. To address the issue of imbalanced data and bias towards negative samples, we use the SMOTETomek-based hybrid sampling method. The MeSEP was validated on an independent test set (ITS) and 10-fold cross-validation (TCV) using lysine methylation sites. The method achieved: an accuracy of 82.9% in ITS and 84.6% in TCV; precision of 0.92 in ITS and 0.94 in TCV; area under the curve values of 0.90 in ITS and 0.92 in TCV; F1 score of 0.81 in ITS and 0.83 in TCV; and MCC of 0.67 in ITS and 0.70 in TCV. MeSEP significantly outperformed previous studies found in the literature. MeSEP as a standalone toolkit and all its source codes are publicly available at https://github.com/arafatro/MeSEP.
甲基化被认为是蛋白质最重要的翻译后修饰(PTM)之一。可塑性和细胞动力学是受甲基化调控的许多特征之一。目前,甲基化位点是通过实验方法确定的。然而,这些方法既耗时又昂贵。利用计算机建模可以快速、准确地确定甲基化位点,为进一步试验和研究提供有价值的信息。在本研究中,我们提出了一种名为 MeSEP 的新机器学习模型,用于预测甲基化位点,该模型结合了基于进化和结构的信息。为了建立这个模型,我们首先分别从 PSSM 和 SPD2 图谱中提取进化和结构特征。然后,我们采用极端梯度提升(XGBoost)作为分类模型来预测甲基化位点。为了解决数据不平衡和偏向负样本的问题,我们采用了基于 SMOTETomek 的混合采样方法。我们使用赖氨酸甲基化位点在独立测试集(ITS)和 10 倍交叉验证(TCV)上对 MeSEP 进行了验证。该方法的准确率在 ITS 中为 82.9%,在 TCV 中为 84.6%;精确度在 ITS 中为 0.92,在 TCV 中为 0.94;曲线下面积值在 ITS 中为 0.90,在 TCV 中为 0.92;F1 分数在 ITS 中为 0.81,在 TCV 中为 0.83;MCC 在 ITS 中为 0.67,在 TCV 中为 0.70。MeSEP 的性能明显优于以往文献中的研究。MeSEP 作为一个独立的工具包及其所有源代码均可在 https://github.com/arafatro/MeSEP 上公开获取。
{"title":"Accurate Prediction of Lysine Methylation Sites Using Evolutionary and Structural-Based Information","authors":"Md. Easin Arafat, Md. Wakil Ahmad, S. M. Shovan, Towhid Ul Haq, Nazrul Islam, Mufti Mahmud, M. Shamim Kaiser","doi":"10.1007/s12559-024-10268-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10268-2","url":null,"abstract":"<p>Methylation is considered one of the proteins’ most important post-translational modifications (PTM). Plasticity and cellular dynamics are among the many traits that are regulated by methylation. Currently, methylation sites are identified using experimental approaches. However, these methods are time-consuming and expensive. With the use of computer modelling, methylation sites can be identified quickly and accurately, providing valuable information for further trial and investigation. In this study, we propose a new machine-learning model called MeSEP to predict methylation sites that incorporates both evolutionary and structural-based information. To build this model, we first extract evolutionary and structural features from the PSSM and SPD2 profiles, respectively. We then employ Extreme Gradient Boosting (XGBoost) as the classification model to predict methylation sites. To address the issue of imbalanced data and bias towards negative samples, we use the SMOTETomek-based hybrid sampling method. The MeSEP was validated on an independent test set (ITS) and 10-fold cross-validation (TCV) using lysine methylation sites. The method achieved: an accuracy of 82.9% in ITS and 84.6% in TCV; precision of 0.92 in ITS and 0.94 in TCV; area under the curve values of 0.90 in ITS and 0.92 in TCV; F1 score of 0.81 in ITS and 0.83 in TCV; and MCC of 0.67 in ITS and 0.70 in TCV. MeSEP significantly outperformed previous studies found in the literature. MeSEP as a standalone toolkit and all its source codes are publicly available at https://github.com/arafatro/MeSEP.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"19 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmanned aerial vehicles (UAVs) have become essential in disaster management due to their ability to provide real-time situational awareness and support decision-making processes. Visual servoing, a technique that uses visual feedback to control the motion of a robotic system, has been used to improve the precision and accuracy of UAVs in disaster scenarios. The study integrates visual servoing to enhance UAV precision while exploring recent advancements in deep learning. This integration enhances the precision and efficiency of disaster response by enabling UAVs to navigate complex environments, identify critical areas for intervention, and provide actionable insights to decision-makers in real time. It discusses disaster management aspects like search and rescue, damage assessment, and situational awareness, while also analyzing the challenges associated with integrating visual servoing and deep learning into UAVs. This review article provides a comprehensive analysis to offer real-time situational awareness and decision support in disaster management. It highlights that deep learning along with visual servoing enhances precision and accuracy in disaster scenarios. The analysis also summarizes the challenges and the need for high computational power, data processing, and communication capabilities. UAVs, especially when combined with visual servoing and deep learning, play a crucial role in disaster management. The review underscores the potential benefits and challenges of integrating these technologies, emphasizing their significance in improving disaster response and recovery, with possible means of enhanced situational awareness and decision-making.
{"title":"The Duo of Visual Servoing and Deep Learning-Based Methods for Situation-Aware Disaster Management: A Comprehensive Review","authors":"Senthil Kumar Jagatheesaperumal, Mohammad Mehedi Hassan, Md. Rafiul Hassan, Giancarlo Fortino","doi":"10.1007/s12559-024-10290-4","DOIUrl":"https://doi.org/10.1007/s12559-024-10290-4","url":null,"abstract":"<p>Unmanned aerial vehicles (UAVs) have become essential in disaster management due to their ability to provide real-time situational awareness and support decision-making processes. Visual servoing, a technique that uses visual feedback to control the motion of a robotic system, has been used to improve the precision and accuracy of UAVs in disaster scenarios. The study integrates visual servoing to enhance UAV precision while exploring recent advancements in deep learning. This integration enhances the precision and efficiency of disaster response by enabling UAVs to navigate complex environments, identify critical areas for intervention, and provide actionable insights to decision-makers in real time. It discusses disaster management aspects like search and rescue, damage assessment, and situational awareness, while also analyzing the challenges associated with integrating visual servoing and deep learning into UAVs. This review article provides a comprehensive analysis to offer real-time situational awareness and decision support in disaster management. It highlights that deep learning along with visual servoing enhances precision and accuracy in disaster scenarios. The analysis also summarizes the challenges and the need for high computational power, data processing, and communication capabilities. UAVs, especially when combined with visual servoing and deep learning, play a crucial role in disaster management. The review underscores the potential benefits and challenges of integrating these technologies, emphasizing their significance in improving disaster response and recovery, with possible means of enhanced situational awareness and decision-making.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"9 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1007/s12559-024-10288-y
Jing Wang, Nasir Saleem, Teddy Surya Gunawan
Long short-term memory (LSTM) has proven effective in modeling sequential data. However, it may encounter challenges in accurately capturing long-term temporal dependencies. LSTM plays a central role in speech enhancement by effectively modeling and capturing temporal dependencies in speech signals. This paper introduces a variable-neurons-based LSTM designed for capturing long-term temporal dependencies by reducing neuron representation in layers with no loss of data. A skip connection between nonadjacent layers is added to prevent gradient vanishing. An attention mechanism in these connections highlights important features and spectral components. Our LSTM is inherently causal, making it well-suited for real-time processing without relying on future information. Training involves utilizing combined acoustic feature sets for improved performance, and the models estimate two time–frequency masks—the ideal ratio mask (IRM) and the ideal binary mask (IBM). Comprehensive evaluation using perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) showed that the proposed LSTM architecture demonstrates enhanced speech intelligibility and perceptual quality. Composite measures further substantiated performance, considering residual noise distortion (Cbak) and speech distortion (Csig). The proposed model showed a 16.21% improvement in STOI and a 0.69 improvement in PESQ on the TIMIT database. Similarly, with the LibriSpeech database, the STOI and PESQ showed improvements of 16.41% and 0.71 over noisy mixtures. The proposed LSTM architecture outperforms deep neural networks (DNNs) in different stationary and nonstationary background noisy conditions. To train an automatic speech recognition (ASR) system on enhanced speech, the Kaldi toolkit is used for evaluating word error rate (WER). The proposed LSTM at the front-end notably reduced WERs, achieving a notable 15.13% WER across different noisy backgrounds.
{"title":"Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition","authors":"Jing Wang, Nasir Saleem, Teddy Surya Gunawan","doi":"10.1007/s12559-024-10288-y","DOIUrl":"https://doi.org/10.1007/s12559-024-10288-y","url":null,"abstract":"<p>Long short-term memory (LSTM) has proven effective in modeling sequential data. However, it may encounter challenges in accurately capturing long-term temporal dependencies. LSTM plays a central role in speech enhancement by effectively modeling and capturing temporal dependencies in speech signals. This paper introduces a variable-neurons-based LSTM designed for capturing long-term temporal dependencies by reducing neuron representation in layers with no loss of data. A skip connection between nonadjacent layers is added to prevent gradient vanishing. An attention mechanism in these connections highlights important features and spectral components. Our LSTM is inherently causal, making it well-suited for real-time processing without relying on future information. Training involves utilizing combined acoustic feature sets for improved performance, and the models estimate two time–frequency masks—the ideal ratio mask (IRM) and the ideal binary mask (IBM). Comprehensive evaluation using perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) showed that the proposed LSTM architecture demonstrates enhanced speech intelligibility and perceptual quality. Composite measures further substantiated performance, considering residual noise distortion (Cbak) and speech distortion (Csig). The proposed model showed a 16.21% improvement in STOI and a 0.69 improvement in PESQ on the TIMIT database. Similarly, with the LibriSpeech database, the STOI and PESQ showed improvements of 16.41% and 0.71 over noisy mixtures. The proposed LSTM architecture outperforms deep neural networks (DNNs) in different stationary and nonstationary background noisy conditions. To train an automatic speech recognition (ASR) system on enhanced speech, the Kaldi toolkit is used for evaluating word error rate (WER). The proposed LSTM at the front-end notably reduced WERs, achieving a notable 15.13% WER across different noisy backgrounds.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"92 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s12559-024-10282-4
Nishu Bansal, Ankit Vidyarthi
Diabetic foot ulcers (DFUs) are a prevalent and serious complication of diabetes, often leading to severe morbidity and even amputations if not timely diagnosed and managed. The increasing prevalence of DFUs poses a significant challenge to healthcare systems worldwide. Accurate and timely classification of DFUs is crucial for effective treatment and prevention of complications. In this paper, we present “DFootNet”, an innovative and comprehensive classification framework for the accurate assessment of diabetic foot ulcers using a dense neural network architecture. Our proposed approach leverages the power of deep learning to automatically extract relevant features from diverse clinical DFU images. The proposed model comprises a multi-layered dense neural network designed to handle the intricate patterns and variations present in different stages and types of DFUs. The network architecture integrates convolutional and fully connected layers, allowing for hierarchical feature extraction and robust feature representation. To evaluate the efficacy of DFootNet, we conducted experiments on a large and diverse dataset of diabetic foot ulcers. Our results demonstrate that DFootNet achieves a remarkable accuracy of 98.87%, precision—99.01%, recall—98.73%, F1-score as 98.86%, and AUC-ROC as 98.13%, outperforming existing methods in distinguishing between ulcer and non-ulcer images. Moreover, our framework provides insights into the decision-making process, offering transparency and interpretability through attention mechanisms that highlight important regions within ulcer images. We also present a comparative analysis of DFootNet’s performance against other popular deep learning models, showcasing its robustness and adaptability across various scenarios.
{"title":"DFootNet: A Domain Adaptive Classification Framework for Diabetic Foot Ulcers Using Dense Neural Network Architecture","authors":"Nishu Bansal, Ankit Vidyarthi","doi":"10.1007/s12559-024-10282-4","DOIUrl":"https://doi.org/10.1007/s12559-024-10282-4","url":null,"abstract":"<p>Diabetic foot ulcers (DFUs) are a prevalent and serious complication of diabetes, often leading to severe morbidity and even amputations if not timely diagnosed and managed. The increasing prevalence of DFUs poses a significant challenge to healthcare systems worldwide. Accurate and timely classification of DFUs is crucial for effective treatment and prevention of complications. In this paper, we present “DFootNet”, an innovative and comprehensive classification framework for the accurate assessment of diabetic foot ulcers using a dense neural network architecture. Our proposed approach leverages the power of deep learning to automatically extract relevant features from diverse clinical DFU images. The proposed model comprises a multi-layered dense neural network designed to handle the intricate patterns and variations present in different stages and types of DFUs. The network architecture integrates convolutional and fully connected layers, allowing for hierarchical feature extraction and robust feature representation. To evaluate the efficacy of DFootNet, we conducted experiments on a large and diverse dataset of diabetic foot ulcers. Our results demonstrate that DFootNet achieves a remarkable accuracy of 98.87%, precision—99.01%, recall—98.73%, F1-score as 98.86%, and AUC-ROC as 98.13%, outperforming existing methods in distinguishing between ulcer and non-ulcer images. Moreover, our framework provides insights into the decision-making process, offering transparency and interpretability through attention mechanisms that highlight important regions within ulcer images. We also present a comparative analysis of DFootNet’s performance against other popular deep learning models, showcasing its robustness and adaptability across various scenarios.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"31 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, human-vehicle recognition (HVR) method has been applied in road monitoring, congestion control, and safety protection situations. However, traditional vision-based HVR methods suffer from problems such as high construction cost and low robustness in scenarios with insufficient lighting. For this reason, it is necessary to develop a low-cost and high-robust HVR method for intelligent street light systems (ISLS). A well-designed HVR method can aid the brightness adjustment in ISLSs that operate exclusively at night, facilitating lower power consumption and carbon emission. The paper proposes a novel wireless sensing-based human-vehicle recognition (WsHVR) method based on deep learning technologies, which can be applied in ISLSs that assembled with wireless sensor network (WSN). To solve the problem of limited recognition ability of wireless sensing technology, a deep feature extraction model that combines multi-scale convolution and attention mechanism is proposed, in which the received signal strength (RSS) features of road users are extracted by multi-scale convolution. WsHVR integrates an adaptive registration convolutional attention mechanism (ARCAM) to further feature extraction and classification. The final normalized classification result is obtained by SoftMax function. Experiments show that the proposed WsHVR outperforms existing methods with an accuracy of 99.07%. The dataset and source code related to the paper have been published at https://github.com/TZ-mx/WiParam and https://github.com/TZ-mx/WsHVR, respectively. The proposed WsHVR method has high performance in the field of human-vehicle recognition, potentially providing valuable guidance for the design of intelligent streetlight systems in intelligent transportation systems.
{"title":"A Novel Method for Human-Vehicle Recognition Based on Wireless Sensing and Deep Learning Technologies","authors":"Liangliang Lou, Ruyin Cai, Mingan Lu, Mingmin Wang, Guang Chen","doi":"10.1007/s12559-024-10276-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10276-2","url":null,"abstract":"<p>Currently, human-vehicle recognition (HVR) method has been applied in road monitoring, congestion control, and safety protection situations. However, traditional vision-based HVR methods suffer from problems such as high construction cost and low robustness in scenarios with insufficient lighting. For this reason, it is necessary to develop a low-cost and high-robust HVR method for intelligent street light systems (ISLS). A well-designed HVR method can aid the brightness adjustment in ISLSs that operate exclusively at night, facilitating lower power consumption and carbon emission. The paper proposes a novel wireless sensing-based human-vehicle recognition (WsHVR) method based on deep learning technologies, which can be applied in ISLSs that assembled with wireless sensor network (WSN). To solve the problem of limited recognition ability of wireless sensing technology, a deep feature extraction model that combines multi-scale convolution and attention mechanism is proposed, in which the received signal strength (RSS) features of road users are extracted by multi-scale convolution. WsHVR integrates an adaptive registration convolutional attention mechanism (ARCAM) to further feature extraction and classification. The final normalized classification result is obtained by SoftMax function. Experiments show that the proposed WsHVR outperforms existing methods with an accuracy of 99.07%. The dataset and source code related to the paper have been published at https://github.com/TZ-mx/WiParam and https://github.com/TZ-mx/WsHVR, respectively. The proposed WsHVR method has high performance in the field of human-vehicle recognition, potentially providing valuable guidance for the design of intelligent streetlight systems in intelligent transportation systems.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"17 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-11DOI: 10.1007/s12559-024-10272-6
Valerio La Gatta, Vincenzo Moscato, Marco Postiglione, Giancarlo Sperlì
Although artificial intelligence has become part of everyone’s real life, a trust crisis against such systems is occurring, thus increasing the need to explain black-box predictions, especially in the military, medical, and financial domains. Modern eXplainable Artificial Intelligence (XAI) techniques focus on benchmark datasets, but the cognitive applicability of such solutions under big data settings is still unclear due to memory or computation constraints. In this paper, we extend a model-agnostic XAI methodology, named Cluster-Aided Space Transformation for Local Explanation (CASTLE), to be able to deal with high-volume datasets. CASTLE aims to explain the black-box behavior of predictive models by combining both local (i.e., based on the input sample) and global (i.e., based on the whole scope for action of the model) information. In particular, the local explanation provides a rule-based explanation for the prediction of a target instance as well as the directions to update the likelihood of the predicted class. Our extension leverages modern big data technologies (e.g., Apache Spark) to handle the high volume, variety, and velocity of huge datasets. We have evaluated the framework on five datasets, in terms of temporal efficiency, explanation quality, and model significance. Our results indicate that the proposed approach retains the high-quality explanations associated with CASTLE while efficiently handling large datasets. Importantly, it exhibits a sub-linear, rather than exponential, dependence on dataset size, making it a scalable solution for massive datasets or in any big data scenario.
{"title":"An eXplainable Artificial Intelligence Methodology on Big Data Architecture","authors":"Valerio La Gatta, Vincenzo Moscato, Marco Postiglione, Giancarlo Sperlì","doi":"10.1007/s12559-024-10272-6","DOIUrl":"https://doi.org/10.1007/s12559-024-10272-6","url":null,"abstract":"<p>Although artificial intelligence has become part of everyone’s real life, a trust crisis against such systems is occurring, thus increasing the need to explain black-box predictions, especially in the military, medical, and financial domains. Modern eXplainable Artificial Intelligence (XAI) techniques focus on benchmark datasets, but the cognitive applicability of such solutions under big data settings is still unclear due to memory or computation constraints. In this paper, we extend a model-agnostic XAI methodology, named <i>Cluster-Aided Space Transformation for Local Explanation</i> (CASTLE), to be able to deal with high-volume datasets. CASTLE aims to explain the black-box behavior of predictive models by combining both <i>local</i> (i.e., based on the input sample) and <i>global</i> (i.e., based on the whole scope for action of the model) information. In particular, the local explanation provides a rule-based explanation for the prediction of a target instance as well as the directions to update the likelihood of the predicted class. Our extension leverages modern big data technologies (e.g., Apache Spark) to handle the high volume, variety, and velocity of huge datasets. We have evaluated the framework on five datasets, in terms of temporal efficiency, explanation quality, and model significance. Our results indicate that the proposed approach retains the high-quality explanations associated with CASTLE while efficiently handling large datasets. Importantly, it exhibits a sub-linear, rather than exponential, dependence on dataset size, making it a scalable solution for massive datasets or in any big data scenario.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"49 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140567891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}