In this paper, the design methodology of artificial intelligent pancreas is presented. Accurate regulation of blood glucose levels in type 1 diabetic patients is of great importance in the presence of possible faults caused by sensor measurements. Regulation of blood glucose levels using a type 3 fuzzy predictive controller in type 1 diabetic patients in the presence of sensor faults is considered. The proposed structure includes a main control structure and a virtual dynamic, in which the main structure includes a fuzzy identifier, predictive controller, and an adaptive compensator, and the virtual structure is used to identify the sensor faults. Glucose is unknown in the dynamics of type 1 diabetes and is estimated on-line using a type 3 fuzzy system. Also, Lyapunov stability analysis is used to design the adaptive compensator to ensure the stability of the closed-loop system. The proposed methodology is evaluated based on Bergman’s minimum model for different patients under various parametric uncertainties and disturbances.
{"title":"Artificial intelligent pancreas for type 1 diabetic patients using adaptive type 3 fuzzy fault tolerant predictive control","authors":"Arman Khani , Peyman Bagheri , Mahdi Baradarannia , Ardashir Mohammadzadeh","doi":"10.1016/j.engappai.2024.109627","DOIUrl":"10.1016/j.engappai.2024.109627","url":null,"abstract":"<div><div>In this paper, the design methodology of artificial intelligent pancreas is presented. Accurate regulation of blood glucose levels in type 1 diabetic patients is of great importance in the presence of possible faults caused by sensor measurements. Regulation of blood glucose levels using a type 3 fuzzy predictive controller in type 1 diabetic patients in the presence of sensor faults is considered. The proposed structure includes a main control structure and a virtual dynamic, in which the main structure includes a fuzzy identifier, predictive controller, and an adaptive compensator, and the virtual structure is used to identify the sensor faults. Glucose is unknown in the dynamics of type 1 diabetes and is estimated on-line using a type 3 fuzzy system. Also, Lyapunov stability analysis is used to design the adaptive compensator to ensure the stability of the closed-loop system. The proposed methodology is evaluated based on Bergman’s minimum model for different patients under various parametric uncertainties and disturbances.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109627"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This research introduces the Interval-Valued T-Spherical Fuzzy Graph (IVTSFG), a novel extension of fuzzy graph theory designed to address imprecision in decision-making processes, network analysis, and Computer Communication Networks (CCNs). Integrating four types of membership degrees-membership, non-membership, abstinence, and hesitancy-the IVTSFG framework significantly enhances the ability to model and analyze complex systems with uncertain data. The study explores the theories of domination and double domination within the context of IVTSFGs, presenting new methods for evaluating network resilience and optimization. Key findings include the development of innovative techniques for applying domination and double domination in IVTSFGs, demonstrating improved performance in managing CCNs. Comparative analysis with existing fuzzy graph models highlights the advantages of IVTSFGs, particularly in capturing nuanced relationships within network structures. The research provides practical examples and empirical comparisons, showcasing the framework's effectiveness in various decision-making scenarios.
{"title":"Analysis of computer communication networks based on evaluation of domination and double domination for interval-valued T-spherical fuzzy graphs and their applications in decision-making problems","authors":"Sami Ullah Khan , Fiaz Hussain , Tapan Senapati , Shoukat Hussain , Zeeshan Ali , Domokos Esztergár-Kiss , Sarbast Moslem","doi":"10.1016/j.engappai.2024.109650","DOIUrl":"10.1016/j.engappai.2024.109650","url":null,"abstract":"<div><div>This research introduces the Interval-Valued T-Spherical Fuzzy Graph (IVTSFG), a novel extension of fuzzy graph theory designed to address imprecision in decision-making processes, network analysis, and Computer Communication Networks (CCNs). Integrating four types of membership degrees-membership, non-membership, abstinence, and hesitancy-the IVTSFG framework significantly enhances the ability to model and analyze complex systems with uncertain data. The study explores the theories of domination and double domination within the context of IVTSFGs, presenting new methods for evaluating network resilience and optimization. Key findings include the development of innovative techniques for applying domination and double domination in IVTSFGs, demonstrating improved performance in managing CCNs. Comparative analysis with existing fuzzy graph models highlights the advantages of IVTSFGs, particularly in capturing nuanced relationships within network structures. The research provides practical examples and empirical comparisons, showcasing the framework's effectiveness in various decision-making scenarios.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109650"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109553
Zhiyuan Wang , Jinhao Duan , Chenxi Yuan , Qingyu Chen , Tianlong Chen , Yue Zhang , Ren Wang , Xiaoshuang Shi , Kaidi Xu
Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence (AI) interaction systems, particularly in the domain of healthcare engineering. However, a robust and general uncertainty measure for free-form answers has not been well-established in open-ended medical question-answering (QA) tasks, where generative inequality introduces a large number of irrelevant words and sequences within the generated set for uncertainty quantification (UQ), which can lead to biases. This paper proposes Word-Sequence Entropy (WSE), which calibrates uncertainty at both the word and sequence levels based on semantic relevance, highlighting keywords and enlarging the generative probability of trustworthy responses when performing UQ. We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs), and demonstrate that WSE exhibits superior performance in accurate UQ under two standard criteria for correctness evaluation. Additionally, in terms of the potential for real-world medical QA applications, we achieve a significant enhancement (e.g., a 6.36% improvement in model accuracy on the COVID-QA dataset) in the performance of LLMs when employing responses with lower uncertainty that are identified by WSE as final answers, without requiring additional task-specific fine-tuning or architectural modifications.
{"title":"Word-Sequence Entropy: Towards uncertainty estimation in free-form medical question answering applications and beyond","authors":"Zhiyuan Wang , Jinhao Duan , Chenxi Yuan , Qingyu Chen , Tianlong Chen , Yue Zhang , Ren Wang , Xiaoshuang Shi , Kaidi Xu","doi":"10.1016/j.engappai.2024.109553","DOIUrl":"10.1016/j.engappai.2024.109553","url":null,"abstract":"<div><div>Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence (AI) interaction systems, particularly in the domain of healthcare engineering. However, a robust and general uncertainty measure for free-form answers has not been well-established in open-ended medical question-answering (QA) tasks, where generative inequality introduces a large number of irrelevant words and sequences within the generated set for uncertainty quantification (UQ), which can lead to biases. This paper proposes Word-Sequence Entropy (<em>WSE</em>), which calibrates uncertainty at both the word and sequence levels based on semantic relevance, highlighting keywords and enlarging the generative probability of trustworthy responses when performing UQ. We compare <em>WSE</em> with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs), and demonstrate that <em>WSE</em> exhibits superior performance in accurate UQ under two standard criteria for correctness evaluation. Additionally, in terms of the potential for real-world medical QA applications, we achieve a significant enhancement (e.g., a 6.36% improvement in model accuracy on the COVID-QA dataset) in the performance of LLMs when employing responses with lower uncertainty that are identified by <em>WSE</em> as final answers, without requiring additional task-specific fine-tuning or architectural modifications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109553"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109665
Maryam Imani
Polarimetric synthetic aperture radar (PolSAR) images containing polarimetric, scattering and contextual features are useful radar data for ground surface classification. Appropriate feature extraction and fusion by using a small set of available labeled samples is an important and challenging task. Several transformers with self-attention mechanism have recently achieved great success for PolSAR image classification. While almost all methods just exploit the self-attention features from the PolSAR cube, the feature fusion method proposed in this work, which is called attention based scattering and contextual (ASC) network, utilizes the polarimetric self-attention beside two cross-attention blocks. The cross-attention blocks extract the polarimetric-scattering dependencies and polarimetric-contextual interactions, individually. The proposed ASC network uses three inputs: the PolSAR cube, the scattering feature maps obtained by clustering of the entropy-alpha features, and the segmentation maps obtained by a super-pixel generation algorithm. The features extracted by self- and cross-attention blocks are fused together, and the residual learning improves the feature learning. While transformers and attention-based networks usually need large training sets, the proposed ASC network shows high efficiency with relatively low number of training samples in various real and synthetic PolSAR images. For example, in the Flevoland PolSAR image containing 15 classes acquired by AIRSAR in L-band, with using 100 training samples per class (less than 1% of labeled samples), the ASC network achieves the overall accuracy of 99.51, which is statistically preferred than the self-attention-based network according to the McNemars test.
{"title":"Attention based network for fusion of polarimetric and contextual features for polarimetric synthetic aperture radar image classification","authors":"Maryam Imani","doi":"10.1016/j.engappai.2024.109665","DOIUrl":"10.1016/j.engappai.2024.109665","url":null,"abstract":"<div><div>Polarimetric synthetic aperture radar (PolSAR) images containing polarimetric, scattering and contextual features are useful radar data for ground surface classification. Appropriate feature extraction and fusion by using a small set of available labeled samples is an important and challenging task. Several transformers with self-attention mechanism have recently achieved great success for PolSAR image classification. While almost all methods just exploit the self-attention features from the PolSAR cube, the feature fusion method proposed in this work, which is called attention based scattering and contextual (ASC) network, utilizes the polarimetric self-attention beside two cross-attention blocks. The cross-attention blocks extract the polarimetric-scattering dependencies and polarimetric-contextual interactions, individually. The proposed ASC network uses three inputs: the PolSAR cube, the scattering feature maps obtained by clustering of the entropy-alpha features, and the segmentation maps obtained by a super-pixel generation algorithm. The features extracted by self- and cross-attention blocks are fused together, and the residual learning improves the feature learning. While transformers and attention-based networks usually need large training sets, the proposed ASC network shows high efficiency with relatively low number of training samples in various real and synthetic PolSAR images. For example, in the Flevoland PolSAR image containing 15 classes acquired by AIRSAR in L-band, with using 100 training samples per class (less than 1% of labeled samples), the ASC network achieves the overall accuracy of 99.51, which is statistically preferred than the self-attention-based network according to the McNemars test.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109665"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109561
Ceca Kraišniković , Robert Harb , Markus Plass , Wael Al Zoughbi , Andreas Holzinger , Heimo Müller
Integrating large language models (LLMs) to retrieve targeted medical knowledge from electronic health records enables significant advancements in medical research. However, recognizing the challenges associated with using LLMs in healthcare is essential for successful implementation. One challenge is that medical records combine unstructured textual information with highly sensitive personal data. This, in turn, highlights the need for explainable Artificial Intelligence (XAI) methods to understand better how LLMs function in the medical domain. In this study, we propose a novel XAI tool to accelerate data-driven cancer research. We apply the Bidirectional Encoder Representations from Transformers (BERT) model to German language pathology reports examining the effects of domain-specific language adaptation and fine-tuning. We demonstrate our model on a real-world pathology dataset, analyzing the contextual representations of diagnostic reports. By illustrating decisions made by fine-tuned models, we provide decision values that can be applied in medical research. To address interpretability, we conduct a performance evaluation of the classifications generated by our fine-tuned model, as assessed by an expert pathologist. In domains such as medicine, inspection of the medical knowledge map in conjunction with expert evaluation reveals valuable information about how contextual representations of key disease features are categorized. This ultimately benefits data structuring and labeling and paves the way for even more advanced approaches to XAI, combining text with other input modalities, such as images which are then applicable to various engineering problems.
{"title":"Fine-tuning language model embeddings to reveal domain knowledge: An explainable artificial intelligence perspective on medical decision making","authors":"Ceca Kraišniković , Robert Harb , Markus Plass , Wael Al Zoughbi , Andreas Holzinger , Heimo Müller","doi":"10.1016/j.engappai.2024.109561","DOIUrl":"10.1016/j.engappai.2024.109561","url":null,"abstract":"<div><div>Integrating large language models (LLMs) to retrieve targeted medical knowledge from electronic health records enables significant advancements in medical research. However, recognizing the challenges associated with using LLMs in healthcare is essential for successful implementation. One challenge is that medical records combine unstructured textual information with highly sensitive personal data. This, in turn, highlights the need for explainable Artificial Intelligence (XAI) methods to understand better how LLMs function in the medical domain. In this study, we propose a novel XAI tool to accelerate data-driven cancer research. We apply the Bidirectional Encoder Representations from Transformers (BERT) model to German language pathology reports examining the effects of domain-specific language adaptation and fine-tuning. We demonstrate our model on a real-world pathology dataset, analyzing the contextual representations of diagnostic reports. By illustrating decisions made by fine-tuned models, we provide decision values that can be applied in medical research. To address interpretability, we conduct a performance evaluation of the classifications generated by our fine-tuned model, as assessed by an expert pathologist. In domains such as medicine, inspection of the medical knowledge map in conjunction with expert evaluation reveals valuable information about how contextual representations of key disease features are categorized. This ultimately benefits data structuring and labeling and paves the way for even more advanced approaches to XAI, combining text with other input modalities, such as images which are then applicable to various engineering problems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109561"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109656
Jianwei Wang , Xiaofan Jin , Xuchu Liu , Ze He , Jiachen Chai , Pengfa Liu , Yuqing Wang , Wei Cai , Rui Guo
To address the issue of low accuracy in the current motion response prediction model of the floating platform tensioner system, this paper proposes an online prediction method that integrates Empirical Mode Decomposition (EMD), Kernel Principal Component Analysis (KPCA), and Long Short-Term Memory (LSTM). The EMD technique is employed to decompose the sequence of environmental factors, reducing their non-stationarity. Subsequently, KPCA is used to extract key influencing factors and reduce input dimensionality. Finally, LSTM neural networks are applied to capture long-term dependencies in features and make accurate predictions. By validating the model using motion response data from the tensioner platform device under two scenarios with and without internal waves, it is compared against other models. The results show that the EMD-KPCA-LSTM model has high prediction accuracy in both scenarios. In particular, compared with the Convolutional Neural Network (CNN) model, the mean Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) of the displacement and tension of the system decreased by 52.97%, 55.14%, 56.31%, 68.97%, 71.02% 57.60%, respectively, and R-square (R2) increased by 7.14% and 12.37%. In summary, the model has a good ability for data fitting and high prediction accuracy and has important practical value.
{"title":"Online prediction of hydro-pneumatic tensioner system of floating platform under internal waves","authors":"Jianwei Wang , Xiaofan Jin , Xuchu Liu , Ze He , Jiachen Chai , Pengfa Liu , Yuqing Wang , Wei Cai , Rui Guo","doi":"10.1016/j.engappai.2024.109656","DOIUrl":"10.1016/j.engappai.2024.109656","url":null,"abstract":"<div><div>To address the issue of low accuracy in the current motion response prediction model of the floating platform tensioner system, this paper proposes an online prediction method that integrates Empirical Mode Decomposition (EMD), Kernel Principal Component Analysis (KPCA), and Long Short-Term Memory (LSTM). The EMD technique is employed to decompose the sequence of environmental factors, reducing their non-stationarity. Subsequently, KPCA is used to extract key influencing factors and reduce input dimensionality. Finally, LSTM neural networks are applied to capture long-term dependencies in features and make accurate predictions. By validating the model using motion response data from the tensioner platform device under two scenarios with and without internal waves, it is compared against other models. The results show that the EMD-KPCA-LSTM model has high prediction accuracy in both scenarios. In particular, compared with the Convolutional Neural Network (CNN) model, the mean Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) of the displacement and tension of the system decreased by 52.97%, 55.14%, 56.31%, 68.97%, 71.02% 57.60%, respectively, and R-square (R<sup>2</sup>) increased by 7.14% and 12.37%. In summary, the model has a good ability for data fitting and high prediction accuracy and has important practical value.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109656"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109634
Zili Wang , Jie Li , Xiaojian Liu , Shuyou Zhang , Yaochen Lin , Jianrong Tan
In response to the growing demand for small-batch bending tube production, traditional bending dies require separate customization for each tube size, resulting in extended design cycles and high costs. To meet bending requirements for tubes of different diameters using a single mandrel, a novel adjustable diameter mechanism (DAM) and its optimization design method are proposed. Initially, the DAM based on a planetary bevel gear-screw transmission set is developed for bending tubes of varying diameters. Subsequently, a domain knowledge-integrated optimization design framework is introduced. To reduce the cost of acquiring training samples for training surrogate models, a monotonicity-constrained neural network based on cascade boosting architecture (CB-MCNN) is introduced that enhances prediction accuracy while maintaining monotonicity. To improve the optimization speed and quality of Evolutionary Algorithms (EAs), a domain knowledge-guided EA (DK-EA) method is proposed, incorporating domain knowledge into the population initialization phase. The results indicate that: (1) CB-MCNN outperforms traditional methods and shows excellent performance on small-sample datasets. (2) DK-EA accelerates optimization processes and produces better outcomes. As a result, the domain knowledge-integrated optimization design framework enables the DAM to achieve a wider diameter variation range and enhanced reliability. The optimized DAM demonstrates the capability to bend tubes with diameters of 46–60 mm.
为满足日益增长的小批量弯管生产需求,传统的弯管模具需要针对每种管材尺寸进行单独定制,导致设计周期延长、成本高昂。为满足使用单一芯轴弯曲不同直径管材的要求,我们提出了一种新型可调直径机构(DAM)及其优化设计方法。首先,开发了基于行星锥齿轮-螺杆传动装置的可调直径机构,用于弯曲不同直径的管材。随后,引入了一个整合领域知识的优化设计框架。为了降低训练代用模型时获取训练样本的成本,引入了基于级联提升架构的单调性受限神经网络(CB-MCNN),在保持单调性的同时提高了预测精度。为了提高进化算法(EA)的优化速度和质量,提出了一种领域知识指导的进化算法(DK-EA)方法,将领域知识纳入种群初始化阶段。结果表明(1) CB-MCNN 优于传统方法,在小样本数据集上表现出色。(2) DK-EA 加快了优化过程,并产生了更好的结果。因此,整合了领域知识的优化设计框架使 DAM 的直径变化范围更广,可靠性更高。优化后的 DAM 能够弯曲直径为 46-60 毫米的管道。
{"title":"Diameter-adjustable mandrel for thin-wall tube bending and its domain knowledge-integrated optimization design framework","authors":"Zili Wang , Jie Li , Xiaojian Liu , Shuyou Zhang , Yaochen Lin , Jianrong Tan","doi":"10.1016/j.engappai.2024.109634","DOIUrl":"10.1016/j.engappai.2024.109634","url":null,"abstract":"<div><div>In response to the growing demand for small-batch bending tube production, traditional bending dies require separate customization for each tube size, resulting in extended design cycles and high costs. To meet bending requirements for tubes of different diameters using a single mandrel, a novel adjustable diameter mechanism (DAM) and its optimization design method are proposed. Initially, the DAM based on a planetary bevel gear-screw transmission set is developed for bending tubes of varying diameters. Subsequently, a domain knowledge-integrated optimization design framework is introduced. To reduce the cost of acquiring training samples for training surrogate models, a monotonicity-constrained neural network based on cascade boosting architecture (CB-MCNN) is introduced that enhances prediction accuracy while maintaining monotonicity. To improve the optimization speed and quality of Evolutionary Algorithms (EAs), a domain knowledge-guided EA (DK-EA) method is proposed, incorporating domain knowledge into the population initialization phase. The results indicate that: (1) CB-MCNN outperforms traditional methods and shows excellent performance on small-sample datasets. (2) DK-EA accelerates optimization processes and produces better outcomes. As a result, the domain knowledge-integrated optimization design framework enables the DAM to achieve a wider diameter variation range and enhanced reliability. The optimized DAM demonstrates the capability to bend tubes with diameters of 46–60 mm.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109634"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109654
Renkai Wu , Pengchen Liang , Yinghao Liu , Yiqi Huang , Wangyan Li , Qing Chang
3-Dimensional (3D) reconstruction of laparoscopic surgical scenes is a key task for future surgical navigation and automated robotic minimally invasive surgery. Binocular laparoscopy with stereo matching enables 3D reconstruction. Stereo matching models used for natural images such as autopilot tend to be less suitable for laparoscopic environments due to the constraints of small samples of laparoscopic images, complex textures, and uneven illumination. In addition, current stereo matching modules use 3D convolutions and transformers in the spatial domain as the base module, which is limited by the ability to learn in the spatial domain. In this paper, we propose a model for laparoscopic stereo matching using 3D Fourier Transform combined with Full Multi-scale Features (FT-FMF Net). Specifically, the proposed Full Multi-scale Fusion Module (FMFM) is able to fuse the full multi-scale feature information from the feature extractor into the stereo matching block, which densely learns the feature information with parallax and FMFM fusion information in the frequency domain using the proposed Dense Fourier Transform Module (DFTM). We validated the proposed method in both the laparoscopic dataset (SCARED) and the endoscopic dataset (SERV-CT). In comparison with other popular and advanced deep learning models available at present, FT-FMF Net achieves the most advanced stereo matching performance available. In the SCARED and SERV-CT public datasets, the End-Point-Error (EPE) was 0.7265 and 2.3119, and the Root Mean Square Error Depth (RMSE Depth) was 4.00 mm and 3.69 mm, respectively. In addition, the inference time is only 0.17s. Our project code is available on https://github.com/wurenkai/FT-FMF.
{"title":"Laparoscopic stereo matching using 3-Dimensional Fourier transform with full multi-scale features","authors":"Renkai Wu , Pengchen Liang , Yinghao Liu , Yiqi Huang , Wangyan Li , Qing Chang","doi":"10.1016/j.engappai.2024.109654","DOIUrl":"10.1016/j.engappai.2024.109654","url":null,"abstract":"<div><div><strong>3</strong>-<strong>D</strong>imensional (3D) reconstruction of laparoscopic surgical scenes is a key task for future surgical navigation and automated robotic minimally invasive surgery. Binocular laparoscopy with stereo matching enables 3D reconstruction. Stereo matching models used for natural images such as autopilot tend to be less suitable for laparoscopic environments due to the constraints of small samples of laparoscopic images, complex textures, and uneven illumination. In addition, current stereo matching modules use 3D convolutions and transformers in the spatial domain as the base module, which is limited by the ability to learn in the spatial domain. In this paper, we propose a model for laparoscopic stereo matching using 3D <strong>F</strong>ourier <strong>T</strong>ransform combined with <strong>F</strong>ull <strong>M</strong>ulti-scale <strong>F</strong>eatures (FT-FMF Net). Specifically, the proposed <strong>F</strong>ull <strong>M</strong>ulti-scale <strong>F</strong>usion <strong>M</strong>odule (FMFM) is able to fuse the full multi-scale feature information from the feature extractor into the stereo matching block, which densely learns the feature information with parallax and FMFM fusion information in the frequency domain using the proposed <strong>D</strong>ense <strong>F</strong>ourier <strong>T</strong>ransform <strong>M</strong>odule (DFTM). We validated the proposed method in both the laparoscopic dataset (SCARED) and the endoscopic dataset (SERV-CT). In comparison with other popular and advanced deep learning models available at present, FT-FMF Net achieves the most advanced stereo matching performance available. In the SCARED and SERV-CT public datasets, the End-Point-Error (EPE) was 0.7265 and 2.3119, and the <strong>R</strong>oot <strong>M</strong>ean <strong>S</strong>quare <strong>E</strong>rror Depth (RMSE Depth) was 4.00 mm and 3.69 mm, respectively. In addition, the inference time is only 0.17s. Our project code is available on <span><span>https://github.com/wurenkai/FT-FMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109654"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.engappai.2024.109612
Yuzhen Niu , Yuqi He , Rui Xu , Yuezhou Li , Yuzhong Chen
Defocus deblurring using dual-pixel sensors has gathered significant attention in recent years. However, current methodologies have not adequately addressed the challenge of defocus disparity between dual views, resulting in suboptimal performance in recovering details from severely defocused pixels. To counteract this limitation, we introduce in this paper a parallax-aware dual-view feature enhancement and adaptive detail compensation network (PA-Net), specifically tailored for dual-pixel defocus deblurring task. Our proposed PA-Net leverages an encoder–decoder architecture augmented with skip connections, designed to initially extract distinct features from the left and right views. A pivotal aspect of our model lies at the network’s bottleneck, where we introduce a parallax-aware dual-view feature enhancement based on Transformer blocks, which aims to align and enhance extracted dual-pixel features, aggregating them into a unified feature. Furthermore, taking into account the disparity and the rich details embedded in encoder features, we design an adaptive detail compensation module to adaptively incorporate dual-view encoder features into image reconstruction, aiding in restoring image details. Experimental results demonstrate that our proposed PA-Net exhibits superior performance and visual effects on the real-world dataset.
{"title":"Parallax-aware dual-view feature enhancement and adaptive detail compensation for dual-pixel defocus deblurring","authors":"Yuzhen Niu , Yuqi He , Rui Xu , Yuezhou Li , Yuzhong Chen","doi":"10.1016/j.engappai.2024.109612","DOIUrl":"10.1016/j.engappai.2024.109612","url":null,"abstract":"<div><div>Defocus deblurring using dual-pixel sensors has gathered significant attention in recent years. However, current methodologies have not adequately addressed the challenge of defocus disparity between dual views, resulting in suboptimal performance in recovering details from severely defocused pixels. To counteract this limitation, we introduce in this paper a parallax-aware dual-view feature enhancement and adaptive detail compensation network (PA-Net), specifically tailored for dual-pixel defocus deblurring task. Our proposed PA-Net leverages an encoder–decoder architecture augmented with skip connections, designed to initially extract distinct features from the left and right views. A pivotal aspect of our model lies at the network’s bottleneck, where we introduce a parallax-aware dual-view feature enhancement based on Transformer blocks, which aims to align and enhance extracted dual-pixel features, aggregating them into a unified feature. Furthermore, taking into account the disparity and the rich details embedded in encoder features, we design an adaptive detail compensation module to adaptively incorporate dual-view encoder features into image reconstruction, aiding in restoring image details. Experimental results demonstrate that our proposed PA-Net exhibits superior performance and visual effects on the real-world dataset.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109612"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aspect sentiment triplet extraction aims to analyze aspect-level sentiment in the form of triplets, including extracting aspect-opinion pairs and predicting the sentiment polarities of these pairs. Many recent works rely on syntactic information (e.g. part-of-speech and syntactic dependency relation) to handle this semantic task, which ignores uncommon part-of-speech items and matches semantically unrelated words. To overcome these drawbacks, we propose a SenticNet and Abstract Meaning Representation (AMR) driven Attention-Gate semantic framework (SAAG), which introduces semantic sentiment knowledge SenticNet and semantic structure AMR as semantic information to replace syntactic information. To highlight the affective meanings in words, an affective-driven attention mechanism is designed to emphasizes sentiment intent within word representations. To match semantically related words, the designed AMR-driven gate mechanism balances the word pair expressions under varying semantic contexts. Extensive experiments on two public datasets demonstrate the effectiveness of our approach.
方面情感三连抽取旨在分析三连形式的方面级情感,包括抽取方面-观点对和预测这些对的情感极性。最近的许多研究都依赖句法信息(如语音部分和句法依赖关系)来处理这一语义任务,这就忽略了不常见的语音部分项,并匹配语义上不相关的词。为了克服这些缺点,我们提出了一种由 SenticNet 和抽象意义表示(AMR)驱动的注意门语义框架(SAAG),它引入了语义情感知识 SenticNet 和语义结构 AMR 作为语义信息来替代句法信息。为了突出词语中的情感含义,设计了情感驱动的注意机制,以强调词语表征中的情感意图。为了匹配语义相关的词语,所设计的 AMR 驱动门机制可在不同语义语境下平衡词对表达。在两个公开数据集上进行的广泛实验证明了我们方法的有效性。
{"title":"SenticNet and Abstract Meaning Representation driven Attention-Gate semantic framework for aspect sentiment triplet extraction","authors":"Xiaowen Sun, Jiangtao Qi, Zhenfang Zhu, Meng Li, Hongli Pei, Jing Meng","doi":"10.1016/j.engappai.2024.109625","DOIUrl":"10.1016/j.engappai.2024.109625","url":null,"abstract":"<div><div>Aspect sentiment triplet extraction aims to analyze aspect-level sentiment in the form of triplets, including extracting aspect-opinion pairs and predicting the sentiment polarities of these pairs. Many recent works rely on syntactic information (e.g. part-of-speech and syntactic dependency relation) to handle this semantic task, which ignores uncommon part-of-speech items and matches semantically unrelated words. To overcome these drawbacks, we propose a SenticNet and Abstract Meaning Representation (AMR) driven Attention-Gate semantic framework (SAAG), which introduces semantic sentiment knowledge SenticNet and semantic structure AMR as semantic information to replace syntactic information. To highlight the affective meanings in words, an affective-driven attention mechanism is designed to emphasizes sentiment intent within word representations. To match semantically related words, the designed AMR-driven gate mechanism balances the word pair expressions under varying semantic contexts. Extensive experiments on two public datasets demonstrate the effectiveness of our approach.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109625"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}