首页 > 最新文献

Expert Systems with Applications最新文献

英文 中文
A deep-learning-based approach for simulating pedestrian turning flow 基于深度学习的行人转弯流模拟方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.eswa.2024.125706
Nan Jiang , Eric Wai Ming Lee , Lizhong Yang , Richard Kwok Kit Yuen , Chunjie Zhai
The applications of artificial intelligence technology in pedestrian and evacuation dynamics research have achieved gratifying progress recent years. Benefiting from the non-linear fitting ability of deep learning algorithm, such learning-based method might have better performance in modeling the individual micro behaviors comparing to traditional pedestrian and evacuation dynamics models. Hence, this paper proposes a deep-learning-based pedestrian dynamics model which can simulate the pedestrian flow in right-angled corridors. The training process is conducted with a deep learning framework composed of two functional layers namely Scene Perception layer (SP layer) and Motion Dynamic layer (MD layer). The input features of the SP layer and MD layer are obtained from a defined ‘sense field’ which captures information about walking facility structures and neighbors. Dataset generated from twelve groups of pedestrian turning flow experiments is used for data training. The initial experiments are further adopted to evaluate the model at both qualitative and quantitative level from pedestrian motion data simulated by trained model. Qualitatively, the simulation results align with the corresponding experiments in terms of fundamental diagrams in different measurement areas and the headway distance-velocity relationship, demonstrating realistic motion characteristics, proper reactions to changeable walking facility structures and collision avoidance tendencies of agents driven by our model. For quantitative evaluation of the model precision, two indicators respectively calculate the duration and trajectory disparities are introduced and both of them yield relatively small values. Moreover, eight groups of external experiments completely independent from training data are introduced to validate the generalization ability of our model, the simulation results are found to match reality well without prior knowledge. The proposed framework presented is a success trial for simulating pedestrian turning flow and have the potential to be adapted to different scenarios. Outcomes presented will be of beneficial guidance for different engineering application such as performance-based fire design and crowd management.
近年来,人工智能技术在行人和疏散动力学研究中的应用取得了可喜的进展。利用深度学习算法的非线性拟合能力,与传统的行人和疏散动力学模型相比,这种基于学习的方法在模拟个体微观行为方面可能会有更好的表现。因此,本文提出了一种基于深度学习的行人动力学模型,可以模拟直角走廊中的行人流。训练过程采用深度学习框架,该框架由两个功能层组成,即场景感知层(SP 层)和运动动态层(MD 层)。场景感知层和运动动态层的输入特征来自定义的 "感知场",该感知场捕捉有关步行设施结构和邻居的信息。数据集由十二组行人转弯流实验生成,用于数据训练。最初的实验被进一步采用,通过训练有素的模型模拟的行人运动数据,对模型进行定性和定量评估。从定性角度来看,模拟结果与相应的实验结果在不同测量区域的基本图和前进距离-速度关系方面相吻合,证明了我们的模型所驱动的行为体具有逼真的运动特征、对多变的步行设施结构的正确反应以及避免碰撞的倾向。为了对模型精度进行定量评估,引入了两个指标,分别计算持续时间和轨迹差异,这两个指标的数值都相对较小。此外,我们还引入了八组完全独立于训练数据的外部实验来验证模型的泛化能力,结果发现模拟结果与实际情况非常吻合,无需事先了解。所提出的框架是模拟行人转弯流的一次成功试验,具有适应不同场景的潜力。提出的结果将为不同的工程应用提供有益的指导,如基于性能的消防设计和人群管理。
{"title":"A deep-learning-based approach for simulating pedestrian turning flow","authors":"Nan Jiang ,&nbsp;Eric Wai Ming Lee ,&nbsp;Lizhong Yang ,&nbsp;Richard Kwok Kit Yuen ,&nbsp;Chunjie Zhai","doi":"10.1016/j.eswa.2024.125706","DOIUrl":"10.1016/j.eswa.2024.125706","url":null,"abstract":"<div><div>The applications of artificial intelligence technology in pedestrian and evacuation dynamics research have achieved gratifying progress recent years. Benefiting from the non-linear fitting ability of deep learning algorithm, such learning-based method might have better performance in modeling the individual micro behaviors comparing to traditional pedestrian and evacuation dynamics models. Hence, this paper proposes a deep-learning-based pedestrian dynamics model which can simulate the pedestrian flow in right-angled corridors. The training process is conducted with a deep learning framework composed of two functional layers namely Scene Perception layer (SP layer) and Motion Dynamic layer (MD layer). The input features of the SP layer and MD layer are obtained from a defined ‘sense field’ which captures information about walking facility structures and neighbors. Dataset generated from twelve groups of pedestrian turning flow experiments is used for data training. The initial experiments are further adopted to evaluate the model at both qualitative and quantitative level from pedestrian motion data simulated by trained model. Qualitatively, the simulation results align with the corresponding experiments in terms of fundamental diagrams in different measurement areas and the headway distance-velocity relationship, demonstrating realistic motion characteristics, proper reactions to changeable walking facility structures and collision avoidance tendencies of agents driven by our model. For quantitative evaluation of the model precision, two indicators respectively calculate the duration and trajectory disparities are introduced and both of them yield relatively small values. Moreover, eight groups of external experiments completely independent from training data are introduced to validate the generalization ability of our model, the simulation results are found to match reality well without prior knowledge. The proposed framework presented is a success trial for simulating pedestrian turning flow and have the potential to be adapted to different scenarios. Outcomes presented will be of beneficial guidance for different engineering application such as performance-based fire design and crowd management.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125706"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bi-directional information interaction for multi-modal 3D object detection in real-world traffic scenes 用于真实世界交通场景中多模态 3D 物体检测的双向信息交互
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.eswa.2024.125651
Yadong Wang , Shuqin Zhang , Yongqiang Deng , Juanjuan Li , Yanlong Yang , Kunfeng Wang
Multimodal 3D object detection methods are poorly adapted to real-world traffic scenes due to sparse distribution of point clouds and misalignment multimodal data during actual collection. Among the existing methods, they focus on high-quality open-source datasets, with performance relying on the accurate structural representation of point clouds and the precise mapping relationship between point clouds and images. To solve the above challenges, this paper proposes a multimodal feature-level fusion method based on the bi-directional interaction between image and point cloud. To overcome the sparsity issue in asynchronous multi-modal data, a point cloud densification scheme based on visual guidance and point cloud density guidance is proposed. This scheme can generate object-level virtual point clouds even when the point cloud and image are misaligned. To deal with the unalignment issue between point cloud and image, a bi-directional interaction module based on image-guided interaction with key points of point clouds and point cloud-guided interaction with image context information is proposed. It achieves effective feature fusion even when the point cloud and image are misaligned. The experiments on the VANJEE and KITTI datasets demonstrated the effectiveness of the proposed method, with average precision improvements of 6.20% and 1.54% compared to the baseline.
由于点云分布稀疏以及实际采集过程中多模态数据的错位,多模态三维物体检测方法对现实世界交通场景的适应性较差。在现有的方法中,它们主要集中于高质量的开源数据集,其性能依赖于点云的精确结构表示和点云与图像之间的精确映射关系。为解决上述难题,本文提出了一种基于图像与点云双向交互的多模态特征级融合方法。为了克服异步多模态数据的稀疏性问题,本文提出了一种基于视觉引导和点云密度引导的点云致密化方案。即使点云和图像不对齐,该方案也能生成对象级虚拟点云。为解决点云和图像之间的不对齐问题,提出了一种基于图像引导的点云关键点交互和基于图像上下文信息的点云引导交互的双向交互模块。即使在点云和图像错位的情况下,它也能实现有效的特征融合。在 VANJEE 和 KITTI 数据集上的实验证明了所提方法的有效性,与基线方法相比,平均精度分别提高了 6.20% 和 1.54%。
{"title":"Bi-directional information interaction for multi-modal 3D object detection in real-world traffic scenes","authors":"Yadong Wang ,&nbsp;Shuqin Zhang ,&nbsp;Yongqiang Deng ,&nbsp;Juanjuan Li ,&nbsp;Yanlong Yang ,&nbsp;Kunfeng Wang","doi":"10.1016/j.eswa.2024.125651","DOIUrl":"10.1016/j.eswa.2024.125651","url":null,"abstract":"<div><div>Multimodal 3D object detection methods are poorly adapted to real-world traffic scenes due to sparse distribution of point clouds and misalignment multimodal data during actual collection. Among the existing methods, they focus on high-quality open-source datasets, with performance relying on the accurate structural representation of point clouds and the precise mapping relationship between point clouds and images. To solve the above challenges, this paper proposes a multimodal feature-level fusion method based on the bi-directional interaction between image and point cloud. To overcome the sparsity issue in asynchronous multi-modal data, a point cloud densification scheme based on visual guidance and point cloud density guidance is proposed. This scheme can generate object-level virtual point clouds even when the point cloud and image are misaligned. To deal with the unalignment issue between point cloud and image, a bi-directional interaction module based on image-guided interaction with key points of point clouds and point cloud-guided interaction with image context information is proposed. It achieves effective feature fusion even when the point cloud and image are misaligned. The experiments on the VANJEE and KITTI datasets demonstrated the effectiveness of the proposed method, with average precision improvements of 6.20% and 1.54% compared to the baseline.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125651"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance and sustainability of BERT derivatives in dyadic data 二元数据中 BERT 衍生物的性能和可持续性
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.eswa.2024.125647
Miguel Escarda, Carlos Eiras-Franco, Brais Cancela, Bertha Guijarro-Berdiñas, Amparo Alonso-Betanzos
In recent years, the Natural Language Processing (NLP) field has experienced a revolution, where numerous models – based on the Transformer architecture – have emerged to process the ever-growing volume of online text-generated data. This architecture has been the basis for the rise of Large Language Models (LLMs). Enabling their application to many diverse tasks in which they excel with just a fine-tuning process that comes right after a vast pre-training phase. However, their sustainability can often be overlooked, especially regarding computational and environmental costs. Our research aims to compare various BERT derivatives in the context of a dyadic data task while also drawing attention to the growing need for sustainable AI solutions. To this end, we utilize a selection of transformer models in an explainable recommendation setting, modeled as a multi-label classification task originating from a social network context, where users, restaurants, and reviews interact.
近年来,自然语言处理(NLP)领域经历了一场革命,出现了许多基于 Transformer 架构的模型,用于处理不断增长的在线文本生成数据。这种架构是大型语言模型(LLM)兴起的基础。在经过大量的预训练阶段后,只需进行微调,就能将其应用到许多不同的任务中。然而,它们的可持续性往往被忽视,尤其是在计算和环境成本方面。我们的研究旨在比较二元数据任务背景下的各种 BERT 衍生工具,同时提请人们关注对可持续人工智能解决方案日益增长的需求。为此,我们在一个可解释的推荐环境中使用了一些变换器模型,该环境被模拟为一个多标签分类任务,该任务源自用户、餐厅和评论互动的社交网络环境。
{"title":"Performance and sustainability of BERT derivatives in dyadic data","authors":"Miguel Escarda,&nbsp;Carlos Eiras-Franco,&nbsp;Brais Cancela,&nbsp;Bertha Guijarro-Berdiñas,&nbsp;Amparo Alonso-Betanzos","doi":"10.1016/j.eswa.2024.125647","DOIUrl":"10.1016/j.eswa.2024.125647","url":null,"abstract":"<div><div>In recent years, the Natural Language Processing (NLP) field has experienced a revolution, where numerous models – based on the Transformer architecture – have emerged to process the ever-growing volume of online text-generated data. This architecture has been the basis for the rise of Large Language Models (LLMs). Enabling their application to many diverse tasks in which they excel with just a fine-tuning process that comes right after a vast pre-training phase. However, their sustainability can often be overlooked, especially regarding computational and environmental costs. Our research aims to compare various BERT derivatives in the context of a dyadic data task while also drawing attention to the growing need for sustainable AI solutions. To this end, we utilize a selection of transformer models in an explainable recommendation setting, modeled as a multi-label classification task originating from a social network context, where users, restaurants, and reviews interact.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125647"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding representative group fairness metrics using correlation estimations 利用相关性估计找到具有代表性的群体公平性指标
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.eswa.2024.125652
Hadis Anahideh, Nazanin Nezami, Abolfazl Asudeh
It is of critical importance to be aware of the historical discrimination embedded in the data and to consider a fairness measure to reduce bias throughout the predictive modeling pipeline. Given various notions of fairness defined in the literature, investigating the correlation and interaction among metrics is vital for addressing unfairness. Practitioners and data scientists should be able to comprehend each metric and examine their impact on one another given the context, use case, and regulations. Exploring the combinatorial space of different metrics for such examination is burdensome. To alleviate the burden of selecting fairness notions for consideration, we propose a framework that estimates the correlation among fairness notions. Our framework consequently identifies a set of diverse and semantically distinct metrics as representative of a given context. We propose a Monte Carlo sampling technique for computing the correlations between fairness metrics by indirect and efficient perturbation in the model space. Using the estimated correlations, we then find a subset of representative metrics. The paper proposes a generic method that can be generalized to any arbitrary set of fairness metrics. We showcase the validity of the proposal using comprehensive experiments on real-world benchmark datasets.
在整个预测建模过程中,意识到数据中蕴含的历史歧视并考虑采取公平措施来减少偏差至关重要。鉴于文献中定义的各种公平性概念,研究指标之间的相关性和相互作用对于解决不公平问题至关重要。从业人员和数据科学家应该能够理解每个指标,并根据背景、用例和法规检查它们之间的相互影响。为进行此类检查而探索不同指标的组合空间是一种负担。为了减轻选择公平性概念的负担,我们提出了一个估算公平性概念之间相关性的框架。因此,我们的框架能识别出一组不同的、在语义上截然不同的度量标准,作为特定语境的代表。我们提出了一种蒙特卡罗抽样技术,通过在模型空间中进行间接有效的扰动来计算公平指标之间的相关性。利用估计的相关性,我们就能找到具有代表性的指标子集。本文提出了一种通用方法,可以推广到任意一组公平性指标。我们在真实世界的基准数据集上进行了综合实验,展示了该建议的有效性。
{"title":"Finding representative group fairness metrics using correlation estimations","authors":"Hadis Anahideh,&nbsp;Nazanin Nezami,&nbsp;Abolfazl Asudeh","doi":"10.1016/j.eswa.2024.125652","DOIUrl":"10.1016/j.eswa.2024.125652","url":null,"abstract":"<div><div>It is of critical importance to be aware of the historical discrimination embedded in the data and to consider a fairness measure to reduce bias throughout the predictive modeling pipeline. Given various notions of fairness defined in the literature, investigating the correlation and interaction among metrics is vital for addressing unfairness. Practitioners and data scientists should be able to comprehend each metric and examine their impact on one another given the context, use case, and regulations. Exploring the combinatorial space of different metrics for such examination is burdensome. To alleviate the burden of selecting fairness notions for consideration, we propose a framework that estimates the correlation among fairness notions. Our framework consequently identifies a set of diverse and semantically distinct metrics as representative of a given context. We propose a Monte Carlo sampling technique for computing the correlations between fairness metrics by indirect and efficient perturbation in the model space. Using the estimated correlations, we then find a subset of representative metrics. The paper proposes a generic method that can be generalized to any arbitrary set of fairness metrics. We showcase the validity of the proposal using comprehensive experiments on real-world benchmark datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125652"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid genetic algorithm with Wiener process for multi-scale colored balanced traveling salesman problem 多尺度彩色平衡旅行推销员问题的维纳过程混合遗传算法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.eswa.2024.125610
Xueshi Dong , Liwen Ma , Xin Zhao , Yongchang Shan , Jie Wang , Zhenghao Xu
Colored traveling salesman problem (CTSP) can be applied to Multi-machine Engineering Systems (MES) in industry, colored balanced traveling salesman problem (CBTSP) is a variant of CTSP, which can be used to model the optimization problems with partially overlapped workspace such as the planning optimization (For example, process planning, assembly planning, productions scheduling). The traditional algorithms have been used to solve CBTSP, however, they are limited both in solution quality and solving speed, and the scale of CBTSP is also restricted. Moreover, the traditional algorithms still have the problems such as lacking theoretical support of mathematical physics. In order to improve these, this paper proposes a novel hybrid genetic algorithm (NHGA) based on Wiener process (ITÖ process) and generating neighborhood solution (GNS) to solve multi-scale CBTSP problem. NHGA firstly uses dual-chromosome coding to construct the solutions of CBTSP, then they are updated by the crossover operator, mutation operator and GNS. The crossover length of the crossover operator and the city number of the mutation operator are controlled by activity intensity based on ITÖ process, while the city keeping probability of GNS can be learned or obtained by Wiener process. The experiments show that NHGA can demonstrate an improvement over the state-of-art algorithms for multi-scale CBTSP in term of solution quality.
有色平衡旅行推销员问题(CBTSP)是 CTSP 的一种变体,可用于规划优化(如流程规划、装配规划、生产调度)等工作空间部分重叠的优化问题建模。传统算法已被用于求解 CBTSP,但它们在求解质量和求解速度上都受到限制,而且 CBTSP 的规模也受到限制。此外,传统算法还存在缺乏数学物理理论支持等问题。为了改善这些问题,本文提出了一种基于维纳过程(ITÖ process)和生成邻域解(GNS)的新型混合遗传算法(NHGA)来解决多尺度 CBTSP 问题。NHGA 首先使用双染色体编码构建 CBTSP 的解,然后通过交叉算子、突变算子和 GNS 对其进行更新。交叉算子的交叉长度和突变算子的城市数由基于 ITÖ 过程的活动强度控制,而 GNS 的城市保持概率可以通过学习或 Wiener 过程获得。实验表明,NHGA 在多尺度 CBTSP 的求解质量方面比最先进的算法有所提高。
{"title":"Hybrid genetic algorithm with Wiener process for multi-scale colored balanced traveling salesman problem","authors":"Xueshi Dong ,&nbsp;Liwen Ma ,&nbsp;Xin Zhao ,&nbsp;Yongchang Shan ,&nbsp;Jie Wang ,&nbsp;Zhenghao Xu","doi":"10.1016/j.eswa.2024.125610","DOIUrl":"10.1016/j.eswa.2024.125610","url":null,"abstract":"<div><div>Colored traveling salesman problem (CTSP) can be applied to Multi-machine Engineering Systems (MES) in industry, colored balanced traveling salesman problem (CBTSP) is a variant of CTSP, which can be used to model the optimization problems with partially overlapped workspace such as the planning optimization (For example, process planning, assembly planning, productions scheduling). The traditional algorithms have been used to solve CBTSP, however, they are limited both in solution quality and solving speed, and the scale of CBTSP is also restricted. Moreover, the traditional algorithms still have the problems such as lacking theoretical support of mathematical physics. In order to improve these, this paper proposes a novel hybrid genetic algorithm (NHGA) based on Wiener process (ITÖ process) and generating neighborhood solution (GNS) to solve multi-scale CBTSP problem. NHGA firstly uses dual-chromosome coding to construct the solutions of CBTSP, then they are updated by the crossover operator, mutation operator and GNS. The crossover length of the crossover operator and the city number of the mutation operator are controlled by activity intensity based on ITÖ process, while the city keeping probability of GNS can be learned or obtained by Wiener process. The experiments show that NHGA can demonstrate an improvement over the state-of-art algorithms for multi-scale CBTSP in term of solution quality.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125610"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic layout-guided diffusion model for high-fidelity image synthesis in ‘The Thousand Li of Rivers and Mountains’ 江山千里》中用于高保真图像合成的语义布局指导扩散模型
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.eswa.2024.125645
Rui Yang , Kaoru Ota , Mianxiong Dong , Xiaojun Wu
“The Thousand Li of Rivers and Mountains” (TLRM) is one of the most famous traditional paintings. Even experienced artists face significant challenges in replicating TLRM. Utilizing semantic layout-guided diffusion models offers a novel approach to this task; however, synthesizing high-quality images with consistent semantics and layout is difficult due to TLRM’s unique characteristics. In this paper, we propose a novel diffusion model-based framework guided by semantic layout to generate images tailored to TLRM’s distinct features. We introduce the layout enhanced map and latent layout injection strategies, which improve semantic fidelity and color distribution. These innovations are integrated into our semantic latent diffusion model for effective semantically-guided image generation. To address the training challenges posed by a single large-scale image, we employ specialized data augmentation techniques, facilitating the generation of one-hot semantic layout representations and ensuring continuous training until convergence. Additionally, we developed the TLRM-dataset for semantic image synthesis, which enhances visual quality and semantic consistency. Experimental results and user surveys demonstrate that our model can produce high-fidelity and diversified results, offering significant potential for art education, digital preservation, and cultural heritage. Our code is publicly available at https://github.com/rane7/TLRM-SLDM.
"千里江山图》(TLRM)是最著名的传统绘画之一。即使是经验丰富的艺术家在复制《千里江山图》时也面临着巨大的挑战。然而,由于《千里江山图》的独特性,合成具有一致语义和布局的高质量图像非常困难。在本文中,我们提出了一种以语义布局为指导、基于扩散模型的新型框架,以生成符合 TLRM 独特特征的图像。我们引入了布局增强地图和潜在布局注入策略,从而提高了语义保真度和色彩分布。这些创新被整合到我们的语义潜在扩散模型中,从而实现有效的语义引导图像生成。为了应对单张大规模图像带来的训练挑战,我们采用了专门的数据增强技术,促进了单次语义布局表征的生成,并确保持续训练直至收敛。此外,我们还开发了用于语义图像合成的 TLRM 数据集,从而提高了视觉质量和语义一致性。实验结果和用户调查表明,我们的模型可以生成高保真和多样化的结果,为艺术教育、数字保存和文化遗产提供了巨大的潜力。我们的代码可通过 https://github.com/rane7/TLRM-SLDM 公开获取。
{"title":"Semantic layout-guided diffusion model for high-fidelity image synthesis in ‘The Thousand Li of Rivers and Mountains’","authors":"Rui Yang ,&nbsp;Kaoru Ota ,&nbsp;Mianxiong Dong ,&nbsp;Xiaojun Wu","doi":"10.1016/j.eswa.2024.125645","DOIUrl":"10.1016/j.eswa.2024.125645","url":null,"abstract":"<div><div>“The Thousand Li of Rivers and Mountains” (TLRM) is one of the most famous traditional paintings. Even experienced artists face significant challenges in replicating TLRM. Utilizing semantic layout-guided diffusion models offers a novel approach to this task; however, synthesizing high-quality images with consistent semantics and layout is difficult due to TLRM’s unique characteristics. In this paper, we propose a novel diffusion model-based framework guided by semantic layout to generate images tailored to TLRM’s distinct features. We introduce the layout enhanced map and latent layout injection strategies, which improve semantic fidelity and color distribution. These innovations are integrated into our semantic latent diffusion model for effective semantically-guided image generation. To address the training challenges posed by a single large-scale image, we employ specialized data augmentation techniques, facilitating the generation of one-hot semantic layout representations and ensuring continuous training until convergence. Additionally, we developed the TLRM-dataset for semantic image synthesis, which enhances visual quality and semantic consistency. Experimental results and user surveys demonstrate that our model can produce high-fidelity and diversified results, offering significant potential for art education, digital preservation, and cultural heritage. Our code is publicly available at <span><span>https://github.com/rane7/TLRM-SLDM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125645"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCENet: A geometric correspondence estimation network for tracking and loop detection in visual–inertial SLAM GCENet:用于视觉惯性 SLAM 跟踪和环路检测的几何对应估计网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.eswa.2024.125659
Jichao Zhou, Jiwei Shen, Shujing Lyu, Yue Lu
Establishing robust and effective data correlation has been one of the core problems in visual based SLAM (Simultaneous Localization and Mapping). In this paper, we propose a geometric correspondence estimation network, GCENet, tailored for visual tracking and loop detection in visual–inertial SLAM. GCENet considers both local and global correlation in frames, enabling deep feature matching in scenarios involving noticeable displacement. Building upon this, we introduce a tightly-coupled visual–inertial state estimation system. To address challenges in extreme environments, such as strong illumination and weak texture, where manual feature matching tends to fail, a compensatory deep optical flow tracker is incorporated into our system. In such cases, our approach utilizes GCENet for dense optical flow tracking, replacing manual pipelines to conduct visual tracking. Furthermore, a deep loop detector based on GCENet is constructed, which utilizes estimated flow to represent scene similarity. Spatial consistency discrimination on candidate loops is conducted with GCENet to establish long-term data association, effectively suppressing false negatives and false positives in loop closure. Dedicated experiments are conducted in EuRoC drone, TUM-4Seasons and private robot datasets to evaluate the proposed method. The results demonstrate that our system exhibits superior robustness and accuracy in extreme environments compared to the state-of-the-art methods.
建立稳健有效的数据相关性一直是基于视觉的 SLAM(同步定位与绘图)的核心问题之一。在本文中,我们提出了一种几何对应估计网络 GCENet,专门用于视觉惯性 SLAM 中的视觉跟踪和环路检测。GCENet 考虑了帧中的局部和全局相关性,可在涉及明显位移的情况下进行深度特征匹配。在此基础上,我们引入了一个紧密耦合的视觉-惯性状态估计系统。在强光照和弱纹理等极端环境下,人工特征匹配往往会失败,为了应对这些挑战,我们在系统中加入了补偿性深度光流跟踪器。在这种情况下,我们的方法利用 GCENet 进行密集光流跟踪,取代人工管道进行视觉跟踪。此外,我们还构建了基于 GCENet 的深度环路检测器,该检测器利用估计的光流来表示场景的相似性。利用 GCENet 对候选环路进行空间一致性判别,以建立长期数据关联,从而有效抑制环路闭合中的假阴性和假阳性。我们在 EuRoC 无人机、TUM-4Seasons 和私人机器人数据集中进行了专门实验,以评估所提出的方法。结果表明,与最先进的方法相比,我们的系统在极端环境中表现出更高的鲁棒性和准确性。
{"title":"GCENet: A geometric correspondence estimation network for tracking and loop detection in visual–inertial SLAM","authors":"Jichao Zhou,&nbsp;Jiwei Shen,&nbsp;Shujing Lyu,&nbsp;Yue Lu","doi":"10.1016/j.eswa.2024.125659","DOIUrl":"10.1016/j.eswa.2024.125659","url":null,"abstract":"<div><div>Establishing robust and effective data correlation has been one of the core problems in visual based SLAM (Simultaneous Localization and Mapping). In this paper, we propose a geometric correspondence estimation network, GCENet, tailored for visual tracking and loop detection in visual–inertial SLAM. GCENet considers both local and global correlation in frames, enabling deep feature matching in scenarios involving noticeable displacement. Building upon this, we introduce a tightly-coupled visual–inertial state estimation system. To address challenges in extreme environments, such as strong illumination and weak texture, where manual feature matching tends to fail, a compensatory deep optical flow tracker is incorporated into our system. In such cases, our approach utilizes GCENet for dense optical flow tracking, replacing manual pipelines to conduct visual tracking. Furthermore, a deep loop detector based on GCENet is constructed, which utilizes estimated flow to represent scene similarity. Spatial consistency discrimination on candidate loops is conducted with GCENet to establish long-term data association, effectively suppressing false negatives and false positives in loop closure. Dedicated experiments are conducted in EuRoC drone, TUM-4Seasons and private robot datasets to evaluate the proposed method. The results demonstrate that our system exhibits superior robustness and accuracy in extreme environments compared to the state-of-the-art methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125659"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YOLO-ResTinyECG: ECG-based lightweight embedded AI arrhythmia small object detector with pruning methods YOLO-ResTinyECG:基于心电图的轻量级嵌入式人工智能心律失常小目标检测器与剪枝方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.eswa.2024.125691
You-Liang Xie , Che-Wei Lin

Objective

This study presents a YOLO-ResTinyECG, a novel model for small object detection in electrocardiogram (ECG) images, emphasizing longer time-window lengths for improving the throughput of arrhythmia detection. Methods: The proposed ResTinyECG backbone consists of four-stage Res-blocks (2/4/3/2) with input ECG image sizes of 320 × 320 × 1 pixels and 160 × 160 × 1 pixels, with a significant parameter reduction of around 20 %/95 %/74 % compared to existing backbones like Shufflenet-v2/CSPDarknet53-tiny and the latest YOLOv7-tiny structure. Moreover, this study applied magnitude-based filter pruning and dependency graph (DepGraph) pruning for parameter optimization. The ECG images undergo processing with time-window lengths of 5/10/15/20 s, accompanied by min–max normalization, and weighted loss is applied for class balance during learning, and non-maximum suppression with edge removal is used to compute and select the final output bounding boxes of ECG heartbeats. Experiments: Experiments conducted on the PhysioNet MIT-BIH arrhythmia ECG database focus on nine classes, including normal (N), atrial premature beat (A), ventricular premature beat (V), left bundle branch block (L), right bundle branch block (R), fusion of ventricular beat (FVN), paced beat (P), fusion of paced and normal beat (FPN), and others (remaining heartbeats). Results: The proposed ResTinyECG-320 exhibits impressive mean Average Precision (mAP) scores of 94.76 %/94.85 %/93.96 %/87.12 % in 6-class detection and 92.35 %/91.96 %/90.58 %/83.34 % in 9-class detection across different time-window lengths. The model demonstrates only a marginal 7 %∼9% decrease in performance when utilizing longer time-window lengths. After applying magnitude-based filter pruning, ResTinyECG can be reduced to a minimal 0.1 million parameters. Furthermore, ResTinyECG-160 achieves a competitive mAP of 93.92 % in 6-class detection and a faster processing speed of 100.6 ECG segments per second (SPS) on a PC compared to Shufflenet-v2 (77.3 SPS). In conclusion, YOLO-ResTinyECG surpasses existing backbones, exhibiting superior mAP in both 6/9-class detection scenarios, and its deployment on an embedded AI platform validates its real-time detection capabilities.
目的 本研究提出了一种用于检测心电图(ECG)图像中的小目标的新型模型 YOLO-ResTinyECG,强调通过延长时间窗口长度来提高心律失常检测的吞吐量。方法:与现有的 Shufflenet-v2/CSPDarknet53-tiny 和最新的 YOLOv7-tiny 结构等骨干网相比,ResTinyECG 骨干网的参数显著降低了约 20%/95%/74%。此外,这项研究还应用了基于幅度的滤波器剪枝和依赖图(DepGraph)剪枝来优化参数。心电图图像经过时间窗长度为 5/10/15/20 秒的处理,同时进行最小-最大归一化,在学习过程中应用加权损失进行类平衡,并使用非最大抑制和边缘去除来计算和选择心电图心搏的最终输出边界框。实验在 PhysioNet MIT-BIH 心律失常心电图数据库上进行的实验主要针对九个类别,包括正常(N)、房性早搏(A)、室性早搏(V)、左束支传导阻滞(L)、右束支传导阻滞(R)、心室搏动融合(FVN)、起搏(P)、起搏与正常搏动融合(FPN)以及其他(剩余心跳)。结果所提出的 ResTinyECG-320 在不同时间窗长度的 6 级检测中显示出令人印象深刻的平均精度 (mAP) 分数,分别为 94.76 %/94.85 %/93.96 %/87.12 %,在 9 级检测中显示出令人印象深刻的平均精度 (mAP) 分数,分别为 92.35 %/91.96 %/90.58 %/83.34 %。当使用更长的时间窗口长度时,该模型的性能仅略微下降 7%∼9%。在应用基于幅度的滤波器剪枝后,ResTinyECG 可以减少到最小的 0.1 百万个参数。此外,ResTinyECG-160 在 6 类检测中实现了 93.92 % 的具有竞争力的 mAP,与 Shufflenet-v2(77.3 SPS)相比,其在 PC 上的处理速度更快,达到每秒 100.6 个心电图片段 (SPS)。总之,YOLO-ResTinyECG 超越了现有的骨干网,在 6/9 级检测场景中都表现出了卓越的 mAP,其在嵌入式人工智能平台上的部署验证了其实时检测能力。
{"title":"YOLO-ResTinyECG: ECG-based lightweight embedded AI arrhythmia small object detector with pruning methods","authors":"You-Liang Xie ,&nbsp;Che-Wei Lin","doi":"10.1016/j.eswa.2024.125691","DOIUrl":"10.1016/j.eswa.2024.125691","url":null,"abstract":"<div><h3>Objective</h3><div>This study presents a YOLO-ResTinyECG, a novel model for small object detection in electrocardiogram (ECG) images, emphasizing longer time-window lengths for improving the throughput of arrhythmia detection. <strong><em>Methods</em></strong>: The proposed ResTinyECG backbone consists of four-stage Res-blocks (2/4/3/2) with input ECG image sizes of 320 × 320 × 1 pixels and 160 × 160 × 1 pixels, with a significant parameter reduction of around 20 %/95 %/74 % compared to existing backbones like Shufflenet-v2/CSPDarknet53-tiny and the latest YOLOv7-tiny structure. Moreover, this study applied magnitude-based filter pruning and dependency graph (DepGraph) pruning for parameter optimization. The ECG images undergo processing with time-window lengths of 5/10/15/20 s, accompanied by min–max normalization, and weighted loss is applied for class balance during learning, and non-maximum suppression with edge removal is used to compute and select the final output bounding boxes of ECG heartbeats. <strong><em>Experiments</em></strong>: Experiments conducted on the PhysioNet MIT-BIH arrhythmia ECG database focus on nine classes, including normal (N), atrial premature beat (A), ventricular premature beat (V), left bundle branch block (L), right bundle branch block (R), fusion of ventricular beat (FVN), paced beat (P), fusion of paced and normal beat (FPN), and others (remaining heartbeats). <strong><em>Results</em></strong>: The proposed ResTinyECG-320 exhibits impressive mean Average Precision (mAP) scores of 94.76 %/94.85 %/93.96 %/87.12 % in 6-class detection and 92.35 %/91.96 %/90.58 %/83.34 % in 9-class detection across different time-window lengths. The model demonstrates only a marginal 7 %∼9% decrease in performance when utilizing longer time-window lengths. After applying magnitude-based filter pruning, ResTinyECG can be reduced to a minimal 0.1 million parameters. Furthermore, ResTinyECG-160 achieves a competitive mAP of 93.92 % in 6-class detection and a faster processing speed of 100.6 ECG segments per second (SPS) on a PC compared to Shufflenet-v2 (77.3 SPS). In conclusion, YOLO-ResTinyECG surpasses existing backbones, exhibiting superior mAP in both 6/9-class detection scenarios, and its deployment on an embedded AI platform validates its real-time detection capabilities.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125691"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic information CE-DCVSI:基于双通道视觉语义信息协同增强的多模态关系提取
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.eswa.2024.125608
Yunchao Gong , Xueqiang Lv , Zhu Yuan , Feng Hu , Zangtai Cai , Yuzhong Chen , Zhaojun Wang , Xindong You
Visual information implied by the images in multimodal relation extraction (MRE) usually contains details that are difficult to describe in text sentences. Integrating textual and visual information is the mainstream method to enhance the understanding and extraction of relations between entities. However, existing MRE methods neglect the semantic gap caused by data heterogeneity. Besides, some approaches map the relations between target objects in image scene graphs to text, but massive invalid visual relations introduce noise. To alleviate the above problems, we propose a novel multimodal relation extraction method based on cooperative enhancement of dual-channel visual semantic information (CE-DCVSI). Specifically, to mitigate the semantic gap between modalities, we realize fine-grained semantic alignment between entities and target objects through multimodal heterogeneous graphs, aligning feature representations of different modalities into the same semantic space using the heterogeneous graph Transformer, thus promoting the consistency and accuracy of feature representations. To eliminate the effect of useless visual relations, we perform multi-scale feature fusion between different levels of visual information and textual representations to increase the complementarity between features, improving the comprehensiveness and robustness of the multimodal representation. Finally, we utilize the information bottleneck principle to filter out invalid information from the multimodal representation to mitigate the negative impact of irrelevant noise. The experiments demonstrate that the method achieves 86.08% of the F1 score on the publicly available MRE dataset, which outperforms other baseline methods.
在多模态关系提取(MRE)中,图像所隐含的视觉信息通常包含难以用文本句子描述的细节。整合文本和视觉信息是增强实体间关系理解和提取的主流方法。然而,现有的 MRE 方法忽视了数据异质性造成的语义差距。此外,有些方法将图像场景图中目标对象之间的关系映射到文本中,但大量无效的视觉关系会带来噪声。为了解决上述问题,我们提出了一种基于双通道视觉语义信息协同增强(CE-DCVSI)的新型多模态关系提取方法。具体来说,为了缓解模态之间的语义差距,我们通过多模态异构图实现了实体与目标对象之间的细粒度语义对齐,利用异构图变换器将不同模态的特征表征对齐到同一语义空间,从而提高了特征表征的一致性和准确性。为了消除无用视觉关系的影响,我们在不同层次的视觉信息和文本表征之间进行多尺度特征融合,以增加特征之间的互补性,提高多模态表征的全面性和鲁棒性。最后,我们利用信息瓶颈原理过滤掉多模态表征中的无效信息,以减轻无关噪声的负面影响。实验证明,该方法在公开的 MRE 数据集上获得了 86.08% 的 F1 分数,优于其他基线方法。
{"title":"CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic information","authors":"Yunchao Gong ,&nbsp;Xueqiang Lv ,&nbsp;Zhu Yuan ,&nbsp;Feng Hu ,&nbsp;Zangtai Cai ,&nbsp;Yuzhong Chen ,&nbsp;Zhaojun Wang ,&nbsp;Xindong You","doi":"10.1016/j.eswa.2024.125608","DOIUrl":"10.1016/j.eswa.2024.125608","url":null,"abstract":"<div><div>Visual information implied by the images in multimodal relation extraction (MRE) usually contains details that are difficult to describe in text sentences. Integrating textual and visual information is the mainstream method to enhance the understanding and extraction of relations between entities. However, existing MRE methods neglect the semantic gap caused by data heterogeneity. Besides, some approaches map the relations between target objects in image scene graphs to text, but massive invalid visual relations introduce noise. To alleviate the above problems, we propose a novel multimodal relation extraction method based on cooperative enhancement of dual-channel visual semantic information (CE-DCVSI). Specifically, to mitigate the semantic gap between modalities, we realize fine-grained semantic alignment between entities and target objects through multimodal heterogeneous graphs, aligning feature representations of different modalities into the same semantic space using the heterogeneous graph Transformer, thus promoting the consistency and accuracy of feature representations. To eliminate the effect of useless visual relations, we perform multi-scale feature fusion between different levels of visual information and textual representations to increase the complementarity between features, improving the comprehensiveness and robustness of the multimodal representation. Finally, we utilize the information bottleneck principle to filter out invalid information from the multimodal representation to mitigate the negative impact of irrelevant noise. The experiments demonstrate that the method achieves 86.08% of the F1 score on the publicly available MRE dataset, which outperforms other baseline methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125608"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of gene regulatory networks associated with breast cancer patient survival using an interpretable deep neural network model 利用可解释的深度神经网络模型识别与乳腺癌患者生存相关的基因调控网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.eswa.2024.125632
Xue Wang , Vivekananda Sarangi , Daniel P. Wickland , Shaoyu Li , Duan Chen , E. Aubrey Thompson , Garrett Jenkinson , Yan W. Asmann
Artificial neural networks have recently gained significant attention in biomedical research. However, their utility in survival analysis still faces many challenges. In addition to designing models for high accuracy, it is essential to optimize models that provide biologically meaningful insights. With these considerations in mind, we developed a deep neural network model, MaskedNet, to identify genes and pathways whose expression at the time of diagnosis is associated with overall survival. MaskedNet was trained using TCGA breast cancer transcriptome and clinical data, and the model’s final output was the predicted logarithm of the hazard ratio for death. The trained model was interpreted using SHapley Additive exPlanations (SHAP), a technique grounded in robust mathematical principles that assigns importance scores to input features. Compared to traditional Cox proportional hazards regression, MaskedNet had higher accuracy, as measured by Harrell’s C-index. We also found that aggregating outputs from several model runs identified multiple genes and pathways associated with overall survival, including IFNG and PIK3CA genes, along with their related pathways. To further elucidate the role of the IFNG gene, tumors were partitioned into two groups based on low and high IFNG SHAP values, respectively. Tumors with lower IFNG SHAP values exhibited higher IFNG expression and better overall survival, which were linked to more abundant presence of M1 macrophages and activated CD4+ and CD8+ T cells in the tumor microenvironment. The association of the IFNG pathway with overall survival was validated in the trastuzumab arm of the NCCTG-N9831 trial, an independent breast cancer study.
最近,人工神经网络在生物医学研究领域获得了极大关注。然而,它们在生存分析中的应用仍然面临着许多挑战。除了要设计出高精度的模型外,还必须优化能提供有生物学意义的见解的模型。考虑到这些因素,我们开发了一种深度神经网络模型--MaskedNet,用于识别诊断时表达与总生存期相关的基因和通路。我们使用 TCGA 乳腺癌转录组和临床数据对 MaskedNet 进行了训练,模型的最终输出是预测的死亡危险比对数。训练后的模型使用 SHapley Additive exPlanations(SHAP)进行解释,SHAP 是一种基于稳健数学原理的技术,它为输入特征分配重要性分数。与传统的 Cox 正比危险回归相比,MaskedNet 的准确性更高,这可以用 Harrell 的 C 指数来衡量。我们还发现,汇总多个模型运行的输出结果,可以发现与总生存率相关的多个基因和通路,包括 IFNG 和 PIK3CA 基因及其相关通路。为进一步阐明IFNG基因的作用,根据IFNG SHAP值的高低将肿瘤分为两组。IFNG SHAP值较低的肿瘤表现出较高的IFNG表达和较好的总生存率,这与肿瘤微环境中存在较多的M1巨噬细胞和活化的CD4+和CD8+T细胞有关。IFNG通路与总生存期的关系在一项独立的乳腺癌研究--NCCTG-N9831试验的曲妥珠单抗组中得到了验证。
{"title":"Identification of gene regulatory networks associated with breast cancer patient survival using an interpretable deep neural network model","authors":"Xue Wang ,&nbsp;Vivekananda Sarangi ,&nbsp;Daniel P. Wickland ,&nbsp;Shaoyu Li ,&nbsp;Duan Chen ,&nbsp;E. Aubrey Thompson ,&nbsp;Garrett Jenkinson ,&nbsp;Yan W. Asmann","doi":"10.1016/j.eswa.2024.125632","DOIUrl":"10.1016/j.eswa.2024.125632","url":null,"abstract":"<div><div>Artificial neural networks have recently gained significant attention in biomedical research. However, their utility in survival analysis still faces many challenges. In addition to designing models for high accuracy, it is essential to optimize models that provide biologically meaningful insights. With these considerations in mind, we developed a deep neural network model, MaskedNet, to identify genes and pathways whose expression at the time of diagnosis is associated with overall survival. MaskedNet was trained using TCGA breast cancer transcriptome and clinical data, and the model’s final output was the predicted logarithm of the hazard ratio for death. The trained model was interpreted using SHapley Additive exPlanations (SHAP), a technique grounded in robust mathematical principles that assigns importance scores to input features. Compared to traditional Cox proportional hazards regression, MaskedNet had higher accuracy, as measured by Harrell’s C-index. We also found that aggregating outputs from several model runs identified multiple genes and pathways associated with overall survival, including <em>IFNG</em> and <em>PIK3CA</em> genes<em>,</em> along with their related pathways. To further elucidate the role of the <em>IFNG</em> gene, tumors were partitioned into two groups based on low and high <em>IFNG</em> SHAP values, respectively. Tumors with lower <em>IFNG</em> SHAP values exhibited higher <em>IFNG</em> expression and better overall survival, which were linked to more abundant presence of M1 macrophages and activated CD4+ and CD8+ T cells in the tumor microenvironment. The association of the <em>IFNG</em> pathway with overall survival was validated in the trastuzumab arm of the NCCTG-N9831 trial, an independent breast cancer study.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125632"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Expert Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1