首页 > 最新文献

Information Processing & Management最新文献

英文 中文
A hybrid feature fusion deep learning framework for multi-source medical image analysis 用于多源医学图像分析的混合特征融合深度学习框架
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-20 DOI: 10.1016/j.ipm.2024.103934
Qiang Cao , Xian Cheng
Despite the widespread adoption of deep learning to enhance image classification, significant obstacles remain. First, multisource data with diverse sizes and formats is a great challenge for most current deep learning models. Second, lacking manual labeled data for model training limits the application of deep learning. Third, the widely used CNN-based methods shows their limitations in extracting global features and yield poor performance for image topology. To address these issues, we propose a Hybrid Feature Fusion Deep Learning (HFFDL) framework for image classification. This framework consists of an automated image segmentation module, a two-stream backbone module, and a classification module. The automatic image segmentation module utilizes the U-Net model and transfer learning to detect region of interest (ROI) in multisource images; the two-stream backbone module integrates the Swin Transformer architecture with the Inception CNN, with the aim of simultaneous extracting local and global features for efficient representation learning. We evaluate the performance of HFFDL framework with two publicly available image datasets: one for identifying COVID-19 through X-ray scans of the chest (30,386 images), and another for multiclass skin cancer screening using dermoscopy images (25,331 images). The HFFDL framework exhibited greater performance in comparison to many cutting-edge models, achieving the AUC score 0.9835 and 0.8789, respectively. Furthermore, a practical application study conducted in a hospital, identifying viable embryos using medical images, revealed the HFFDL framework outperformed embryologists.
尽管深度学习已被广泛应用于增强图像分类,但仍存在重大障碍。首先,对于目前大多数深度学习模型来说,不同大小和格式的多源数据是一个巨大的挑战。其次,缺乏用于模型训练的人工标注数据限制了深度学习的应用。第三,广泛使用的基于 CNN 的方法在提取全局特征方面存在局限性,在图像拓扑方面表现不佳。为了解决这些问题,我们提出了一种用于图像分类的混合特征融合深度学习(HFFDL)框架。该框架由自动图像分割模块、双流骨干模块和分类模块组成。自动图像分割模块利用 U-Net 模型和迁移学习来检测多源图像中的感兴趣区域(ROI);双流骨干模块集成了 Swin Transformer 架构和 Inception CNN,旨在同时提取局部和全局特征,以实现高效的表征学习。我们用两个公开的图像数据集评估了 HFFDL 框架的性能:一个数据集用于通过胸部 X 光扫描(30,386 幅图像)识别 COVID-19,另一个数据集用于使用皮肤镜图像(25,331 幅图像)进行多类皮肤癌筛查。与许多前沿模型相比,HFFDL 框架表现出更高的性能,AUC 分别达到 0.9835 和 0.8789。此外,一项在医院进行的实际应用研究显示,HFFDL 框架在利用医学图像识别存活胚胎方面的表现优于胚胎学家。
{"title":"A hybrid feature fusion deep learning framework for multi-source medical image analysis","authors":"Qiang Cao ,&nbsp;Xian Cheng","doi":"10.1016/j.ipm.2024.103934","DOIUrl":"10.1016/j.ipm.2024.103934","url":null,"abstract":"<div><div>Despite the widespread adoption of deep learning to enhance image classification, significant obstacles remain. First, multisource data with diverse sizes and formats is a great challenge for most current deep learning models. Second, lacking manual labeled data for model training limits the application of deep learning. Third, the widely used CNN-based methods shows their limitations in extracting global features and yield poor performance for image topology. To address these issues, we propose a Hybrid Feature Fusion Deep Learning (HFFDL) framework for image classification. This framework consists of an automated image segmentation module, a two-stream backbone module, and a classification module. The automatic image segmentation module utilizes the U-Net model and transfer learning to detect region of interest (ROI) in multisource images; the two-stream backbone module integrates the Swin Transformer architecture with the Inception CNN, with the aim of simultaneous extracting local and global features for efficient representation learning. We evaluate the performance of HFFDL framework with two publicly available image datasets: one for identifying COVID-19 through X-ray scans of the chest (30,386 images), and another for multiclass skin cancer screening using dermoscopy images (25,331 images). The HFFDL framework exhibited greater performance in comparison to many cutting-edge models, achieving the AUC score 0.9835 and 0.8789, respectively. Furthermore, a practical application study conducted in a hospital, identifying viable embryos using medical images, revealed the HFFDL framework outperformed embryologists.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining communication network behaviors, structure and dynamics in an organizational hierarchy: A social network analysis approach 研究组织层级中的通信网络行为、结构和动态:社会网络分析方法
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-19 DOI: 10.1016/j.ipm.2024.103927
Tao Wen , Yu-wang Chen , Tahir Abbas Syed , Darminder Ghataoura
Effectively understanding and enhancing communication flows among employees within an organizational hierarchy is crucial for optimizing operational and decision-making efficiency. To fill this significant gap in research, we propose a systematic and comprehensive social network analysis approach, coupled with a newly formulated communication vector and matrix, to examine communication behaviors and dynamics in an organizational hierarchy. We use the Enron email dataset, consisting of 619,499 emails, as an illustrative example to bridge the micro-macro divide of organizational communication research. A series of centrality measures are employed to evaluate the influential ability of individual employees, revealing descending influential ability and changing behaviors according to hierarchy. We also uncover that employees tend to communicate within the same functional teams through the identification of community structure and the proposed communication matrix. Furthermore, the emergent dynamics of organizational communication during a crisis are examined through a time-segmented dataset, showcasing the progressive absence of the legal team, the responsibility of top management, and the presence of hierarchy. By considering both individual and organizational perspectives, our work provides a systematic and data-driven approach to understanding how the organizational communication network emerges dynamically from individual communication behaviors within the hierarchy, which has the potential to enhance operational and decision-making efficiency within organizations.
有效了解和加强组织层级中员工之间的沟通对于优化运营和决策效率至关重要。为了填补这一重大研究空白,我们提出了一种系统而全面的社会网络分析方法,并结合新制定的沟通向量和矩阵,来研究组织层级中的沟通行为和动态。我们以由 619,499 封电子邮件组成的安然电子邮件数据集为例,说明如何弥合组织沟通研究的微观-宏观鸿沟。我们采用了一系列中心度量来评估单个员工的影响能力,揭示了不同层级的员工影响能力和行为变化。我们还通过确定社区结构和提出沟通矩阵,发现员工倾向于在同一职能团队内进行沟通。此外,我们还通过分时数据集研究了危机期间组织沟通的突发动态,展示了法律团队的逐步缺失、高层管理者的责任以及等级制度的存在。通过考虑个人和组织两个视角,我们的工作提供了一种系统化和数据驱动的方法,用于理解组织沟通网络是如何从个人在层级中的沟通行为中动态产生的,这有可能提高组织内的运营和决策效率。
{"title":"Examining communication network behaviors, structure and dynamics in an organizational hierarchy: A social network analysis approach","authors":"Tao Wen ,&nbsp;Yu-wang Chen ,&nbsp;Tahir Abbas Syed ,&nbsp;Darminder Ghataoura","doi":"10.1016/j.ipm.2024.103927","DOIUrl":"10.1016/j.ipm.2024.103927","url":null,"abstract":"<div><div>Effectively understanding and enhancing communication flows among employees within an organizational hierarchy is crucial for optimizing operational and decision-making efficiency. To fill this significant gap in research, we propose a systematic and comprehensive social network analysis approach, coupled with a newly formulated communication vector and matrix, to examine communication behaviors and dynamics in an organizational hierarchy. We use the Enron email dataset, consisting of 619,499 emails, as an illustrative example to bridge the micro-macro divide of organizational communication research. A series of centrality measures are employed to evaluate the influential ability of individual employees, revealing descending influential ability and changing behaviors according to hierarchy. We also uncover that employees tend to communicate within the same functional teams through the identification of community structure and the proposed communication matrix. Furthermore, the emergent dynamics of organizational communication during a crisis are examined through a time-segmented dataset, showcasing the progressive absence of the legal team, the responsibility of top management, and the presence of hierarchy. By considering both individual and organizational perspectives, our work provides a systematic and data-driven approach to understanding how the organizational communication network emerges dynamically from individual communication behaviors within the hierarchy, which has the potential to enhance operational and decision-making efficiency within organizations.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying the degree of scientific innovation breakthrough: Considering knowledge trajectory change and impact 量化科学创新突破的程度:考虑知识轨迹的变化和影响
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-19 DOI: 10.1016/j.ipm.2024.103933
Lin Runhui , Li Yalin , Ji Ze , Xie Qiqi , Chen Xiaoyu
Scientific breakthroughs have the potential to reshape the trajectory of knowledge flow and significantly impact later research. The aim of this study is to introduce the Degree of Innovation Breakthrough (DIB) metric to more accurately quantify the extent of scientific breakthroughs. The DIB metric takes into account changes in the trajectory of knowledge flow, as well as the deep and width of impact, and it modifies the traditional assumption of equal citation contributions by assigning weighted citation counts. The effectiveness of the DIB metric is assessed using ROC curves and AUC metrics, demonstrating its ability to differentiate between high and low scientific breakthroughs with high sensitivity and minimal false positives. Based on ROC curves, this study proposes a method to calculate the threshold for high scientific breakthrough, reducing subjectivity. The effectiveness of the proposed method is demonstrated through a dataset consisting of 1108 award-winning computer science papers and 9832 matched control papers, showing that the DIB metric surpasses single-dimensional metrics. The study also performs a granular analysis of the innovation breakthrough degree of non-award-winning papers, categorizing them into four types based on originality and impact through 2D histogram visualization, and suggests tailored management strategies. Through the adoption of this refined classification strategy, the management of innovation practices can be optimized, ultimately fostering the enhancement of innovative research outcomes. The quantitative tools introduced in this paper offer guidance for researchers in the fields of science intelligence mining and science trend prediction.
科学突破有可能重塑知识流动的轨迹,并对后来的研究产生重大影响。本研究旨在引入 "创新突破度"(DIB)指标,以更准确地量化科学突破的程度。DIB 指标考虑了知识流动轨迹的变化以及影响的深度和广度,并通过分配加权引文次数修改了传统的等量引文贡献假设。我们使用 ROC 曲线和 AUC 指标对 DIB 指标的有效性进行了评估,结果表明该指标能够以较高的灵敏度和最小的误报率区分高科学突破和低科学突破。基于 ROC 曲线,本研究提出了一种计算高科学突破阈值的方法,减少了主观性。通过一个由 1108 篇获奖计算机科学论文和 9832 篇匹配对照论文组成的数据集,证明了所提方法的有效性,表明 DIB 指标超越了单维指标。研究还对非获奖论文的创新突破程度进行了细化分析,通过二维直方图可视化将非获奖论文根据原创性和影响力分为四种类型,并提出了有针对性的管理策略。通过采用这种精细化分类策略,可以优化创新实践管理,最终促进创新研究成果的提升。本文介绍的定量工具为科学情报挖掘和科学趋势预测领域的研究人员提供了指导。
{"title":"Quantifying the degree of scientific innovation breakthrough: Considering knowledge trajectory change and impact","authors":"Lin Runhui ,&nbsp;Li Yalin ,&nbsp;Ji Ze ,&nbsp;Xie Qiqi ,&nbsp;Chen Xiaoyu","doi":"10.1016/j.ipm.2024.103933","DOIUrl":"10.1016/j.ipm.2024.103933","url":null,"abstract":"<div><div>Scientific breakthroughs have the potential to reshape the trajectory of knowledge flow and significantly impact later research. The aim of this study is to introduce the Degree of Innovation Breakthrough (DIB) metric to more accurately quantify the extent of scientific breakthroughs. The DIB metric takes into account changes in the trajectory of knowledge flow, as well as the deep and width of impact, and it modifies the traditional assumption of equal citation contributions by assigning weighted citation counts. The effectiveness of the DIB metric is assessed using ROC curves and AUC metrics, demonstrating its ability to differentiate between high and low scientific breakthroughs with high sensitivity and minimal false positives. Based on ROC curves, this study proposes a method to calculate the threshold for high scientific breakthrough, reducing subjectivity. The effectiveness of the proposed method is demonstrated through a dataset consisting of 1108 award-winning computer science papers and 9832 matched control papers, showing that the DIB metric surpasses single-dimensional metrics. The study also performs a granular analysis of the innovation breakthrough degree of non-award-winning papers, categorizing them into four types based on originality and impact through 2D histogram visualization, and suggests tailored management strategies. Through the adoption of this refined classification strategy, the management of innovation practices can be optimized, ultimately fostering the enhancement of innovative research outcomes. The quantitative tools introduced in this paper offer guidance for researchers in the fields of science intelligence mining and science trend prediction.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triple Sparse Denoising Discriminantive Least Squares Regression for image classification 用于图像分类的三重稀疏去噪判别最小二乘回归技术
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-19 DOI: 10.1016/j.ipm.2024.103922
Jinjin Zhang, Qimeng Fan, Dingan Wang, Pu Huang, Zhangjing Yang
Discriminantive Least Squares Regression (DLSR) is an algorithm that employs ɛ-draggings techniques to enhance intra-class similarity. However, it overlooks that an increase in intra-class closeness may simultaneously lead to a decrease in the distance between similar but different classes. To address this issue, we propose a new approach called Triple Sparse Denoising Discriminantive Least Squares Regression (TSDDLSR), which combines three sparsity constraints: sparsity constraints between classes to amplify the growth of the distance between similar classes; sparsity constraints on relaxation matrices to capture more local structure; sparsity constraints on noise matrices to minimize the effect of outliers. In addition, we position the matrix decomposition step in the label space strategically with the objective of enhancing denoising capabilities, safeguarding it from potential degradation, and preserving its underlying manifold structure. Our experiments evaluate the classification performance of the method under face recognition tasks (AR, CMU PIE, Extended Yale B, Georgia Tech, FERET datasets), biometric recognition tasks (PolyU Palmprint dataset), and object recognition tasks (COIL-20, ImageNet datasets). Meanwhile, the results show that TSDDLSR significantly improves classification performance compared to existing methods.
判别最小二乘法回归(DLSR)是一种利用ɛ拖曳技术来提高类内相似度的算法。然而,它忽略了类内相似度的增加可能会同时导致相似但不同类之间距离的减小。为了解决这个问题,我们提出了一种名为三重稀疏去噪最小二乘回归(TSDDLSR)的新方法,它结合了三种稀疏性约束:类间稀疏性约束,以放大相似类间距离的增长;松弛矩阵稀疏性约束,以捕捉更多局部结构;噪声矩阵稀疏性约束,以最小化异常值的影响。此外,我们将矩阵分解步骤战略性地置于标签空间中,目的是增强去噪能力,防止潜在的退化,并保留其底层流形结构。我们的实验评估了该方法在人脸识别任务(AR、CMU PIE、Extended Yale B、Georgia Tech、FERET 数据集)、生物识别任务(PolyU Palmprint 数据集)和物体识别任务(COIL-20、ImageNet 数据集)下的分类性能。同时,研究结果表明,与现有方法相比,TSDDLSR 能显著提高分类性能。
{"title":"Triple Sparse Denoising Discriminantive Least Squares Regression for image classification","authors":"Jinjin Zhang,&nbsp;Qimeng Fan,&nbsp;Dingan Wang,&nbsp;Pu Huang,&nbsp;Zhangjing Yang","doi":"10.1016/j.ipm.2024.103922","DOIUrl":"10.1016/j.ipm.2024.103922","url":null,"abstract":"<div><div>Discriminantive Least Squares Regression (DLSR) is an algorithm that employs <span><math><mi>ɛ</mi></math></span>-draggings techniques to enhance intra-class similarity. However, it overlooks that an increase in intra-class closeness may simultaneously lead to a decrease in the distance between similar but different classes. To address this issue, we propose a new approach called Triple Sparse Denoising Discriminantive Least Squares Regression (TSDDLSR), which combines three sparsity constraints: sparsity constraints between classes to amplify the growth of the distance between similar classes; sparsity constraints on relaxation matrices to capture more local structure; sparsity constraints on noise matrices to minimize the effect of outliers. In addition, we position the matrix decomposition step in the label space strategically with the objective of enhancing denoising capabilities, safeguarding it from potential degradation, and preserving its underlying manifold structure. Our experiments evaluate the classification performance of the method under face recognition tasks (AR, CMU PIE, Extended Yale B, Georgia Tech, FERET datasets), biometric recognition tasks (PolyU Palmprint dataset), and object recognition tasks (COIL-20, ImageNet datasets). Meanwhile, the results show that TSDDLSR significantly improves classification performance compared to existing methods.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised feature selection using sparse manifold learning: Auto-encoder approach 使用稀疏流形学习的无监督特征选择:自动编码器方法
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-18 DOI: 10.1016/j.ipm.2024.103923
Amir Moslemi , Mina Jamshidi
Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.
特征选择技术被广泛用作训练机器学习算法的预处理步骤,以规避维度诅咒、过拟合和计算时间等难题。特征选择中经常使用基于投影的方法,利用提取特征之间的线性关系。在这种情况下,特征间非线性信息提取的缺失是值得注意的。虽然基于自动编码器的技术最近在特征选择中得到了广泛应用,但其重点仍主要集中在编码阶段,因为所选特征正是通过这一阶段得到的。一个微妙的问题是,自动编码器获取最具区分度特征的性能受到解码阶段的显著影响。为了应对这些挑战,本文提出了一种基于自动编码器的新型特征选择方法,不仅能提取特征间的非线性信息,还能对解码阶段进行正则化处理,从而提高算法的性能。在这项研究中,我们定义了一种新的自动编码器模型,以保留重建后接近输入数据的拓扑信息。利用拉普拉奇图在投影空间中保留输入数据的几何结构,并利用优化问题中的适当术语(重建数据的抽象拉普拉奇图)在重建空间中保留几何投影空间。保持重建数据的抽象拉普拉奇图接近输入数据的拉普拉奇图会影响特征选择的性能,我们的实验证明了这一点。因此,我们展示了一种解决相应问题目标的有效方法。由于这种方法主要用于聚类目的,我们在十个基准数据集上进行了实验,并根据聚类精度和归一化互信息(NMI)度量评估了我们的支持方法。在归一化互信息(NMI)和准确性方面,我们的方法比最近的先进技术有很大优势。
{"title":"Unsupervised feature selection using sparse manifold learning: Auto-encoder approach","authors":"Amir Moslemi ,&nbsp;Mina Jamshidi","doi":"10.1016/j.ipm.2024.103923","DOIUrl":"10.1016/j.ipm.2024.103923","url":null,"abstract":"<div><div>Feature selection techniques are widely being used as a preprocessing step to train machine learning algorithms to circumvent the curse of dimensionality, overfitting, and computation time challenges. Projection-based methods are frequently employed in feature selection, leveraging the extraction of linear relationships among features. The absence of nonlinear information extraction among features is notable in this context. While auto-encoder based techniques have recently gained traction for feature selection, their focus remains primarily on the encoding phase, as it is through this phase that the selected features are derived. The subtle point is that the performance of auto-encoder to obtain the most discriminative features is significantly affected by decoding phase. To address these challenges, in this paper, we proposed a novel feature selection based on auto-encoder to not only extracting nonlinear information among features but also decoding phase is regularized as well to enhance the performance of algorithm. In this study, we defined a new model of auto-encoder to preserve the topological information of reconstructed close to input data. To geometric structure of input data is preserved in projected space using Laplacian graph, and geometrical projected space is preserved in reconstructed space using a suitable term (abstract Laplacian graph of reconstructed data) in optimization problem. Preserving abstract Laplacian graph of reconstructed data close to Laplacian graph of input data affects the performance of feature selection and we experimentally showed this. Therefore, we show an effective approach to solve the objective of the corresponding problem. Since this approach can be mainly used for clustering aims, we conducted experiments on ten benchmark datasets and assessed our propped method based on clustering accuracy and normalized mutual information (NMI) metric. Our method obtained considerable superiority over recent state-of-the-art techniques in terms of NMI and accuracy.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EvoPath: Evolutionary meta-path discovery with large language models for complex heterogeneous information networks EvoPath:利用大型语言模型为复杂的异构信息网络发现进化元路径
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-18 DOI: 10.1016/j.ipm.2024.103920
Shixuan Liu , Haoxiang Cheng , Yunfei Wang , Yue He , Changjun Fan , Zhong Liu
Heterogeneous Information Networks (HINs) encapsulate diverse entity and relation types, with meta-paths providing essential meta-level semantics for knowledge reasoning, although their utility is constrained by discovery challenges. While Large Language Models (LLMs) offer new prospects for meta-path discovery due to their extensive knowledge encoding and efficiency, their adaptation faces challenges such as corpora bias, lexical discrepancies, and hallucination. This paper pioneers the mitigation of these challenges by presenting EvoPath, an innovative framework that leverages LLMs to efficiently identify high-quality meta-paths. EvoPath is carefully designed, with each component aimed at addressing issues that could lead to potential knowledge conflicts. With a minimal subset of HIN facts, EvoPath iteratively generates and evolves meta-paths by dynamically replaying meta-paths in the buffer with prioritization based on their scores. Comprehensive experiments on three large, complex HINs with hundreds of relations demonstrate that our framework, EvoPath, enables LLMs to generate high-quality meta-paths through effective prompting, confirming its superior performance in HIN reasoning tasks. Further ablation studies validate the effectiveness of each module within the framework.
异构信息网络(HIN)封装了各种实体和关系类型,元路径为知识推理提供了重要的元级语义,但其实用性受到发现挑战的限制。虽然大语言模型(LLM)因其广泛的知识编码和高效性为元路径发现提供了新的前景,但其适应性面临着语料偏差、词汇差异和幻觉等挑战。EvoPath 是一种利用 LLMs 高效识别高质量元路径的创新框架,本文通过介绍 EvoPath 率先缓解了这些挑战。EvoPath 经过精心设计,每个组件都旨在解决可能导致潜在知识冲突的问题。EvoPath 使用最小的 HIN 事实子集,通过动态重放缓冲区中的元路径,并根据其分数确定优先级,从而迭代生成和演化元路径。在三个包含数百个关系的大型复杂 HIN 上进行的综合实验证明,我们的框架 EvoPath 能够通过有效的提示使 LLM 生成高质量的元路径,从而证实了它在 HIN 推理任务中的卓越性能。进一步的消融研究验证了该框架中每个模块的有效性。
{"title":"EvoPath: Evolutionary meta-path discovery with large language models for complex heterogeneous information networks","authors":"Shixuan Liu ,&nbsp;Haoxiang Cheng ,&nbsp;Yunfei Wang ,&nbsp;Yue He ,&nbsp;Changjun Fan ,&nbsp;Zhong Liu","doi":"10.1016/j.ipm.2024.103920","DOIUrl":"10.1016/j.ipm.2024.103920","url":null,"abstract":"<div><div>Heterogeneous Information Networks (HINs) encapsulate diverse entity and relation types, with meta-paths providing essential meta-level semantics for knowledge reasoning, although their utility is constrained by discovery challenges. While Large Language Models (LLMs) offer new prospects for meta-path discovery due to their extensive knowledge encoding and efficiency, their adaptation faces challenges such as corpora bias, lexical discrepancies, and hallucination. This paper pioneers the mitigation of these challenges by presenting EvoPath, an innovative framework that leverages LLMs to efficiently identify high-quality meta-paths. EvoPath is carefully designed, with each component aimed at addressing issues that could lead to potential knowledge conflicts. With a minimal subset of HIN facts, EvoPath iteratively generates and evolves meta-paths by dynamically replaying meta-paths in the buffer with prioritization based on their scores. Comprehensive experiments on three large, complex HINs with hundreds of relations demonstrate that our framework, EvoPath, enables LLMs to generate high-quality meta-paths through effective prompting, confirming its superior performance in HIN reasoning tasks. Further ablation studies validate the effectiveness of each module within the framework.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A diachronic language model for long-time span classical Chinese 长时跨古典汉语的非同步语言模型
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-16 DOI: 10.1016/j.ipm.2024.103925
Yuting Wei, Meiling Li, Yangfu Zhu, Yuanxing Xu, Yuqing Li, Bin Wu
Classical Chinese literature, with its long history spanning thousands of years, serves as an invaluable resource for historical and humanistic studies. Previous classical Chinese language models have achieved significant progress in semantic understanding. However, they largely neglected the dynamic evolution of language across different historical eras. In this paper, we introduce a novel diachronic pre-trained language model tailored for classical Chinese texts. This model utilizes a time-based transformer architecture that captures the continuous evolution of semantics over time. Moreover, it adeptly balances the contextual and temporal information, minimizing semantic ambiguities from excessive time-related inputs. A high-quality diachronic corpus for classical Chinese is developed for training. This corpus spans from the pre-Qin dynasty to the Qing dynasty and includes a diverse array of genres. We validate its effectiveness by enriching a well-known classical Chinese word sense disambiguation dataset with additional temporal annotations. The results demonstrate the state-of-the-art performance of our model in discerning classical Chinese word meanings across different historical periods. Our research helps linguists to rapidly grasp the extent of semantic changes across different periods from vast corpora.1
中国古典文学有着数千年的悠久历史,是历史和人文研究的宝贵资源。以往的古汉语模型在语义理解方面取得了重大进展。然而,它们在很大程度上忽视了语言在不同历史时期的动态演变。在本文中,我们介绍了一种为古典中文文本量身定制的新型非同步预训练语言模型。该模型采用基于时间的转换器架构,可捕捉语义随时间的持续演变。此外,它还能巧妙地平衡上下文和时间信息,最大限度地减少因时间相关输入过多而产生的语义歧义。我们开发了一个高质量的古汉语异时语料库用于训练。该语料库的时间跨度从先秦到清代,包含多种体裁。我们在一个著名的古汉语词义消歧数据集上添加了额外的时间注释,从而验证了其有效性。结果表明,我们的模型在辨析不同历史时期的古汉语词义方面具有一流的性能。我们的研究有助于语言学家从庞大的语料库中快速掌握不同时期的语义变化程度1。
{"title":"A diachronic language model for long-time span classical Chinese","authors":"Yuting Wei,&nbsp;Meiling Li,&nbsp;Yangfu Zhu,&nbsp;Yuanxing Xu,&nbsp;Yuqing Li,&nbsp;Bin Wu","doi":"10.1016/j.ipm.2024.103925","DOIUrl":"10.1016/j.ipm.2024.103925","url":null,"abstract":"<div><div>Classical Chinese literature, with its long history spanning thousands of years, serves as an invaluable resource for historical and humanistic studies. Previous classical Chinese language models have achieved significant progress in semantic understanding. However, they largely neglected the dynamic evolution of language across different historical eras. In this paper, we introduce a novel diachronic pre-trained language model tailored for classical Chinese texts. This model utilizes a time-based transformer architecture that captures the continuous evolution of semantics over time. Moreover, it adeptly balances the contextual and temporal information, minimizing semantic ambiguities from excessive time-related inputs. A high-quality diachronic corpus for classical Chinese is developed for training. This corpus spans from the pre-Qin dynasty to the Qing dynasty and includes a diverse array of genres. We validate its effectiveness by enriching a well-known classical Chinese word sense disambiguation dataset with additional temporal annotations. The results demonstrate the state-of-the-art performance of our model in discerning classical Chinese word meanings across different historical periods. Our research helps linguists to rapidly grasp the extent of semantic changes across different periods from vast corpora.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
mm-FERP: An effective method for human personality prediction via mm-wave radar using facial sensing mm-FERP:利用面部传感通过毫米波雷达预测人类性格的有效方法
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-16 DOI: 10.1016/j.ipm.2024.103919
Naveed Imran , Jian Zhang , Zheng Yang , Jehad Ali
mm-FERP (millimeter wave Facial Expression Recognition for Personality) explores the use of mm-Wave radar technology, specifically the TI IWR1443, to assess personality traits based on the OCEAN model through facial expression analysis. This research uniquely combines psychological profiling with state-of-the-art technology to predict the OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) personality traits by carefully analyzing facial muscle movements collected through mm-wave radar alongside detailed questionnaire analysis. Our advanced mm-FERP system employs mm-wave radar technology for the detection and analysis of facial expressions in a manner that is both non-intrusive and privacy-centric, handling the ethical and privacy concerns associated with traditional camera-based methods. Using a convolutional neural network (CNN), mm-FERP effectively analyzes the complex patterns in mm-wave signals. This approach enables the smooth transfer of model knowledge from extensive image-based (Scalograms) datasets to the detailed understanding of mm-wave radar signals, significantly enhancing the model’s predictive accuracy and efficiency in identifying personality traits via emotional behavior. Our in-depth evaluation reveals mm-FERP’s remarkable potential to predict personality traits through emotion recognition (Neutral, Smile, Angry, Sad, Amazed) with an impressive accuracy of 97% across distances up to 0.47 m. We experiment in a controlled environment with more than 50 participants from different age groups (18–35) including males and females of different continents to train our model on different facial symmetry. Each participant gives 50 samples 10 for each expression making a total of 2500 samples. We also collected a self-assessment report from the same participants of 64 questions related to psychological behavior to validate personality by correlating it with radar signal features on question value weight (0.5–1.5). mm-FERP achieve an average score of 97.8% in precision, 97.2% in Recall, and 97.2% of F1. These results show mm-FERP’s ability as an innovative approach for psychological behavioral analysis through mm-wave emotion recognition, improving user experience design, and paving the path for interactive technologies that are both personalized and psychologically insightful.
mm-FERP(毫米波个性面部表情识别)探索使用毫米波雷达技术,特别是 TI IWR1443,通过面部表情分析评估基于 OCEAN 模型的个性特征。这项研究独特地将心理分析与最先进的技术相结合,通过仔细分析毫米波雷达收集的面部肌肉运动和详细的问卷分析,预测 OCEAN(开放性、自觉性、外向性、宜人性、神经质)人格特质。我们先进的 mm-FERP 系统采用毫米波雷达技术来检测和分析面部表情,该技术具有非侵入性和注重隐私的特点,解决了与传统摄像方法相关的道德和隐私问题。mm-FERP 利用卷积神经网络 (CNN) 有效分析毫米波信号中的复杂模式。这种方法能将模型知识从大量基于图像(Scalograms)的数据集顺利转移到对毫米波雷达信号的详细理解中,从而显著提高了模型的预测准确性和通过情绪行为识别个性特征的效率。我们的深入评估显示,mm-FERP 在通过情绪识别(中性、微笑、愤怒、悲伤、惊讶)预测个性特征方面具有非凡的潜力,在 0.47 米的距离内准确率高达 97%,令人印象深刻。每位参与者提供 50 个样本,每个表情 10 个样本,总共 2500 个样本。我们还从同一参与者那里收集了一份自我评估报告,其中包含 64 个与心理行为相关的问题,通过将其与雷达信号特征的问题值权重(0.5-1.5)相关联来验证个性。mm-FERP 的平均精确度为 97.8%,召回率为 97.2%,F1 为 97.2%。这些结果表明,mm-FERP 是一种通过毫米波情绪识别进行心理行为分析的创新方法,能够改善用户体验设计,并为实现个性化和心理洞察的交互技术铺平道路。
{"title":"mm-FERP: An effective method for human personality prediction via mm-wave radar using facial sensing","authors":"Naveed Imran ,&nbsp;Jian Zhang ,&nbsp;Zheng Yang ,&nbsp;Jehad Ali","doi":"10.1016/j.ipm.2024.103919","DOIUrl":"10.1016/j.ipm.2024.103919","url":null,"abstract":"<div><div>mm-FERP (millimeter wave Facial Expression Recognition for Personality) explores the use of mm-Wave radar technology, specifically the TI IWR1443, to assess personality traits based on the OCEAN model through facial expression analysis. This research uniquely combines psychological profiling with state-of-the-art technology to predict the OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) personality traits by carefully analyzing facial muscle movements collected through mm-wave radar alongside detailed questionnaire analysis. Our advanced mm-FERP system employs mm-wave radar technology for the detection and analysis of facial expressions in a manner that is both non-intrusive and privacy-centric, handling the ethical and privacy concerns associated with traditional camera-based methods. Using a convolutional neural network (CNN), mm-FERP effectively analyzes the complex patterns in mm-wave signals. This approach enables the smooth transfer of model knowledge from extensive image-based (Scalograms) datasets to the detailed understanding of mm-wave radar signals, significantly enhancing the model’s predictive accuracy and efficiency in identifying personality traits via emotional behavior. Our in-depth evaluation reveals mm-FERP’s remarkable potential to predict personality traits through emotion recognition (Neutral, Smile, Angry, Sad, Amazed) with an impressive accuracy of 97% across distances up to 0.47 m. We experiment in a controlled environment with more than 50 participants from different age groups (18–35) including males and females of different continents to train our model on different facial symmetry. Each participant gives 50 samples 10 for each expression making a total of 2500 samples. We also collected a self-assessment report from the same participants of 64 questions related to psychological behavior to validate personality by correlating it with radar signal features on question value weight (0.5–1.5). mm-FERP achieve an average score of 97.8% in precision, 97.2% in Recall, and 97.2% of F1. These results show mm-FERP’s ability as an innovative approach for psychological behavioral analysis through mm-wave emotion recognition, improving user experience design, and paving the path for interactive technologies that are both personalized and psychologically insightful.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CGCN: Context graph convolutional network for few-shot temporal action localization CGCN:用于少量时间动作定位的上下文图卷积网络
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-15 DOI: 10.1016/j.ipm.2024.103926
Shihui Zhang , Houlin Wang , Lei Wang , Xueqiang Han , Qing Tian
Localizing human actions in videos has attracted extensive attention from industry and academia. Few-Shot Temporal Action Localization (FS-TAL) aims to detect human actions in untrimmed videos using a limited number of training samples. Existing FS-TAL methods usually ignore the semantic context between video snippets, making it difficult to detect actions during the query process. In this paper, we propose a novel FS-TAL method named Context Graph Convolutional Network (CGCN) which employs multi-scale graph convolution to aggregate semantic context between video snippets in addition to exploiting their temporal context. Specifically, CGCN constructs a graph for each scale of a video, where each video snippet is a node, and the relationships between the snippets are edges. There are three types of edges, namely sequence edges, intra-action edges, and inter-action edges. CGCN establishes sequence edges to enhance temporal expression. Intra-action edges utilize hyperbolic space to encapsulate context among video snippets within each action, while inter-action edges leverage Euclidean space to capture similar semantics between different actions. Through graph convolution on each scale, CGCN enables the acquisition of richer and context-aware video representations. Experiments demonstrate CGCN outperforms the second-best method by 4.5%/0.9% and 4.3%/0.9% mAP on the ActivityNet and THUMOS14 datasets in one-shot/five-shot scenarios, respectively, at [email protected]. The source code can be found in https://github.com/mugenggeng/CGCN.git.
视频中的人类动作定位引起了业界和学术界的广泛关注。少镜头时态动作定位(FS-TAL)旨在利用有限的训练样本检测未剪辑视频中的人类动作。现有的 FS-TAL 方法通常会忽略视频片段之间的语义上下文,因此很难在查询过程中检测到动作。在本文中,我们提出了一种名为 "上下文图卷积网络(CGCN)"的新型 FS-TAL 方法,该方法除了利用视频片段的时间上下文外,还利用多尺度图卷积来聚合视频片段之间的语义上下文。具体来说,CGCN 为视频的每个尺度构建一个图,其中每个视频片段是一个节点,片段之间的关系是边。边有三种类型,即序列边、动作内边和动作间边。CGCN 通过建立序列边缘来增强时间表达能力。动作内边缘利用双曲空间来封装每个动作中视频片段之间的上下文,而动作间边缘则利用欧几里得空间来捕捉不同动作之间的相似语义。通过在每个尺度上进行图卷积,CGCN 能够获得更丰富的上下文感知视频表示。实验证明,在ActivityNet和THUMOS14数据集上,CGCN在一帧/五帧场景下的mAP分别比第二好的方法高出4.5%/0.9%和4.3%/0.9%,详情请访问[email protected]。源代码见 https://github.com/mugenggeng/CGCN.git。
{"title":"CGCN: Context graph convolutional network for few-shot temporal action localization","authors":"Shihui Zhang ,&nbsp;Houlin Wang ,&nbsp;Lei Wang ,&nbsp;Xueqiang Han ,&nbsp;Qing Tian","doi":"10.1016/j.ipm.2024.103926","DOIUrl":"10.1016/j.ipm.2024.103926","url":null,"abstract":"<div><div>Localizing human actions in videos has attracted extensive attention from industry and academia. Few-Shot Temporal Action Localization (FS-TAL) aims to detect human actions in untrimmed videos using a limited number of training samples. Existing FS-TAL methods usually ignore the semantic context between video snippets, making it difficult to detect actions during the query process. In this paper, we propose a novel FS-TAL method named Context Graph Convolutional Network (CGCN) which employs multi-scale graph convolution to aggregate semantic context between video snippets in addition to exploiting their temporal context. Specifically, CGCN constructs a graph for each scale of a video, where each video snippet is a node, and the relationships between the snippets are edges. There are three types of edges, namely sequence edges, intra-action edges, and inter-action edges. CGCN establishes sequence edges to enhance temporal expression. Intra-action edges utilize hyperbolic space to encapsulate context among video snippets within each action, while inter-action edges leverage Euclidean space to capture similar semantics between different actions. Through graph convolution on each scale, CGCN enables the acquisition of richer and context-aware video representations. Experiments demonstrate CGCN outperforms the second-best method by 4.5%/0.9% and 4.3%/0.9% mAP on the ActivityNet and THUMOS14 datasets in one-shot/five-shot scenarios, respectively, at [email protected]. The source code can be found in <span><span>https://github.com/mugenggeng/CGCN.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IterSum: Iterative summarization based on document topological structure IterSum:基于文档拓扑结构的迭代摘要
IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-15 DOI: 10.1016/j.ipm.2024.103918
Shuai Yu , Wei Gao , Yongbin Qin , Caiwei Yang , Ruizhang Huang , Yanping Chen , Chuan Lin
Document structure plays a crucial role in understanding and analyzing document information. However, effectively encoding document structural features into the Transformer architecture faces significant challenges. This is primarily because different types of documents require the model to adopt varying structural encoding strategies, leading to a lack of a unified framework that can broadly adapt to different document types to leverage their structural properties. Despite the diversity of document types, sentences within a document are interconnected through semantic relationships, forming a topological semantic network. This topological structure is essential for integrating and summarizing information within the document. In this work, we introduce IterSum, a versatile text summarization framework applicable to various types of text. In IterSum, we utilize the document’s topological structure to divide the text into multiple blocks, first generating a summary for the initial block, then combining the current summary with the content of the next block to produce the subsequent summary, and continuing in this iterative manner until the final summary is generated. We validated our model on nine different types of public datasets, including news, knowledge bases, legal documents, and guidelines. Both quantitative and qualitative analyses were conducted, and the experimental results show that our model achieves state-of-the-art performance on all nine datasets measured by ROUGE scores. We also explored low-resource summarization, finding that even with only 10 or 100 samples in multiple datasets, top-notch results were obtained. Finally, we conducted human evaluations to further validate the superiority of our model.
文档结构在理解和分析文档信息方面起着至关重要的作用。然而,将文档结构特征有效地编码到 Transformer 架构中面临着巨大的挑战。这主要是因为不同类型的文档要求模型采用不同的结构编码策略,从而导致缺乏一个统一的框架来广泛适应不同的文档类型,以充分利用其结构属性。尽管文档类型多种多样,但文档中的句子通过语义关系相互连接,形成了拓扑语义网络。这种拓扑结构对于整合和总结文档中的信息至关重要。在这项工作中,我们介绍了适用于各种类型文本的多功能文本摘要框架 IterSum。在 IterSum 中,我们利用文档的拓扑结构将文本划分为多个区块,首先为初始区块生成摘要,然后将当前摘要与下一个区块的内容相结合,生成后续摘要,并以这种迭代方式继续下去,直到生成最终摘要。我们在九种不同类型的公共数据集上验证了我们的模型,包括新闻、知识库、法律文件和指南。我们进行了定量和定性分析,实验结果表明,我们的模型在以 ROUGE 分数衡量的所有九个数据集上都达到了最先进的性能。我们还探索了低资源摘要,发现即使在多个数据集中只有 10 或 100 个样本,也能获得一流的结果。最后,我们进行了人工评估,进一步验证了我们模型的优越性。
{"title":"IterSum: Iterative summarization based on document topological structure","authors":"Shuai Yu ,&nbsp;Wei Gao ,&nbsp;Yongbin Qin ,&nbsp;Caiwei Yang ,&nbsp;Ruizhang Huang ,&nbsp;Yanping Chen ,&nbsp;Chuan Lin","doi":"10.1016/j.ipm.2024.103918","DOIUrl":"10.1016/j.ipm.2024.103918","url":null,"abstract":"<div><div>Document structure plays a crucial role in understanding and analyzing document information. However, effectively encoding document structural features into the Transformer architecture faces significant challenges. This is primarily because different types of documents require the model to adopt varying structural encoding strategies, leading to a lack of a unified framework that can broadly adapt to different document types to leverage their structural properties. Despite the diversity of document types, sentences within a document are interconnected through semantic relationships, forming a topological semantic network. This topological structure is essential for integrating and summarizing information within the document. In this work, we introduce IterSum, a versatile text summarization framework applicable to various types of text. In IterSum, we utilize the document’s topological structure to divide the text into multiple blocks, first generating a summary for the initial block, then combining the current summary with the content of the next block to produce the subsequent summary, and continuing in this iterative manner until the final summary is generated. We validated our model on nine different types of public datasets, including news, knowledge bases, legal documents, and guidelines. Both quantitative and qualitative analyses were conducted, and the experimental results show that our model achieves state-of-the-art performance on all nine datasets measured by ROUGE scores. We also explored low-resource summarization, finding that even with only 10 or 100 samples in multiple datasets, top-notch results were obtained. Finally, we conducted human evaluations to further validate the superiority of our model.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142437742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Processing & Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1