AI Open最新文献 - Book学术

PM2.5 forecasting under distribution shift: A graph learning approach 分布变化下的 PM2.5 预测：图学习方法

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2023.11.001

Yachuan Liu , Jiaqi Ma , Paramveer Dhillon , Qiaozhu Mei

We present a new benchmark task for graph-based machine learning, aiming to predict future air quality (PM2.5 concentration) observed by a geographically distributed network of environmental sensors. While prior work has successfully applied Graph Neural Networks (GNNs) on a wide family of spatio-temporal prediction tasks, the new benchmark task introduced here brings a technical challenge that has been less studied in the context of graph-based spatio-temporal learning: distribution shift across a long period of time. An important goal of this paper is to understand the behavior of spatio-temporal GNNs under distribution shift. We conduct a comprehensive comparative study of both graph-based and non-graph-based machine learning models under two data split methods, one results in distribution shift and one does not. Our empirical results suggest that GNN models tend to suffer more from distribution shift compared to non-graph-based models, which calls for special attention when deploying spatio-temporal GNNs in practice.

我们为基于图的机器学习提出了一项新的基准任务，旨在预测由地理分布式环境传感器网络观测到的未来空气质量（PM2.5 浓度）。虽然之前的工作已经成功地将图神经网络（GNN）应用于一系列时空预测任务，但本文介绍的新基准任务带来了一个在基于图的时空学习方面研究较少的技术挑战：跨长时间的分布转移。本文的一个重要目标是了解时空 GNN 在分布转移下的行为。我们对基于图和非基于图的机器学习模型在两种数据拆分方法（一种会导致分布转移，另一种不会）下的表现进行了全面的比较研究。我们的实证结果表明，与非基于图的模型相比，基于图的 GNN 模型更容易受到分布转移的影响，这就要求在实际部署时空 GNN 时要特别注意这一点。

{"title":"PM2.5 forecasting under distribution shift: A graph learning approach","authors":"Yachuan Liu , Jiaqi Ma , Paramveer Dhillon , Qiaozhu Mei","doi":"10.1016/j.aiopen.2023.11.001","DOIUrl":"10.1016/j.aiopen.2023.11.001","url":null,"abstract":"<div><p>We present a new benchmark task for graph-based machine learning, aiming to predict future air quality (PM2.5 concentration) observed by a geographically distributed network of environmental sensors. While prior work has successfully applied Graph Neural Networks (GNNs) on a wide family of spatio-temporal prediction tasks, the new benchmark task introduced here brings a technical challenge that has been less studied in the context of graph-based spatio-temporal learning: distribution shift across a long period of time. An important goal of this paper is to understand the behavior of spatio-temporal GNNs under distribution shift. We conduct a comprehensive comparative study of both graph-based and non-graph-based machine learning models under two data split methods, one results in distribution shift and one does not. Our empirical results suggest that GNN models tend to suffer more from distribution shift compared to non-graph-based models, which calls for special attention when deploying spatio-temporal GNNs in practice.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 23-29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000220/pdfft?md5=cec5103867bd9723b31ac8d2aeadf3e7&pid=1-s2.0-S2666651023000220-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139013251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MindLLM: Lightweight large language model pre-training, evaluation and domain application MindLLM：轻量级大型语言模型的预训练、评估和领域应用

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.08.001

Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Yang Gao, Heyan Huang

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.

大型语言模型（LLM）在各种自然语言任务中表现出了卓越的性能，标志着在通用人工智能方面取得了重大进展。虽然通用人工智能是通过开发越来越大规模的模型来实现的，但考虑到训练和部署 LLM 的高成本以及资源的稀缺性，开发轻量级定制模型以更好地服务于某些领域可能是另一个分支。在本文中，我们介绍了从零开始训练的一系列新型双语轻量级大型语言模型--MindLLM，通过提供具有 13 亿和 30 亿参数的模型来减轻这些负担。本文全面介绍了大型模型开发过程中积累的经验，包括数据构建、模型架构、评估和应用等每一个步骤。希望这些见解对同行学者和开发人员有价值。在一些公共基准测试中，MindLLM 的性能始终与其他开源大型模型不相上下，甚至有过之而无不及。我们还介绍了一个为小型模型量身定制的创新指令调整框架，以有效增强其能力。此外，我们还探索了 MindLLM 在法律和金融等特定垂直领域的应用，强调了我们轻量级模型的灵活性和适应性。

{"title":"MindLLM: Lightweight large language model pre-training, evaluation and domain application","authors":"Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Yang Gao, Heyan Huang","doi":"10.1016/j.aiopen.2024.08.001","DOIUrl":"10.1016/j.aiopen.2024.08.001","url":null,"abstract":"<div><p>Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 1-26"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000111/pdfft?md5=5c01070780bb0f7ea417c3293322b19c&pid=1-s2.0-S2666651024000111-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive negative representations for graph contrastive learning 图形对比学习的自适应负表征

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2023.10.005

Qi Zhang, Cheng Yang, Chuan Shi

Graph contrastive learning (GCL) has emerged as a promising paradigm for learning graph representations. Recently, the idea of hard negatives is introduced to GCL, which can provide more challenging self-supervised objectives and alleviate over-fitting issues. These methods use different graphs in the same mini-batch as negative examples, and assign larger weights to true hard negative ones. However, the influence of such weighting strategies is limited in practice, since a small mini-batch may not contain any challenging enough negative examples. In this paper, we aim to offer a more flexible solution to affect the hardness of negatives by directly manipulating the representations of negatives. By assuming that (1) good negative representations should not deviate far from the representations of real graph samples, and (2) the computation process of graph encoder may introduce biases to graph representations, we first design a negative representation generator (NRG) which (1) employs real graphs as prototypes to perturb, and (2) introduces parameterized perturbations through the feed-forward computation of the graph encoder to match the biases. Then we design a generation loss to train the parameters in NRG and adaptively generate negative representations for more challenging contrastive objectives. Experiments on eight benchmark datasets show that our proposed framework ANGCL has 1.6% relative improvement over the best baseline, and can be successfully integrated with three types of graph augmentations. Ablation studies and hyper-parameter experiments further demonstrate the effectiveness of ANGCL.

图形对比学习（GCL）已成为一种很有前途的图形表征学习范式。最近，GCL 引入了 "硬否定"（hard negatives）的概念，它可以提供更具挑战性的自我监督目标，并缓解过度拟合问题。这些方法使用同一迷你批次中的不同图形作为负面示例，并为真正的硬负面示例分配更大的权重。然而，这种加权策略的影响在实践中是有限的，因为一个小的迷你批次可能不包含任何足够有挑战性的负面示例。在本文中，我们旨在提供一种更灵活的解决方案，通过直接操作负面示例来影响负面的硬度。通过假设（1）好的否定表示不应该与真实图样本的表示有太大偏差，以及（2）图编码器的计算过程可能会给图表示带来偏差，我们首先设计了一个否定表示生成器（NRG），它（1）采用真实图作为扰动原型，以及（2）通过图编码器的前馈计算引入参数化扰动以匹配偏差。然后，我们设计了一种生成损失来训练 NRG 中的参数，并针对更具挑战性的对比目标自适应生成负表征。在八个基准数据集上的实验表明，我们提出的框架 ANGCL 比最佳基线有 1.6% 的相对改进，并能成功地与三种类型的图增强集成。消融研究和超参数实验进一步证明了 ANGCL 的有效性。

{"title":"Adaptive negative representations for graph contrastive learning","authors":"Qi Zhang, Cheng Yang, Chuan Shi","doi":"10.1016/j.aiopen.2023.10.005","DOIUrl":"10.1016/j.aiopen.2023.10.005","url":null,"abstract":"<div><p>Graph contrastive learning (GCL) has emerged as a promising paradigm for learning graph representations. Recently, the idea of hard negatives is introduced to GCL, which can provide more challenging self-supervised objectives and alleviate over-fitting issues. These methods use different graphs in the same mini-batch as negative examples, and assign larger weights to true hard negative ones. However, the influence of such weighting strategies is limited in practice, since a small mini-batch may not contain any challenging enough negative examples. In this paper, we aim to offer a more flexible solution to affect the hardness of negatives by directly manipulating the representations of negatives. By assuming that (1) good negative representations should not deviate far from the representations of real graph samples, and (2) the computation process of graph encoder may introduce biases to graph representations, we first design a negative representation generator (NRG) which (1) employs real graphs as prototypes to perturb, and (2) introduces parameterized perturbations through the feed-forward computation of the graph encoder to match the biases. Then we design a generation loss to train the parameters in NRG and adaptively generate negative representations for more challenging contrastive objectives. Experiments on eight benchmark datasets show that our proposed framework ANGCL has 1.6% relative improvement over the best baseline, and can be successfully integrated with three types of graph augmentations. Ablation studies and hyper-parameter experiments further demonstrate the effectiveness of ANGCL.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 79-86"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000219/pdfft?md5=b0c3c461206c9fd2fcce93a0a80db1a1&pid=1-s2.0-S2666651023000219-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138992756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving trajectory classification through Kramers–Moyal coefficients 通过克拉默-莫亚系数改进轨迹分类

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.06.001

G. Viera-López , J.J. Morgado-Vega , A. Reyes , E. Altshuler , Yudivián Almeida-Cruz , Giorgio Manganini

Trajectory classification focuses on predicting the class or category of a moving object based on its observed movement over time. The classification of trajectory data using classical approaches can be challenging due to the arbitrary and relatively long length of some trajectories. To overcome this, trajectories are often mapped into vector representations that aim to encode their most significant features and for a fixed number of dimensions. Here we propose a novel vector representation for trajectories that combines previously employed features with new ones derived from the computation of the Kramers–Moyal coefficients (KMC). Due to KMC originating from a Taylor expansion that progressively encapsulates more information about a stochastic process, their potential to be effective in trajectory classification is a logical anticipation. We evaluated our representation using different classifiers and several benchmark datasets previously used for trajectory classification. With the addition of features extracted from KMCs, our results indicate a reliable increase in classification accuracy and F1 score of around 4% across all datasets and models used for evaluation. Moreover, we observed an increase in accuracy of up to 20% and an increase in F1 score of up to 23% in some scenarios.

轨迹分类的重点是根据观察到的移动物体随时间的变化来预测其类别。由于某些轨迹的任意性和相对较长的长度，使用传统方法对轨迹数据进行分类具有挑战性。为了克服这一问题，通常会将轨迹映射到矢量表示中，目的是对其最重要的特征和固定维数进行编码。在这里，我们提出了一种新的轨迹向量表示法，它将以前使用的特征与通过计算克拉默-莫亚系数（KMC）得到的新特征相结合。由于 KMC 源自泰勒扩展，能逐步囊括随机过程的更多信息，因此它们在轨迹分类中的有效潜力是一个合乎逻辑的预期。我们使用不同的分类器和以前用于轨迹分类的几个基准数据集对我们的表示法进行了评估。我们的结果表明，加入从 KMC 提取的特征后，在所有用于评估的数据集和模型中，分类准确率和 F1 分数都有可靠的提高，提高幅度约为 4%。此外，我们还观察到在某些情况下，准确率提高了 20%，F1 分数提高了 23%。

{"title":"Improving trajectory classification through Kramers–Moyal coefficients","authors":"G. Viera-López , J.J. Morgado-Vega , A. Reyes , E. Altshuler , Yudivián Almeida-Cruz , Giorgio Manganini","doi":"10.1016/j.aiopen.2024.06.001","DOIUrl":"10.1016/j.aiopen.2024.06.001","url":null,"abstract":"<div><p>Trajectory classification focuses on predicting the class or category of a moving object based on its observed movement over time. The classification of trajectory data using classical approaches can be challenging due to the arbitrary and relatively long length of some trajectories. To overcome this, trajectories are often mapped into vector representations that aim to encode their most significant features and for a fixed number of dimensions. Here we propose a novel vector representation for trajectories that combines previously employed features with new ones derived from the computation of the Kramers–Moyal coefficients (KMC). Due to KMC originating from a Taylor expansion that progressively encapsulates more information about a stochastic process, their potential to be effective in trajectory classification is a logical anticipation. We evaluated our representation using different classifiers and several benchmark datasets previously used for trajectory classification. With the addition of features extracted from KMCs, our results indicate a reliable increase in classification accuracy and F1 score of around 4% across all datasets and models used for evaluation. Moreover, we observed an increase in accuracy of up to 20% and an increase in F1 score of up to 23% in some scenarios.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 87-93"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266665102400010X/pdfft?md5=1530eab784a46e13da719255a80cd3e1&pid=1-s2.0-S266665102400010X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141715791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mining contacts from spatio-temporal trajectories 从时空轨迹挖掘联系人

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.10.002

Adikarige Randil Sanjeewa Madanayake, Kyungmi Lee, Ickjai Lee

Contact mining is discovering objects in close proximity in their movements in order to reveal possible interactions, infections, collisions or contacts. This process can be significantly beneficial in a spread of an infectious disease situation to identify potential victims from a known infected human or animal, especially when the victims are asymptomatic. Movements of objects are captured by spatio-temporal trajectories represented by a series of geospatial locations and corresponding timestamps. A large amount of spatio-temporal trajectory data is being gathered by various location acquiring sensor devices by tracking movement behaviours of people, animals, vehicles and natural events. Trajectory data mining techniques have been proposed to discover useful patterns to understand the behaviours of spatio-temporal trajectories. One unexplored pattern is to identify contacts of targeted trajectory in spatio-temporal trajectories, which is defined as contact mining. The aim of this study is to investigate contact mining from spatio-temporal trajectories. The approach will be initiated by preprocessing spatio-temporal data and then by investigating a robust contact mining framework to efficiently and effectively mine contacts of a trajectory of interest from a given set of trajectories. Experimental results demonstrate the efficiency, effectiveness and scalability of our approach. In addition, parameter sensitivity analysis reveals the robustness and insensitivity of our framework.

接触挖掘是指在物体移动过程中发现近距离的物体，以揭示可能的相互作用、感染、碰撞或接触。在传染病传播的情况下，尤其是在受害者没有症状的情况下，这一过程对于从已知受感染的人类或动物中识别潜在受害者大有裨益。物体的移动是通过一系列地理空间位置和相应的时间戳所代表的时空轨迹来捕捉的。通过跟踪人、动物、车辆和自然事件的移动行为，各种位置获取传感器设备正在收集大量的时空轨迹数据。有人提出了轨迹数据挖掘技术来发现有用的模式，以了解时空轨迹的行为。其中一种尚未探索的模式是在时空轨迹中识别目标轨迹的接触点，这被定义为接触点挖掘。本研究旨在研究从时空轨迹中挖掘接触点。该方法将首先对时空数据进行预处理，然后研究一种稳健的接触挖掘框架，以便从给定的轨迹集中高效、有效地挖掘感兴趣轨迹的接触点。实验结果证明了我们方法的效率、有效性和可扩展性。此外，参数敏感性分析揭示了我们框架的鲁棒性和不敏感性。

{"title":"Mining contacts from spatio-temporal trajectories","authors":"Adikarige Randil Sanjeewa Madanayake, Kyungmi Lee, Ickjai Lee","doi":"10.1016/j.aiopen.2024.10.002","DOIUrl":"10.1016/j.aiopen.2024.10.002","url":null,"abstract":"<div><div>Contact mining is discovering objects in close proximity in their movements in order to reveal possible interactions, infections, collisions or contacts. This process can be significantly beneficial in a spread of an infectious disease situation to identify potential victims from a known infected human or animal, especially when the victims are asymptomatic. Movements of objects are captured by spatio-temporal trajectories represented by a series of geospatial locations and corresponding timestamps. A large amount of spatio-temporal trajectory data is being gathered by various location acquiring sensor devices by tracking movement behaviours of people, animals, vehicles and natural events. Trajectory data mining techniques have been proposed to discover useful patterns to understand the behaviours of spatio-temporal trajectories. One unexplored pattern is to identify contacts of targeted trajectory in spatio-temporal trajectories, which is defined as contact mining. The aim of this study is to investigate contact mining from spatio-temporal trajectories. The approach will be initiated by preprocessing spatio-temporal data and then by investigating a robust contact mining framework to efficiently and effectively mine contacts of a trajectory of interest from a given set of trajectories. Experimental results demonstrate the efficiency, effectiveness and scalability of our approach. In addition, parameter sensitivity analysis reveals the robustness and insensitivity of our framework.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 197-207"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing neural network classification using fractional-order activation functions 利用分数阶激活函数增强神经网络分类功能

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2023.12.003

Meshach Kumar , Utkal Mehta , Giansalvo Cirrincione

In this paper, a series of novel activation functions is presented, which is derived using the improved Riemann–Liouville conformable fractional derivative ( $^{R L}$ CFD). This study investigates the use of fractional activation functions in Multilayer Perceptron (MLP) models and their impact on the performance of classification tasks, verified using the IRIS, MNIST and FMNIST datasets. Fractional activation functions introduce a non-integer power exponent, allowing for improved capturing of complex patterns and representations. The experiment compares MLP models employing fractional activation functions, such as fractional sigmoid, hyperbolic tangent and rectified linear units, against traditional models using standard activation functions, their improved versions and existing fractional functions. The numerical studies have confirmed the theoretical observations mentioned in the paper. The findings highlight the potential usage of new functions as a valuable tool in deep learning in classification. The study suggests incorporating fractional activation functions in MLP architectures can lead to superior accuracy and robustness.

本文介绍了一系列新颖的激活函数，这些函数是利用改进的黎曼-刘维尔顺应分数导数（RLCFD）推导出来的。本研究探讨了分数激活函数在多层感知器（MLP）模型中的使用及其对分类任务性能的影响，并使用 IRIS、MNIST 和 FMNIST 数据集进行了验证。分数激活函数引入了一个非整数幂指数，从而改进了对复杂模式和表征的捕捉。实验将采用分数激活函数（如分数 sigmoid、双曲正切和整流线性单元）的 MLP 模型与采用标准激活函数、其改进版本和现有分数函数的传统模型进行了比较。数值研究证实了论文中提到的理论观察结果。研究结果凸显了新函数作为深度学习分类的重要工具的潜在用途。研究表明，在 MLP 架构中加入分数激活函数可以提高准确性和鲁棒性。

{"title":"Enhancing neural network classification using fractional-order activation functions","authors":"Meshach Kumar , Utkal Mehta , Giansalvo Cirrincione","doi":"10.1016/j.aiopen.2023.12.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.12.003","url":null,"abstract":"<div><p>In this paper, a series of novel activation functions is presented, which is derived using the improved Riemann–Liouville conformable fractional derivative (<span><math><msup><mrow></mrow><mrow><mi>R</mi><mi>L</mi></mrow></msup></math></span>CFD). This study investigates the use of fractional activation functions in Multilayer Perceptron (MLP) models and their impact on the performance of classification tasks, verified using the IRIS, MNIST and FMNIST datasets. Fractional activation functions introduce a non-integer power exponent, allowing for improved capturing of complex patterns and representations. The experiment compares MLP models employing fractional activation functions, such as fractional sigmoid, hyperbolic tangent and rectified linear units, against traditional models using standard activation functions, their improved versions and existing fractional functions. The numerical studies have confirmed the theoretical observations mentioned in the paper. The findings highlight the potential usage of new functions as a valuable tool in deep learning in classification. The study suggests incorporating fractional activation functions in MLP architectures can lead to superior accuracy and robustness.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 10-22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266665102300030X/pdfft?md5=2be839945dd6c63499655950e9809539&pid=1-s2.0-S266665102300030X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139090006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Few-shot Named Entity Recognition via encoder and class intervention 通过编码器和类别干预进行少量命名实体识别

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.01.005

Long Ding , Chunping Ouyang , Yongbin Liu , Zhihua Tao , Yaping Wan , Zheng Gao

In the real world, the large and complex nature of text increases the difficulty of tagging and results in a limited amount of tagged text. Few-shot Named Entity Recognition(NER) only uses a small amount of annotation data to identify and classify entities. It avoids the above problems. Few-shot learning methods usually use prior knowledge to achieve good results. However, prior knowledge may become a confounding factor affecting the relation between sample features and real labels. This problem leads to bias and difficulty accurately capturing class. To solve this problem, a new model, Few-shot Named Entity Recognition via Encoder and Class Intervention, is proposed based on causality. We show that we can steer the model to manufacture interventions on encoder and class, and reduce the interference of confounding factors. Specifically, while cross-sample attention perturbation is used in the encoder layer, a practical causal relation between feature and classification label is developed in the class layer. This way is an attempt of causal methodology in the Few-shot Named Entity Recognition task, which improves the discrimination ability of the NER classifier. Experimental results demonstrate that our model outperforms baseline models in both 5-way and 10-way on two NER datasets.

在现实世界中，文本的庞大性和复杂性增加了标记的难度，导致标记的文本数量有限。少量命名实体识别（NER）只使用少量标注数据来识别和分类实体。它避免了上述问题。少量学习方法通常利用先验知识来获得良好效果。然而，先验知识可能会成为影响样本特征与真实标签之间关系的干扰因素。这个问题会导致偏差，难以准确捕捉类别。为了解决这个问题，我们提出了一种基于因果关系的新模型--通过编码器和类别干预的少量命名实体识别（Few-shot Named Entity Recognition via Encoder and Class Intervention）。我们的研究表明，我们可以引导模型对编码器和类别进行干预，减少混杂因素的干扰。具体来说，在编码器层使用跨样本注意力扰动的同时，在类层开发了特征与分类标签之间的实用因果关系。这种方法是因果关系方法学在 "少量命名实体识别 "任务中的一种尝试，它提高了 NER 分类器的辨别能力。实验结果表明，在两个 NER 数据集上，我们的模型在 5 路和 10 路模型中的表现都优于基线模型。

{"title":"Few-shot Named Entity Recognition via encoder and class intervention","authors":"Long Ding , Chunping Ouyang , Yongbin Liu , Zhihua Tao , Yaping Wan , Zheng Gao","doi":"10.1016/j.aiopen.2024.01.005","DOIUrl":"10.1016/j.aiopen.2024.01.005","url":null,"abstract":"<div><p>In the real world, the large and complex nature of text increases the difficulty of tagging and results in a limited amount of tagged text. Few-shot Named Entity Recognition(NER) only uses a small amount of annotation data to identify and classify entities. It avoids the above problems. Few-shot learning methods usually use prior knowledge to achieve good results. However, prior knowledge may become a confounding factor affecting the relation between sample features and real labels. This problem leads to bias and difficulty accurately capturing class. To solve this problem, a new model, Few-shot Named Entity Recognition via Encoder and Class Intervention, is proposed based on causality. We show that we can steer the model to manufacture interventions on encoder and class, and reduce the interference of confounding factors. Specifically, while cross-sample attention perturbation is used in the encoder layer, a practical causal relation between feature and classification label is developed in the class layer. This way is an attempt of causal methodology in the Few-shot Named Entity Recognition task, which improves the discrimination ability of the NER classifier. Experimental results demonstrate that our model outperforms baseline models in both 5-way and 10-way on two NER datasets.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 39-45"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000068/pdfft?md5=737ba44f6bb38a965193bee8501a6eb7&pid=1-s2.0-S2666651024000068-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139884960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CPT: Colorful Prompt Tuning for pre-trained vision-language models CPT：基于颜色的提示调整，用于预训练的视觉语言模型

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.01.004

Yuan Yao , Ao Zhang , Zhengyan Zhang , Zhiyuan Liu , Tat-Seng Chua , Maosong Sun

Vision-Language Pre-training (VLP) models have shown promising capabilities in grounding natural language in image data, facilitating a broad range of cross-modal tasks. However, we note that there exists a significant gap between the objective forms of model pre-training and fine-tuning, resulting in a need for large amounts of labeled data to stimulate the visual grounding capability of VLP models for downstream tasks. To address the challenge, we present Color-based Prompt Tuning (CPT), a novel paradigm for tuning VLP models, which reformulates visual grounding into a fill-in-the-blank problem with color-based co-referential markers in image and text, maximally mitigating the gap. In this way, CPT enables strong few-shot and even zero-shot visual grounding capabilities of VLP models. Comprehensive experimental results show that CPT achieves state-of-the-art performance on zero/few-shot visual grounding (e.g., 75.1 zero-shot accuracy in RefCOCO evaluation), outperforming fine-tuned and other prompt-tuned models by a large margin. Moreover, CPT can also be easily extended to achieve promising zero/few-shot performance on other vision-language tasks, such as visual relation detection, visual commonsense reasoning and visual question answering. We make the data and codes publicly available at https://github.com/thunlp/CPT.

视觉语言预训练（VLP）模型在图像数据的自然语言基础方面表现出了良好的能力，从而促进了广泛的跨模态任务。然而，我们注意到，模型预训练和微调的客观形式之间存在很大差距，因此需要大量标注数据来激发 VLP 模型的视觉接地能力，以完成下游任务。为了应对这一挑战，我们提出了基于颜色的提示调整（CPT），这是一种用于调整 VLP 模型的新型范式，它将视觉接地重新组合为一个填空问题，在图像和文本中使用基于颜色的共同参照标记，从而最大限度地缩小了差距。通过这种方法，CPT 使 VLP 模型具有强大的少镜头甚至零镜头视觉接地能力。综合实验结果表明，CPT 在零镜头/少镜头视觉接地方面达到了最先进的性能（例如，在 RefCOCO 评估中，零镜头准确率为 75.1），远远优于微调模型和其他及时调整模型。此外，CPT 还可以很容易地扩展到其他视觉语言任务，如视觉关系检测、视觉常识推理和视觉问题解答等，以获得令人满意的零/少镜头性能。我们在 https://github.com/thunlp/CPT 上公开了数据和代码。

{"title":"CPT: Colorful Prompt Tuning for pre-trained vision-language models","authors":"Yuan Yao , Ao Zhang , Zhengyan Zhang , Zhiyuan Liu , Tat-Seng Chua , Maosong Sun","doi":"10.1016/j.aiopen.2024.01.004","DOIUrl":"10.1016/j.aiopen.2024.01.004","url":null,"abstract":"<div><p>Vision-Language Pre-training (VLP) models have shown promising capabilities in grounding natural language in image data, facilitating a broad range of cross-modal tasks. However, we note that there exists a significant gap between the objective forms of model pre-training and fine-tuning, resulting in a need for large amounts of labeled data to stimulate the visual grounding capability of VLP models for downstream tasks. To address the challenge, we present <strong>C</strong>olor-based <strong>P</strong>rompt <strong>T</strong>uning (CPT), a novel paradigm for tuning VLP models, which reformulates visual grounding into a fill-in-the-blank problem with color-based co-referential markers in image and text, maximally mitigating the gap. In this way, CPT enables strong few-shot and even zero-shot visual grounding capabilities of VLP models. Comprehensive experimental results show that CPT achieves state-of-the-art performance on zero/few-shot visual grounding (e.g., 75.1 zero-shot accuracy in RefCOCO evaluation), outperforming fine-tuned and other prompt-tuned models by a large margin. Moreover, CPT can also be easily extended to achieve promising zero/few-shot performance on other vision-language tasks, such as visual relation detection, visual commonsense reasoning and visual question answering. We make the data and codes publicly available at <span>https://github.com/thunlp/CPT</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 30-38"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000056/pdfft?md5=a0b3ea3b64a989f20cbd8db1f84428c6&pid=1-s2.0-S2666651024000056-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139686627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An ecosystem for personal knowledge graphs: A survey and research roadmap 个人知识图谱生态系统：调查与研究路线图

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.01.003

Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, Trond Linjordet

This paper presents an ecosystem for personal knowledge graphs (PKGs), commonly defined as resources of structured information about entities related to an individual, their attributes, and the relations between them. PKGs are a key enabler of secure and sophisticated personal data management and personalized services. However, there are challenges that need to be addressed before PKGs can achieve widespread adoption. One of the fundamental challenges is the very definition of what constitutes a PKG, as there are multiple interpretations of the term. We propose our own definition of a PKG, emphasizing the aspects of (1) data ownership by a single individual and (2) the delivery of personalized services as the primary purpose. We further argue that a holistic view of PKGs is needed to unlock their full potential, and propose a unified framework for PKGs, where the PKG is a part of a larger ecosystem with clear interfaces towards data services and data sources. A comprehensive survey and synthesis of existing work is conducted, with a mapping of the surveyed work into the proposed unified ecosystem. Finally, we identify open challenges and research opportunities for the ecosystem as a whole, as well as for the specific aspects of PKGs, which include population, representation and management, and utilization.

个人知识图谱（PKGs）通常被定义为与个人相关的实体、其属性以及它们之间关系的结构化信息资源。PKG 是实现安全、复杂的个人数据管理和个性化服务的关键因素。然而，在 PKG 得到广泛应用之前，还需要应对一些挑战。基本挑战之一是 PKG 的定义本身，因为对该术语有多种解释。我们提出了自己对 PKG 的定义，强调了以下两方面：(1) 单个个体对数据的所有权；(2) 以提供个性化服务为主要目的。我们进一步认为，需要从整体上看待 PKG，才能充分释放其潜力，并提出了一个统一的 PKG 框架，即 PKG 是一个更大的生态系统的一部分，具有面向数据服务和数据源的清晰接口。我们对现有工作进行了全面调查和综合，并将调查工作映射到建议的统一生态系统中。最后，我们确定了整个生态系统以及 PKG 的具体方面（包括人口、代表性和管理以及利用）所面临的挑战和研究机会。

{"title":"An ecosystem for personal knowledge graphs: A survey and research roadmap","authors":"Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, Trond Linjordet","doi":"10.1016/j.aiopen.2024.01.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2024.01.003","url":null,"abstract":"<div><p>This paper presents an ecosystem for personal knowledge graphs (PKGs), commonly defined as resources of structured information about entities related to an individual, their attributes, and the relations between them. PKGs are a key enabler of secure and sophisticated personal data management and personalized services. However, there are challenges that need to be addressed before PKGs can achieve widespread adoption. One of the fundamental challenges is the very definition of what constitutes a PKG, as there are multiple interpretations of the term. We propose our own definition of a PKG, emphasizing the aspects of (1) data ownership by a single individual and (2) the delivery of personalized services as the primary purpose. We further argue that a holistic view of PKGs is needed to unlock their full potential, and propose a unified framework for PKGs, where the PKG is a part of a larger ecosystem with clear interfaces towards data services and data sources. A comprehensive survey and synthesis of existing work is conducted, with a mapping of the surveyed work into the proposed unified ecosystem. Finally, we identify open challenges and research opportunities for the ecosystem as a whole, as well as for the specific aspects of PKGs, which include population, representation and management, and utilization.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 55-69"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000044/pdfft?md5=a12ec1f170570bcf4e71b8ae5c11e512&pid=1-s2.0-S2666651024000044-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139986315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating graph perturbations to enhance the generalization of GNNs 生成图形扰动以增强 GNN 的泛化能力

AI Open

Pub Date : 2024-01-01 DOI: 10.1016/j.aiopen.2024.10.001

Sofiane Ennadir , Giannis Nikolentzos , Michalis Vazirgiannis , Henrik Boström

Graph neural networks (GNNs) have become the standard approach for performing machine learning on graphs. Such models need large amounts of training data, however, in several graph classification and regression tasks, only limited training data is available. Unfortunately, due to the complex nature of graphs, common augmentation strategies employed in other settings, such as computer vision, do not apply to graphs. This work aims to improve the generalization ability of GNNs by increasing the size of the training set of a given problem. The new samples are generated using an iterative contrastive learning procedure that augments the dataset during the training, in a task-relevant approach, by manipulating the graph topology. The proposed approach is general, assumes no knowledge about the underlying architecture, and can thus be applied to any GNN. We provided a theoretical analysis regarding the equivalence of the proposed approach to a regularization technique. We demonstrate instances of our framework on popular GNNs, and evaluate them on several real-world benchmark graph classification datasets. The experimental results show that the proposed approach, in several cases, enhances the generalization of the underlying prediction models reaching in some datasets state-of-the-art performance.

图神经网络（GNN）已成为对图进行机器学习的标准方法。这类模型需要大量的训练数据，但在一些图分类和回归任务中，只有有限的训练数据可用。遗憾的是，由于图的复杂性，在计算机视觉等其他环境中采用的常见增强策略并不适用于图。这项研究旨在通过增加给定问题的训练集规模来提高 GNN 的泛化能力。新样本是通过迭代对比学习程序生成的，该程序在训练过程中通过操纵图拓扑结构，以任务相关的方式增加数据集。所提出的方法具有通用性，不需要了解底层架构，因此可应用于任何 GNN。我们对所提出的方法与正则化技术的等效性进行了理论分析。我们在流行的 GNN 上演示了我们的框架实例，并在几个真实世界的基准图分类数据集上对其进行了评估。实验结果表明，所提出的方法在某些情况下增强了底层预测模型的泛化能力，在某些数据集上达到了最先进的性能。

{"title":"Generating graph perturbations to enhance the generalization of GNNs","authors":"Sofiane Ennadir , Giannis Nikolentzos , Michalis Vazirgiannis , Henrik Boström","doi":"10.1016/j.aiopen.2024.10.001","DOIUrl":"10.1016/j.aiopen.2024.10.001","url":null,"abstract":"<div><div>Graph neural networks (GNNs) have become the standard approach for performing machine learning on graphs. Such models need large amounts of training data, however, in several graph classification and regression tasks, only limited training data is available. Unfortunately, due to the complex nature of graphs, common augmentation strategies employed in other settings, such as computer vision, do not apply to graphs. This work aims to improve the generalization ability of GNNs by increasing the size of the training set of a given problem. The new samples are generated using an iterative contrastive learning procedure that augments the dataset during the training, in a task-relevant approach, by manipulating the graph topology. The proposed approach is general, assumes no knowledge about the underlying architecture, and can thus be applied to any GNN. We provided a theoretical analysis regarding the equivalence of the proposed approach to a regularization technique. We demonstrate instances of our framework on popular GNNs, and evaluate them on several real-world benchmark graph classification datasets. The experimental results show that the proposed approach, in several cases, enhances the generalization of the underlying prediction models reaching in some datasets state-of-the-art performance.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 216-223"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142704286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0