We present a new benchmark task for graph-based machine learning, aiming to predict future air quality (PM2.5 concentration) observed by a geographically distributed network of environmental sensors. While prior work has successfully applied Graph Neural Networks (GNNs) on a wide family of spatio-temporal prediction tasks, the new benchmark task introduced here brings a technical challenge that has been less studied in the context of graph-based spatio-temporal learning: distribution shift across a long period of time. An important goal of this paper is to understand the behavior of spatio-temporal GNNs under distribution shift. We conduct a comprehensive comparative study of both graph-based and non-graph-based machine learning models under two data split methods, one results in distribution shift and one does not. Our empirical results suggest that GNN models tend to suffer more from distribution shift compared to non-graph-based models, which calls for special attention when deploying spatio-temporal GNNs in practice.
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
Graph contrastive learning (GCL) has emerged as a promising paradigm for learning graph representations. Recently, the idea of hard negatives is introduced to GCL, which can provide more challenging self-supervised objectives and alleviate over-fitting issues. These methods use different graphs in the same mini-batch as negative examples, and assign larger weights to true hard negative ones. However, the influence of such weighting strategies is limited in practice, since a small mini-batch may not contain any challenging enough negative examples. In this paper, we aim to offer a more flexible solution to affect the hardness of negatives by directly manipulating the representations of negatives. By assuming that (1) good negative representations should not deviate far from the representations of real graph samples, and (2) the computation process of graph encoder may introduce biases to graph representations, we first design a negative representation generator (NRG) which (1) employs real graphs as prototypes to perturb, and (2) introduces parameterized perturbations through the feed-forward computation of the graph encoder to match the biases. Then we design a generation loss to train the parameters in NRG and adaptively generate negative representations for more challenging contrastive objectives. Experiments on eight benchmark datasets show that our proposed framework ANGCL has 1.6% relative improvement over the best baseline, and can be successfully integrated with three types of graph augmentations. Ablation studies and hyper-parameter experiments further demonstrate the effectiveness of ANGCL.
Trajectory classification focuses on predicting the class or category of a moving object based on its observed movement over time. The classification of trajectory data using classical approaches can be challenging due to the arbitrary and relatively long length of some trajectories. To overcome this, trajectories are often mapped into vector representations that aim to encode their most significant features and for a fixed number of dimensions. Here we propose a novel vector representation for trajectories that combines previously employed features with new ones derived from the computation of the Kramers–Moyal coefficients (KMC). Due to KMC originating from a Taylor expansion that progressively encapsulates more information about a stochastic process, their potential to be effective in trajectory classification is a logical anticipation. We evaluated our representation using different classifiers and several benchmark datasets previously used for trajectory classification. With the addition of features extracted from KMCs, our results indicate a reliable increase in classification accuracy and F1 score of around 4% across all datasets and models used for evaluation. Moreover, we observed an increase in accuracy of up to 20% and an increase in F1 score of up to 23% in some scenarios.
In this paper, a series of novel activation functions is presented, which is derived using the improved Riemann–Liouville conformable fractional derivative (CFD). This study investigates the use of fractional activation functions in Multilayer Perceptron (MLP) models and their impact on the performance of classification tasks, verified using the IRIS, MNIST and FMNIST datasets. Fractional activation functions introduce a non-integer power exponent, allowing for improved capturing of complex patterns and representations. The experiment compares MLP models employing fractional activation functions, such as fractional sigmoid, hyperbolic tangent and rectified linear units, against traditional models using standard activation functions, their improved versions and existing fractional functions. The numerical studies have confirmed the theoretical observations mentioned in the paper. The findings highlight the potential usage of new functions as a valuable tool in deep learning in classification. The study suggests incorporating fractional activation functions in MLP architectures can lead to superior accuracy and robustness.

