Pub Date : 2024-09-06DOI: 10.1007/s10115-024-02216-1
Xiao Li, Sichen Liu, Yin Zhu, Gong Cheng
After recent gains achieved by large language models (LLMs) on numerical reasoning tasks, it has become of interest to have LLMs teach small models to improve on numerical reasoning. Instructing LLMs to generate Chains of Thought to fine-tune small models is an established approach. However, small models are passive in this line of work and may not be able to exploit the provided training data. In this paper, we propose a novel targeted training strategy to match LLM’s assistance with small models’ capacities. The small model will proactively request LLM’s assistance when it sifts out confusing training data. Then, LLM refines such data by successively revising reasoning steps and reducing question complexity before feeding the small model. Experiments show that this targeted training approach remarkably improves the performance of small models on a range of numerical reasoning datasets by 12–25%, making small models even competitive with some LLMs.
{"title":"Targeted training for numerical reasoning with large language models","authors":"Xiao Li, Sichen Liu, Yin Zhu, Gong Cheng","doi":"10.1007/s10115-024-02216-1","DOIUrl":"https://doi.org/10.1007/s10115-024-02216-1","url":null,"abstract":"<p>After recent gains achieved by large language models (LLMs) on numerical reasoning tasks, it has become of interest to have LLMs teach small models to improve on numerical reasoning. Instructing LLMs to generate Chains of Thought to fine-tune small models is an established approach. However, small models are passive in this line of work and may not be able to exploit the provided training data. In this paper, we propose a novel targeted training strategy to match LLM’s assistance with small models’ capacities. The small model will proactively request LLM’s assistance when it sifts out confusing training data. Then, LLM refines such data by successively revising reasoning steps and reducing question complexity before feeding the small model. Experiments show that this targeted training approach remarkably improves the performance of small models on a range of numerical reasoning datasets by 12–25%, making small models even competitive with some LLMs.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"14 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1007/s10115-024-02212-5
Anastasia Zhukova, Lukas von Sperl, Christian E. Matt, Bela Gipp
User experience (UX) is a part of human–computer interaction research and focuses on increasing intuitiveness, transparency, simplicity, and trust for the system users. Most UX research for machine learning or natural language processing (NLP) focuses on a data-driven methodology. It engages domain users mainly for usability evaluation. Moreover, more typical UX methods tailor the systems toward user usability, unlike learning about the user needs first. This paper proposes a new methodology for integrating generative UX research into developing domain NLP applications. Generative UX research employs domain users at the initial stages of prototype development, i.e., ideation and concept evaluation, and the last stage for evaluating system usefulness and user utility. The methodology emerged from and is evaluated on a case study about the full-cycle prototype development of a domain-specific semantic search for daily operations in the process industry. A key finding of our case study is that involving domain experts increases their interest and trust in the final NLP application. The combined UX+NLP research of the proposed method efficiently considers data- and user-driven opportunities and constraints, which can be crucial for developing NLP applications.
{"title":"Generative user-experience research for developing domain-specific natural language processing applications","authors":"Anastasia Zhukova, Lukas von Sperl, Christian E. Matt, Bela Gipp","doi":"10.1007/s10115-024-02212-5","DOIUrl":"https://doi.org/10.1007/s10115-024-02212-5","url":null,"abstract":"<p>User experience (UX) is a part of human–computer interaction research and focuses on increasing intuitiveness, transparency, simplicity, and trust for the system users. Most UX research for machine learning or natural language processing (NLP) focuses on a data-driven methodology. It engages domain users mainly for usability evaluation. Moreover, more typical UX methods tailor the systems toward user usability, unlike learning about the user needs first. This paper proposes a new methodology for integrating generative UX research into developing domain NLP applications. Generative UX research employs domain users at the initial stages of prototype development, i.e., ideation and concept evaluation, and the last stage for evaluating system usefulness and user utility. The methodology emerged from and is evaluated on a case study about the full-cycle prototype development of a domain-specific semantic search for daily operations in the process industry. A key finding of our case study is that involving domain experts increases their interest and trust in the final NLP application. The combined UX+NLP research of the proposed method efficiently considers data- and user-driven opportunities and constraints, which can be crucial for developing NLP applications.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"38 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The commercially applicable Recommendation system (RS) exploits multi-criteria rating-based user-item interaction to learn and personalize user preferences using the Multi-criteria recommendation system (MCRS). The existing MCRS techniques have exploited similarity or aggregation function-based modeling to improve prediction accuracy. However, these MCRS methods do not investigate item aspects-based latent user preferences and criteria-based user-item implicit relationships. Also, the prediction reliability is uncertain due to highly sparse user-item interactions and ignoring auxiliary information support. Hence, this study proposes an ensembled approach that jointly develops the Similarity and aggregation function-based MCRS model (SimAgg-MCRS) and aggregates their user-item predicted preferences into a cumulative preference matrix to generate the final recommendation. First, the proposed model develops the deep neural network (DNN)-based model to aggregate the criteria-based similarity and predicts the overall rating using the aggregated similarity by merging user and item-based predictions. Second, the preference relation-based aggregation function approach develops deep autoencoder-based modeling to exploit the latent relationship among criteria to obtain users’ overall preference over an item by aggregating criteria-wise preference. Finally, the third phase develops the DNN-based ensemble model to integrate the preference matrix of similarity and aggregation function approach to obtain the overall aggregated matrix for the recommendation. The proposed SimAgg-MCRS integrates user and item side information to learn user preferences better. Experimental and prediction accuracy-based comparative evaluation results across Yahoo! Movies and Trip Advisor multi-criteria datasets validate the proposed models’ performance over the baseline MCRS methods.
{"title":"Deep ensembled multi-criteria recommendation system for enhancing and personalizing the user experience on e-commerce platforms","authors":"Rahul Shrivastava, Dilip Singh Sisodia, Naresh Kumar Nagwani","doi":"10.1007/s10115-024-02187-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02187-3","url":null,"abstract":"<p>The commercially applicable Recommendation system (RS) exploits multi-criteria rating-based user-item interaction to learn and personalize user preferences using the Multi-criteria recommendation system (MCRS). The existing MCRS techniques have exploited similarity or aggregation function-based modeling to improve prediction accuracy. However, these MCRS methods do not investigate item aspects-based latent user preferences and criteria-based user-item implicit relationships. Also, the prediction reliability is uncertain due to highly sparse user-item interactions and ignoring auxiliary information support. Hence, this study proposes an ensembled approach that jointly develops the Similarity and aggregation function-based MCRS model (SimAgg-MCRS) and aggregates their user-item predicted preferences into a cumulative preference matrix to generate the final recommendation. First, the proposed model develops the deep neural network (DNN)-based model to aggregate the criteria-based similarity and predicts the overall rating using the aggregated similarity by merging user and item-based predictions. Second, the preference relation-based aggregation function approach develops deep autoencoder-based modeling to exploit the latent relationship among criteria to obtain users’ overall preference over an item by aggregating criteria-wise preference. Finally, the third phase develops the DNN-based ensemble model to integrate the preference matrix of similarity and aggregation function approach to obtain the overall aggregated matrix for the recommendation. The proposed SimAgg-MCRS integrates user and item side information to learn user preferences better. Experimental and prediction accuracy-based comparative evaluation results across Yahoo! Movies and Trip Advisor multi-criteria datasets validate the proposed models’ performance over the baseline MCRS methods.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"62 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1007/s10115-024-02188-2
Ziyue Yu, Jiayi Wang, Wuman Luo, Rita Tse, Giovanni Pau
Patient representation learning based on electronic health records (EHR) is a critical task for disease prediction. This task aims to effectively extract useful information on dynamic features. Although various existing works have achieved remarkable progress, the model performance can be further improved by fully extracting the trends, variations, and the correlation between the trends and variations in dynamic features. In addition, sparse visit records limit the performance of deep learning models. To address these issues, we propose the multi-perspective patient representation Extractor (MPRE) for disease prediction. Specifically, we propose frequency transformation module (FTM) to extract the trend and variation information of dynamic features in the time–frequency domain, which can enhance the feature representation. In the 2D multi-extraction network (2D MEN), we form the 2D temporal tensor based on trend and variation. Then, the correlations between trend and variation are captured by the proposed dilated operation. Moreover, we propose the first-order difference attention mechanism (FODAM) to calculate the contributions of differences in adjacent variations to the disease diagnosis adaptively. To evaluate the performance of MPRE and baseline methods, we conduct extensive experiments on two real-world public datasets. The experiment results show that MPRE outperforms state-of-the-art baseline methods in terms of AUROC and AUPRC.
{"title":"Multi-perspective patient representation learning for disease prediction on electronic health records","authors":"Ziyue Yu, Jiayi Wang, Wuman Luo, Rita Tse, Giovanni Pau","doi":"10.1007/s10115-024-02188-2","DOIUrl":"https://doi.org/10.1007/s10115-024-02188-2","url":null,"abstract":"<p>Patient representation learning based on electronic health records (EHR) is a critical task for disease prediction. This task aims to effectively extract useful information on dynamic features. Although various existing works have achieved remarkable progress, the model performance can be further improved by fully extracting the trends, variations, and the correlation between the trends and variations in dynamic features. In addition, sparse visit records limit the performance of deep learning models. To address these issues, we propose the multi-perspective patient representation Extractor (MPRE) for disease prediction. Specifically, we propose frequency transformation module (FTM) to extract the trend and variation information of dynamic features in the time–frequency domain, which can enhance the feature representation. In the 2D multi-extraction network (2D MEN), we form the 2D temporal tensor based on trend and variation. Then, the correlations between trend and variation are captured by the proposed dilated operation. Moreover, we propose the first-order difference attention mechanism (FODAM) to calculate the contributions of differences in adjacent variations to the disease diagnosis adaptively. To evaluate the performance of MPRE and baseline methods, we conduct extensive experiments on two real-world public datasets. The experiment results show that MPRE outperforms state-of-the-art baseline methods in terms of AUROC and AUPRC.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"7 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommendation systems are ubiquitous in various domains, facilitating users in finding relevant items according to their preferences. Identifying pertinent items that meet their preferences enables users to target the right items. To predict ratings for more accurate forecasts, recommender systems often use collaborative filtering (CF) approaches to sparse user-rated item matrices. Due to a lack of knowledge regarding newly formed entities, the data sparsity of the user-rated item matrix has an enormous effect on collaborative filtering algorithms, which frequently face lazy learning issues. Real-world datasets with exponentially increasing users and reviews make this situation worse. Matrix factorization (MF) stands out as a key strategy in recommender systems, especially for CF tasks. This paper presents a neural network matrix factorization (NNMF) model through machine learning to overcome data sparsity challenges. This approach aims to enhance recommendation quality while mitigating the impact of data sparsity, a common issue in CF algorithms. A thorough comparative analysis was conducted on the well-known MovieLens dataset, spanning from 1.6 to 9.6 M records. The outcomes consistently favored the NNMF algorithm, showcasing superior performance compared to the state-of-the-art method in this domain in terms of precision, recall, ({mathcal {F}}1_{textrm{score}}), MAE, and RMSE.
{"title":"Lazy learning and sparsity handling in recommendation systems","authors":"Suryanshi Mishra, Tinku Singh, Manish Kumar, Satakshi","doi":"10.1007/s10115-024-02218-z","DOIUrl":"https://doi.org/10.1007/s10115-024-02218-z","url":null,"abstract":"<p>Recommendation systems are ubiquitous in various domains, facilitating users in finding relevant items according to their preferences. Identifying pertinent items that meet their preferences enables users to target the right items. To predict ratings for more accurate forecasts, recommender systems often use collaborative filtering (CF) approaches to sparse user-rated item matrices. Due to a lack of knowledge regarding newly formed entities, the data sparsity of the user-rated item matrix has an enormous effect on collaborative filtering algorithms, which frequently face lazy learning issues. Real-world datasets with exponentially increasing users and reviews make this situation worse. Matrix factorization (MF) stands out as a key strategy in recommender systems, especially for CF tasks. This paper presents a neural network matrix factorization (NNMF) model through machine learning to overcome data sparsity challenges. This approach aims to enhance recommendation quality while mitigating the impact of data sparsity, a common issue in CF algorithms. A thorough comparative analysis was conducted on the well-known MovieLens dataset, spanning from 1.6 to 9.6 M records. The outcomes consistently favored the NNMF algorithm, showcasing superior performance compared to the state-of-the-art method in this domain in terms of precision, recall, <span>({mathcal {F}}1_{textrm{score}})</span>, MAE, and RMSE.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1007/s10115-024-02215-2
Bo Xing, Yuting Tan, Junfeng Zhou, Ming Du
Given an uncertain graph, community search is used to return dense subgraphs that contain the query vertex and satisfy the probability constraint. With the proliferation of uncertain graphs in practical applications, community search has become increasingly important in practical applications to help users make decisions in advertising recommendations, conference organization, etc. However, existing approaches for community search still suffer from two problems. First, they may return subgraphs that cannot meet users’ expectations on structural cohesiveness, due to the existence of cut-vertices/edges. Second, they use floating-point division to update the probability of each edge during computation, resulting in inaccurate results. In this paper, we study community search on uncertain graphs and propose efficient algorithms to address the above two problems. We first propose a novel community model, namely triangle-connected ((k,gamma ))-truss community, to return communities with enhanced cohesiveness. Then, we propose an online algorithm that uses a batch-recalculation strategy to guarantee the accuracy. To improve the performance of community search, we propose an index-based approach. This index organizes all the triangle-connected ((k,gamma ))-truss communities using a forest structure and maintains the mapping relationship from vertices in the uncertain graph to communities in the index. Based on this index, we can get the results of community search easily, without the costly operation as the online approach does. Finally, we conduct rich experiments on 10 real-world graphs. The experimental results verified the effectiveness and efficiency of our approaches.
{"title":"Truss community search in uncertain graphs","authors":"Bo Xing, Yuting Tan, Junfeng Zhou, Ming Du","doi":"10.1007/s10115-024-02215-2","DOIUrl":"https://doi.org/10.1007/s10115-024-02215-2","url":null,"abstract":"<p>Given an uncertain graph, community search is used to return dense subgraphs that contain the query vertex and satisfy the probability constraint. With the proliferation of uncertain graphs in practical applications, community search has become increasingly important in practical applications to help users make decisions in advertising recommendations, conference organization, etc. However, existing approaches for community search still suffer from two problems. First, they may return subgraphs that cannot meet users’ expectations on structural cohesiveness, due to the existence of cut-vertices/edges. Second, they use floating-point division to update the probability of each edge during computation, resulting in inaccurate results. In this paper, we study community search on uncertain graphs and propose efficient algorithms to address the above two problems. We first propose a novel community model, namely triangle-connected <span>((k,gamma ))</span>-truss community, to return communities with enhanced cohesiveness. Then, we propose an online algorithm that uses a batch-recalculation strategy to guarantee the accuracy. To improve the performance of community search, we propose an index-based approach. This index organizes all the triangle-connected <span>((k,gamma ))</span>-truss communities using a forest structure and maintains the mapping relationship from vertices in the uncertain graph to communities in the index. Based on this index, we can get the results of community search easily, without the costly operation as the online approach does. Finally, we conduct rich experiments on 10 real-world graphs. The experimental results verified the effectiveness and efficiency of our approaches.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31DOI: 10.1007/s10115-024-02217-0
Ensieh Davoodijam, Mohsen Alambardar Meybodi
Automatic text summarization is the process of shortening a large document into a summary text that preserves the main concepts and key points of the original document. Due to the wide applications of text summarization, many studies have been conducted on it, but evaluating the quality of generated summaries poses significant challenges. Selecting the appropriate evaluation metrics to capture various aspects of summarization quality, including content, structure, coherence, readability, novelty, and semantic relevance, plays a crucial role in text summarization application. To address this challenge, the main focus of this study is on gathering and investigating a comprehensive set of evaluation metrics. Analysis of various metrics can enhance the understanding of the evaluation method and leads to select appropriate evaluation text summarization systems in the future. After a short review of various automatic text summarization methods, we thoroughly analyze 42 prominent metrics, categorizing them into six distinct categories to provide insights into their strengths, limitations, and applicability.
{"title":"Evaluation metrics on text summarization: comprehensive survey","authors":"Ensieh Davoodijam, Mohsen Alambardar Meybodi","doi":"10.1007/s10115-024-02217-0","DOIUrl":"https://doi.org/10.1007/s10115-024-02217-0","url":null,"abstract":"<p>Automatic text summarization is the process of shortening a large document into a summary text that preserves the main concepts and key points of the original document. Due to the wide applications of text summarization, many studies have been conducted on it, but evaluating the quality of generated summaries poses significant challenges. Selecting the appropriate evaluation metrics to capture various aspects of summarization quality, including content, structure, coherence, readability, novelty, and semantic relevance, plays a crucial role in text summarization application. To address this challenge, the main focus of this study is on gathering and investigating a comprehensive set of evaluation metrics. Analysis of various metrics can enhance the understanding of the evaluation method and leads to select appropriate evaluation text summarization systems in the future. After a short review of various automatic text summarization methods, we thoroughly analyze 42 prominent metrics, categorizing them into six distinct categories to provide insights into their strengths, limitations, and applicability.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"4 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1007/s10115-024-02203-6
Zhongyuan Chen, Yongji Wang
Social networks, characterized by their dynamic and continually evolving nature, present challenges for effective link prediction (LP) due to the constant addition of nodes and connections. In response to this, we propose a novel approach to LP in social networks through Node Embedding and Ensemble Learning (LP-NEEL). Our method constructs a transition matrix from the network’s adjacency matrix and computes similarity measures between node pairs. Utilizing node2vec embedding, we extract features from nodes and generate edge embeddings by computing the inner product of node embeddings for each edge. This process yields a well-labeled dataset suitable for LP tasks. To mitigate overfitting, we balance the dataset by ensuring an equal number of negative and positive samples edge samples during both the testing and training phases. Leveraging this balanced dataset, we employ the XGBoost machine learning algorithm for final link prediction. Extensive experimentation across six social network datasets validates the efficacy of our approach, demonstrating improved predictive performance compared to existing methods.
{"title":"Enhancing link prediction through node embedding and ensemble learning","authors":"Zhongyuan Chen, Yongji Wang","doi":"10.1007/s10115-024-02203-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02203-6","url":null,"abstract":"<p>Social networks, characterized by their dynamic and continually evolving nature, present challenges for effective link prediction (LP) due to the constant addition of nodes and connections. In response to this, we propose a novel approach to LP in social networks through Node Embedding and Ensemble Learning (LP-NEEL). Our method constructs a transition matrix from the network’s adjacency matrix and computes similarity measures between node pairs. Utilizing node2vec embedding, we extract features from nodes and generate edge embeddings by computing the inner product of node embeddings for each edge. This process yields a well-labeled dataset suitable for LP tasks. To mitigate overfitting, we balance the dataset by ensuring an equal number of negative and positive samples edge samples during both the testing and training phases. Leveraging this balanced dataset, we employ the XGBoost machine learning algorithm for final link prediction. Extensive experimentation across six social network datasets validates the efficacy of our approach, demonstrating improved predictive performance compared to existing methods.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"9 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1007/s10115-024-02207-2
Yanjin Li, Linchuan Xu, Kenji Yamanishi
Graph data augmentation (GDA), which manipulates graph structure and/or attributes, has been demonstrated as an effective method for improving the generalization of graph neural networks on semi-supervised node classification. As a data augmentation technique, label preservation is critical, that is, node labels should not change after data manipulation. However, most existing methods overlook the label preservation requirements. Determining the label-preserving nature of a GDA method is highly challenging, owing to the non-Euclidean nature of the graph structure. In this study, for the first time, we formulate a label-preserving problem (LPP) in the context of GDA. The LPP is formulated as an optimization problem in which, given a fixed augmentation budget, the objective is to find an augmented graph with minimal difference in data distribution compared to the original graph. To solve the LPP problem, we propose GMMDA, a generative data augmentation (DA) method based on Gaussian mixture modeling (GMM) of a graph in a latent space. We designed a novel learning objective that jointly learns a low-dimensional graph representation and estimates the GMM. The learning is followed by sampling from the GMM, and the samples are converted back to the graph as additional nodes. To uphold label preservation, we designed a minimum description length (MDL)-based method to select a set of samples that produces the minimum shift in the data distribution captured by the GMM. Through experiments, we demonstrate that GMMDA can improve the performance of graph convolutional network on Cora, Citeseer and Pubmed by as much as (7.75%), (8.75%) and (5.87%), respectively, significantly outperforming the state-of-the-art methods.
{"title":"GMMDA: Gaussian mixture modeling of graph in latent space for graph data augmentation","authors":"Yanjin Li, Linchuan Xu, Kenji Yamanishi","doi":"10.1007/s10115-024-02207-2","DOIUrl":"https://doi.org/10.1007/s10115-024-02207-2","url":null,"abstract":"<p>Graph data augmentation (GDA), which manipulates graph structure and/or attributes, has been demonstrated as an effective method for improving the generalization of graph neural networks on semi-supervised node classification. As a data augmentation technique, label preservation is critical, that is, node labels should not change after data manipulation. However, most existing methods overlook the label preservation requirements. Determining the label-preserving nature of a GDA method is highly challenging, owing to the non-Euclidean nature of the graph structure. In this study, for the first time, we formulate a label-preserving problem (LPP) in the context of GDA. The LPP is formulated as an optimization problem in which, given a fixed augmentation budget, the objective is to find an augmented graph with minimal difference in data distribution compared to the original graph. To solve the LPP problem, we propose GMMDA, a generative data augmentation (DA) method based on Gaussian mixture modeling (GMM) of a graph in a latent space. We designed a novel learning objective that jointly learns a low-dimensional graph representation and estimates the GMM. The learning is followed by sampling from the GMM, and the samples are converted back to the graph as additional nodes. To uphold label preservation, we designed a minimum description length (MDL)-based method to select a set of samples that produces the minimum shift in the data distribution captured by the GMM. Through experiments, we demonstrate that GMMDA can improve the performance of graph convolutional network on <span>Cora</span>, <span>Citeseer</span> and <span>Pubmed</span> by as much as <span>(7.75%)</span>, <span>(8.75%)</span> and <span>(5.87%)</span>, respectively, significantly outperforming the state-of-the-art methods.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-24DOI: 10.1007/s10115-024-02211-6
Saumya Singh, Smriti Srivastava
Deep learning models (DLMs), such as recurrent neural networks (RNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and gated recurrent unit (GRU), are superior for sequential data analysis due to their ability to learn complex patterns. This paper proposes enhancing performance of these models by applying fuzzy c-means (FCM) clustering on sequential data from a nonlinear plant and the stock market. FCM clustering helps to organize the data into clusters based on similarity, which improves the performance of the models. Thus, the proposed fuzzy c-means recurrent neural network (FCM-RNN), fuzzy c-means long short-term memory (FCM-LSTM), fuzzy c-means bidirectional long short-term memory (FCM-Bi-LSTM), and fuzzy c-means gated recurrent unit (FCM-GRU) models showed enhanced prediction results than RNN, LSTM, Bi-LSTM, and GRU models, respectively. This enhancement is validated using performance metrics such as root-mean-square error and mean absolute error and is further illustrated by scatter plots comparing actual versus predicted values for training, validation, and testing data. The experiment results confirm that integrating FCM clustering with DLMs shows the superiority of the proposed models.
{"title":"Enhancing the performance of deep learning models with fuzzy c-means clustering","authors":"Saumya Singh, Smriti Srivastava","doi":"10.1007/s10115-024-02211-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02211-6","url":null,"abstract":"<p>Deep learning models (DLMs), such as recurrent neural networks (RNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and gated recurrent unit (GRU), are superior for sequential data analysis due to their ability to learn complex patterns. This paper proposes enhancing performance of these models by applying fuzzy c-means (FCM) clustering on sequential data from a nonlinear plant and the stock market. FCM clustering helps to organize the data into clusters based on similarity, which improves the performance of the models. Thus, the proposed fuzzy c-means recurrent neural network (FCM-RNN), fuzzy c-means long short-term memory (FCM-LSTM), fuzzy c-means bidirectional long short-term memory (FCM-Bi-LSTM), and fuzzy c-means gated recurrent unit (FCM-GRU) models showed enhanced prediction results than RNN, LSTM, Bi-LSTM, and GRU models, respectively. This enhancement is validated using performance metrics such as root-mean-square error and mean absolute error and is further illustrated by scatter plots comparing actual versus predicted values for training, validation, and testing data. The experiment results confirm that integrating FCM clustering with DLMs shows the superiority of the proposed models.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"36 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}