Pub Date : 2024-08-23DOI: 10.1007/s10115-024-02213-4
Kai Liu, Hui Zhao, Zicong Wang, Qianxi Hou
The event argument extraction (EAE) task primarily aims to identify event arguments and their specific roles within a given event. Existing generation-based event argument extraction models, including the recent ones focused on document-level event argument extraction, emphasize the construction of prompt templates and entity representations. However, they overlook the inadequate comprehension of model in document context structure information and the impact of arguments spanning a wide range on event argument extraction. Consequently, this results in reduced model detection accuracy. In this paper, we propose a prompt-based generation event argument extraction model with the ability of document structure information enhancement for document-level event argument extraction task based on prompt generation. Specifically, we use sentence abstract meaning representation (AMR) to represent the contextual structural information of the document, and then remove the redundant parts of the structural information through constraints to obtain the constraint graph with the document information. Finally, we use the encoder to convert the graph into the corresponding dense vector. We inject these vectors with contextual structural information into the prompt-based generation EAE model in a prefixed manner. When contextual information and prompt templates interact at the attention layer of the model, the generated structural information improves the generation by affecting attention. We conducted experiments on RAMS and WIKIEVENTS datasets, and the results show that our model achieves excellent results compared with the current advanced generative EAE model.
{"title":"EIGP: document-level event argument extraction with information enhancement generated based on prompts","authors":"Kai Liu, Hui Zhao, Zicong Wang, Qianxi Hou","doi":"10.1007/s10115-024-02213-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02213-4","url":null,"abstract":"<p>The event argument extraction (EAE) task primarily aims to identify event arguments and their specific roles within a given event. Existing generation-based event argument extraction models, including the recent ones focused on document-level event argument extraction, emphasize the construction of prompt templates and entity representations. However, they overlook the inadequate comprehension of model in document context structure information and the impact of arguments spanning a wide range on event argument extraction. Consequently, this results in reduced model detection accuracy. In this paper, we propose a prompt-based generation event argument extraction model with the ability of document structure information enhancement for document-level event argument extraction task based on prompt generation. Specifically, we use sentence abstract meaning representation (AMR) to represent the contextual structural information of the document, and then remove the redundant parts of the structural information through constraints to obtain the constraint graph with the document information. Finally, we use the encoder to convert the graph into the corresponding dense vector. We inject these vectors with contextual structural information into the prompt-based generation EAE model in a prefixed manner. When contextual information and prompt templates interact at the attention layer of the model, the generated structural information improves the generation by affecting attention. We conducted experiments on RAMS and WIKIEVENTS datasets, and the results show that our model achieves excellent results compared with the current advanced generative EAE model.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"8 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s10115-024-02191-7
Loong Kuan Lee, Geoffrey I. Webb, Daniel F. Schmidt, Nico Piatkowski
The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications, but doing so naively is intractable. Computing the (alpha beta )-divergence—a family of divergences that includes the Kullback–Leibler divergence and Hellinger distance—between the joint distribution of two decomposable models, i.e., chordal Markov networks, can be done in time exponential in the treewidth of these models. Extending this result, we propose an approach to compute the exact (alpha beta )-divergence between any marginal or conditional distribution of two decomposable models. In order to do so tractably, we provide a decomposition over the marginal and conditional distributions of decomposable models. We then show how our method can be used to analyze distributional changes by first applying it to the benchmark image dataset QMNIST and a dataset containing observations from various areas at the Roosevelt Nation Forest and their cover type. Finally, based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers.
{"title":"Computing marginal and conditional divergences between decomposable models with applications in quantum computing and earth observation","authors":"Loong Kuan Lee, Geoffrey I. Webb, Daniel F. Schmidt, Nico Piatkowski","doi":"10.1007/s10115-024-02191-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02191-7","url":null,"abstract":"<p>The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications, but doing so naively is intractable. Computing the <span>(alpha beta )</span>-divergence—a family of divergences that includes the Kullback–Leibler divergence and Hellinger distance—between the joint distribution of two decomposable models, i.e., chordal Markov networks, can be done in time exponential in the treewidth of these models. Extending this result, we propose an approach to compute the exact <span>(alpha beta )</span>-divergence between any marginal or conditional distribution of two decomposable models. In order to do so tractably, we provide a decomposition over the marginal and conditional distributions of decomposable models. We then show how our method can be used to analyze distributional changes by first applying it to the benchmark image dataset QMNIST and a dataset containing observations from various areas at the Roosevelt Nation Forest and their cover type. Finally, based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"97 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s10115-024-02195-3
Junyi Bian, Xiaodi Huang, Hong Zhou, Tianyang Huang, Shanfeng Zhu
Summarizing extensive documents involves selecting sentences, with the organizational structure of document sections playing a pivotal role. However, effectively utilizing discourse information for summary generation poses a significant challenge, especially given the inconsistency between training and evaluation in extractive summarization. In this paper, we introduce GoSum, a novel extractive summarizer that integrates a graph-based model with reinforcement learning techniques to summarize long documents. Specifically, GoSum utilizes a graph neural network to encode sentence states, constructing a heterogeneous graph that represents each document at various discourse levels. The edges of this graph capture hierarchical relationships between different document sections. Furthermore, GoSum incorporates offline reinforcement learning, enabling the model to receive ROUGE score feedback on diverse training samples, thereby enhancing the quality of summary generation. On the two scientific article datasets PubMed and arXiv, GoSum achieved the highest performance among extractive models. Particularly on the PubMed dataset, GoSum outperformed other models with ROUGE-1 and ROUGE-L scores surpassing by 0.45 and 0.26, respectively.
{"title":"GoSum: extractive summarization of long documents by reinforcement learning and graph-organized discourse state","authors":"Junyi Bian, Xiaodi Huang, Hong Zhou, Tianyang Huang, Shanfeng Zhu","doi":"10.1007/s10115-024-02195-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02195-3","url":null,"abstract":"<p>Summarizing extensive documents involves selecting sentences, with the organizational structure of document sections playing a pivotal role. However, effectively utilizing discourse information for summary generation poses a significant challenge, especially given the inconsistency between training and evaluation in extractive summarization. In this paper, we introduce GoSum, a novel extractive summarizer that integrates a graph-based model with reinforcement learning techniques to summarize long documents. Specifically, GoSum utilizes a graph neural network to encode sentence states, constructing a heterogeneous graph that represents each document at various discourse levels. The edges of this graph capture hierarchical relationships between different document sections. Furthermore, GoSum incorporates offline reinforcement learning, enabling the model to receive ROUGE score feedback on diverse training samples, thereby enhancing the quality of summary generation. On the two scientific article datasets PubMed and arXiv, GoSum achieved the highest performance among extractive models. Particularly on the PubMed dataset, GoSum outperformed other models with ROUGE-1 and ROUGE-L scores surpassing by 0.45 and 0.26, respectively.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"10 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s10115-024-02208-1
Hadis Bashiri, Hassan Naderi
Event detection on social media platforms, especially Twitter, poses significant challenges due to the dynamic nature and high volume of data. The rapid flow of tweets and the varied ways users express thoughts complicate the identification of relevant events. Accurately identifying and interpreting events from this noisy and fast-paced environment is crucial for various applications, including crisis management and market analysis. This paper presents a novel unsupervised framework for event detection on social media, designed to enhance the accuracy and efficiency of identifying significant events from Twitter data. The framework incorporates several innovative techniques, including dynamic bandwidth adjustment based on local data density, Mahalanobis distance integration, adaptive kernel density estimation, and an improved Louvain-MOMR method for community detection. Additionally, a new scoring system is implemented to accurately extract trending words that evoke strong emotions, improving the identification of event-related keywords. The proposed framework demonstrates robust performance across three diverse datasets: FACup, Super Tuesday, and US Election, showcasing its effectiveness in capturing temporal and semantic patterns within tweets.
{"title":"Probabilistic temporal semantic graph: a holistic framework for event detection in twitter","authors":"Hadis Bashiri, Hassan Naderi","doi":"10.1007/s10115-024-02208-1","DOIUrl":"https://doi.org/10.1007/s10115-024-02208-1","url":null,"abstract":"<p>Event detection on social media platforms, especially Twitter, poses significant challenges due to the dynamic nature and high volume of data. The rapid flow of tweets and the varied ways users express thoughts complicate the identification of relevant events. Accurately identifying and interpreting events from this noisy and fast-paced environment is crucial for various applications, including crisis management and market analysis. This paper presents a novel unsupervised framework for event detection on social media, designed to enhance the accuracy and efficiency of identifying significant events from Twitter data. The framework incorporates several innovative techniques, including dynamic bandwidth adjustment based on local data density, Mahalanobis distance integration, adaptive kernel density estimation, and an improved Louvain-MOMR method for community detection. Additionally, a new scoring system is implemented to accurately extract trending words that evoke strong emotions, improving the identification of event-related keywords. The proposed framework demonstrates robust performance across three diverse datasets: FACup, Super Tuesday, and US Election, showcasing its effectiveness in capturing temporal and semantic patterns within tweets.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"93 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1007/s10115-024-02194-4
Anoop Kadan, P. Deepak, Manjary P. Gangan, Sam Savitha Abraham, V. L. Lajish
Technological advancements in web platforms allow people to express and share emotions toward textual write-ups written and shared by others. This brings about different interesting domains for analysis, emotion expressed by the writer and emotion elicited from the readers. In this paper, we propose a novel approach for readers’ emotion detection from short-text documents using a deep learning model called REDAffectiveLM. Within state-of-the-art NLP tasks, it is well understood that utilizing context-specific representations from transformer-based pre-trained language models helps achieve improved performance. Within this affective computing task, we explore how incorporating affective information can further enhance performance. Toward this, we leverage context-specific and affect enriched representations by using a transformer-based pre-trained language model in tandem with affect enriched Bi-LSTM+Attention. For empirical evaluation, we procure a new dataset REN-20k, besides using RENh-4k and SemEval-2007. We evaluate the performance of our REDAffectiveLM rigorously across these datasets, against a vast set of state-of-the-art baselines, where our model consistently outperforms baselines and obtains statistically significant results. Our results establish that utilizing affect enriched representation along with context-specific representation within a neural architecture can considerably enhance readers’ emotion detection. Since the impact of affect enrichment specifically in readers’ emotion detection isn’t well explored, we conduct a detailed analysis over affect enriched Bi-LSTM+Attention using qualitative and quantitative model behavior evaluation techniques. We observe that compared to conventional semantic embedding, affect enriched embedding increases the ability of the network to effectively identify and assign weightage to the key terms responsible for readers’ emotion detection to improve prediction.
{"title":"REDAffectiveLM: leveraging affect enriched embedding and transformer-based neural language model for readers’ emotion detection","authors":"Anoop Kadan, P. Deepak, Manjary P. Gangan, Sam Savitha Abraham, V. L. Lajish","doi":"10.1007/s10115-024-02194-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02194-4","url":null,"abstract":"<p>Technological advancements in web platforms allow people to express and share emotions toward textual write-ups written and shared by others. This brings about different interesting domains for analysis, emotion expressed by the writer and emotion elicited from the readers. In this paper, we propose a novel approach for readers’ emotion detection from short-text documents using a deep learning model called <i>REDAffectiveLM</i>. Within state-of-the-art NLP tasks, it is well understood that utilizing context-specific representations from transformer-based pre-trained language models helps achieve improved performance. Within this affective computing task, we explore how incorporating affective information can further enhance performance. Toward this, we leverage context-specific and affect enriched representations by using a transformer-based pre-trained language model in tandem with affect enriched Bi-LSTM+Attention. For empirical evaluation, we procure a new dataset REN-20k, besides using RENh-4k and SemEval-2007. We evaluate the performance of our <i>REDAffectiveLM</i> rigorously across these datasets, against a vast set of state-of-the-art baselines, where our model consistently outperforms baselines and obtains statistically significant results. Our results establish that utilizing affect enriched representation along with context-specific representation within a neural architecture can considerably enhance readers’ emotion detection. Since the impact of affect enrichment specifically in readers’ emotion detection isn’t well explored, we conduct a detailed analysis over affect enriched Bi-LSTM+Attention using qualitative and quantitative model behavior evaluation techniques. We observe that compared to conventional semantic embedding, affect enriched embedding increases the ability of the network to effectively identify and assign weightage to the key terms responsible for readers’ emotion detection to improve prediction.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"29 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s10115-024-02200-9
Deena Nath, Sanjay K. Dwivedi
Sentiment analysis (SA) is a technique that employs natural language processing to determine the function of mining methodically, extract, analyse and comprehend people’s thoughts, feelings, personal opinions and perceptions as well as their reactions and attitude regarding various subjects such as topics, commodities and various other products and services. However, it only reveals the overall sentiment. Unlike SA, the aspect-based sentiment analysis (ABSA) study categorizes a text into distinct components and determines the appropriate sentiment, which is more reliable in its predictions. Hence, ABSA is essential to study and break down texts into various service elements. It then assigns the appropriate sentiment polarity (positive, negative or neutral) for every aspect. In this paper, the main task is to critically review the research outcomes to look at the various techniques, methods and features used for ABSA. After giving brief introduction of SA in order to establish a clear relationship between SA and ABSA, we focussed on approaches, applications, challenges and trends in ABSA research.
情感分析(Sentiment Analysis,SA)是一种利用自然语言处理技术来确定挖掘功能的技术,它有条不紊地提取、分析和理解人们的思想、情感、个人观点和看法,以及他们对各种主题(如话题、商品和其他各种产品和服务)的反应和态度。然而,它只能揭示整体情感。与情感分析不同,基于方面的情感分析(ABSA)研究将文本分为不同的组成部分,并确定相应的情感,其预测结果更为可靠。因此,ABSA 对于研究和将文本分解为各种服务元素至关重要。然后,它为每个方面分配适当的情感极性(正面、负面或中性)。本文的主要任务是批判性地回顾研究成果,研究 ABSA 所使用的各种技术、方法和特征。在简要介绍 SA 以明确 SA 与 ABSA 之间的关系之后,我们重点讨论了 ABSA 研究的方法、应用、挑战和趋势。
{"title":"Aspect-based sentiment analysis: approaches, applications, challenges and trends","authors":"Deena Nath, Sanjay K. Dwivedi","doi":"10.1007/s10115-024-02200-9","DOIUrl":"https://doi.org/10.1007/s10115-024-02200-9","url":null,"abstract":"<p>Sentiment analysis (SA) is a technique that employs natural language processing to determine the function of mining methodically, extract, analyse and comprehend people’s thoughts, feelings, personal opinions and perceptions as well as their reactions and attitude regarding various subjects such as topics, commodities and various other products and services. However, it only reveals the overall sentiment. Unlike SA, the aspect-based sentiment analysis (ABSA) study categorizes a text into distinct components and determines the appropriate sentiment, which is more reliable in its predictions. Hence, ABSA is essential to study and break down texts into various service elements. It then assigns the appropriate sentiment polarity (positive, negative or neutral) for every aspect. In this paper, the main task is to critically review the research outcomes to look at the various techniques, methods and features used for ABSA. After giving brief introduction of SA in order to establish a clear relationship between SA and ABSA, we focussed on approaches, applications, challenges and trends in ABSA research.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"50 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s10115-024-02197-1
Ghufran Ahmad Khan, Jalaluddin Khan, Taushif Anwar, Zaid Al-Huda, Bassoma Diallo, Naved Ahmad
The main aim of traditional multi-view clustering is to categorize data into separate clusters under the assumption that all views are fully available. However, practical scenarios often arise where not all aspects of the data are accessible, which hampers the efficacy of conventional multi-view clustering techniques. Recent advancements have made significant progress in addressing the incompleteness in multi-view data clustering. Still, current incomplete multi-view clustering methods overlooked a number of important factors, such as providing a consensus representation across the kernel space, dealing with over-fitting issue from different views, and looking at how these multiple views relate to each other at the same time. To deal these challenges, we introduced an innovative multi-view clustering algorithm to manage incomplete data from multiple perspectives. Additionally, we have introduced a novel objective function incorporating a weighted concept factorization technique to tackle the absence of data instances within each incomplete viewpoint. We used a co-regularization constraint to learn a common shared structure from different points of view and a smooth regularization term to prevent view over-fitting. It is noteworthy that the proposed objective function is inherently non-convex, presenting optimization challenges. To obtain the optimal solution, we have implemented an iterative optimization approach to converge the local minima for our method. To underscore the effectiveness and validation of our approach, we conducted experiments using real-world datasets against state-of-the-art methods for comparative evaluation.
{"title":"Complementary incomplete weighted concept factorization methods for multi-view clustering","authors":"Ghufran Ahmad Khan, Jalaluddin Khan, Taushif Anwar, Zaid Al-Huda, Bassoma Diallo, Naved Ahmad","doi":"10.1007/s10115-024-02197-1","DOIUrl":"https://doi.org/10.1007/s10115-024-02197-1","url":null,"abstract":"<p>The main aim of traditional multi-view clustering is to categorize data into separate clusters under the assumption that all views are fully available. However, practical scenarios often arise where not all aspects of the data are accessible, which hampers the efficacy of conventional multi-view clustering techniques. Recent advancements have made significant progress in addressing the incompleteness in multi-view data clustering. Still, current incomplete multi-view clustering methods overlooked a number of important factors, such as providing a consensus representation across the kernel space, dealing with over-fitting issue from different views, and looking at how these multiple views relate to each other at the same time. To deal these challenges, we introduced an innovative multi-view clustering algorithm to manage incomplete data from multiple perspectives. Additionally, we have introduced a novel objective function incorporating a weighted concept factorization technique to tackle the absence of data instances within each incomplete viewpoint. We used a co-regularization constraint to learn a common shared structure from different points of view and a smooth regularization term to prevent view over-fitting. It is noteworthy that the proposed objective function is inherently non-convex, presenting optimization challenges. To obtain the optimal solution, we have implemented an iterative optimization approach to converge the local minima for our method. To underscore the effectiveness and validation of our approach, we conducted experiments using real-world datasets against state-of-the-art methods for comparative evaluation.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"57 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1007/s10115-024-02202-7
Gyananjaya Tripathy, Aakanksha Sharaff
Due to the significant participation of the users, it is highly challenging to handle enormous datasets using machine learning algorithms. Deep learning methods are therefore designed with efficient hyperparameter sets to enhance the processing of the vast corpus. Different hyperparameter tuning models have been used previously in various studies. Still, tuning the deep learning models with the greatest possible number of hyperparameters has not yet been possible. This study developed a modified optimization methodology for effective hyperparameter identification, addressing the shortcomings of the previous studies. To get the optimum outcome, an enhanced genetic algorithm is used with modified crossover and mutation. The method has the ability to tune several hyperparameters simultaneously. The benchmark datasets for online reviews show outstanding results from the proposed methodology. The outcome demonstrates that the presented enhanced genetic algorithm-based hyperparameter tuning model performs better than other standard approaches with 88.73% classification accuracy, 87.31% sensitivity, 90.15% specificity, and 88.58% F-score value for the IMDB dataset and 92.17% classification accuracy, 91.89% sensitivity, 92.47% specificity, and 92.50% F-score value for the Yelp dataset while requiring less processing effort. To further enhance the performance, attention mechanism is applied to the designed model, achieving 89.62% accuracy, 88.59% sensitivity, 91.89% specificity, and 89.35% F-score with the IMDB dataset and 93.29% accuracy, 92.04% sensitivity, 93.22% specificity, and 92.98% F-score with the Yelp dataset.
{"title":"Hyperparameter elegance: fine-tuning text analysis with enhanced genetic algorithm hyperparameter landscape","authors":"Gyananjaya Tripathy, Aakanksha Sharaff","doi":"10.1007/s10115-024-02202-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02202-7","url":null,"abstract":"<p>Due to the significant participation of the users, it is highly challenging to handle enormous datasets using machine learning algorithms. Deep learning methods are therefore designed with efficient hyperparameter sets to enhance the processing of the vast corpus. Different hyperparameter tuning models have been used previously in various studies. Still, tuning the deep learning models with the greatest possible number of hyperparameters has not yet been possible. This study developed a modified optimization methodology for effective hyperparameter identification, addressing the shortcomings of the previous studies. To get the optimum outcome, an enhanced genetic algorithm is used with modified crossover and mutation. The method has the ability to tune several hyperparameters simultaneously. The benchmark datasets for online reviews show outstanding results from the proposed methodology. The outcome demonstrates that the presented enhanced genetic algorithm-based hyperparameter tuning model performs better than other standard approaches with 88.73% classification accuracy, 87.31% sensitivity, 90.15% specificity, and 88.58% F-score value for the IMDB dataset and 92.17% classification accuracy, 91.89% sensitivity, 92.47% specificity, and 92.50% F-score value for the Yelp dataset while requiring less processing effort. To further enhance the performance, attention mechanism is applied to the designed model, achieving 89.62% accuracy, 88.59% sensitivity, 91.89% specificity, and 89.35% F-score with the IMDB dataset and 93.29% accuracy, 92.04% sensitivity, 93.22% specificity, and 92.98% F-score with the Yelp dataset.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"18 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s10115-024-02190-8
Tao Tan, Hong Xie, Yunni Xia, Xiaoyu Shi, Mingsheng Shang
A variety of algorithms have been proposed to address the long-standing overestimation bias problem of Q-learning. Reducing this overestimation bias may lead to an underestimation bias, such as double Q-learning. However, it is still unclear how to make a good balance between overestimation and underestimation. We present a simple yet effective algorithm to fill in this gap and call Moving Average Q-learning. Specifically, we maintain two dependent Q-estimators. The first one is used to estimate the maximum expected Q-value. The second one is used to select the optimal action. In particular, the second estimator is the moving average of historical Q-values generated by the first estimator. The second estimator has only one hyperparameter, namely the moving average parameter. This parameter controls the dependence between the second estimator and the first estimator, ranging from independent to identical. Based on Moving Average Q-learning, we design an adaptive strategy to select the moving average parameter, resulting in AdaMA (Adaptive Moving Average) Q-learning. This adaptive strategy is a simple function, where the moving average parameter increases monotonically with the number of state–action pairs visited. Moreover, we extend AdaMA Q-learning to AdaMA DQN in high-dimensional environments. Extensive experiment results reveal why Moving Average Q-learning and AdaMA Q-learning can mitigate the overestimation bias, and also show that AdaMA Q-learning and AdaMA DQN outperform SOTA baselines drastically. In particular, when compared with the overestimated value of 1.66 in Q-learning, AdaMA Q-learning underestimates by 0.196, resulting in an improvement of 88.19%.
为了解决 Q-learning 长期存在的高估偏差问题,人们提出了多种算法。减少高估偏差可能会导致低估偏差,如双 Q 学习。然而,如何在高估和低估之间取得良好的平衡仍不清楚。我们提出了一种简单而有效的算法来填补这一空白,并称之为移动平均 Q-learning。具体来说,我们保留了两个相互依赖的 Q 值估计器。第一个用于估计最大预期 Q 值。第二个用于选择最优行动。具体来说,第二个估计器是第一个估计器生成的历史 Q 值的移动平均值。第二个估计器只有一个超参数,即移动平均参数。该参数控制着第二个估计器和第一个估计器之间的依赖关系,范围从独立到相同。在移动平均 Q-learning 的基础上,我们设计了一种自适应策略来选择移动平均参数,这就是 AdaMA(自适应移动平均)Q-learning。这种自适应策略是一个简单的函数,其中移动平均参数随访问的状态-动作对数量的增加而单调增加。此外,我们还将 AdaMA Q-learning 扩展到了高维环境下的 AdaMA DQN。广泛的实验结果揭示了移动平均 Q-learning 和 AdaMA Q-learning 能够减轻高估偏差的原因,同时也表明 AdaMA Q-learning 和 AdaMA DQN 的性能大大优于 SOTA 基线。其中,与 Q-learning 的高估值 1.66 相比,AdaMA Q-learning 的低估值为 0.196,提高了 88.19%。
{"title":"Adaptive moving average Q-learning","authors":"Tao Tan, Hong Xie, Yunni Xia, Xiaoyu Shi, Mingsheng Shang","doi":"10.1007/s10115-024-02190-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02190-8","url":null,"abstract":"<p>A variety of algorithms have been proposed to address the long-standing overestimation bias problem of Q-learning. Reducing this overestimation bias may lead to an underestimation bias, such as double Q-learning. However, it is still unclear how to make a good balance between overestimation and underestimation. We present a simple yet effective algorithm to fill in this gap and call Moving Average Q-learning. Specifically, we maintain two dependent Q-estimators. The first one is used to estimate the maximum expected Q-value. The second one is used to select the optimal action. In particular, the second estimator is the moving average of historical Q-values generated by the first estimator. The second estimator has only one hyperparameter, namely the moving average parameter. This parameter controls the dependence between the second estimator and the first estimator, ranging from independent to identical. Based on Moving Average Q-learning, we design an adaptive strategy to select the moving average parameter, resulting in AdaMA (<u>Ada</u>ptive <u>M</u>oving <u>A</u>verage) Q-learning. This adaptive strategy is a simple function, where the moving average parameter increases monotonically with the number of state–action pairs visited. Moreover, we extend AdaMA Q-learning to AdaMA DQN in high-dimensional environments. Extensive experiment results reveal why Moving Average Q-learning and AdaMA Q-learning can mitigate the overestimation bias, and also show that AdaMA Q-learning and AdaMA DQN outperform SOTA baselines drastically. In particular, when compared with the overestimated value of 1.66 in Q-learning, AdaMA Q-learning underestimates by 0.196, resulting in an improvement of 88.19%.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"372 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s10115-024-02201-8
Mahsa Nooribakhsh, Marta Fernández-Diego, Fernando González-Ladrón-De-Guevara, Mahdi Mollamotalebi
One of the important issues in social networks is the social communities which are formed by interactions between its members. Three types of community including overlapping, non-overlapping, and hidden are detected by different approaches. Regarding the importance of community detection in social networks, this paper provides a systematic mapping of machine learning-based community detection approaches. The study aimed to show the type of communities in social networks along with the algorithms of machine learning that have been used for community detection. After carrying out the steps of mapping and removing useless references, 246 papers were selected to answer the questions of this research. The results of the research indicated that unsupervised machine learning-based algorithms with 41.46% (such as k means) are the most used categories to detect communities in social networks due to their low processing overheads. On the other hand, there has been a significant increase in the use of deep learning since 2020 which has sufficient performance for community detection in large-volume data. With regard to the ability of NMI to measure the correlation or similarity between communities, with 53.25%, it is the most frequently used metric to evaluate the performance of community identifications. Furthermore, considering availability, low in size, and lack of multiple edge and loops, dataset Zachary’s Karate Club with 26.42% is the most used dataset for community detection research in social networks.
社交网络中的一个重要问题是由其成员之间的互动所形成的社交社区。不同的方法可以检测出三种类型的社群,包括重叠社群、非重叠社群和隐藏社群。鉴于社群检测在社交网络中的重要性,本文系统地介绍了基于机器学习的社群检测方法。研究旨在展示社交网络中的社群类型以及用于社群检测的机器学习算法。在进行了映射和删除无用参考文献等步骤后,选出了 246 篇论文来回答本研究的问题。研究结果表明,基于无监督机器学习的算法(如 k 平均值)因其较低的处理开销,以 41.46% 的比例成为检测社交网络中社区的最常用类别。另一方面,自 2020 年以来,深度学习的使用显著增加,其性能足以在海量数据中进行社群检测。关于 NMI 衡量社区间相关性或相似性的能力,它以 53.25% 的比例成为评估社区识别性能的最常用指标。此外,考虑到数据集 Zachary's Karate Club 的可用性、规模较小、缺乏多边缘和循环等因素,该数据集以 26.42% 的比例成为社交网络社区检测研究中最常用的数据集。
{"title":"Community detection in social networks using machine learning: a systematic mapping study","authors":"Mahsa Nooribakhsh, Marta Fernández-Diego, Fernando González-Ladrón-De-Guevara, Mahdi Mollamotalebi","doi":"10.1007/s10115-024-02201-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02201-8","url":null,"abstract":"<p>One of the important issues in social networks is the social communities which are formed by interactions between its members. Three types of community including overlapping, non-overlapping, and hidden are detected by different approaches. Regarding the importance of community detection in social networks, this paper provides a systematic mapping of machine learning-based community detection approaches. The study aimed to show the type of communities in social networks along with the algorithms of machine learning that have been used for community detection. After carrying out the steps of mapping and removing useless references, 246 papers were selected to answer the questions of this research. The results of the research indicated that unsupervised machine learning-based algorithms with 41.46% (such as <i>k</i> means) are the most used categories to detect communities in social networks due to their low processing overheads. On the other hand, there has been a significant increase in the use of deep learning since 2020 which has sufficient performance for community detection in large-volume data. With regard to the ability of NMI to measure the correlation or similarity between communities, with 53.25%, it is the most frequently used metric to evaluate the performance of community identifications. Furthermore, considering availability, low in size, and lack of multiple edge and loops, dataset Zachary’s Karate Club with 26.42% is the most used dataset for community detection research in social networks.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"53 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}