Pub Date : 2024-09-19DOI: 10.1007/s10115-024-02222-3
Rasoul Amirzadeh, Dhananjay Thiruvady, Asef Nazari, Mong Shan Ee
Understanding the relationships between cryptocurrencies is important for making informed investment decisions in this financial market. Our study utilises Bayesian networks to examine the causal interrelationships among six major cryptocurrencies: Bitcoin, Binance Coin, Ethereum, Litecoin, Ripple, and Tether. Beyond understanding the connectedness, we also investigate whether these relationships evolve over time. This understanding is crucial for developing profitable investment strategies and forecasting methods. Therefore, we introduce an approach to investigate the dynamic nature of these relationships. Our observations reveal that Tether, a stablecoin, behaves distinctly compared to mining-based cryptocurrencies and stands isolated from the others. Furthermore, our findings indicate that Bitcoin and Ethereum significantly influence the price fluctuations of the other coins, except for Tether. This highlights their key roles in the cryptocurrency ecosystem. Additionally, we conduct diagnostic analyses on constructed Bayesian networks, emphasising that cryptocurrencies generally follow the same market direction as extra evidence for interconnectedness. Moreover, our approach reveals the dynamic and evolving nature of these relationships over time, offering insights into the ever-changing dynamics of the cryptocurrency market.
{"title":"Dynamic evolution of causal relationships among cryptocurrencies: an analysis via Bayesian networks","authors":"Rasoul Amirzadeh, Dhananjay Thiruvady, Asef Nazari, Mong Shan Ee","doi":"10.1007/s10115-024-02222-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02222-3","url":null,"abstract":"<p>Understanding the relationships between cryptocurrencies is important for making informed investment decisions in this financial market. Our study utilises Bayesian networks to examine the causal interrelationships among six major cryptocurrencies: Bitcoin, Binance Coin, Ethereum, Litecoin, Ripple, and Tether. Beyond understanding the connectedness, we also investigate whether these relationships evolve over time. This understanding is crucial for developing profitable investment strategies and forecasting methods. Therefore, we introduce an approach to investigate the dynamic nature of these relationships. Our observations reveal that Tether, a stablecoin, behaves distinctly compared to mining-based cryptocurrencies and stands isolated from the others. Furthermore, our findings indicate that Bitcoin and Ethereum significantly influence the price fluctuations of the other coins, except for Tether. This highlights their key roles in the cryptocurrency ecosystem. Additionally, we conduct diagnostic analyses on constructed Bayesian networks, emphasising that cryptocurrencies generally follow the same market direction as extra evidence for interconnectedness. Moreover, our approach reveals the dynamic and evolving nature of these relationships over time, offering insights into the ever-changing dynamics of the cryptocurrency market.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"39 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing deep fuzzy clustering methods employ deep neural networks to extract high-level feature embeddings from data, thereby enhancing subsequent clustering and achieving superior performance compared to traditional methods. However, solely relying on feature embeddings may cause clustering models to ignore detailed information within data. To address this issue, this paper designs a deep multi-semantic fuzzy K-means (DMFKM) model. Our method harnesses the semantic complementarity of various kinds of features within autoencoder to improve clustering performance. Additionally, to fully exploit the contribution of different types of features to each cluster, we propose an adaptive weight adjustment mechanism to dynamically calculate the importance of different features during clustering. To validate the effectiveness of the proposed method, we applied it to six benchmark datasets. DMFKM significantly outperforms the prevailing fuzzy clustering techniques across different evaluation metrics. Specifically, on the six benchmark datasets, our method achieves notable gains over the second-best comparison method, with an ACC improvement of approximately 2.42%, a Purity boost of around 1.94%, and an NMI enhancement of roughly 0.65%.
{"title":"Deep multi-semantic fuzzy K-means with adaptive weight adjustment","authors":"Xiaodong Wang, Longfu Hong, Fei Yan, Jiayu Wang, Zhiqiang Zeng","doi":"10.1007/s10115-024-02221-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02221-4","url":null,"abstract":"<p>Existing deep fuzzy clustering methods employ deep neural networks to extract high-level feature embeddings from data, thereby enhancing subsequent clustering and achieving superior performance compared to traditional methods. However, solely relying on feature embeddings may cause clustering models to ignore detailed information within data. To address this issue, this paper designs a deep multi-semantic fuzzy K-means (DMFKM) model. Our method harnesses the semantic complementarity of various kinds of features within autoencoder to improve clustering performance. Additionally, to fully exploit the contribution of different types of features to each cluster, we propose an adaptive weight adjustment mechanism to dynamically calculate the importance of different features during clustering. To validate the effectiveness of the proposed method, we applied it to six benchmark datasets. DMFKM significantly outperforms the prevailing fuzzy clustering techniques across different evaluation metrics. Specifically, on the six benchmark datasets, our method achieves notable gains over the second-best comparison method, with an ACC improvement of approximately 2.42%, a Purity boost of around 1.94%, and an NMI enhancement of roughly 0.65%.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"92 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s10115-024-02220-5
Ye Liu, Shaobin Huang, Chi Wei, Sicheng Tian, Rongsheng Li, Naiyu Yan, Zhijuan Du
Class Incremental Named Entity Recognition (CINER) needs to learn new entity classes without forgetting old entity classes under the setting where the data only contain annotations for new entity classes. As is well known, the forgetting problem is the biggest challenge in Class Incremental Learning (CIL). In the CINER scenario, the unlabeled old class entities will further aggravate the forgetting problem. The current CINER method based on a single model cannot completely avoid the forgetting problem and is sensitive to the learning order of entity classes. To this end, we propose a Multi-Model (MM) framework that trains a new model for each incremental step and uses all the models for inference. In MM, each model only needs to learn the entity classes included in corresponding step, so MM has no forgetting problem and is robust to the different entity class learning orders. Furthermore, we design an error-correction training strategy and conflict-handling rules for MM to further improve performance. We evaluate MM on CoNLL-03 and OntoNotes-V5, and the experimental results show that our framework outperforms the current state-of-the-art (SOTA) methods by a large margin.
类增量命名实体识别(CINER)需要在数据只包含新实体类注释的情况下学习新实体类而不遗忘旧实体类。众所周知,遗忘问题是类增量学习(CIL)的最大挑战。在 CINER 场景中,未标注的旧类实体将进一步加剧遗忘问题。目前基于单一模型的 CINER 方法无法完全避免遗忘问题,而且对实体类的学习顺序很敏感。为此,我们提出了多模型(Multi-Model,MM)框架,为每个增量步骤训练一个新模型,并使用所有模型进行推理。在 MM 中,每个模型只需学习相应步骤中包含的实体类,因此 MM 不存在遗忘问题,而且对不同的实体类学习顺序具有鲁棒性。此外,我们还为 MM 设计了纠错训练策略和冲突处理规则,以进一步提高性能。我们在 CoNLL-03 和 OntoNotes-V5 上对 MM 进行了评估,实验结果表明,我们的框架在很大程度上优于目前最先进的方法(SOTA)。
{"title":"Class incremental named entity recognition without forgetting","authors":"Ye Liu, Shaobin Huang, Chi Wei, Sicheng Tian, Rongsheng Li, Naiyu Yan, Zhijuan Du","doi":"10.1007/s10115-024-02220-5","DOIUrl":"https://doi.org/10.1007/s10115-024-02220-5","url":null,"abstract":"<p>Class Incremental Named Entity Recognition (CINER) needs to learn new entity classes without forgetting old entity classes under the setting where the data only contain annotations for new entity classes. As is well known, the forgetting problem is the biggest challenge in Class Incremental Learning (CIL). In the CINER scenario, the unlabeled old class entities will further aggravate the forgetting problem. The current CINER method based on a single model cannot completely avoid the forgetting problem and is sensitive to the learning order of entity classes. To this end, we propose a Multi-Model (MM) framework that trains a new model for each incremental step and uses all the models for inference. In MM, each model only needs to learn the entity classes included in corresponding step, so MM has no forgetting problem and is robust to the different entity class learning orders. Furthermore, we design an error-correction training strategy and conflict-handling rules for MM to further improve performance. We evaluate MM on CoNLL-03 and OntoNotes-V5, and the experimental results show that our framework outperforms the current state-of-the-art (SOTA) methods by a large margin.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"110 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s10115-024-02183-7
Zhijing Yang, Hui Zhang, Chunming Yang, Bo Li, Xujian Zhao, Yin Long
Spectral clustering is one of the most common unsupervised learning algorithms in machine learning and plays an important role in data science. Fair spectral clustering has also become a hot topic with the extensive research on fair machine learning in recent years. Current iterations of fair spectral clustering methods are based on the concepts of group and individual fairness. These concepts act as mechanisms to mitigate decision bias, particularly for individuals with analogous characteristics and groups that are considered to be sensitive. Existing algorithms in fair spectral clustering have made progress in redistributing resources during clustering to mitigate inequities for certain individuals or subgroups. However, these algorithms still suffer from an unresolved problem at the global level: the resulting clusters tend to be oversized and undersized. To this end, the first original research on scale fairness is presented, aiming to explore how to enhance scale fairness in spectral clustering. We define it as a cluster attribution problem for uncertain data points and introduce entropy to enhance scale fairness. We measure the scale fairness of clustering by designing two statistical metrics. In addition, two scale fair spectral clustering algorithms are proposed, the entropy weighted spectral clustering (EWSC) and the scale fair spectral clustering (SFSC). We have experimentally verified on several publicly available real datasets of different sizes that EWSC and SFSC have excellent scale fairness performance, along with comparable clustering effects.
{"title":"Spectral clustering with scale fairness constraints","authors":"Zhijing Yang, Hui Zhang, Chunming Yang, Bo Li, Xujian Zhao, Yin Long","doi":"10.1007/s10115-024-02183-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02183-7","url":null,"abstract":"<p>Spectral clustering is one of the most common unsupervised learning algorithms in machine learning and plays an important role in data science. Fair spectral clustering has also become a hot topic with the extensive research on fair machine learning in recent years. Current iterations of fair spectral clustering methods are based on the concepts of group and individual fairness. These concepts act as mechanisms to mitigate decision bias, particularly for individuals with analogous characteristics and groups that are considered to be sensitive. Existing algorithms in fair spectral clustering have made progress in redistributing resources during clustering to mitigate inequities for certain individuals or subgroups. However, these algorithms still suffer from an unresolved problem at the global level: the resulting clusters tend to be oversized and undersized. To this end, the first original research on scale fairness is presented, aiming to explore how to enhance scale fairness in spectral clustering. We define it as a cluster attribution problem for uncertain data points and introduce entropy to enhance scale fairness. We measure the scale fairness of clustering by designing two statistical metrics. In addition, two scale fair spectral clustering algorithms are proposed, the <i>entropy weighted spectral clustering</i> (EWSC) and the <i>scale fair spectral clustering</i> (SFSC). We have experimentally verified on several publicly available real datasets of different sizes that EWSC and SFSC have excellent scale fairness performance, along with comparable clustering effects.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"36 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s10115-024-02223-2
Atena Jalali Mojahed, Mohammad Hossein Moattar, Hamidreza Ghaffari
Learned distance metrics measure the difference of the data according to the intrinsic properties of the data points and classes. Distance metric learning approaches are typically used to linearly distinguish the samples of different classes and do not perform well on real-world nonlinear data classes. A kernel-based nonlinear distance metric learning approach is proposed in this article which exploits the density of multimodal classes to properly differentiate the classes while reducing the within-class separation. Here, multimodality refers to the disjoint distribution of a class, resulting in each class having multiple density components. In the proposed kernel density-based distance metric learning approach, kernel trick is applied on the original data and maps the data to a higher-dimensional space. Then, given the possibility of multimodal classes, a mixture of multivariate Gaussian densities is considered for the distribution of each class. The number of components is calculated using a density-based clustering approach, and then the parameters of the Gaussian components are estimated using maximum a posteriori density estimation. Then, an iterative method is used to maximize the Bhattacharya distance among the classes' Gaussian mixtures. The distance among the external components is increased, while the distance among samples of each component is decreased to provide a wide between-class margin. The results of the experiments show that using the proposed approach significantly improves the efficiency of the simple K nearest neighbor algorithm on the imbalanced data set, but when the imbalance ratio is very high, the kernel function does not have a significant effect on the efficiency of the distance metric.
学习的距离度量根据数据点和类别的内在属性来衡量数据的差异。距离度量学习方法通常用于线性区分不同类别的样本,在现实世界的非线性数据类别中表现不佳。本文提出了一种基于核的非线性距离度量学习方法,它利用多模态类的密度来正确区分类,同时减少类内分离。这里的多模态是指类的不连续分布,导致每个类都有多个密度分量。在所提出的基于核密度的距离度量学习方法中,核技巧被应用于原始数据,并将数据映射到高维空间。然后,考虑到多模态类的可能性,每个类的分布都会考虑多元高斯密度的混合物。使用基于密度的聚类方法计算分量的数量,然后使用最大后验密度估计法估算高斯分量的参数。然后,使用迭代法最大化类别高斯混合物之间的巴塔查里亚距离。外部分量之间的距离增大,而每个分量样本之间的距离减小,以提供较宽的类间余量。实验结果表明,在不平衡数据集上,使用所提出的方法能显著提高简单 K 近邻算法的效率,但当不平衡率非常高时,核函数对距离度量的效率影响并不明显。
{"title":"Supervised kernel-based multi-modal Bhattacharya distance learning for imbalanced data classification","authors":"Atena Jalali Mojahed, Mohammad Hossein Moattar, Hamidreza Ghaffari","doi":"10.1007/s10115-024-02223-2","DOIUrl":"https://doi.org/10.1007/s10115-024-02223-2","url":null,"abstract":"<p>Learned distance metrics measure the difference of the data according to the intrinsic properties of the data points and classes. Distance metric learning approaches are typically used to linearly distinguish the samples of different classes and do not perform well on real-world nonlinear data classes. A kernel-based nonlinear distance metric learning approach is proposed in this article which exploits the density of multimodal classes to properly differentiate the classes while reducing the within-class separation. Here, multimodality refers to the disjoint distribution of a class, resulting in each class having multiple density components. In the proposed kernel density-based distance metric learning approach, kernel trick is applied on the original data and maps the data to a higher-dimensional space. Then, given the possibility of multimodal classes, a mixture of multivariate Gaussian densities is considered for the distribution of each class. The number of components is calculated using a density-based clustering approach, and then the parameters of the Gaussian components are estimated using maximum a posteriori density estimation. Then, an iterative method is used to maximize the Bhattacharya distance among the classes' Gaussian mixtures. The distance among the external components is increased, while the distance among samples of each component is decreased to provide a wide between-class margin. The results of the experiments show that using the proposed approach significantly improves the efficiency of the simple K nearest neighbor algorithm on the imbalanced data set, but when the imbalance ratio is very high, the kernel function does not have a significant effect on the efficiency of the distance metric.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Document re-ranking is a core task in session search. However, most existing methods only focus on the short-term session and ignore the long-term history sessions. This leads to inadequate understanding of the user’s search intent, which affects the performance of model re-ranking. At the same time, these methods have weaker capability in understanding user queries. In this paper, we propose a long short-term search session-based re-ranking model (LSSRM). Firstly, we utilize the BERT model to predict the topic relevance between the query and candidate documents, in order to improve the model’s understanding of user queries. Secondly, we initialize the reading vector with topic relevance and use the personalized memory encoder module to model the user’s long-term search intent. Thirdly, we input the user’s current session interaction sequence into Transformer to obtain the vector representation of the user’s short-term search intent. Finally, the user’s search intent and topical relevance information are hierarchically fused to obtain the final document ranking scores. Then re-rank the documents according to this score. We conduct extensive experiments on two real-world session datasets. The experimental results show that our method outperforms the baseline models for the document re-ranking task.
{"title":"Long short-term search session-based document re-ranking model","authors":"Jianping Liu, Meng Wang, Jian Wang, Yingfei Wang, Xintao Chu","doi":"10.1007/s10115-024-02205-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02205-4","url":null,"abstract":"<p>Document re-ranking is a core task in session search. However, most existing methods only focus on the short-term session and ignore the long-term history sessions. This leads to inadequate understanding of the user’s search intent, which affects the performance of model re-ranking. At the same time, these methods have weaker capability in understanding user queries. In this paper, we propose a long short-term search session-based re-ranking model (LSSRM). Firstly, we utilize the BERT model to predict the topic relevance between the query and candidate documents, in order to improve the model’s understanding of user queries. Secondly, we initialize the reading vector with topic relevance and use the personalized memory encoder module to model the user’s long-term search intent. Thirdly, we input the user’s current session interaction sequence into Transformer to obtain the vector representation of the user’s short-term search intent. Finally, the user’s search intent and topical relevance information are hierarchically fused to obtain the final document ranking scores. Then re-rank the documents according to this score. We conduct extensive experiments on two real-world session datasets. The experimental results show that our method outperforms the baseline models for the document re-ranking task.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"17 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1007/s10115-024-02189-1
Baojie Zhang, Ye Zhu, Yang Cao, Sutharshan Rajasegarar, Gang Li, Gang Liu
Visual Assessment of cluster Tendency (VAT) is a popular method that visually represents the possible clusters found in a dataset as dark blocks along the diagonal of a reordered dissimilarity image (RDI). Although many variants of the VAT algorithm have been proposed to improve the visualisation quality on different types of datasets, they still suffer from the challenge of extracting clusters with varied densities. In this paper, we focus on overcoming this drawback of VAT algorithms by incorporating kernel methods and also propose a novel adaptive cluster extraction strategy, named CER, to effectively identify the local clusters from the RDI. We examine their effects on an improved VAT method (iVAT) and systematically evaluate the clustering performance on 18 synthetic and real-world datasets. The experimental results reveal that the recently proposed data-dependent dissimilarity measure, namely the Isolation kernel, helps to significantly improve the RDI image for easy cluster identification. Furthermore, the proposed cluster extraction method, CER, outperforms other existing methods on most of the datasets in terms of a series of dissimilarity measures.
{"title":"Kernel-based iVAT with adaptive cluster extraction","authors":"Baojie Zhang, Ye Zhu, Yang Cao, Sutharshan Rajasegarar, Gang Li, Gang Liu","doi":"10.1007/s10115-024-02189-1","DOIUrl":"https://doi.org/10.1007/s10115-024-02189-1","url":null,"abstract":"<p>Visual Assessment of cluster Tendency (VAT) is a popular method that visually represents the possible clusters found in a dataset as dark blocks along the diagonal of a <i>reordered dissimilarity image</i> (RDI). Although many variants of the VAT algorithm have been proposed to improve the visualisation quality on different types of datasets, they still suffer from the challenge of extracting clusters with varied densities. In this paper, we focus on overcoming this drawback of VAT algorithms by incorporating kernel methods and also propose a novel adaptive cluster extraction strategy, named CER, to effectively identify the local clusters from the RDI. We examine their effects on an improved VAT method (iVAT) and systematically evaluate the clustering performance on 18 synthetic and real-world datasets. The experimental results reveal that the recently proposed data-dependent dissimilarity measure, namely the Isolation kernel, helps to significantly improve the RDI image for easy cluster identification. Furthermore, the proposed cluster extraction method, CER, outperforms other existing methods on most of the datasets in terms of a series of dissimilarity measures.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"9 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1007/s10115-024-02214-3
Hadis Bashiri, Hassan Naderi
Sentiment analysis has become an important task in natural language processing because it is used in many different areas. This paper gives a detailed review of sentiment analysis, including its definition, challenges, and uses. Different approaches to sentiment analysis are discussed, focusing on how they have changed and their limitations. Special attention is given to recent improvements with transformer models and transfer learning. Detailed reviews of well-known transformer models like BERT, RoBERTa, XLNet, ELECTRA, DistilBERT, ALBERT, T5, and GPT are provided, looking at their structures and roles in sentiment analysis. In the experimental section, the performance of these eight transformer models is compared across 22 different datasets. The results show that the T5 model consistently performs the best on multiple datasets, demonstrating its flexibility and ability to generalize. XLNet performs very well in understanding irony and sentiments related to products, while ELECTRA and RoBERTa perform best on certain datasets, showing their strengths in specific areas. BERT and DistilBERT often perform the lowest, indicating that they may struggle with complex sentiment tasks despite being computationally efficient.
{"title":"Comprehensive review and comparative analysis of transformer models in sentiment analysis","authors":"Hadis Bashiri, Hassan Naderi","doi":"10.1007/s10115-024-02214-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02214-3","url":null,"abstract":"<p>Sentiment analysis has become an important task in natural language processing because it is used in many different areas. This paper gives a detailed review of sentiment analysis, including its definition, challenges, and uses. Different approaches to sentiment analysis are discussed, focusing on how they have changed and their limitations. Special attention is given to recent improvements with transformer models and transfer learning. Detailed reviews of well-known transformer models like BERT, RoBERTa, XLNet, ELECTRA, DistilBERT, ALBERT, T5, and GPT are provided, looking at their structures and roles in sentiment analysis. In the experimental section, the performance of these eight transformer models is compared across 22 different datasets. The results show that the T5 model consistently performs the best on multiple datasets, demonstrating its flexibility and ability to generalize. XLNet performs very well in understanding irony and sentiments related to products, while ELECTRA and RoBERTa perform best on certain datasets, showing their strengths in specific areas. BERT and DistilBERT often perform the lowest, indicating that they may struggle with complex sentiment tasks despite being computationally efficient.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"11 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the current era, the number of social network users continues to increase day by day due to the vast usage of interactive social networking sites like Twitter, Facebook, Instagram, etc. On these sites, users generate posts, whereas the attitude of followers towards factor utilization like situation, sound, feeling, and so on can be analysed. But most people feel difficult to analyse feelings accurately, which is one of the most difficult problems in natural language processing. Some people expose their opinions with different sole meanings, and this sophisticated form of expressing sentiments through irony or mockery is termed sarcasm. The sarcastic comments, tweets or feedback can mislead data mining activities and may result in inaccurate predictions. Several existing models are used for sarcasm detection, but they have resulted in inaccuracy issues, huge time consumption, less training ability, high overfitting issues, etc. To overcome these limitations, an effective model is introduced in this research to detect sarcasm. Initially, the data are collected from publicly available sarcasmania and Generic sarcasm-Not sarcasm (Gen-Sarc-Notsarc) datasets. The collected data are pre-processed using stemming and stop word removal procedures. The features are extracted using the inverse filtering (IF) model through hash index creation, keyword matching and ranking. The optimal features are selected using adaptive search and rescue (ASAR) optimization algorithm. To enhance the accuracy of sarcasm detection, an optimized Bi-LSTM-based deep learning model is proposed by integrating Bi-directional long short-term memory (Bi-LSTM) with group teaching optimization (GTO). Also, the LSTM + GTO model is proposed to compare its performance with the Bi-LSTM + GTO model. The proposed models are compared with existing classifier approaches to prove the model’s superiority using PYTHON. The accuracy of 98.24% and 98.36% are attained for sarcasmania and Gen-Sarc-Notsarc datasets.
{"title":"Sarcasm detection using optimized bi-directional long short-term memory","authors":"Vidyullatha Sukhavasi, Venkatrama Phani kumar Sistla, Venkatesulu Dondeti","doi":"10.1007/s10115-024-02210-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02210-7","url":null,"abstract":"<p>In the current era, the number of social network users continues to increase day by day due to the vast usage of interactive social networking sites like Twitter, Facebook, Instagram, etc. On these sites, users generate posts, whereas the attitude of followers towards factor utilization like situation, sound, feeling, and so on can be analysed. But most people feel difficult to analyse feelings accurately, which is one of the most difficult problems in natural language processing. Some people expose their opinions with different sole meanings, and this sophisticated form of expressing sentiments through irony or mockery is termed sarcasm. The sarcastic comments, tweets or feedback can mislead data mining activities and may result in inaccurate predictions. Several existing models are used for sarcasm detection, but they have resulted in inaccuracy issues, huge time consumption, less training ability, high overfitting issues, etc. To overcome these limitations, an effective model is introduced in this research to detect sarcasm. Initially, the data are collected from publicly available sarcasmania and Generic sarcasm-Not sarcasm (Gen-Sarc-Notsarc) datasets. The collected data are pre-processed using stemming and stop word removal procedures. The features are extracted using the inverse filtering (IF) model through hash index creation, keyword matching and ranking. The optimal features are selected using adaptive search and rescue (ASAR) optimization algorithm. To enhance the accuracy of sarcasm detection, an optimized Bi-LSTM-based deep learning model is proposed by integrating Bi-directional long short-term memory (Bi-LSTM) with group teaching optimization (GTO). Also, the LSTM + GTO model is proposed to compare its performance with the Bi-LSTM + GTO model. The proposed models are compared with existing classifier approaches to prove the model’s superiority using PYTHON. The accuracy of 98.24% and 98.36% are attained for sarcasmania and Gen-Sarc-Notsarc datasets.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1007/s10115-024-02204-5
Amirhossein Ghadami, Thomas Tran
Recommendation systems are crucial in boosting companies’ revenues by implementing various strategies to engage customers and encourage them to invest in products or services. Businesses constantly desire to enhance these systems through different approaches. One effective method involves using hybrid recommendation systems, known for their ability to create high-performance models. We introduce a hybrid recommendation system that leverages two types of recommendation systems: first, a novel deep learning-based recommendation system that utilizes users’ and items’ content data, and second, a traditional recommendation system that employs users’ past behaviour data. We introduce a novel deep learning-based recommendation system called convolutional autoencoder recommendation system (CAERS). It uses a convolutional autoencoder (CAE) to capture high-order meaningful relationships between users’ and items’ content information and decode them to predict ratings. Subsequently, we design a traditional model-based collaborative filtering recommendation system (CF) that leverages users’ past behaviour data, utilizing singular value decomposition (SVD). Finally, in the last step, we combine the two method’s predictions with linear regression. We determine the optimal weight for each prediction generated by the collaborative filtering and the deep learning-based recommendation system. Our main objective is to introduce a hybrid model called CAERS-CF that leverages the strengths of the two mentioned approaches. For experimental purposes, we utilize two movie datasets to showcase the performance of CAERS-CF. Our model outperforms each constituent model individually and other state-of-the-art deep learning or hybrid models. Across both datasets, the hybrid CAERS-CF model demonstrates an average RMSE improvement of approximately 3.70% and an average MAE improvement of approximately 5.96% compared to the next best models.
{"title":"CAERS-CF: enhancing convolutional autoencoder recommendations through collaborative filtering","authors":"Amirhossein Ghadami, Thomas Tran","doi":"10.1007/s10115-024-02204-5","DOIUrl":"https://doi.org/10.1007/s10115-024-02204-5","url":null,"abstract":"<p>Recommendation systems are crucial in boosting companies’ revenues by implementing various strategies to engage customers and encourage them to invest in products or services. Businesses constantly desire to enhance these systems through different approaches. One effective method involves using hybrid recommendation systems, known for their ability to create high-performance models. We introduce a hybrid recommendation system that leverages two types of recommendation systems: first, a novel deep learning-based recommendation system that utilizes users’ and items’ content data, and second, a traditional recommendation system that employs users’ past behaviour data. We introduce a novel deep learning-based recommendation system called convolutional autoencoder recommendation system (CAERS). It uses a convolutional autoencoder (CAE) to capture high-order meaningful relationships between users’ and items’ content information and decode them to predict ratings. Subsequently, we design a traditional model-based collaborative filtering recommendation system (CF) that leverages users’ past behaviour data, utilizing singular value decomposition (SVD). Finally, in the last step, we combine the two method’s predictions with linear regression. We determine the optimal weight for each prediction generated by the collaborative filtering and the deep learning-based recommendation system. Our main objective is to introduce a hybrid model called CAERS-CF that leverages the strengths of the two mentioned approaches. For experimental purposes, we utilize two movie datasets to showcase the performance of CAERS-CF. Our model outperforms each constituent model individually and other state-of-the-art deep learning or hybrid models. Across both datasets, the hybrid CAERS-CF model demonstrates an average RMSE improvement of approximately 3.70% and an average MAE improvement of approximately 5.96% compared to the next best models.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}