Pub Date : 2023-04-29DOI: 10.1108/dta-09-2022-0378
Bin Wang, Fan Gao, Le Tong, Qian Zhang, Sulei Zhu
PurposeTraffic flow prediction has always been a top priority of intelligent transportation systems. There are many mature methods for short-term traffic flow prediction. However, the existing methods are often insufficient in capturing long-term spatial-temporal dependencies. To predict long-term dependencies more accurately, in this paper, a new and more effective traffic flow prediction model is proposed.Design/methodology/approachThis paper proposes a new and more effective traffic flow prediction model, named channel attention-based spatial-temporal graph neural networks. A graph convolutional network is used to extract local spatial-temporal correlations, a channel attention mechanism is used to enhance the influence of nearby spatial-temporal dependencies on decision-making and a transformer mechanism is used to capture long-term dependencies.FindingsThe proposed model is applied to two common highway datasets: METR-LA collected in Los Angeles and PEMS-BAY collected in the California Bay Area. This model outperforms the other five in terms of performance on three performance metrics a popular model.Originality/value(1) Based on the spatial-temporal synchronization graph convolution module, a spatial-temporal channel attention module is designed to increase the influence of proximity dependence on decision-making by enhancing or suppressing different channels. (2) To better capture long-term dependencies, the transformer module is introduced.
{"title":"Channel attention-based spatial-temporal graph neural networks for traffic prediction","authors":"Bin Wang, Fan Gao, Le Tong, Qian Zhang, Sulei Zhu","doi":"10.1108/dta-09-2022-0378","DOIUrl":"https://doi.org/10.1108/dta-09-2022-0378","url":null,"abstract":"PurposeTraffic flow prediction has always been a top priority of intelligent transportation systems. There are many mature methods for short-term traffic flow prediction. However, the existing methods are often insufficient in capturing long-term spatial-temporal dependencies. To predict long-term dependencies more accurately, in this paper, a new and more effective traffic flow prediction model is proposed.Design/methodology/approachThis paper proposes a new and more effective traffic flow prediction model, named channel attention-based spatial-temporal graph neural networks. A graph convolutional network is used to extract local spatial-temporal correlations, a channel attention mechanism is used to enhance the influence of nearby spatial-temporal dependencies on decision-making and a transformer mechanism is used to capture long-term dependencies.FindingsThe proposed model is applied to two common highway datasets: METR-LA collected in Los Angeles and PEMS-BAY collected in the California Bay Area. This model outperforms the other five in terms of performance on three performance metrics a popular model.Originality/value(1) Based on the spatial-temporal synchronization graph convolution module, a spatial-temporal channel attention module is designed to increase the influence of proximity dependence on decision-making by enhancing or suppressing different channels. (2) To better capture long-term dependencies, the transformer module is introduced.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49082434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-28DOI: 10.1108/dta-03-2022-0096
Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas
PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.
{"title":"Machine learning approaches for prediction of ovarian cancer driver genes from mutational and network analysis","authors":"Rucha Wadapurkar, S. Bapat, Rupali A. Mahajan, R. Vyas","doi":"10.1108/dta-03-2022-0096","DOIUrl":"https://doi.org/10.1108/dta-03-2022-0096","url":null,"abstract":"PurposeOvarian cancer (OC) is the most common type of gynecologic cancer in the world with a high rate of mortality. Due to manifestation of generic symptoms and absence of specific biomarkers, OC is usually diagnosed at a late stage. Machine learning models can be employed to predict driver genes implicated in causative mutations.Design/methodology/approachIn the present study, a comprehensive next generation sequencing (NGS) analysis of whole exome sequences of 47 OC patients was carried out to identify clinically significant mutations. Nine functional features of 708 mutations identified were input into a machine learning classification model by employing the eXtreme Gradient Boosting (XGBoost) classifier method for prediction of OC driver genes.FindingsThe XGBoost classifier model yielded a classification accuracy of 0.946, which was superior to that obtained by other classifiers such as decision tree, Naive Bayes, random forest and support vector machine. Further, an interaction network was generated to identify and establish correlations with cancer-associated pathways and gene ontology data.Originality/valueThe final results revealed 12 putative candidate cancer driver genes, namely LAMA3, LAMC3, COL6A1, COL5A1, COL2A1, UGT1A1, BDNF, ANK1, WNT10A, FZD4, PLEKHG5 and CYP2C9, that may have implications in clinical diagnosis.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48683941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-30DOI: 10.1108/dta-08-2022-0315
Duen-Ren Liu, Yang Huang, Jhen-Jie Jhao, Shin-Jye Lee
PurposeOnline news websites provide huge amounts of timely news, bringing the challenge of recommending personalized news articles. Generative adversarial networks (GAN) based on collaborative filtering (CFGAN) can achieve effective recommendation quality. However, CFGAN ignores item contents, which contain more latent preference features than just user ratings. It is important to consider both ratings and item contents in making preference predictions. This study aims to improve news recommendation by proposing a GAN-based news recommendation model considering both ratings (implicit feedback) and the latent features of news content.Design/methodology/approachThe collaborative topic modeling (CTM) can improve user preference prediction by combining matrix factorization (MF) with latent topics of item content derived from latent topic modeling. This study proposes a novel hybrid news recommendation model, Hybrid-CFGAN, which modifies the architecture of the CFGAN model with enhanced preference learning from the CTM. The proposed Hybrid-CFGAN model contains parallel neural networks – original rating-based preference learning and CTM-based preference learning, which consider both ratings and news content with user preferences derived from the CTM model. A tunable parameter is used to adjust the weights of the two preference learnings, while concatenating the preference outputs of the two parallel neural networks.FindingsThis study uses the dataset collected from an online news website, NiusNews, to conduct an experimental evaluation. The results show that the proposed Hybrid-CFGAN model can achieve better performance than the state-of-the-art GAN-based recommendation methods. The proposed novel Hybrid-CFGAN model can enhance existing GAN-based recommendation and increase the performance of preference predictions on textual content such as news articles.Originality/valueAs the existing CFGAN model does not consider content information and solely relies on history logs, it may not be effective in recommending news articles. Our proposed Hybrid-CFGAN model modified the architecture of the CFGAN generator by adding a parallel neural network to gain the relevant information from news content and user preferences derived from the CTM model. The novel idea of adjusting the preference learning from two parallel neural networks – original rating-based preference learning and CTM-based preference learning – contributes to improve the recommendation quality of the proposed model by considering both ratings and latent preferences derived from item contents. The proposed novel recommendation model can improve news recommendation, thereby increasing the commercial value of news media platforms.
{"title":"News recommendations based on collaborative topic modeling and collaborative filtering with generative adversarial networks","authors":"Duen-Ren Liu, Yang Huang, Jhen-Jie Jhao, Shin-Jye Lee","doi":"10.1108/dta-08-2022-0315","DOIUrl":"https://doi.org/10.1108/dta-08-2022-0315","url":null,"abstract":"PurposeOnline news websites provide huge amounts of timely news, bringing the challenge of recommending personalized news articles. Generative adversarial networks (GAN) based on collaborative filtering (CFGAN) can achieve effective recommendation quality. However, CFGAN ignores item contents, which contain more latent preference features than just user ratings. It is important to consider both ratings and item contents in making preference predictions. This study aims to improve news recommendation by proposing a GAN-based news recommendation model considering both ratings (implicit feedback) and the latent features of news content.Design/methodology/approachThe collaborative topic modeling (CTM) can improve user preference prediction by combining matrix factorization (MF) with latent topics of item content derived from latent topic modeling. This study proposes a novel hybrid news recommendation model, Hybrid-CFGAN, which modifies the architecture of the CFGAN model with enhanced preference learning from the CTM. The proposed Hybrid-CFGAN model contains parallel neural networks – original rating-based preference learning and CTM-based preference learning, which consider both ratings and news content with user preferences derived from the CTM model. A tunable parameter is used to adjust the weights of the two preference learnings, while concatenating the preference outputs of the two parallel neural networks.FindingsThis study uses the dataset collected from an online news website, NiusNews, to conduct an experimental evaluation. The results show that the proposed Hybrid-CFGAN model can achieve better performance than the state-of-the-art GAN-based recommendation methods. The proposed novel Hybrid-CFGAN model can enhance existing GAN-based recommendation and increase the performance of preference predictions on textual content such as news articles.Originality/valueAs the existing CFGAN model does not consider content information and solely relies on history logs, it may not be effective in recommending news articles. Our proposed Hybrid-CFGAN model modified the architecture of the CFGAN generator by adding a parallel neural network to gain the relevant information from news content and user preferences derived from the CTM model. The novel idea of adjusting the preference learning from two parallel neural networks – original rating-based preference learning and CTM-based preference learning – contributes to improve the recommendation quality of the proposed model by considering both ratings and latent preferences derived from item contents. The proposed novel recommendation model can improve news recommendation, thereby increasing the commercial value of news media platforms.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41499535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-20DOI: 10.1108/dta-04-2022-0142
Jasleen Kaur, Khushdeep Dharni
PurposeThe stock market generates massive databases of various financial companies that are highly volatile and complex. To forecast daily stock values of these companies, investors frequently use technical analysis or fundamental analysis. Data mining techniques coupled with fundamental and technical analysis types have the potential to give satisfactory results for stock market prediction. In the current paper, an effort is made to investigate the accuracy of stock market predictions by using the combined approach of variables from technical and fundamental analysis for the creation of a data mining predictive model.Design/methodology/approachWe chose 381 companies from the National Stock Exchange of India's CNX 500 index and conducted a two-stage data analysis. The first stage is identifying key fundamental variables and constructing a portfolio based on that study. Artificial neural network (ANN), support vector machines (SVM) and decision tree J48 were used to build the models. The second stage entails applying technical analysis to forecast price movements in the companies included in the portfolios. ANN and SVM techniques were used to create predictive models for all companies in the portfolios. We also estimated returns using trading decisions based on the model's output and then compared them to buy-and-hold returns and the return of the NIFTY 50 index, which served as a benchmark.FindingsThe results show that the returns of both the portfolios are higher than the benchmark buy-and-hold strategy return. It can be concluded that data mining techniques give better results, irrespective of the type of stock, and have the ability to make up for poor stocks. The comparison of returns of portfolios with the return of NIFTY as a benchmark also indicates that both the portfolios are generating higher returns as compared to the return generated by NIFTY.Originality/valueAs stock prices are influenced by both technical and fundamental indicators, the current paper explored the combined effect of technical analysis and fundamental analysis variables for Indian stock market prediction. Further, the results obtained by individual analysis have also been compared. The proposed method under study can also be utilized to determine whether to hold stocks for the long or short term using trend-based research.
{"title":"Data mining–based stock price prediction using hybridization of technical and fundamental analysis","authors":"Jasleen Kaur, Khushdeep Dharni","doi":"10.1108/dta-04-2022-0142","DOIUrl":"https://doi.org/10.1108/dta-04-2022-0142","url":null,"abstract":"PurposeThe stock market generates massive databases of various financial companies that are highly volatile and complex. To forecast daily stock values of these companies, investors frequently use technical analysis or fundamental analysis. Data mining techniques coupled with fundamental and technical analysis types have the potential to give satisfactory results for stock market prediction. In the current paper, an effort is made to investigate the accuracy of stock market predictions by using the combined approach of variables from technical and fundamental analysis for the creation of a data mining predictive model.Design/methodology/approachWe chose 381 companies from the National Stock Exchange of India's CNX 500 index and conducted a two-stage data analysis. The first stage is identifying key fundamental variables and constructing a portfolio based on that study. Artificial neural network (ANN), support vector machines (SVM) and decision tree J48 were used to build the models. The second stage entails applying technical analysis to forecast price movements in the companies included in the portfolios. ANN and SVM techniques were used to create predictive models for all companies in the portfolios. We also estimated returns using trading decisions based on the model's output and then compared them to buy-and-hold returns and the return of the NIFTY 50 index, which served as a benchmark.FindingsThe results show that the returns of both the portfolios are higher than the benchmark buy-and-hold strategy return. It can be concluded that data mining techniques give better results, irrespective of the type of stock, and have the ability to make up for poor stocks. The comparison of returns of portfolios with the return of NIFTY as a benchmark also indicates that both the portfolios are generating higher returns as compared to the return generated by NIFTY.Originality/valueAs stock prices are influenced by both technical and fundamental indicators, the current paper explored the combined effect of technical analysis and fundamental analysis variables for Indian stock market prediction. Further, the results obtained by individual analysis have also been compared. The proposed method under study can also be utilized to determine whether to hold stocks for the long or short term using trend-based research.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44417754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-17DOI: 10.1108/dta-07-2022-0267
Rui Tian, Ruheng Yin, Feng Gan
PurposeMusic sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual workload and low classification accuracy caused by difficulty in feature extraction and inaccurate manual determination of hyperparameter. In this paper, the authors propose an optimized convolution neural network-random forest (CNN-RF) model for music sentiment classification which is capable of optimizing the manually selected hyperparameters to improve the accuracy of music sentiment classification and reduce labor costs and human classification errors.Design/methodology/approachA CNN-RF music sentiment classification model is designed based on quantum particle swarm optimization (QPSO). First, the audio data are transformed into a Mel spectrogram, and feature extraction is conducted by a CNN. Second, the music features extracted are processed by RF algorithm to complete a preliminary emotion classification. Finally, to select the suitable hyperparameters for a CNN, the QPSO algorithm is adopted to extract the best hyperparameters and obtain the final classification results.FindingsThe model has gone through experimental validations and achieved a classification accuracy of 97 per cent for different sentiment categories with shortened training time. The proposed method with QPSO achieved 1.2 and 1.6 per cent higher accuracy than that with particle swarm optimization and genetic algorithm, respectively. The proposed model had great potential for music sentiment classification.Originality/valueThe dual contribution of this work comprises the proposed model which integrated two deep learning models and the introduction of a QPSO into model optimization. With these two innovations, the efficiency and accuracy of music emotion recognition and classification have been significantly improved.
{"title":"Music sentiment classification based on an optimized CNN-RF-QPSO model","authors":"Rui Tian, Ruheng Yin, Feng Gan","doi":"10.1108/dta-07-2022-0267","DOIUrl":"https://doi.org/10.1108/dta-07-2022-0267","url":null,"abstract":"PurposeMusic sentiment analysis helps to promote the diversification of music information retrieval methods. Traditional music emotion classification tasks suffer from high manual workload and low classification accuracy caused by difficulty in feature extraction and inaccurate manual determination of hyperparameter. In this paper, the authors propose an optimized convolution neural network-random forest (CNN-RF) model for music sentiment classification which is capable of optimizing the manually selected hyperparameters to improve the accuracy of music sentiment classification and reduce labor costs and human classification errors.Design/methodology/approachA CNN-RF music sentiment classification model is designed based on quantum particle swarm optimization (QPSO). First, the audio data are transformed into a Mel spectrogram, and feature extraction is conducted by a CNN. Second, the music features extracted are processed by RF algorithm to complete a preliminary emotion classification. Finally, to select the suitable hyperparameters for a CNN, the QPSO algorithm is adopted to extract the best hyperparameters and obtain the final classification results.FindingsThe model has gone through experimental validations and achieved a classification accuracy of 97 per cent for different sentiment categories with shortened training time. The proposed method with QPSO achieved 1.2 and 1.6 per cent higher accuracy than that with particle swarm optimization and genetic algorithm, respectively. The proposed model had great potential for music sentiment classification.Originality/valueThe dual contribution of this work comprises the proposed model which integrated two deep learning models and the introduction of a QPSO into model optimization. With these two innovations, the efficiency and accuracy of music emotion recognition and classification have been significantly improved.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47188804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-16DOI: 10.1108/dta-07-2022-0290
Yishan Liu, Wenming Cao, Guitao Cao
PurposeSession-based recommendation aims to predict the user's next preference based on the user's recent activities. Although most existing studies consider the global characteristics of items, they only learn the global characteristics of items based on a single connection relationship, which cannot fully capture the complex transformation relationship between items. We believe that multiple relationships between items in learning sessions can improve the performance of session recommendation tasks and the scalability of recommendation models. At the same time, high-quality global features of the item help to explore the potential common preferences of users.Design/methodology/approachThis work proposes a session-based recommendation method with a multi-relation global context–enhanced network to capture this global transition relationship. Specifically, we construct a multi-relation global item graph based on a group of sessions, use a graded attention mechanism to learn different types of connection relations independently and obtain the global feature of the item according to the multi-relation weight.FindingsWe did related experiments on three benchmark datasets. The experimental results show that our proposed model is superior to the existing state-of-the-art methods, which verifies the effectiveness of our model.Originality/valueFirst, we construct a multi-relation global item graph to learn the complex transition relations of the global context of the item and effectively mine the potential association of items between different sessions. Second, our model effectively improves the scalability of the model by obtaining high-quality item global features and enables some previously unconsidered items to make it onto the candidate list.
{"title":"Multi-relation global context learning for session-based recommendation","authors":"Yishan Liu, Wenming Cao, Guitao Cao","doi":"10.1108/dta-07-2022-0290","DOIUrl":"https://doi.org/10.1108/dta-07-2022-0290","url":null,"abstract":"PurposeSession-based recommendation aims to predict the user's next preference based on the user's recent activities. Although most existing studies consider the global characteristics of items, they only learn the global characteristics of items based on a single connection relationship, which cannot fully capture the complex transformation relationship between items. We believe that multiple relationships between items in learning sessions can improve the performance of session recommendation tasks and the scalability of recommendation models. At the same time, high-quality global features of the item help to explore the potential common preferences of users.Design/methodology/approachThis work proposes a session-based recommendation method with a multi-relation global context–enhanced network to capture this global transition relationship. Specifically, we construct a multi-relation global item graph based on a group of sessions, use a graded attention mechanism to learn different types of connection relations independently and obtain the global feature of the item according to the multi-relation weight.FindingsWe did related experiments on three benchmark datasets. The experimental results show that our proposed model is superior to the existing state-of-the-art methods, which verifies the effectiveness of our model.Originality/valueFirst, we construct a multi-relation global item graph to learn the complex transition relations of the global context of the item and effectively mine the potential association of items between different sessions. Second, our model effectively improves the scalability of the model by obtaining high-quality item global features and enables some previously unconsidered items to make it onto the candidate list.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48165487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-15DOI: 10.1108/dta-08-2022-0300
A. Ghorbanian, H. Razavi
PurposeThe common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common techniques used in data mining to increase the accuracy of clustering. In this study, based on segmentation, selecting the best segments, and using ensemble clustering for selected segments, a multistep approach has been developed for the whole clustering of time series data.Design/methodology/approachFirst, this approach divides the time series dataset into equal segments. In the next step, using one or more internal clustering criteria, the best segments are selected, and then the selected segments are combined for final clustering. By using a loop and how to select the best segments for the final clustering (using one criterion or several criteria simultaneously), two algorithms have been developed in different settings. A logarithmic relationship limits the number of segments created in the loop.FindingAccording to Rand's external criteria and statistical tests, at first, the best setting of the two developed algorithms has been selected. Then this setting has been compared to different algorithms in the literature on clustering accuracy and execution time. The obtained results indicate more accuracy and less execution time for the proposed approach.Originality/valueThis paper proposed a fast and accurate approach for time series clustering in three main steps. This is the first work that uses a combination of segmentation and ensemble clustering. More accuracy and less execution time are the remarkable achievements of this study.
{"title":"A new method based on ensemble time series for fast and accurate clustering","authors":"A. Ghorbanian, H. Razavi","doi":"10.1108/dta-08-2022-0300","DOIUrl":"https://doi.org/10.1108/dta-08-2022-0300","url":null,"abstract":"PurposeThe common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common techniques used in data mining to increase the accuracy of clustering. In this study, based on segmentation, selecting the best segments, and using ensemble clustering for selected segments, a multistep approach has been developed for the whole clustering of time series data.Design/methodology/approachFirst, this approach divides the time series dataset into equal segments. In the next step, using one or more internal clustering criteria, the best segments are selected, and then the selected segments are combined for final clustering. By using a loop and how to select the best segments for the final clustering (using one criterion or several criteria simultaneously), two algorithms have been developed in different settings. A logarithmic relationship limits the number of segments created in the loop.FindingAccording to Rand's external criteria and statistical tests, at first, the best setting of the two developed algorithms has been selected. Then this setting has been compared to different algorithms in the literature on clustering accuracy and execution time. The obtained results indicate more accuracy and less execution time for the proposed approach.Originality/valueThis paper proposed a fast and accurate approach for time series clustering in three main steps. This is the first work that uses a combination of segmentation and ensemble clustering. More accuracy and less execution time are the remarkable achievements of this study.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42318784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-09DOI: 10.1108/dta-08-2022-0302
Jingyi Li, S. Chao
PurposeBinary classification on imbalanced data is a challenge; due to the imbalance of the classes, the minority class is easily masked by the majority class. However, most existing classifiers are better at identifying the majority class, thereby ignoring the minority class, which leads to classifier degradation. To address this, this paper proposes a twin-support vector machines for binary classification on imbalanced data.Design/methodology/approachIn the proposed method, the authors construct two support vector machines to focus on majority classes and minority classes, respectively. In order to promote the learning ability of the two support vector machines, a new kernel is derived for them.Findings(1) A novel twin-support vector machine is proposed for binary classification on imbalanced data, and new kernels are derived. (2) For imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned by using optimizing kernels. (3) Classifiers based on twin architectures have more advantages than those based on single architecture for binary classification on imbalanced data.Originality/valueFor imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned through using optimizing kernels.
{"title":"A novel twin-support vector machine for binary classification to imbalanced data","authors":"Jingyi Li, S. Chao","doi":"10.1108/dta-08-2022-0302","DOIUrl":"https://doi.org/10.1108/dta-08-2022-0302","url":null,"abstract":"PurposeBinary classification on imbalanced data is a challenge; due to the imbalance of the classes, the minority class is easily masked by the majority class. However, most existing classifiers are better at identifying the majority class, thereby ignoring the minority class, which leads to classifier degradation. To address this, this paper proposes a twin-support vector machines for binary classification on imbalanced data.Design/methodology/approachIn the proposed method, the authors construct two support vector machines to focus on majority classes and minority classes, respectively. In order to promote the learning ability of the two support vector machines, a new kernel is derived for them.Findings(1) A novel twin-support vector machine is proposed for binary classification on imbalanced data, and new kernels are derived. (2) For imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned by using optimizing kernels. (3) Classifiers based on twin architectures have more advantages than those based on single architecture for binary classification on imbalanced data.Originality/valueFor imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned through using optimizing kernels.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"49 1","pages":"385-396"},"PeriodicalIF":1.6,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73540195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-27DOI: 10.1108/dta-06-2021-0156
Vasileios Stamatis, M. Salampasis, K. Diamantaras
PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.
{"title":"Machine learning methods for results merging in patent retrieval","authors":"Vasileios Stamatis, M. Salampasis, K. Diamantaras","doi":"10.1108/dta-06-2021-0156","DOIUrl":"https://doi.org/10.1108/dta-06-2021-0156","url":null,"abstract":"PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44771332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-25DOI: 10.1108/dta-03-2022-0093
Bin Wang, Huifeng Li, Le Tong, Qian Zhang, Sulei Zhu, Tao Yang
PurposeThis paper aims to address the following issues: (1) most existing methods are based on recurrent network, which is time-consuming to train long sequences due to not allowing for full parallelism; (2) personalized preference generally are not considered reasonably; (3) existing methods rarely systematically studied how to efficiently utilize various auxiliary information (e.g. user ID and time stamp) in trajectory data and the spatiotemporal relations among nonconsecutive locations.Design/methodology/approachThe authors propose a novel self-attention network–based model named SanMove to predict the next location via capturing the long- and short-term mobility patterns of users. Specifically, SanMove uses a self-attention module to capture each user's long-term preference, which can represent her personalized location preference. Meanwhile, the authors use a spatial-temporal guided noninvasive self-attention (STNOVA) module to exploit auxiliary information in the trajectory data to learn the user's short-term preference.FindingsThe authors evaluate SanMove on two real-world datasets. The experimental results demonstrate that SanMove is not only faster than the state-of-the-art recurrent neural network (RNN) based predict model but also outperforms the baselines for next location prediction.Originality/valueThe authors propose a self-attention-based sequential model named SanMove to predict the user's trajectory, which comprised long-term and short-term preference learning modules. SanMove allows full parallel processing of trajectories to improve processing efficiency. They propose an STNOVA module to capture the sequential transitions of current trajectories. Moreover, the self-attention module is used to process historical trajectory sequences in order to capture the personalized location preference of each user. The authors conduct extensive experiments on two check-in datasets. The experimental results demonstrate that the model has a fast training speed and excellent performance compared with the existing RNN-based methods for next location prediction.
{"title":"SanMove: next location recommendation via self-attention network","authors":"Bin Wang, Huifeng Li, Le Tong, Qian Zhang, Sulei Zhu, Tao Yang","doi":"10.1108/dta-03-2022-0093","DOIUrl":"https://doi.org/10.1108/dta-03-2022-0093","url":null,"abstract":"PurposeThis paper aims to address the following issues: (1) most existing methods are based on recurrent network, which is time-consuming to train long sequences due to not allowing for full parallelism; (2) personalized preference generally are not considered reasonably; (3) existing methods rarely systematically studied how to efficiently utilize various auxiliary information (e.g. user ID and time stamp) in trajectory data and the spatiotemporal relations among nonconsecutive locations.Design/methodology/approachThe authors propose a novel self-attention network–based model named SanMove to predict the next location via capturing the long- and short-term mobility patterns of users. Specifically, SanMove uses a self-attention module to capture each user's long-term preference, which can represent her personalized location preference. Meanwhile, the authors use a spatial-temporal guided noninvasive self-attention (STNOVA) module to exploit auxiliary information in the trajectory data to learn the user's short-term preference.FindingsThe authors evaluate SanMove on two real-world datasets. The experimental results demonstrate that SanMove is not only faster than the state-of-the-art recurrent neural network (RNN) based predict model but also outperforms the baselines for next location prediction.Originality/valueThe authors propose a self-attention-based sequential model named SanMove to predict the user's trajectory, which comprised long-term and short-term preference learning modules. SanMove allows full parallel processing of trajectories to improve processing efficiency. They propose an STNOVA module to capture the sequential transitions of current trajectories. Moreover, the self-attention module is used to process historical trajectory sequences in order to capture the personalized location preference of each user. The authors conduct extensive experiments on two check-in datasets. The experimental results demonstrate that the model has a fast training speed and excellent performance compared with the existing RNN-based methods for next location prediction.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"49 1","pages":"330-343"},"PeriodicalIF":1.6,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76459007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}