首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
Improvement of Data Stream Decision Trees 数据流决策树的改进
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.290889
Sarah Nait Bahloul, Oussama Abderrahim, Aya Ichrak Benhadj Amar, Mohammed Yacine Bouhedadja
The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.
数据流分类已成为一个重要而活跃的研究领域。数据流的主要特征是到达的数据量大、到达的速度和速率高、数据流的性质和分布随时间的变化。Hoeffding树是一种以增量方式构建决策树的方法。自文献提出以来,它已成为最流行的数据流分类工具之一。此后出现了几项改进。Hoeffding Anytime Tree是最近引入的,被认为是最有前途的算法之一。在大多数情况下,与Hoeffding Tree相比,它提供了更高的精度,而额外的计算成本很小。在这项工作中,作者提出了对Hoeffding随时树的三个改进。这些改进在已知的基准数据集上进行了测试。实验结果表明,提出的两种变体更好地利用了Hoeffding任意树的特性。他们学得更快,同时提供相同的期望的准确性。
{"title":"Improvement of Data Stream Decision Trees","authors":"Sarah Nait Bahloul, Oussama Abderrahim, Aya Ichrak Benhadj Amar, Mohammed Yacine Bouhedadja","doi":"10.4018/ijdwm.290889","DOIUrl":"https://doi.org/10.4018/ijdwm.290889","url":null,"abstract":"The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78748795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Rumor Detection by Image Captioning and Multi-Cell Bi-RNN With Self-Attention in Social Networks 基于图像字幕和自关注的多细胞Bi-RNN改进社交网络谣言检测
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.313189
Jenq-Haur Wang, Chin-Wei Huang, M. Norouzi
User-generated contents in social media are not verified before being posted. They could bring many problems if they were misused. Among various types of rumors, the authors focus on the type in which there's mismatch between images and their surrounding texts. They can be detected by multimodal feature fusion in RNNs with attention mechanism, but the relations between images and texts are not well-addressed. In this paper, the authors propose to improve rumor detection by image captioning and RNNs with self-attention. Firstly, they utilize the idea of image captioning to translate images into the corresponding text descriptions. Secondly, these caption words are represented by word embedding models and aggregated with surrounding texts using early fusion. Finally, multi-cell bi-directional RNNs with self-attention are used to learn important features to identify rumors. From the experimental results, the best F-measure of 0.882 can be obtained, which shows the potential of our proposed approach to rumor detection. Further investigation is needed for data in larger scale.
社交媒体中用户生成的内容在发布之前没有经过验证。如果使用不当,它们可能会带来许多问题。在各种类型的谣言中,作者关注的是图像与周围文本不匹配的类型。在具有注意机制的rnn中,可以通过多模态特征融合来检测这些特征,但图像和文本之间的关系没有得到很好的处理。在本文中,作者提出通过图像字幕和带有自关注的rnn来改进谣言检测。首先,他们利用图像字幕的思想将图像翻译成相应的文本描述。其次,用词嵌入模型表示这些标题词,并利用早期融合与周围文本进行聚合;最后,利用自关注的多细胞双向rnn学习重要特征来识别谣言。实验结果表明,该方法的最佳f值为0.882,表明了该方法在谣言检测中的潜力。需要对更大规模的数据进行进一步的调查。
{"title":"Improving Rumor Detection by Image Captioning and Multi-Cell Bi-RNN With Self-Attention in Social Networks","authors":"Jenq-Haur Wang, Chin-Wei Huang, M. Norouzi","doi":"10.4018/ijdwm.313189","DOIUrl":"https://doi.org/10.4018/ijdwm.313189","url":null,"abstract":"User-generated contents in social media are not verified before being posted. They could bring many problems if they were misused. Among various types of rumors, the authors focus on the type in which there's mismatch between images and their surrounding texts. They can be detected by multimodal feature fusion in RNNs with attention mechanism, but the relations between images and texts are not well-addressed. In this paper, the authors propose to improve rumor detection by image captioning and RNNs with self-attention. Firstly, they utilize the idea of image captioning to translate images into the corresponding text descriptions. Secondly, these caption words are represented by word embedding models and aggregated with surrounding texts using early fusion. Finally, multi-cell bi-directional RNNs with self-attention are used to learn important features to identify rumors. From the experimental results, the best F-measure of 0.882 can be obtained, which shows the potential of our proposed approach to rumor detection. Further investigation is needed for data in larger scale.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42579490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Initial Optimization Techniques for the Cube Algebra Query Language: The Relational Model as a Target 多维代数查询语言的初始优化技术:以关系模型为目标
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.299016
Thomas Mercieca, J. Vella, K. Vella
A common model used in addressing today's overwhelming amounts of data is the OLAP Cube. The OLAP community has proposed several cube algebras, although a standard has still not been nominated. This study focuses on a recent addition to the cube algebras: the user-centric Cube Algebra Query Language (CAQL). The study aims to explore the optimization potential of this algebra by applying logical rewriting inspired by classic relational algebra and parallelism. The lack of standard algebra is often cited as a problem in such discussions. Thus, the significance of this work is that of strengthening the position of this algebra within the OLAP algebras by addressing implementation details. The modern open-source PostgreSQL relational engine is used to encode the CAQL abstraction. A query workload based on a well-known dataset is adopted, and CAQL and SQL implementations are compared. Finally, the quality of the query created is evaluated through the observed performance characteristics of the query. Results show strong improvements over the baseline case of the unoptimized query.
用于处理当今海量数据的一个常用模型是OLAP Cube。OLAP社区已经提出了几个立方体代数,尽管还没有一个标准被提名。本研究主要关注立方体代数的新成员:以用户为中心的立方体代数查询语言(CAQL)。本研究旨在利用经典关系代数的逻辑改写和并行性来探索该代数的优化潜力。在这样的讨论中,缺乏标准代数经常被引用为一个问题。因此,这项工作的意义在于通过解决实现细节来加强该代数在OLAP代数中的地位。使用现代开源的PostgreSQL关系引擎对CAQL抽象进行编码。采用基于知名数据集的查询工作负载,对CAQL和SQL实现进行了比较。最后,通过观察查询的性能特征来评估所创建查询的质量。结果显示,与未优化查询的基线情况相比,有很大的改进。
{"title":"Initial Optimization Techniques for the Cube Algebra Query Language: The Relational Model as a Target","authors":"Thomas Mercieca, J. Vella, K. Vella","doi":"10.4018/ijdwm.299016","DOIUrl":"https://doi.org/10.4018/ijdwm.299016","url":null,"abstract":"A common model used in addressing today's overwhelming amounts of data is the OLAP Cube. The OLAP community has proposed several cube algebras, although a standard has still not been nominated. This study focuses on a recent addition to the cube algebras: the user-centric Cube Algebra Query Language (CAQL). The study aims to explore the optimization potential of this algebra by applying logical rewriting inspired by classic relational algebra and parallelism. The lack of standard algebra is often cited as a problem in such discussions. Thus, the significance of this work is that of strengthening the position of this algebra within the OLAP algebras by addressing implementation details. The modern open-source PostgreSQL relational engine is used to encode the CAQL abstraction. A query workload based on a well-known dataset is adopted, and CAQL and SQL implementations are compared. Finally, the quality of the query created is evaluated through the observed performance characteristics of the query. Results show strong improvements over the baseline case of the unoptimized query.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76108926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emotion-Drive Interpretable Fake News Detection 情绪驱动可解释的假新闻检测
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.314585
Xiaoyin Ge, Mingshu Zhang, Xu An Wang, Jia Liu, Bin Wei
Fake news has brought significant challenges to the healthy development of social media. Although current fake news detection methods are advanced, many models directly utilize unselected user comments and do not consider the emotional connection between news content and user comments. The authors propose an emotion-driven explainable fake news detection model (EDI) to solve this problem. The model can select valuable user comments by using sentiment value, obtain the emotional correlation representation between news content and user comments by using collaborative annotation, and obtain the weighted representation of user comments by using the attention mechanism. Experimental results on Twitter and Weibo show that the detection model significantly outperforms the state-of-the-art models and provides reasonable interpretation.
假新闻给社交媒体的健康发展带来了重大挑战。尽管目前的假新闻检测方法很先进,但许多模型直接利用未经选择的用户评论,没有考虑新闻内容和用户评论之间的情感联系。为了解决这一问题,作者提出了一种情绪驱动的可解释假新闻检测模型(EDI)。该模型可以利用情感值选择有价值的用户评论,利用协同标注获得新闻内容与用户评论之间的情感相关性表示,利用注意力机制获得用户评论的加权表示。在Twitter和微博上的实验结果表明,该检测模型显著优于最先进的模型,并提供了合理的解释。
{"title":"Emotion-Drive Interpretable Fake News Detection","authors":"Xiaoyin Ge, Mingshu Zhang, Xu An Wang, Jia Liu, Bin Wei","doi":"10.4018/ijdwm.314585","DOIUrl":"https://doi.org/10.4018/ijdwm.314585","url":null,"abstract":"Fake news has brought significant challenges to the healthy development of social media. Although current fake news detection methods are advanced, many models directly utilize unselected user comments and do not consider the emotional connection between news content and user comments. The authors propose an emotion-driven explainable fake news detection model (EDI) to solve this problem. The model can select valuable user comments by using sentiment value, obtain the emotional correlation representation between news content and user comments by using collaborative annotation, and obtain the weighted representation of user comments by using the attention mechanism. Experimental results on Twitter and Weibo show that the detection model significantly outperforms the state-of-the-art models and provides reasonable interpretation.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42693737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Based Admission Data Processing for Early Forecasting Students' Learning Outcomes 基于机器学习的录取数据处理对学生学习成绩的早期预测
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.313585
Nguyen Thi Kim Son, Nguyen Van Bien, Nguyen Huu Quynh, C. Thơ
In this paper, the authors explore the factors to improve the accuracy of predicting student learning outcomes. The method can remove redundant and irrelevant factors to get a “clean” data set without having to solve the NP-Hard problem. The method can improve the graduation outcome prediction accuracy through logistic regression machine learning method for “clean” data set. They empirically evaluate the training and university admission data of Hanoi Metropolitan University from 2016 to 2020. From data processing results and the support from the machine learning techniques application program, they analyze, evaluate, and forecast students' learning outcomes based on admission data, first-year, and second-year academic performance data. They then submit proposals of training and admission policies and methods of radically and quantitatively solving problems in university admissions.
在本文中,作者探讨了提高预测学生学习结果准确性的因素。该方法可以去除冗余和不相关的因素,获得“干净”的数据集,而不必解决NP难题。该方法可以通过对“干净”数据集的逻辑回归机器学习方法来提高毕业结果预测的准确性。他们对河内都市大学2016年至2020年的培训和大学录取数据进行了实证评估。根据数据处理结果和机器学习技术应用程序的支持,他们根据录取数据、一年级和二年级的学习成绩数据分析、评估和预测学生的学习结果。然后,他们提交了培训和录取政策的建议,以及从根本上定量解决大学录取问题的方法。
{"title":"Machine Learning Based Admission Data Processing for Early Forecasting Students' Learning Outcomes","authors":"Nguyen Thi Kim Son, Nguyen Van Bien, Nguyen Huu Quynh, C. Thơ","doi":"10.4018/ijdwm.313585","DOIUrl":"https://doi.org/10.4018/ijdwm.313585","url":null,"abstract":"In this paper, the authors explore the factors to improve the accuracy of predicting student learning outcomes. The method can remove redundant and irrelevant factors to get a “clean” data set without having to solve the NP-Hard problem. The method can improve the graduation outcome prediction accuracy through logistic regression machine learning method for “clean” data set. They empirically evaluate the training and university admission data of Hanoi Metropolitan University from 2016 to 2020. From data processing results and the support from the machine learning techniques application program, they analyze, evaluate, and forecast students' learning outcomes based on admission data, first-year, and second-year academic performance data. They then submit proposals of training and admission policies and methods of radically and quantitatively solving problems in university admissions.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49537984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of COVID-19 Detection From Chest X-Rays Using Deep Learning Methods 应用深度学习方法从胸部X射线中检测新冠肺炎的调查
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.314155
Bhargavinath Dornadula, S. Geetha, L. Anbarasi, Seifedine Kadry
The coronavirus (COVID-19) outbreak has opened an alarming situation for the whole world and has been marked as one of the most severe and acute medical conditions in the last hundred years. Various medical imaging modalities including computer tomography (CT) and chest x-rays are employed for diagnosis. This paper presents an overview of the recently developed COVID-19 detection systems from chest x-ray images using deep learning approaches. This review explores and analyses the data sets, feature engineering techniques, image pre-processing methods, and experimental results of various works carried out in the literature. It also highlights the transfer learning techniques and different performance metrics used by researchers in this field. This information is helpful to point out the future research direction in the domain of automatic diagnosis of COVID-19 using deep learning techniques.
冠状病毒(新冠肺炎)的爆发为整个世界打开了一个令人担忧的局面,并被标记为过去一百年来最严重和最急性的医疗状况之一。包括计算机断层扫描(CT)和胸部x光片在内的各种医学成像模式被用于诊断。本文概述了最近开发的使用深度学习方法的胸部x射线图像新冠肺炎检测系统。这篇综述探讨和分析了文献中各种工作的数据集、特征工程技术、图像预处理方法和实验结果。它还强调了迁移学习技术和该领域研究人员使用的不同绩效指标。这些信息有助于指出未来新冠肺炎深度学习自动诊断领域的研究方向。
{"title":"A Survey of COVID-19 Detection From Chest X-Rays Using Deep Learning Methods","authors":"Bhargavinath Dornadula, S. Geetha, L. Anbarasi, Seifedine Kadry","doi":"10.4018/ijdwm.314155","DOIUrl":"https://doi.org/10.4018/ijdwm.314155","url":null,"abstract":"The coronavirus (COVID-19) outbreak has opened an alarming situation for the whole world and has been marked as one of the most severe and acute medical conditions in the last hundred years. Various medical imaging modalities including computer tomography (CT) and chest x-rays are employed for diagnosis. This paper presents an overview of the recently developed COVID-19 detection systems from chest x-ray images using deep learning approaches. This review explores and analyses the data sets, feature engineering techniques, image pre-processing methods, and experimental results of various works carried out in the literature. It also highlights the transfer learning techniques and different performance metrics used by researchers in this field. This information is helpful to point out the future research direction in the domain of automatic diagnosis of COVID-19 using deep learning techniques.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41458501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Warehouse and Interactive Map for Promoting Cultural Heritage in Saudi Arabia Using GIS 利用GIS促进沙特阿拉伯文化遗产的数据仓库和交互式地图
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.314236
Nasser Allheeib, Marine Alraqdi, Mohammed Almukaynizi
With the urbanization of various regions, many historical sites may be misrepresented or totally neglected. As more people move to urban areas with time, heritage areas are being abandoned or ignored. The roads leading to such areas are less maintained, and they are not being adequately promoted. Over the years, the emergence and evolution of digital maps have played a significant role in tourist and cultural exploration and are important sources of information for tourists who are considering specific destinations. In this paper, the authors discuss the development and implementation of a geographic information system (GIS) in the tourism industry. They create an interactive map for tourist sites and suggest a means of retrieving tourist data. They select the Aseer region as a case study since it is rich with deep cultural heritage, comprising almost 4,000 heritage villages, and is considered to be one of the most important tourist destinations in the country. In this paper, the authors propose an initiative for the development and implementation of GIS in the tourism industry.
随着各个地区的城市化,许多历史遗迹可能被歪曲或完全忽视。随着时间的推移,越来越多的人搬到城市地区,遗产地区正在被遗弃或忽视。通往这些地区的道路维修较少,而且没有得到充分的推广。多年来,数字地图的出现和发展在旅游和文化探索中发挥了重要作用,是游客考虑特定目的地的重要信息来源。本文讨论了旅游行业地理信息系统(GIS)的开发与实现。他们为旅游景点创建了一个交互式地图,并提出了一种检索旅游数据的方法。他们选择阿西尔地区作为案例研究,因为该地区拥有丰富的深厚文化遗产,包括近4000个遗产村庄,被认为是该国最重要的旅游目的地之一。在本文中,作者提出了GIS在旅游行业发展和实施的倡议。
{"title":"Data Warehouse and Interactive Map for Promoting Cultural Heritage in Saudi Arabia Using GIS","authors":"Nasser Allheeib, Marine Alraqdi, Mohammed Almukaynizi","doi":"10.4018/ijdwm.314236","DOIUrl":"https://doi.org/10.4018/ijdwm.314236","url":null,"abstract":"With the urbanization of various regions, many historical sites may be misrepresented or totally neglected. As more people move to urban areas with time, heritage areas are being abandoned or ignored. The roads leading to such areas are less maintained, and they are not being adequately promoted. Over the years, the emergence and evolution of digital maps have played a significant role in tourist and cultural exploration and are important sources of information for tourists who are considering specific destinations. In this paper, the authors discuss the development and implementation of a geographic information system (GIS) in the tourism industry. They create an interactive map for tourist sites and suggest a means of retrieving tourist data. They select the Aseer region as a case study since it is rich with deep cultural heritage, comprising almost 4,000 heritage villages, and is considered to be one of the most important tourist destinations in the country. In this paper, the authors propose an initiative for the development and implementation of GIS in the tourism industry.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48390256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association Rule Mining Based on Hybrid Whale Optimization Algorithm 基于混合鲸优化算法的关联规则挖掘
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.308817
Z. Ye, Wenhui Cai, Mingwei Wang, Aixin Zhang, Wen-hua Zhou, Na Deng, Zimei Wei, Daxin Zhu
Association Rule Mining(ARM) is one of the most significant and active research areas in data mining. Recently, Whale Optimization Algorithm (WOA) has been successfully applied in the field of data mining, however, it easily falls into the local optimum. Therefore, an improved WOA based adaptive parameter strategy and Levy Flight mechanism (LWOA) is applied to mine association rules. Meanwhile, a hybrid strategy that blends two algorithms to balance the exploration and exploitation phases is put forward, that is, grey wolf optimization algorithm (GWO), artificial bee colony algorithm (ABC) and cuckoo search algorithm (CS) are devoted to improving the convergence of LWOA. The approach performs a global search and finds the association rules sets by modeling the rule mining task as a multi-objective problem that simultaneously meets support, confidence, lift, and certain factor, which is examined on multiple data sets. Experimental results verify that the proposed method has better mining performance compared to other algorithms involved in the paper.
关联规则挖掘(ARM)是数据挖掘中最重要和最活跃的研究领域之一。近年来,鲸鱼优化算法(Whale Optimization Algorithm, WOA)在数据挖掘领域得到了成功的应用,但该算法容易陷入局部最优。为此,将改进的基于WOA的自适应参数策略和Levy Flight机制(LWOA)应用于关联规则挖掘。同时,提出了一种混合两种算法来平衡探索和开发阶段的混合策略,即灰狼优化算法(GWO)、人工蜂群算法(ABC)和布谷鸟搜索算法(CS)致力于提高LWOA的收敛性。该方法通过将规则挖掘任务建模为同时满足支持度、置信度、提升度和特定因子的多目标问题,并在多个数据集上进行检查,从而进行全局搜索并找到关联规则集。实验结果表明,与其他算法相比,该方法具有更好的挖掘性能。
{"title":"Association Rule Mining Based on Hybrid Whale Optimization Algorithm","authors":"Z. Ye, Wenhui Cai, Mingwei Wang, Aixin Zhang, Wen-hua Zhou, Na Deng, Zimei Wei, Daxin Zhu","doi":"10.4018/ijdwm.308817","DOIUrl":"https://doi.org/10.4018/ijdwm.308817","url":null,"abstract":"Association Rule Mining(ARM) is one of the most significant and active research areas in data mining. Recently, Whale Optimization Algorithm (WOA) has been successfully applied in the field of data mining, however, it easily falls into the local optimum. Therefore, an improved WOA based adaptive parameter strategy and Levy Flight mechanism (LWOA) is applied to mine association rules. Meanwhile, a hybrid strategy that blends two algorithms to balance the exploration and exploitation phases is put forward, that is, grey wolf optimization algorithm (GWO), artificial bee colony algorithm (ABC) and cuckoo search algorithm (CS) are devoted to improving the convergence of LWOA. The approach performs a global search and finds the association rules sets by modeling the rule mining task as a multi-objective problem that simultaneously meets support, confidence, lift, and certain factor, which is examined on multiple data sets. Experimental results verify that the proposed method has better mining performance compared to other algorithms involved in the paper.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91026072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Hybrid Neural Networks With Multi-Head Attention for Document Classification 具有多头关注的层次混合神经网络用于文档分类
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.303673
Weihao Huang, Jiaojiao Chen, Qianhua Cai, Xuejie Liu, Yu-dong Zhang, Xiaohui Hu
Document classification is a research topic aiming to predict the overall text sentiment polarity with the advent of deep neural networks. Various deep learning algorithms have been employed in the current studies to improve classification performance. To this end, this paper proposes a hierarchical hybrid neural network with multi-head attention (HHNN-MHA) model on the task of document classification. The proposed model contains two layers to deal with the word-sentence level and sentence-document level classification respectively. In the first layer, CNN is integrated into Bi-GRU and a multi-head attention mechanism is employed, in order to exploit local and global features. Then, both Bi-GRU and attention mechanism are applied to document processing and classification in the second layer. Experiments on four datasets demonstrate the effectiveness of the proposed method. Compared to the state-of-art methods, our model achieves competitive results in document classification in terms of experimental performance.
随着深度神经网络的出现,文档分类是一个旨在预测文本整体情感极性的研究课题。在目前的研究中,各种深度学习算法被用于提高分类性能。为此,本文提出了一种具有多头关注的层次混合神经网络(HHNN-MHA)模型来完成文档分类任务。该模型包含两层,分别处理词-句子级和句子-文档级的分类。在第一层,将CNN集成到Bi-GRU中,采用多头注意机制,利用局部和全局特征。然后,将Bi-GRU和注意力机制应用于第二层的文档处理和分类。在四个数据集上的实验证明了该方法的有效性。与目前最先进的方法相比,我们的模型在实验性能上取得了具有竞争力的文档分类结果。
{"title":"Hierarchical Hybrid Neural Networks With Multi-Head Attention for Document Classification","authors":"Weihao Huang, Jiaojiao Chen, Qianhua Cai, Xuejie Liu, Yu-dong Zhang, Xiaohui Hu","doi":"10.4018/ijdwm.303673","DOIUrl":"https://doi.org/10.4018/ijdwm.303673","url":null,"abstract":"Document classification is a research topic aiming to predict the overall text sentiment polarity with the advent of deep neural networks. Various deep learning algorithms have been employed in the current studies to improve classification performance. To this end, this paper proposes a hierarchical hybrid neural network with multi-head attention (HHNN-MHA) model on the task of document classification. The proposed model contains two layers to deal with the word-sentence level and sentence-document level classification respectively. In the first layer, CNN is integrated into Bi-GRU and a multi-head attention mechanism is employed, in order to exploit local and global features. Then, both Bi-GRU and attention mechanism are applied to document processing and classification in the second layer. Experiments on four datasets demonstrate the effectiveness of the proposed method. Compared to the state-of-art methods, our model achieves competitive results in document classification in terms of experimental performance.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89573483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Schema Evolution in Multiversion Data Warehouses 多版本数据仓库中的模式演化
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-10-01 DOI: 10.4018/ijdwm.2021100101
Waqas Ahmed, E. Zimányi, A. Vaisman, R. Wrembel
Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL.
数据仓库(dw)的内容和模式都会随着用户需求、业务流程或外部来源的变化而变化。尽管已经提出了使用时态和/或多版本dw的多种方法来处理这些更改,但仍然缺乏有效的解决方案。作者的方法是分离关注点并使用时态dw来处理内容更改,使用多版本dw来处理模式更改。为了解决前者,之前,他们提出了一个时间多维(MD)模型。在本文中,他们提出了一个多版本的模式演化模型来解决后一个问题。这两个模型相互补充,并允许管理内容和模式演变。本文给出了用于派生各种模式版本的模式修改操作符的语义。还展示了联机分析处理(OLAP)操作(如上卷)如何在模型上工作。最后,给出了从多版本MD模型到关系模式的映射,以及标准SQL中的OLAP操作。
{"title":"Schema Evolution in Multiversion Data Warehouses","authors":"Waqas Ahmed, E. Zimányi, A. Vaisman, R. Wrembel","doi":"10.4018/ijdwm.2021100101","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100101","url":null,"abstract":"Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73133877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1