International Journal of Data Warehousing and Mining最新文献

英文中文

A New Approach for Fairness Increment of Consensus-Driven Group Recommender Systems Based on Choquet Integral 基于Choquet积分的共识驱动群推荐系统公平性增量新方法

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.290891

Cu Nguyen Giap, Nguyen Nhu Son, Long Giang Nguyen, Hoang Thi Minh Chau, Tran Manh Tuan, Le Hoang Son

It has been witnessed in recent years for the rising of Group recommender systems (GRSs) in most e-commerce and tourism applications like Booking.com, Traveloka.com, Amazon, etc. One of the most concerned problems in GRSs is to guarantee the fairness between users in a group so-called the consensus-driven group recommender system. This paper proposes a new flexible alternative that embeds a fuzzy measure to aggregation operators of consensus process to improve fairness of group recommendation and deals with group member interaction. Choquet integral is used to build a fuzzy measure based on group member interactions and to seek a better fairness recommendation. The empirical results on the benchmark datasets show the incremental advances of the proposal for dealing with group member interactions and the issue of fairness in Consensus-driven GRS.

近年来，在大多数电子商务和旅游应用程序中，如Booking.com、Traveloka.com、Amazon等，都出现了群组推荐系统(grs)。grs中最受关注的问题之一是如何保证群体中用户之间的公平性，即共识驱动的群体推荐系统。本文提出了一种新的灵活方案，该方案在共识过程的聚合算子中嵌入模糊度量，以提高群体推荐的公平性，并处理群体成员之间的相互作用。利用Choquet积分建立基于群体成员相互作用的模糊度量，寻求更好的公平推荐。在基准数据集上的实证结果表明，在共识驱动的GRS中，该建议在处理群体成员互动和公平问题方面取得了渐进式进展。

引用次数: 2

Semi-Supervised Sentiment Classification on E-Commerce Reviews Using Tripartite Graph and Clustering 基于三部图和聚类的电子商务评论半监督情感分类

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.307904

Xin Lu, Donghong Gu, Haolan Zhang, Zhengxin Song, Qianhua Cai, Hongya Zhao, Haiming Wu

Sentiment classification constitutes an important topic in the field of Natural Language Processing, whose main purpose is to extract the sentiment polarity from unstructured texts. The label propagation algorithm, as a semi-supervised learning method, has been widely used in sentiment classification due to its describing sample relation in a graph-based pattern. Whereas, current graph developing strategies fail to use the global distribution and cannot handle the issues of polysemy and synonymy properly. In this paper, a semi-supervised learning methodology, integrating the tripartite graph and the clustering, is proposed for graph construction. Experiments on E-commerce reviews demonstrate the proposed method outperform baseline methods on the whole, which enables precise sentiment classification with few labeled samples.

情感分类是自然语言处理领域的一个重要课题，其主要目的是从非结构化文本中提取情感极性。标签传播算法作为一种半监督学习方法，以基于图的模式描述样本关系，在情感分类中得到了广泛的应用。然而，现有的图开发策略没有充分利用全局分布，不能很好地处理多义、同义问题。本文提出了一种结合三部图和聚类的半监督学习方法，用于图的构造。电子商务评论实验表明，该方法总体上优于基线方法，能够在较少标记样本的情况下实现精确的情感分类。

引用次数: 2

A Stock Trading Expert System Established by the CNN-GA-Based Collaborative System 基于CNN-GA的协同系统构建股票交易专家系统

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.309957

J. Wu, Lingyun Sun, Gautam Srivastava, Vicente García Díaz, Jerry Chun‐wei Lin

This article uses a new convolutional neural network framework, which has good performance for time series feature extraction and stock price prediction. This method is called the stock sequence array convolutional neural network, or SSACNN for short. SSACNN collects data on leading indicators including historical prices and their futures and options, and uses arrays as the input map of the CNN framework. In the financial market, every number has its logic behind it. Leading indicators such as futures and options can reflect changes in many markets, such as the industry's prosperity. Adding the data set of leading indicators can predict the trend of stock prices well. This study takes the stock markets of the United States and Taiwan as the research objects and uses historical data, futures, and options as data sets to predict the stock prices of these two markets, and then uses genetic algorithms to find trading signals, so as to get a stock trading system. The experimental results show that the stock trading system proposed in this research can help investors obtain certain returns.

本文使用了一种新的卷积神经网络框架，该框架在时间序列特征提取和股价预测方面具有良好的性能。这种方法被称为股票序列阵列卷积神经网络，简称SSACNN。SSACNN收集领先指标的数据，包括历史价格及其期货和期权，并使用数组作为CNN框架的输入图。在金融市场上，每个数字背后都有其逻辑。期货和期权等领先指标可以反映许多市场的变化，例如行业的繁荣程度。加入领先指标的数据集可以很好地预测股价的走势。本研究以美国和台湾股市为研究对象，以历史数据、期货和期权为数据集，预测这两个市场的股价，然后利用遗传算法寻找交易信号，从而得到股票交易系统。实验结果表明，本文提出的股票交易系统可以帮助投资者获得一定的收益。

{"title":"A Stock Trading Expert System Established by the CNN-GA-Based Collaborative System","authors":"J. Wu, Lingyun Sun, Gautam Srivastava, Vicente García Díaz, Jerry Chun‐wei Lin","doi":"10.4018/ijdwm.309957","DOIUrl":"https://doi.org/10.4018/ijdwm.309957","url":null,"abstract":"This article uses a new convolutional neural network framework, which has good performance for time series feature extraction and stock price prediction. This method is called the stock sequence array convolutional neural network, or SSACNN for short. SSACNN collects data on leading indicators including historical prices and their futures and options, and uses arrays as the input map of the CNN framework. In the financial market, every number has its logic behind it. Leading indicators such as futures and options can reflect changes in many markets, such as the industry's prosperity. Adding the data set of leading indicators can predict the trend of stock prices well. This study takes the stock markets of the United States and Taiwan as the research objects and uses historical data, futures, and options as data sets to predict the stock prices of these two markets, and then uses genetic algorithms to find trading signals, so as to get a stock trading system. The experimental results show that the stock trading system proposed in this research can help investors obtain certain returns.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46827513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Density-Based Spatial Anomalous Window Discovery 基于密度的空间异常窗口发现

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.299015

Prerna Mohod, V. Janeja

The focus of this paper is to identify anomalous spatial windows using clustering-based methods. Spatial Anomalous windows are the contiguous groupings of spatial nodes which are unusual with respect to the rest of the data. Many scan statistics based approaches have been proposed for the identification of spatial anomalous windows. To identify similarly behaving groups of points, clustering techniques have been proposed. There are parallels between both types of approaches but these approaches have not been used interchangeably. Thus, the focus of our work is to bridge this gap and identify anomalous spatial windows using clustering based methods. Specifically, we use the circular scan statistic based approach and DBSCAN- Density based Spatial Clustering of Applications with Noise, to bridge the gap between clustering and scan statistics based approach. We present experimental results in US crime data Our results show that our approach is effective in identifying spatial anomalous windows and performs equal or better than existing techniques and does better than pure clustering.

本文的重点是利用基于聚类的方法识别异常空间窗口。空间异常窗口是空间节点的连续分组，这些节点相对于其他数据来说是不寻常的。许多基于扫描统计的方法被提出用于空间异常窗的识别。为了识别行为相似的点群，提出了聚类技术。这两种方法之间有相似之处，但这些方法不能互换使用。因此，我们的工作重点是弥合这一差距，并使用基于聚类的方法识别异常空间窗口。具体来说，我们使用基于圆形扫描统计的方法和基于DBSCAN-密度的带噪声应用空间聚类，以弥合聚类和基于扫描统计的方法之间的差距。我们在美国犯罪数据中展示了实验结果。我们的结果表明，我们的方法在识别空间异常窗口方面是有效的，并且比现有的技术表现相同或更好，并且比纯聚类更好。

引用次数: 0

Improvement of Data Stream Decision Trees 数据流决策树的改进

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.290889

Sarah Nait Bahloul, Oussama Abderrahim, Aya Ichrak Benhadj Amar, Mohammed Yacine Bouhedadja

The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.

数据流分类已成为一个重要而活跃的研究领域。数据流的主要特征是到达的数据量大、到达的速度和速率高、数据流的性质和分布随时间的变化。Hoeffding树是一种以增量方式构建决策树的方法。自文献提出以来，它已成为最流行的数据流分类工具之一。此后出现了几项改进。Hoeffding Anytime Tree是最近引入的，被认为是最有前途的算法之一。在大多数情况下，与Hoeffding Tree相比，它提供了更高的精度，而额外的计算成本很小。在这项工作中，作者提出了对Hoeffding随时树的三个改进。这些改进在已知的基准数据集上进行了测试。实验结果表明，提出的两种变体更好地利用了Hoeffding任意树的特性。他们学得更快，同时提供相同的期望的准确性。

引用次数: 0

Crime Analyses Using Data Analytics 使用数据分析进行犯罪分析

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.299014

Thanu Dayara, F. Thabtah, Hussein Abdel-jaber, S. Zeidan

One potential approach for crime analysis that has shown promising results is data analytics, particularly descriptive and predictive techniques. Data analytics can explore former criminal incidents seeking hidden correlations and patterns, which potentially could be used in crime prevention and resource management. The purpose of this research is to build a crime analysis model using supervised techniques to predict the arrest status of serious crimes in Chicago. This is based on specific indicators, such as timeframe, location in terms of district, community, and beat, and crime type among others. We used time series and clustering techniques to help us identify influential features. Supervised machine learning algorithms then modelled the subset of features against incidents related to battery and assaults in specific timeframes and locations to predict the arrest status response variable. The models derived from Naïve Bayes, Decision Tree, and Support Vector Machine (SVM) algorithms reveal a high predictive accuracy rate at certain times in some communities within Chicago.

数据分析是一种潜在的犯罪分析方法，已经显示出有希望的结果，特别是描述性和预测性技术。数据分析可以探索以前的犯罪事件，寻找隐藏的相关性和模式，这可能会用于预防犯罪和资源管理。本研究的目的是利用监督技术建立一个犯罪分析模型来预测芝加哥严重犯罪的逮捕状况。这是基于具体的指标，如时间框架，地区，社区和殴打的地点，以及犯罪类型等。我们使用时间序列和聚类技术来帮助我们识别有影响的特征。然后，有监督的机器学习算法根据特定时间范围和地点的电池和攻击事件对特征子集进行建模，以预测逮捕状态响应变量。从Naïve贝叶斯、决策树和支持向量机(SVM)算法中得出的模型显示，在芝加哥的一些社区，在特定时间内的预测准确率很高。

引用次数: 0

Improving Rumor Detection by Image Captioning and Multi-Cell Bi-RNN With Self-Attention in Social Networks 基于图像字幕和自关注的多细胞Bi-RNN改进社交网络谣言检测

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.313189

Jenq-Haur Wang, Chin-Wei Huang, M. Norouzi

User-generated contents in social media are not verified before being posted. They could bring many problems if they were misused. Among various types of rumors, the authors focus on the type in which there's mismatch between images and their surrounding texts. They can be detected by multimodal feature fusion in RNNs with attention mechanism, but the relations between images and texts are not well-addressed. In this paper, the authors propose to improve rumor detection by image captioning and RNNs with self-attention. Firstly, they utilize the idea of image captioning to translate images into the corresponding text descriptions. Secondly, these caption words are represented by word embedding models and aggregated with surrounding texts using early fusion. Finally, multi-cell bi-directional RNNs with self-attention are used to learn important features to identify rumors. From the experimental results, the best F-measure of 0.882 can be obtained, which shows the potential of our proposed approach to rumor detection. Further investigation is needed for data in larger scale.

社交媒体中用户生成的内容在发布之前没有经过验证。如果使用不当，它们可能会带来许多问题。在各种类型的谣言中，作者关注的是图像与周围文本不匹配的类型。在具有注意机制的rnn中，可以通过多模态特征融合来检测这些特征，但图像和文本之间的关系没有得到很好的处理。在本文中，作者提出通过图像字幕和带有自关注的rnn来改进谣言检测。首先，他们利用图像字幕的思想将图像翻译成相应的文本描述。其次，用词嵌入模型表示这些标题词，并利用早期融合与周围文本进行聚合;最后，利用自关注的多细胞双向rnn学习重要特征来识别谣言。实验结果表明，该方法的最佳f值为0.882，表明了该方法在谣言检测中的潜力。需要对更大规模的数据进行进一步的调查。

引用次数: 1

Initial Optimization Techniques for the Cube Algebra Query Language: The Relational Model as a Target 多维代数查询语言的初始优化技术:以关系模型为目标

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.299016

Thomas Mercieca, J. Vella, K. Vella

A common model used in addressing today's overwhelming amounts of data is the OLAP Cube. The OLAP community has proposed several cube algebras, although a standard has still not been nominated. This study focuses on a recent addition to the cube algebras: the user-centric Cube Algebra Query Language (CAQL). The study aims to explore the optimization potential of this algebra by applying logical rewriting inspired by classic relational algebra and parallelism. The lack of standard algebra is often cited as a problem in such discussions. Thus, the significance of this work is that of strengthening the position of this algebra within the OLAP algebras by addressing implementation details. The modern open-source PostgreSQL relational engine is used to encode the CAQL abstraction. A query workload based on a well-known dataset is adopted, and CAQL and SQL implementations are compared. Finally, the quality of the query created is evaluated through the observed performance characteristics of the query. Results show strong improvements over the baseline case of the unoptimized query.

用于处理当今海量数据的一个常用模型是OLAP Cube。OLAP社区已经提出了几个立方体代数，尽管还没有一个标准被提名。本研究主要关注立方体代数的新成员:以用户为中心的立方体代数查询语言(CAQL)。本研究旨在利用经典关系代数的逻辑改写和并行性来探索该代数的优化潜力。在这样的讨论中，缺乏标准代数经常被引用为一个问题。因此，这项工作的意义在于通过解决实现细节来加强该代数在OLAP代数中的地位。使用现代开源的PostgreSQL关系引擎对CAQL抽象进行编码。采用基于知名数据集的查询工作负载，对CAQL和SQL实现进行了比较。最后，通过观察查询的性能特征来评估所创建查询的质量。结果显示，与未优化查询的基线情况相比，有很大的改进。

引用次数: 0

Emotion-Drive Interpretable Fake News Detection 情绪驱动可解释的假新闻检测

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.314585

Xiaoyin Ge, Mingshu Zhang, Xu An Wang, Jia Liu, Bin Wei

Fake news has brought significant challenges to the healthy development of social media. Although current fake news detection methods are advanced, many models directly utilize unselected user comments and do not consider the emotional connection between news content and user comments. The authors propose an emotion-driven explainable fake news detection model (EDI) to solve this problem. The model can select valuable user comments by using sentiment value, obtain the emotional correlation representation between news content and user comments by using collaborative annotation, and obtain the weighted representation of user comments by using the attention mechanism. Experimental results on Twitter and Weibo show that the detection model significantly outperforms the state-of-the-art models and provides reasonable interpretation.

假新闻给社交媒体的健康发展带来了重大挑战。尽管目前的假新闻检测方法很先进，但许多模型直接利用未经选择的用户评论，没有考虑新闻内容和用户评论之间的情感联系。为了解决这一问题，作者提出了一种情绪驱动的可解释假新闻检测模型（EDI）。该模型可以利用情感值选择有价值的用户评论，利用协同标注获得新闻内容与用户评论之间的情感相关性表示，利用注意力机制获得用户评论的加权表示。在Twitter和微博上的实验结果表明，该检测模型显著优于最先进的模型，并提供了合理的解释。

引用次数: 0

Machine Learning Based Admission Data Processing for Early Forecasting Students' Learning Outcomes 基于机器学习的录取数据处理对学生学习成绩的早期预测

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2022-01-01 DOI: 10.4018/ijdwm.313585

Nguyen Thi Kim Son, Nguyen Van Bien, Nguyen Huu Quynh, C. Thơ

In this paper, the authors explore the factors to improve the accuracy of predicting student learning outcomes. The method can remove redundant and irrelevant factors to get a “clean” data set without having to solve the NP-Hard problem. The method can improve the graduation outcome prediction accuracy through logistic regression machine learning method for “clean” data set. They empirically evaluate the training and university admission data of Hanoi Metropolitan University from 2016 to 2020. From data processing results and the support from the machine learning techniques application program, they analyze, evaluate, and forecast students' learning outcomes based on admission data, first-year, and second-year academic performance data. They then submit proposals of training and admission policies and methods of radically and quantitatively solving problems in university admissions.

在本文中，作者探讨了提高预测学生学习结果准确性的因素。该方法可以去除冗余和不相关的因素，获得“干净”的数据集，而不必解决NP难题。该方法可以通过对“干净”数据集的逻辑回归机器学习方法来提高毕业结果预测的准确性。他们对河内都市大学2016年至2020年的培训和大学录取数据进行了实证评估。根据数据处理结果和机器学习技术应用程序的支持，他们根据录取数据、一年级和二年级的学习成绩数据分析、评估和预测学生的学习结果。然后，他们提交了培训和录取政策的建议，以及从根本上定量解决大学录取问题的方法。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Data Warehousing and Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀