International Journal of Data Warehousing and Mining最新文献

英文中文

Efficient Algorithms for Dynamic Incomplete Decision Systems 动态不完全决策系统的高效算法

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021070103

N. Thang, Long Giang Nguyen, Hoang Viet Long, N. Tuan, T. Tuan, Ngo Duy Tan

Attribute reduction is a crucial problem in the process of data mining and knowledge discovery in big data. In incomplete decision systems, the model using tolerance rough set is fundamental to solve the problem by computing the redact to reduce the execution time. However, these proposals used the traditional filter approach so that the reduct was not optimal in the number of attributes and the accuracy of classification. The problem is critical in the dynamic incomplete decision systems which are more appropriate for real-world applications. Therefore, this paper proposes two novel incremental algorithms using the combination of filter and wrapper approach, namely IFWA_ADO and IFWA_DEO, respectively, for the dynamic incomplete decision systems. The IFWA_ADO computes reduct incrementally in cases of adding multiple objects while IFWA_DEO updates reduct when removing multiple objects. These algorithms are also verified on six data sets. Experimental results show that the filter-wrapper algorithms get higher performance than the other filter incremental algorithms.

属性约简是大数据数据挖掘和知识发现过程中的一个关键问题。在不完全决策系统中，使用容差粗糙集的模型是通过计算编校来减少执行时间的基础。然而，这些建议使用传统的过滤方法，使得约简在属性数量和分类精度上不是最优的。这个问题在动态不完全决策系统中非常关键，因为动态不完全决策系统更适合实际应用。因此，针对动态不完全决策系统，本文提出了两种结合过滤和包装的增量算法IFWA_ADO和IFWA_DEO。在添加多个对象的情况下，IFWA_ADO计算的reduce是增量的，而IFWA_DEO计算的reduce是在删除多个对象时更新的。这些算法还在六个数据集上进行了验证。实验结果表明，该滤波包装算法比其他滤波增量算法具有更高的性能。

引用次数: 2

A Temporal Multidimensional Model and OLAP Operators 时间多维模型和OLAP操作符

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100107

Waqas Ahmed, E. Zimányi, A. Vaisman, R. Wrembel

Usually, data in data warehouses (DWs) are stored using the notion of the multidimensional (MD) model. Often, DWs change in content and structure due to several reasons, like, for instance, changes in a business scenario or technology. For accurate decision-making, a DW model must allow storing and analyzing time-varying data. This paper addresses the problem of keeping track of the history of the data in a DW. For this, first, a formalization of the traditional MD model is proposed and then extended as a generalized temporal MD model. The model comes equipped with a collection of typical online analytical processing (OLAP) operations with temporal semantics, which is formalized for the four classic operations, namely roll-up, dice, project, and drill-across. Finally, the mapping from the generalized temporal model into a relational schema is presented together with an implementation of the temporal OLAP operations in standard SQL.

通常，数据仓库(dw)中的数据使用多维模型的概念进行存储。通常，dw的内容和结构会由于几个原因而发生变化，例如，业务场景或技术的变化。为了做出准确的决策，DW模型必须允许存储和分析时变数据。本文解决了在数据仓库中跟踪数据历史的问题。为此，首先提出了传统MD模型的形式化，然后将其扩展为广义时间MD模型。该模型配备了一组具有时态语义的典型在线分析处理(OLAP)操作，这些操作被形式化为四种经典操作，即卷取、掷骰子、项目和钻取。最后，给出了从广义时间模型到关系模式的映射，以及在标准SQL中实现的时间OLAP操作。

引用次数: 3

Recommender Systems Based on Resonance Relationship of Criteria With Choquet Operation 基于标准共振关系和Choquet操作的推荐系统

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100103

H. Huynh, Le Hoang Son, Cu Nguyen Giap, T. Huynh, H. H. Luong

Recommender systems are becoming increasingly important in every aspect of life for the diverse needs of users. One of the main goals of the recommender system is to make decisions based on criteria. It is thus important to have a reasonable solution that is consistent with user requirements and characteristics of the stored data. This paper proposes a novel recommendation method based on the resonance relationship of user criteria with Choquet Operation for building a decision-making model. It has been evaluated on the multirecsys tool based on R language. Outputs from the proposed model are effective and reliable through the experiments. It can be applied in appropriate contexts to improve efficiency and minimize the limitations of the current recommender systems.

由于用户的多样化需求，推荐系统在生活的各个方面变得越来越重要。推荐系统的主要目标之一是根据标准做出决定。因此，重要的是要有一个合理的解决方案，符合用户的需求和存储数据的特征。本文提出了一种基于用户准则与Choquet操作的共振关系的推荐方法，用于构建决策模型。并在基于R语言的multirecsys工具上进行了评估。通过实验证明，该模型的输出是有效可靠的。它可以在适当的环境中应用，以提高效率并最大限度地减少当前推荐系统的局限性。

引用次数: 1

The Model-Driven Architecture for the Trajectory Data Warehouse Modeling 轨迹数据仓库建模的模型驱动体系结构

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100102

Noura Azaiez, J. Akaichi

Business Intelligence includes the concept of data warehousing to support decision making. As the ETL process presents the core of the warehousing technology, it is responsible for pulling data out of the source systems and placing it into a data warehouse. Given the technology development in the field of geographical information systems, pervasive systems, and the positioning systems, the traditional warehouse features become unable to handle the mobility aspect integrated in the warehousing chain. Therefore, the trajectory or the mobility data gathered from the mobile object movements have to be managed through what is called the trajectory ELT. For this purpose, the authors emphasize the power of the model-driven architecture approach to achieve the whole transformation task, in this case transforming trajectory data source model that describes the resulting trajectories into trajectory data mart models. The authors illustrate the proposed approach with an epilepsy patient state case study.

商业智能包括支持决策制定的数据仓库概念。由于ETL流程呈现了仓储技术的核心，它负责从源系统中提取数据并将其放入数据仓库中。随着地理信息系统、普适系统和定位系统等技术的发展，传统的仓库特征已经无法处理集成在仓储链中的移动性方面的问题。因此，从移动物体运动中收集的轨迹或移动数据必须通过所谓的轨迹ELT来管理。为此，作者强调了模型驱动架构方法实现整个转换任务的能力，在这种情况下，将描述结果轨迹的轨迹数据源模型转换为轨迹数据集市模型。作者用癫痫患者状态的案例研究说明了所提出的方法。

引用次数: 1

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100104

V. Janeja, J. Namayanja, Y. Yesha, A. Kench, V. Misal

The analysis of both continuous and categorical attributes generating a heterogeneous mix of attributes poses challenges in data clustering. Traditional clustering techniques like k-means clustering work well when applied to small homogeneous datasets. However, as the data size becomes large, it becomes increasingly difficult to find meaningful and well-formed clusters. In this paper, the authors propose an approach that utilizes a combined similarity function, which looks at similarity across numeric and categorical features and employs this function in a clustering algorithm to identify similarity between data objects. The findings indicate that the proposed approach handles heterogeneous data better by forming well-separated clusters.

连续属性和分类属性的分析产生了异构的属性组合，这对数据聚类提出了挑战。传统的聚类技术，如k-means聚类，在应用于小型同构数据集时效果很好。然而，随着数据量变得越来越大，找到有意义且格式良好的集群变得越来越困难。在本文中，作者提出了一种利用组合相似函数的方法，该方法查看数字和分类特征之间的相似性，并在聚类算法中使用该函数来识别数据对象之间的相似性。研究结果表明，该方法通过形成分离良好的簇来更好地处理异构数据。

引用次数: 2

Enhancing the Diamond Document Warehouse Model 改进钻石文档仓库模型

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100101

M. Azabou, Ameen Banjar, J. Feki

The data warehouse community has paid particular attention to the document warehouse (DocW) paradigm during the last two decades. However, some important issues related to the semantics are still pending and therefore need a deep research investigation. Indeed, the semantic exploitation of the DocW is not yet mature despite it representing a main concern for decision-makers. This paper aims to enhancing the multidimensional model called Diamond Document Warehouse Model with semantics aspects; in particular, it suggests semantic OLAP (on-line analytical processing) operators for querying the DocW.

在过去的二十年中，数据仓库社区特别关注文档仓库(DocW)范式。然而，一些与语义相关的重要问题仍然悬而未决，因此需要深入研究。实际上，DocW的语义开发还不成熟，尽管它代表了决策者的主要关注点。本文旨在从语义方面对多维模型——钻石文档仓库模型进行增强;特别是，它建议用于查询DocW的语义OLAP(在线分析处理)操作符。

引用次数: 0

An Improvement of K-Medoids Clustering Algorithm Based on Fixed Point Iteration 基于不动点迭代的K-Medoids聚类算法改进

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100105

Xiaodi Huang, Minglun Ren, Zhongfeng Hu

The process of K-medoids algorithm is that it first selects data randomly as initial centers to form initial clusters. Then, based on PAM (partitioning around medoids) algorithm, centers will be sequential replaced by all the remaining data to find a result has the best inherent convergence. Since PAM algorithm is an iterative ergodic strategy, when the data size or the number of clusters are huge, its expensive computational overhead will hinder its feasibility. The authors use the fixed-point iteration to search the optimal clustering centers and build a FPK-medoids (fixed point-based K-medoids) algorithm. By constructing fixed point equations for each cluster, the problem of searching optimal centers is converted into the solving of equation set in parallel. The experiment is carried on six standard datasets, and the result shows that the clustering efficiency of proposed algorithm is significantly improved compared with the conventional algorithm. In addition, the clustering quality will be markedly enhanced in handling problems with large-scale datasets or a large number of clusters.

K-medoids算法的过程是首先随机选择数据作为初始中心，形成初始聚类。然后，基于PAM (partitioning around medioids)算法，将中心依次替换为所有剩余数据，以寻找具有最佳固有收敛性的结果。由于PAM算法是一种迭代遍历策略，当数据大小或集群数量较大时，其昂贵的计算开销将阻碍其可行性。利用不动点迭代法搜索最优聚类中心，建立了基于不动点的K-medoids算法。通过构造每个聚类的不动点方程，将寻找最优中心的问题转化为求解并行方程组的问题。在6个标准数据集上进行了实验，结果表明，与传统算法相比，本文算法的聚类效率有显著提高。此外，在处理大规模数据集或大量聚类问题时，聚类质量将显著提高。

引用次数: 3

Data Discovery Over Time Series From Star Schemas Based on Association, Correlation, and Causality 基于关联、相关性和因果关系的星型模式的时间序列数据发现

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100106

Wallace A. Pinheiro, G. Xexéo, J. Souza, A. B. Pinheiro

This work proposes a methodology applied to repositories modeled using star schemas, such as data marts, to discover relevant time series relations. This paper applies a set of measures related to association, correlation, and causality to create connections among data. In this context, the research proposes a new causality function based on peaks and values that relate coherently time series. To evaluate the approach, the authors use a set of experiments exploring time series about a particular neglected disease that affects several Brazilian cities called American Tegumentary Leishmaniasis and time series about the climate of some cities in Brazil. The authors populate data marts with these data, and the proposed methodology has generated a set of relations linking the notifications of this disease to the variation of temperature and pluviometry.

这项工作提出了一种应用于使用星型模式(如数据集市)建模的存储库的方法，以发现相关的时间序列关系。本文应用了一组与关联、相关性和因果关系相关的度量来创建数据之间的联系。在此背景下，研究提出了一种新的基于相干时间序列的峰值和值的因果函数。为了评估这种方法，作者使用了一组实验，探索了一种影响巴西几个城市的被忽视的疾病的时间序列，这种疾病被称为美洲土著利什曼病，并研究了巴西一些城市的气候时间序列。作者用这些数据填充数据集市，提出的方法产生了一组关系，将这种疾病的通知与温度和降雨量的变化联系起来。

引用次数: 0

Extending LINE for Network Embedding With Completely Imbalanced Labels 标签完全不平衡的网络嵌入扩展LINE

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070102

Zheng Wang, Qiao Wang, Tanjie Zhu, Xiaojun Ye

Network embedding is a fundamental problem in network research. Semi-supervised network embedding, which benefits from labeled data, has recently attracted considerable interest. However, existing semi-supervised methods would get biased results in the completely-imbalanced label setting where labeled data cannot cover all classes. This article proposes a novel network embedding method which could benefit from completely-imbalanced labels by approximately guaranteeing both intra-class similarity and inter-class dissimilarity. In addition, the authors prove and adopt the matrix factorization form of LINE (a famous network embedding method) as the network structure preserving model. Extensive experiments demonstrate the superiority and robustness of this method.

网络嵌入是网络研究中的一个基本问题。得益于标记数据的半监督网络嵌入最近引起了人们的极大兴趣。然而，现有的半监督方法会在完全不平衡的标签设置下得到有偏差的结果，因为标签数据不能覆盖所有类别。本文提出了一种新的网络嵌入方法，该方法可以近似地保证类内相似度和类间不相似度，从而受益于完全不平衡标签。此外，作者证明并采用著名的网络嵌入方法LINE的矩阵分解形式作为网络结构保持模型。大量的实验证明了该方法的优越性和鲁棒性。

引用次数: 1

Discovering Specific Sales Patterns Among Different Market Segments 发现不同细分市场的特定销售模式

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining

Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070103

Cheng-Hsiung Weng, Cheng-Kui Huang

Formulating different marketing strategies to apply to various market segments is a noteworthy undertaking for marketing managers. Accordingly, marketing managers should identify sales patterns among different market segments. The study initially applies the concept of recency–frequency–monetary (RFM) scores to segment transaction datasets into several sub-datasets (market segments) and discovers RFM itemsets from these market segments. In addition, three sales features (unique, common, and particular sales patterns) are defined to identify various sales patterns in this study. In particular, a new criterion (contrast support) is also proposed to discover notable sales patterns among different market segments. This study develops an algorithm, called sales pattern mining (SPMING), for discovering RFM itemsets from several RFM-based market segments and then identifying unique, common, and particular sales patterns. The experimental results from two real datasets show that the SPMING algorithm can discover specific sales patterns in various market segments.

制定不同的营销策略，以适用于不同的细分市场是一个值得注意的事业，市场营销经理。因此，营销经理应该识别不同细分市场的销售模式。该研究最初应用了最近频率货币(RFM)分数的概念，将交易数据集划分为几个子数据集(细分市场)，并从这些细分市场中发现RFM项目集。此外，本研究还定义了三种销售特征(独特的、常见的和特殊的销售模式)来识别各种销售模式。特别地，我们也提出了一个新的标准(对比支持)来发现不同细分市场之间显著的销售模式。本研究开发了一种称为销售模式挖掘(SPMING)的算法，用于从几个基于RFM的细分市场中发现RFM项目集，然后识别独特的、常见的和特定的销售模式。两个真实数据集的实验结果表明，SPMING算法可以在不同的细分市场中发现特定的销售模式。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Data Warehousing and Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀