首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
Enhancing the Diamond Document Warehouse Model 改进钻石文档仓库模型
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100101
M. Azabou, Ameen Banjar, J. Feki
The data warehouse community has paid particular attention to the document warehouse (DocW) paradigm during the last two decades. However, some important issues related to the semantics are still pending and therefore need a deep research investigation. Indeed, the semantic exploitation of the DocW is not yet mature despite it representing a main concern for decision-makers. This paper aims to enhancing the multidimensional model called Diamond Document Warehouse Model with semantics aspects; in particular, it suggests semantic OLAP (on-line analytical processing) operators for querying the DocW.
在过去的二十年中,数据仓库社区特别关注文档仓库(DocW)范式。然而,一些与语义相关的重要问题仍然悬而未决,因此需要深入研究。实际上,DocW的语义开发还不成熟,尽管它代表了决策者的主要关注点。本文旨在从语义方面对多维模型——钻石文档仓库模型进行增强;特别是,它建议用于查询DocW的语义OLAP(在线分析处理)操作符。
{"title":"Enhancing the Diamond Document Warehouse Model","authors":"M. Azabou, Ameen Banjar, J. Feki","doi":"10.4018/ijdwm.2020100101","DOIUrl":"https://doi.org/10.4018/ijdwm.2020100101","url":null,"abstract":"The data warehouse community has paid particular attention to the document warehouse (DocW) paradigm during the last two decades. However, some important issues related to the semantics are still pending and therefore need a deep research investigation. Indeed, the semantic exploitation of the DocW is not yet mature despite it representing a main concern for decision-makers. This paper aims to enhancing the multidimensional model called Diamond Document Warehouse Model with semantics aspects; in particular, it suggests semantic OLAP (on-line analytical processing) operators for querying the DocW.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85634852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Improvement of K-Medoids Clustering Algorithm Based on Fixed Point Iteration 基于不动点迭代的K-Medoids聚类算法改进
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100105
Xiaodi Huang, Minglun Ren, Zhongfeng Hu
The process of K-medoids algorithm is that it first selects data randomly as initial centers to form initial clusters. Then, based on PAM (partitioning around medoids) algorithm, centers will be sequential replaced by all the remaining data to find a result has the best inherent convergence. Since PAM algorithm is an iterative ergodic strategy, when the data size or the number of clusters are huge, its expensive computational overhead will hinder its feasibility. The authors use the fixed-point iteration to search the optimal clustering centers and build a FPK-medoids (fixed point-based K-medoids) algorithm. By constructing fixed point equations for each cluster, the problem of searching optimal centers is converted into the solving of equation set in parallel. The experiment is carried on six standard datasets, and the result shows that the clustering efficiency of proposed algorithm is significantly improved compared with the conventional algorithm. In addition, the clustering quality will be markedly enhanced in handling problems with large-scale datasets or a large number of clusters.
K-medoids算法的过程是首先随机选择数据作为初始中心,形成初始聚类。然后,基于PAM (partitioning around medioids)算法,将中心依次替换为所有剩余数据,以寻找具有最佳固有收敛性的结果。由于PAM算法是一种迭代遍历策略,当数据大小或集群数量较大时,其昂贵的计算开销将阻碍其可行性。利用不动点迭代法搜索最优聚类中心,建立了基于不动点的K-medoids算法。通过构造每个聚类的不动点方程,将寻找最优中心的问题转化为求解并行方程组的问题。在6个标准数据集上进行了实验,结果表明,与传统算法相比,本文算法的聚类效率有显著提高。此外,在处理大规模数据集或大量聚类问题时,聚类质量将显著提高。
{"title":"An Improvement of K-Medoids Clustering Algorithm Based on Fixed Point Iteration","authors":"Xiaodi Huang, Minglun Ren, Zhongfeng Hu","doi":"10.4018/ijdwm.2020100105","DOIUrl":"https://doi.org/10.4018/ijdwm.2020100105","url":null,"abstract":"The process of K-medoids algorithm is that it first selects data randomly as initial centers to form initial clusters. Then, based on PAM (partitioning around medoids) algorithm, centers will be sequential replaced by all the remaining data to find a result has the best inherent convergence. Since PAM algorithm is an iterative ergodic strategy, when the data size or the number of clusters are huge, its expensive computational overhead will hinder its feasibility. The authors use the fixed-point iteration to search the optimal clustering centers and build a FPK-medoids (fixed point-based K-medoids) algorithm. By constructing fixed point equations for each cluster, the problem of searching optimal centers is converted into the solving of equation set in parallel. The experiment is carried on six standard datasets, and the result shows that the clustering efficiency of proposed algorithm is significantly improved compared with the conventional algorithm. In addition, the clustering quality will be markedly enhanced in handling problems with large-scale datasets or a large number of clusters.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85850552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Data Discovery Over Time Series From Star Schemas Based on Association, Correlation, and Causality 基于关联、相关性和因果关系的星型模式的时间序列数据发现
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-10-01 DOI: 10.4018/ijdwm.2020100106
Wallace A. Pinheiro, G. Xexéo, J. Souza, A. B. Pinheiro
This work proposes a methodology applied to repositories modeled using star schemas, such as data marts, to discover relevant time series relations. This paper applies a set of measures related to association, correlation, and causality to create connections among data. In this context, the research proposes a new causality function based on peaks and values that relate coherently time series. To evaluate the approach, the authors use a set of experiments exploring time series about a particular neglected disease that affects several Brazilian cities called American Tegumentary Leishmaniasis and time series about the climate of some cities in Brazil. The authors populate data marts with these data, and the proposed methodology has generated a set of relations linking the notifications of this disease to the variation of temperature and pluviometry.
这项工作提出了一种应用于使用星型模式(如数据集市)建模的存储库的方法,以发现相关的时间序列关系。本文应用了一组与关联、相关性和因果关系相关的度量来创建数据之间的联系。在此背景下,研究提出了一种新的基于相干时间序列的峰值和值的因果函数。为了评估这种方法,作者使用了一组实验,探索了一种影响巴西几个城市的被忽视的疾病的时间序列,这种疾病被称为美洲土著利什曼病,并研究了巴西一些城市的气候时间序列。作者用这些数据填充数据集市,提出的方法产生了一组关系,将这种疾病的通知与温度和降雨量的变化联系起来。
{"title":"Data Discovery Over Time Series From Star Schemas Based on Association, Correlation, and Causality","authors":"Wallace A. Pinheiro, G. Xexéo, J. Souza, A. B. Pinheiro","doi":"10.4018/ijdwm.2020100106","DOIUrl":"https://doi.org/10.4018/ijdwm.2020100106","url":null,"abstract":"This work proposes a methodology applied to repositories modeled using star schemas, such as data marts, to discover relevant time series relations. This paper applies a set of measures related to association, correlation, and causality to create connections among data. In this context, the research proposes a new causality function based on peaks and values that relate coherently time series. To evaluate the approach, the authors use a set of experiments exploring time series about a particular neglected disease that affects several Brazilian cities called American Tegumentary Leishmaniasis and time series about the climate of some cities in Brazil. The authors populate data marts with these data, and the proposed methodology has generated a set of relations linking the notifications of this disease to the variation of temperature and pluviometry.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83654658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering Specific Sales Patterns Among Different Market Segments 发现不同细分市场的特定销售模式
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070103
Cheng-Hsiung Weng, Cheng-Kui Huang
Formulating different marketing strategies to apply to various market segments is a noteworthy undertaking for marketing managers. Accordingly, marketing managers should identify sales patterns among different market segments. The study initially applies the concept of recency–frequency–monetary (RFM) scores to segment transaction datasets into several sub-datasets (market segments) and discovers RFM itemsets from these market segments. In addition, three sales features (unique, common, and particular sales patterns) are defined to identify various sales patterns in this study. In particular, a new criterion (contrast support) is also proposed to discover notable sales patterns among different market segments. This study develops an algorithm, called sales pattern mining (SPMING), for discovering RFM itemsets from several RFM-based market segments and then identifying unique, common, and particular sales patterns. The experimental results from two real datasets show that the SPMING algorithm can discover specific sales patterns in various market segments.
制定不同的营销策略,以适用于不同的细分市场是一个值得注意的事业,市场营销经理。因此,营销经理应该识别不同细分市场的销售模式。该研究最初应用了最近频率货币(RFM)分数的概念,将交易数据集划分为几个子数据集(细分市场),并从这些细分市场中发现RFM项目集。此外,本研究还定义了三种销售特征(独特的、常见的和特殊的销售模式)来识别各种销售模式。特别地,我们也提出了一个新的标准(对比支持)来发现不同细分市场之间显著的销售模式。本研究开发了一种称为销售模式挖掘(SPMING)的算法,用于从几个基于RFM的细分市场中发现RFM项目集,然后识别独特的、常见的和特定的销售模式。两个真实数据集的实验结果表明,SPMING算法可以在不同的细分市场中发现特定的销售模式。
{"title":"Discovering Specific Sales Patterns Among Different Market Segments","authors":"Cheng-Hsiung Weng, Cheng-Kui Huang","doi":"10.4018/ijdwm.2020070103","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070103","url":null,"abstract":"Formulating different marketing strategies to apply to various market segments is a noteworthy undertaking for marketing managers. Accordingly, marketing managers should identify sales patterns among different market segments. The study initially applies the concept of recency–frequency–monetary (RFM) scores to segment transaction datasets into several sub-datasets (market segments) and discovers RFM itemsets from these market segments. In addition, three sales features (unique, common, and particular sales patterns) are defined to identify various sales patterns in this study. In particular, a new criterion (contrast support) is also proposed to discover notable sales patterns among different market segments. This study develops an algorithm, called sales pattern mining (SPMING), for discovering RFM itemsets from several RFM-based market segments and then identifying unique, common, and particular sales patterns. The experimental results from two real datasets show that the SPMING algorithm can discover specific sales patterns in various market segments.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81898263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extending LINE for Network Embedding With Completely Imbalanced Labels 标签完全不平衡的网络嵌入扩展LINE
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070102
Zheng Wang, Qiao Wang, Tanjie Zhu, Xiaojun Ye
Network embedding is a fundamental problem in network research. Semi-supervised network embedding, which benefits from labeled data, has recently attracted considerable interest. However, existing semi-supervised methods would get biased results in the completely-imbalanced label setting where labeled data cannot cover all classes. This article proposes a novel network embedding method which could benefit from completely-imbalanced labels by approximately guaranteeing both intra-class similarity and inter-class dissimilarity. In addition, the authors prove and adopt the matrix factorization form of LINE (a famous network embedding method) as the network structure preserving model. Extensive experiments demonstrate the superiority and robustness of this method.
网络嵌入是网络研究中的一个基本问题。得益于标记数据的半监督网络嵌入最近引起了人们的极大兴趣。然而,现有的半监督方法会在完全不平衡的标签设置下得到有偏差的结果,因为标签数据不能覆盖所有类别。本文提出了一种新的网络嵌入方法,该方法可以近似地保证类内相似度和类间不相似度,从而受益于完全不平衡标签。此外,作者证明并采用著名的网络嵌入方法LINE的矩阵分解形式作为网络结构保持模型。大量的实验证明了该方法的优越性和鲁棒性。
{"title":"Extending LINE for Network Embedding With Completely Imbalanced Labels","authors":"Zheng Wang, Qiao Wang, Tanjie Zhu, Xiaojun Ye","doi":"10.4018/ijdwm.2020070102","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070102","url":null,"abstract":"Network embedding is a fundamental problem in network research. Semi-supervised network embedding, which benefits from labeled data, has recently attracted considerable interest. However, existing semi-supervised methods would get biased results in the completely-imbalanced label setting where labeled data cannot cover all classes. This article proposes a novel network embedding method which could benefit from completely-imbalanced labels by approximately guaranteeing both intra-class similarity and inter-class dissimilarity. In addition, the authors prove and adopt the matrix factorization form of LINE (a famous network embedding method) as the network structure preserving model. Extensive experiments demonstrate the superiority and robustness of this method.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83912304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories 基于轨迹表示的两组特征的空间区域函数分类新方法
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070101
Haitao Zhang, Che Yu, Yan Jin
Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions.Manyspatialclassification methods use trajectories to detect buildings and districts in urban settings. However, methods thatonly take intoconsideration the localspatiotemporalcharacteristics indicatedby trajectories maygenerateinaccurateresults.Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations.Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics.The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84. KeyWoRDS Function of Spatial Regions, Global Connection Characteristics, Local Spatiotemporal Characteristics, Spatial Classification, Trajectory
Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions。Manyspatialclassification方法使用轨迹来探测城市环境中的建筑物和区域。然而,methods_ thatonly take _ intoconsideration the_ localspatiotemporalcharacteristics indicatedby trajectories_ maygenerateinaccurateresults。Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations。Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in> theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics。The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84。关键词:空间区域功能,全局连接特征,局部时空特征,空间分类,轨迹
{"title":"A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories","authors":"Haitao Zhang, Che Yu, Yan Jin","doi":"10.4018/ijdwm.2020070101","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070101","url":null,"abstract":"Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions.Manyspatialclassification methods use trajectories to detect buildings and districts in urban settings. However, methods thatonly take intoconsideration the localspatiotemporalcharacteristics indicatedby trajectories maygenerateinaccurateresults.Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations.Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics.The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84. KeyWoRDS Function of Spatial Regions, Global Connection Characteristics, Local Spatiotemporal Characteristics, Spatial Classification, Trajectory","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77056513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem 一种基于增强辅助自适应聚类的欠采样方法处理类失衡问题
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070104
D. Devi, S. Namasudra, Seifedine Kadry
The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.
类不平衡是一个被广泛研究的主题,它解决了由于数据空间中类分布不均匀而导致标准学习模型性能下降的问题。基于聚类的欠采样是一种流行的解决方案,它提供了从一定数量的聚类中消除大多数类实例来平衡训练数据。然而,基于距离的实例消除常常受到底层数据分布的影响。近年来,集成学习技术因其对罕见实例的加权学习原理而成为一种有效的解决方法。在本文中,提出了一种增强辅助自适应基于聚类的欠采样技术,以促进从AdaBoost集成学习模型检测到的聚类中消除学习无关重要的大多数类实例。用现有的七种基于聚类的欠采样技术对六个二值数据集和三种分类模型进行了验证。实验结果表明,该方法比现有方法更有效。
{"title":"A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem","authors":"D. Devi, S. Namasudra, Seifedine Kadry","doi":"10.4018/ijdwm.2020070104","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070104","url":null,"abstract":"The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81780316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Recommender Systems Using Collaborative Tagging 使用协作标记的推荐系统
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070110
Latha Banda, Karan Singh, Le Hoang Son, Mohamed Abdel-Basset, Pham Huy Thong, H. Huynh, D. Taniar
Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%.
协作标记是一种有用且有效的方法,用于对搜索项进行分类,共享信息,以便用户可以通过在线社交网络进行标记。本文提出了一种新的协同标注推荐系统,该系统将类型兴趣度量和逐渐衰减与扩散相似度相结合。将MovieLens、Amazon等基准推荐系统数据集与现有方法(基于E-FCM标记的协同过滤、E-GK聚类算法、基于GA标记的混合推荐系统和基于信任的增量聚类的协同标记)进行了比较。实验结果表明,对于100个用户的各种分割数据的平均值,本文方法的预测准确率最高达到9.25%,高于现有方法仅获得的5.76%的预测准确率。
{"title":"Recommender Systems Using Collaborative Tagging","authors":"Latha Banda, Karan Singh, Le Hoang Son, Mohamed Abdel-Basset, Pham Huy Thong, H. Huynh, D. Taniar","doi":"10.4018/ijdwm.2020070110","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070110","url":null,"abstract":"Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87852321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval 基于序列化协同训练的药品名称识别专利挖掘与检索
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070105
Na Deng, Caiquan Xiong
IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents,akeystepisChineseword segmentationandnamedentityrecognition.However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM patents,whichdirectlyaffectstheeffectofpatentmining.Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents.Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstract texts isproposed.Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity.Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon. KeyWoRDS Annotation, Co-Training, Machine Learning, Medicine Name, Patent Mining, Patent Retrieval, Traditional Chinese Medicine
IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents、akeystepisChineseword segmentationandnamedentityrecognition。However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM专利,whichdirectlyaffectstheeffectofpatentmining。Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents。Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstracttexts.com isproposed。Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity。Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon。关键词:标注,协同训练,机器学习,药物名称,专利挖掘,专利检索,中药
{"title":"Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval","authors":"Na Deng, Caiquan Xiong","doi":"10.4018/ijdwm.2020070105","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070105","url":null,"abstract":"IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents,akeystepisChineseword segmentationandnamedentityrecognition.However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM patents,whichdirectlyaffectstheeffectofpatentmining.Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents.Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstract texts isproposed.Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity.Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon. KeyWoRDS Annotation, Co-Training, Machine Learning, Medicine Name, Patent Mining, Patent Retrieval, Traditional Chinese Medicine","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73526987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating Feature and Instance Selection Techniques in Opinion Mining 集成特征和实例选择技术的意见挖掘
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070109
Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo
Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification
观点挖掘侧重于从文本中提取极性信息。对于text_term表示,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency - inverse_document_frequency_ (TF - idf), can_yield_diverse_numbers_ of text_features。> > text>分类,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance。Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments。Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously。两个组合过程基于在differentorders,werecompared中执行featureselection_和instanceselection_。我们使用了Specifically、twofeatureselectionmethods、namelyTFand TF-IDF和两个实例选择方法(drop3和IB3)进行比较。实验结果是通过使用三个twitter数据集来发展情绪classifiersshowedthatTF-IDFfollowedbyDROP3performsthebest。关键词特征选择,实例选择,意见挖掘,文本分类
{"title":"Integrating Feature and Instance Selection Techniques in Opinion Mining","authors":"Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo","doi":"10.4018/ijdwm.2020070109","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070109","url":null,"abstract":"Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91046196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1