首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories 基于轨迹表示的两组特征的空间区域函数分类新方法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070101
Haitao Zhang, Che Yu, Yan Jin
Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions.Manyspatialclassification methods use trajectories to detect buildings and districts in urban settings. However, methods thatonly take intoconsideration the localspatiotemporalcharacteristics indicatedby trajectories maygenerateinaccurateresults.Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations.Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics.The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84. KeyWoRDS Function of Spatial Regions, Global Connection Characteristics, Local Spatiotemporal Characteristics, Spatial Classification, Trajectory
Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions。Manyspatialclassification方法使用轨迹来探测城市环境中的建筑物和区域。然而,methods_ thatonly take _ intoconsideration the_ localspatiotemporalcharacteristics indicatedby trajectories_ maygenerateinaccurateresults。Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations。Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in> theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics。The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84。关键词:空间区域功能,全局连接特征,局部时空特征,空间分类,轨迹
{"title":"A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories","authors":"Haitao Zhang, Che Yu, Yan Jin","doi":"10.4018/ijdwm.2020070101","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070101","url":null,"abstract":"Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions.Manyspatialclassification methods use trajectories to detect buildings and districts in urban settings. However, methods thatonly take intoconsideration the localspatiotemporalcharacteristics indicatedby trajectories maygenerateinaccurateresults.Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations.Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics.The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84. KeyWoRDS Function of Spatial Regions, Global Connection Characteristics, Local Spatiotemporal Characteristics, Spatial Classification, Trajectory","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"128 1","pages":"1-19"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77056513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem 一种基于增强辅助自适应聚类的欠采样方法处理类失衡问题
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070104
D. Devi, S. Namasudra, Seifedine Kadry
The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.
类不平衡是一个被广泛研究的主题,它解决了由于数据空间中类分布不均匀而导致标准学习模型性能下降的问题。基于聚类的欠采样是一种流行的解决方案,它提供了从一定数量的聚类中消除大多数类实例来平衡训练数据。然而,基于距离的实例消除常常受到底层数据分布的影响。近年来,集成学习技术因其对罕见实例的加权学习原理而成为一种有效的解决方法。在本文中,提出了一种增强辅助自适应基于聚类的欠采样技术,以促进从AdaBoost集成学习模型检测到的聚类中消除学习无关重要的大多数类实例。用现有的七种基于聚类的欠采样技术对六个二值数据集和三种分类模型进行了验证。实验结果表明,该方法比现有方法更有效。
{"title":"A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem","authors":"D. Devi, S. Namasudra, Seifedine Kadry","doi":"10.4018/ijdwm.2020070104","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070104","url":null,"abstract":"The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"15 1","pages":"60-86"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81780316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Recommender Systems Using Collaborative Tagging 使用协作标记的推荐系统
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070110
Latha Banda, Karan Singh, Le Hoang Son, Mohamed Abdel-Basset, Pham Huy Thong, H. Huynh, D. Taniar
Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%.
协作标记是一种有用且有效的方法,用于对搜索项进行分类,共享信息,以便用户可以通过在线社交网络进行标记。本文提出了一种新的协同标注推荐系统,该系统将类型兴趣度量和逐渐衰减与扩散相似度相结合。将MovieLens、Amazon等基准推荐系统数据集与现有方法(基于E-FCM标记的协同过滤、E-GK聚类算法、基于GA标记的混合推荐系统和基于信任的增量聚类的协同标记)进行了比较。实验结果表明,对于100个用户的各种分割数据的平均值,本文方法的预测准确率最高达到9.25%,高于现有方法仅获得的5.76%的预测准确率。
{"title":"Recommender Systems Using Collaborative Tagging","authors":"Latha Banda, Karan Singh, Le Hoang Son, Mohamed Abdel-Basset, Pham Huy Thong, H. Huynh, D. Taniar","doi":"10.4018/ijdwm.2020070110","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070110","url":null,"abstract":"Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"7 1","pages":"183-200"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87852321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Skeleton Network Extraction and Analysis on Bicycle Sharing Networks 自行车共享网络的骨架网络提取与分析
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070108
Kanokwan Malang, Shuliang Wang, Yuanyuan Lv, Aniwat Phaphuangwittayakul
Skeletonnetworkextractionhasbeenadoptedunevenlyintransportationnetworkswhosenodes are always represented as spatial units. In this article, the TPks skeleton network extraction methodisproposedandappliedtobicyclesharingnetworks.Themethodaimstoreducethe networksizewhilepreservingkeytopologiesandspatialfeatures.Theauthorsquantifiedthe importanceofnodesbyanimprovedtopologypotentialalgorithm.Thespatialclusteringallows todetecthightrafficconcentrationsandallocate thenodesofeachclusteraccordingto their spatialdistribution.Then,theskeletonnetworkisconstructedbyaggregatingthemostimportant indicatedskeletonnodes.Theauthorsexaminetheskeletonnetworkcharacteristicsanddifferent spatialinformationusingtheoriginalnetworksasabenchmark.Theresultsshowthattheskeleton networkscanpreservethetopologicalandspatialinformationsimilartotheoriginalnetworks whilereducingtheirsizeandcomplexity. KEyWoRDS Backbone Extraction, Complex Network, Geographical Information, Network Summarization, Public Bicycle, Spatial Information, Topology Potential, Transportation
Skeletonnetworkextractionhasbeenadoptedunevenlyintransportationnetworkswhosenodes它们总是被表示为空间单位。在这篇文章中,thetpks骷髅网络提取methodisproposedandappliedtobicyclesharingnetworks。Themethodaimstoreducethe networksizewhilepreservingkeytopologiesandspatialfeatures。Theauthorsquantifiedthe importanceofnodesbyanimprovedtopologypotentialalgorithm。Thespatialclusteringallows todetecthightrafficconcentrationsandallocate thenodesofeachclusteraccordingto their > spatialdistribution。Then,theskeletonnetworkisconstructedbyaggregatingthemostimportant indicatedskeletonnodes。Theauthorsexaminetheskeletonnetworkcharacteristicsanddifferent spatialinformationusingtheoriginalnetworksasabenchmark。Theresultsshowthattheskeleton networkscanpreservethetopologicalandspatialinformationsimilartotheoriginalnetworks whilereducingtheirsizeandcomplexity。关键词主干提取,复杂网络,地理信息,网络汇总,公共自行车,空间信息,拓扑势能,交通
{"title":"Skeleton Network Extraction and Analysis on Bicycle Sharing Networks","authors":"Kanokwan Malang, Shuliang Wang, Yuanyuan Lv, Aniwat Phaphuangwittayakul","doi":"10.4018/ijdwm.2020070108","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070108","url":null,"abstract":"Skeletonnetworkextractionhasbeenadoptedunevenlyintransportationnetworkswhosenodes are always represented as spatial units. In this article, the TPks skeleton network extraction methodisproposedandappliedtobicyclesharingnetworks.Themethodaimstoreducethe networksizewhilepreservingkeytopologiesandspatialfeatures.Theauthorsquantifiedthe importanceofnodesbyanimprovedtopologypotentialalgorithm.Thespatialclusteringallows todetecthightrafficconcentrationsandallocate thenodesofeachclusteraccordingto their spatialdistribution.Then,theskeletonnetworkisconstructedbyaggregatingthemostimportant indicatedskeletonnodes.Theauthorsexaminetheskeletonnetworkcharacteristicsanddifferent spatialinformationusingtheoriginalnetworksasabenchmark.Theresultsshowthattheskeleton networkscanpreservethetopologicalandspatialinformationsimilartotheoriginalnetworks whilereducingtheirsizeandcomplexity. KEyWoRDS Backbone Extraction, Complex Network, Geographical Information, Network Summarization, Public Bicycle, Spatial Information, Topology Potential, Transportation","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"61 1","pages":"146-167"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84809236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conceptual Model and Design of Semantic Trajectory Data Warehouse 语义轨迹数据仓库的概念模型与设计
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070106
M. Kwakye
Thetrajectorypatternsofamovingobjectinaspatio-temporaldomainoffersvariedinformationin termsofthemanagementofthedatageneratedfromthemovement.Thequeryresultsoftrajectory objects from thedatawarehouse areusuallynot enough to answer certain trendbehaviours and meaningfulinferenceswithouttheassociatedsemanticinformationofthetrajectoryobjectorthe geospatialenvironmentwithinaspecifiedpurposeorcontext.Thisarticleformulatesanddesignsa genericontologymodellingframeworkthatservesasthebackgroundmodelplatformforthedesignof asemanticdatawarehousefortrajectories.Themethodologyunderpinsonhighergranularityofdata asaresultofpre-processedandextract-transformed-load(ETL)datasoastoofferefficientsemantic inferencetotheunderlyingtrajectorydata.Moreover,themodellingapproachoutlinesthethematic dimensionsthatofferadesignplatformforpredictivetrendanalysisandknowledgediscoveryinthe trajectorydynamicsanddataprocessingformovingobjects. KeyWoRDS Generic Trajectory Ontology, Multidimensional Entity Relationship, Semantic Annotations, Semantic Trajectory Data Warehouse, Spatio-Temporal Data Modelling
Thetrajectorypatternsofamovingobjectinaspatio-temporaldomainoffersvariedinformationin termsofthemanagementofthedatageneratedfromthemovement。Thequeryresultsoftrajectory objects_从_ thedatawarehouse areusuallynot足够_回答_某些_ trendbehaviours和_ meaningfulinferenceswithouttheassociatedsemanticinformationofthetrajectoryobjectorthe geospatialenvironmentwithinaspecifiedpurposeorcontext。Thisarticleformulatesanddesignsa genericontologymodellingframeworkthatservesasthebackgroundmodelplatformforthedesignof asemanticdatawarehousefortrajectories。Themethodologyunderpinsonhighergranularityofdata asaresultofpre-processedandextract-transformed-load(ETL)datasoastoofferefficientsemantic inferencetotheunderlyingtrajectorydata。Moreover,themodellingapproachoutlinesthethematic dimensionsthatofferadesignplatformforpredictivetrendanalysisandknowledgediscoveryinthe trajectorydynamicsanddataprocessingformovingobjects。关键词:通用轨迹本体,多维实体关系,语义标注,语义轨迹数据仓库,时空数据建模
{"title":"Conceptual Model and Design of Semantic Trajectory Data Warehouse","authors":"M. Kwakye","doi":"10.4018/ijdwm.2020070106","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070106","url":null,"abstract":"Thetrajectorypatternsofamovingobjectinaspatio-temporaldomainoffersvariedinformationin termsofthemanagementofthedatageneratedfromthemovement.Thequeryresultsoftrajectory objects from thedatawarehouse areusuallynot enough to answer certain trendbehaviours and meaningfulinferenceswithouttheassociatedsemanticinformationofthetrajectoryobjectorthe geospatialenvironmentwithinaspecifiedpurposeorcontext.Thisarticleformulatesanddesignsa genericontologymodellingframeworkthatservesasthebackgroundmodelplatformforthedesignof asemanticdatawarehousefortrajectories.Themethodologyunderpinsonhighergranularityofdata asaresultofpre-processedandextract-transformed-load(ETL)datasoastoofferefficientsemantic inferencetotheunderlyingtrajectorydata.Moreover,themodellingapproachoutlinesthethematic dimensionsthatofferadesignplatformforpredictivetrendanalysisandknowledgediscoveryinthe trajectorydynamicsanddataprocessingformovingobjects. KeyWoRDS Generic Trajectory Ontology, Multidimensional Entity Relationship, Semantic Annotations, Semantic Trajectory Data Warehouse, Spatio-Temporal Data Modelling","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"13 1","pages":"108-131"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84935765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval 基于序列化协同训练的药品名称识别专利挖掘与检索
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070105
Na Deng, Caiquan Xiong
IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents,akeystepisChineseword segmentationandnamedentityrecognition.However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM patents,whichdirectlyaffectstheeffectofpatentmining.Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents.Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstract texts isproposed.Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity.Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon. KeyWoRDS Annotation, Co-Training, Machine Learning, Medicine Name, Patent Mining, Patent Retrieval, Traditional Chinese Medicine
IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents、akeystepisChineseword segmentationandnamedentityrecognition。However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM专利,whichdirectlyaffectstheeffectofpatentmining。Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents。Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstracttexts.com isproposed。Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity。Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon。关键词:标注,协同训练,机器学习,药物名称,专利挖掘,专利检索,中药
{"title":"Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval","authors":"Na Deng, Caiquan Xiong","doi":"10.4018/ijdwm.2020070105","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070105","url":null,"abstract":"IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents,akeystepisChineseword segmentationandnamedentityrecognition.However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM patents,whichdirectlyaffectstheeffectofpatentmining.Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents.Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstract texts isproposed.Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity.Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon. KeyWoRDS Annotation, Co-Training, Machine Learning, Medicine Name, Patent Mining, Patent Retrieval, Traditional Chinese Medicine","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"42 1","pages":"87-107"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73526987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating Feature and Instance Selection Techniques in Opinion Mining 集成特征和实例选择技术的意见挖掘
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070109
Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo
Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification
观点挖掘侧重于从文本中提取极性信息。对于text_term表示,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency - inverse_document_frequency_ (TF - idf), can_yield_diverse_numbers_ of text_features。> > text>分类,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance。Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments。Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously。两个组合过程基于在differentorders,werecompared中执行featureselection_和instanceselection_。我们使用了Specifically、twofeatureselectionmethods、namelyTFand TF-IDF和两个实例选择方法(drop3和IB3)进行比较。实验结果是通过使用三个twitter数据集来发展情绪classifiersshowedthatTF-IDFfollowedbyDROP3performsthebest。关键词特征选择,实例选择,意见挖掘,文本分类
{"title":"Integrating Feature and Instance Selection Techniques in Opinion Mining","authors":"Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo","doi":"10.4018/ijdwm.2020070109","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070109","url":null,"abstract":"Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"49 1 1","pages":"168-182"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91046196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection 区域建议网络快速目标检测中一种新的多尺度特征融合方法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-07-01 DOI: 10.4018/ijdwm.2020070107
Gang Liu, Chuyi Wang
Neuralnetworkmodelshavebeenwidelyusedinthefieldofobjectdetecting.Theregionproposal methodsarewidelyusedinthecurrentobjectdetectionnetworksandhaveachievedwellperformance. Thecommonregionproposalmethodshunttheobjectsbygeneratingthousandsofthecandidate boxes.Compared toother regionproposalmethods, the regionproposalnetwork (RPN)method improvestheaccuracyanddetectionspeedwithseveralhundredcandidateboxes.However,sincethe featuremapscontainsinsufficientinformation,theabilityofRPNtodetectandlocatesmall-sized objectsispoor.Anovelmulti-scalefeaturefusionmethodforregionproposalnetworktosolvethe aboveproblemsisproposedinthisarticle.Theproposedmethodiscalledmulti-scaleregionproposal network(MS-RPN)whichcangeneratesuitablefeaturemapsfortheregionproposalnetwork.In MS-RPN,theselectedfeaturemapsatmultiplescalesarefineturnedrespectivelyandcompressed intoauniformspace.Thegeneratedfusionfeaturemapsarecalledrefinedfusionfeatures(RFFs). RFFsincorporateabundantdetailinformationandcontextinformation.AndRFFsaresenttoRPN togeneratebetterregionproposals.TheproposedapproachisevaluatedonPASCALVOC2007 andMSCOCObenchmarktasks.MS-RPNobtainssignificantimprovementsoverthecomparable state-of-the-artdetectionmodels. KeyWORDS Fusion Feature, Multi-Scale, Object Detecting, Region Proposal Network
Neuralnetworkmodelshavebeenwidelyusedinthefieldofobjectdetecting。Theregionproposal methodsarewidelyusedinthecurrentobjectdetectionnetworksandhaveachievedwellperformance。Thecommonregionproposalmethodshunttheobjectsbygeneratingthousandsofthecandidate盒子。Compared toother regionproposalmethods, regionproposalnetwork (RPN)method improvestheaccuracyanddetectionspeedwithseveralhundredcandidateboxes。However,sincethe featuremapscontainsinsufficientinformation,theabilityofRPNtodetectandlocatesmall-sized objectsispoor。Anovelmulti-scalefeaturefusionmethodforregionproposalnetworktosolvethe aboveproblemsisproposedinthisarticle。Theproposedmethodiscalledmulti-scaleregionproposal network_ (MS-RPN)whichcangeneratesuitablefeaturemapsfortheregionproposalnetwork。In MS-RPN,theselectedfeaturemapsatmultiplescalesarefineturnedrespectivelyandcompressed intoauniformspace.Thegeneratedfusionfeaturemapsarecalledrefinedfusionfeatures(RFFs)。RFFsincorporateabundantdetailinformationandcontextinformation。AndRFFsaresenttoRPN togeneratebetterregionproposals。TheproposedapproachisevaluatedonPASCALVOC2007 andMSCOCObenchmarktasks。MS-RPNobtainssignificantimprovementsoverthecomparable state-of-the-artdetectionmodels。关键词融合特征,多尺度,目标检测,区域建议网络
{"title":"A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection","authors":"Gang Liu, Chuyi Wang","doi":"10.4018/ijdwm.2020070107","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070107","url":null,"abstract":"Neuralnetworkmodelshavebeenwidelyusedinthefieldofobjectdetecting.Theregionproposal methodsarewidelyusedinthecurrentobjectdetectionnetworksandhaveachievedwellperformance. Thecommonregionproposalmethodshunttheobjectsbygeneratingthousandsofthecandidate boxes.Compared toother regionproposalmethods, the regionproposalnetwork (RPN)method improvestheaccuracyanddetectionspeedwithseveralhundredcandidateboxes.However,sincethe featuremapscontainsinsufficientinformation,theabilityofRPNtodetectandlocatesmall-sized objectsispoor.Anovelmulti-scalefeaturefusionmethodforregionproposalnetworktosolvethe aboveproblemsisproposedinthisarticle.Theproposedmethodiscalledmulti-scaleregionproposal network(MS-RPN)whichcangeneratesuitablefeaturemapsfortheregionproposalnetwork.In MS-RPN,theselectedfeaturemapsatmultiplescalesarefineturnedrespectivelyandcompressed intoauniformspace.Thegeneratedfusionfeaturemapsarecalledrefinedfusionfeatures(RFFs). RFFsincorporateabundantdetailinformationandcontextinformation.AndRFFsaresenttoRPN togeneratebetterregionproposals.TheproposedapproachisevaluatedonPASCALVOC2007 andMSCOCObenchmarktasks.MS-RPNobtainssignificantimprovementsoverthecomparable state-of-the-artdetectionmodels. KeyWORDS Fusion Feature, Multi-Scale, Object Detecting, Region Proposal Network","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"23 1","pages":"132-145"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89168467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Data Mining in Programs: Clustering Programs Based on Structure Metrics and Execution Values 程序中的数据挖掘:基于结构度量和执行值的程序聚类
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-04-01 DOI: 10.4018/ijdwm.2020040104
Tiantian Wang, Kechao Wang, Xiaohong Su, Lin Liu
Software exists in various control systems, such as security-critical systems and so on. Existing program clustering methods are limited in identifying functional equivalent programs with different syntactic representations. To solve this problem, firstly, a clustering method based on structured metric vectors was proposed to quickly identify structurally similar programs from a large number of existing programs. Next, a clustering method based on similar execution value sequences was proposed, to accurately identify the functional equivalent programs with code variations. This approach has been applied in automatic program repair, to identify sample programs from a large pool of template programs. The average purity value is 0.95576 and the average entropy is 0.15497. This means that the clustering partition is consistent with the expected partition.
软件存在于各种控制系统中,如安全关键系统等。现有的程序聚类方法在识别具有不同语法表示的功能等效程序方面受到限制。为了解决这一问题,首先提出了一种基于结构化度量向量的聚类方法,从大量现有程序中快速识别出结构相似的程序;其次,提出了一种基于相似执行值序列的聚类方法,以准确识别具有代码变化的功能等效程序。该方法已应用于自动程序修复,从大量的模板程序池中识别样本程序。平均纯度值为0.95576,平均熵值为0.15497。这意味着集群分区与预期分区一致。
{"title":"Data Mining in Programs: Clustering Programs Based on Structure Metrics and Execution Values","authors":"Tiantian Wang, Kechao Wang, Xiaohong Su, Lin Liu","doi":"10.4018/ijdwm.2020040104","DOIUrl":"https://doi.org/10.4018/ijdwm.2020040104","url":null,"abstract":"Software exists in various control systems, such as security-critical systems and so on. Existing program clustering methods are limited in identifying functional equivalent programs with different syntactic representations. To solve this problem, firstly, a clustering method based on structured metric vectors was proposed to quickly identify structurally similar programs from a large number of existing programs. Next, a clustering method based on similar execution value sequences was proposed, to accurately identify the functional equivalent programs with code variations. This approach has been applied in automatic program repair, to identify sample programs from a large pool of template programs. The average purity value is 0.95576 and the average entropy is 0.15497. This means that the clustering partition is consistent with the expected partition.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"73 1","pages":"48-63"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84572447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Collective Entity Disambiguation Based on Hierarchical Semantic Similarity 基于层次语义相似度的集体实体消歧
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2020-04-01 DOI: 10.4018/ijdwm.2020040101
Bingjing Jia, Hu Yang, Bin Wu, Ying Xing
Entity disambiguation involves mapping mentions in texts to the corresponding entities in a given knowledge base. Most previous approaches were based on handcrafted features and failed to capture semantic information over multiple granularities. For accurately disambiguating entities, various information aspects of mentions and entities should be used in. This article proposes a hierarchical semantic similarity model to find important clues related to mentions and entities based on multiple sources of information, such as contexts of the mentions, entity descriptions and categories. This model can effectively measure the semantic matching between mentions and target entities. Global features are also added, including prior popularity and global coherence, to improve the performance. In order to verify the effect of hierarchical semantic similarity model combined with global features, named HSSMGF, experiments were carried out on five publicly available benchmark datasets. Results demonstrate the proposed method is very effective in the case that documents have more mentions.
实体消歧涉及将文本中的提及映射到给定知识库中的相应实体。以前的大多数方法都是基于手工制作的特征,无法捕获多粒度的语义信息。为了准确地消除实体的歧义,应该在中使用提及和实体的各种信息方面。本文提出了一种基于多信息源(如提及上下文、实体描述和类别)的分层语义相似度模型,以寻找与提及和实体相关的重要线索。该模型可以有效地度量提及与目标实体之间的语义匹配。还添加了全局特征,包括先验流行度和全局一致性,以提高性能。为了验证结合全局特征的层次语义相似度模型HSSMGF的效果,在5个公开的基准数据集上进行了实验。结果表明,该方法在文献提及数较多的情况下是非常有效的。
{"title":"Collective Entity Disambiguation Based on Hierarchical Semantic Similarity","authors":"Bingjing Jia, Hu Yang, Bin Wu, Ying Xing","doi":"10.4018/ijdwm.2020040101","DOIUrl":"https://doi.org/10.4018/ijdwm.2020040101","url":null,"abstract":"Entity disambiguation involves mapping mentions in texts to the corresponding entities in a given knowledge base. Most previous approaches were based on handcrafted features and failed to capture semantic information over multiple granularities. For accurately disambiguating entities, various information aspects of mentions and entities should be used in. This article proposes a hierarchical semantic similarity model to find important clues related to mentions and entities based on multiple sources of information, such as contexts of the mentions, entity descriptions and categories. This model can effectively measure the semantic matching between mentions and target entities. Global features are also added, including prior popularity and global coherence, to improve the performance. In order to verify the effect of hierarchical semantic similarity model combined with global features, named HSSMGF, experiments were carried out on five publicly available benchmark datasets. Results demonstrate the proposed method is very effective in the case that documents have more mentions.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"65 1","pages":"1-17"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91002662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1