首页 > 最新文献

International journal of database theory and application最新文献

英文 中文
Research on the Network Data Mining Application in the College Ideological and Political Education 网络数据挖掘在高校思想政治教育中的应用研究
Pub Date : 2017-01-31 DOI: 10.14257/ijdta.2017.10.1.16
Li Pingquan
With the development of information technology and network education, as a kind of new teaching method, educational data mining has been widely concerned. In this paper, the author analyzes the data mining application in the college ideological and political education. Through big data mining, the author analyzes the present situation of Ideological and political education, and points out the key points of Ideological and political education reform, including theoretical reform, practical reform and examination reform. At the same time, we analyze the development of Ideological and political education in the context of new media. The result shows that the new media has played an important role in the cultivation of college students' ideological guidance, learning and aesthetic appreciation. Teachers should make full use of new media to strengthen ideological and political education.
随着信息技术和网络教育的发展,教育数据挖掘作为一种新的教学方法受到了广泛关注。本文分析了数据挖掘在高校思想政治教育中的应用。通过大数据挖掘,分析思想政治教育的现状,指出思想政治教育改革的重点,包括理论改革、实践改革和考试改革。同时,对新媒体背景下思想政治教育的发展进行了分析。结果表明,新媒体在大学生思想引导、学习能力培养、审美能力培养等方面发挥了重要作用。教师应充分利用新媒体加强思想政治教育。
{"title":"Research on the Network Data Mining Application in the College Ideological and Political Education","authors":"Li Pingquan","doi":"10.14257/ijdta.2017.10.1.16","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.1.16","url":null,"abstract":"With the development of information technology and network education, as a kind of new teaching method, educational data mining has been widely concerned. In this paper, the author analyzes the data mining application in the college ideological and political education. Through big data mining, the author analyzes the present situation of Ideological and political education, and points out the key points of Ideological and political education reform, including theoretical reform, practical reform and examination reform. At the same time, we analyze the development of Ideological and political education in the context of new media. The result shows that the new media has played an important role in the cultivation of college students' ideological guidance, learning and aesthetic appreciation. Teachers should make full use of new media to strengthen ideological and political education.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85593870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction and Simulation on Environmental Quality Evaluation Model Based on Data Mining and Correlation Analysis 基于数据挖掘和关联分析的环境质量评价模型构建与仿真
Pub Date : 2017-01-31 DOI: 10.14257/ijdta.2017.10.1.12
Meimei Wang, Duoyong Zhang, Huimei Xu
In this paper, we conduct research on environmental quality evaluation model based on data mining and correlation analysis. Along with the application of multi-statistical analysis method, the big data analysis law by has been applied in the environmental quality evaluation. Reciprocities of this method among from many targets starts that changes into a few not related overall targets many targets and the merit lies in had considered the relevance among various targets that can maximum limit retain original information, carries on best comprehensive dimensionality reduction processing to the high dimensional data. Aside by using this feature, this paper proposes the data mining and correlation analysis based model. The basic task of the analytical grey incidence is the microscopic or macroscopic geometry of behavior based factor sequence is close, to analyze and contribution degree of influence or the factor between determination factors to main behavior, but the gray incidence space carries on the foundation of analytical grey incidence. We implement the model on the air and water quality evaluation which are assisted with the neural network and gray analysis. The experimental result reflect the effectiveness of our model, it can evaluate the environmental quality effectively.
本文对基于数据挖掘和关联分析的环境质量评价模型进行了研究。随着多元统计分析方法的应用,大数据分析法在环境质量评价中得到了应用。该方法从多个目标之间的互易性开始变为多个不相关的总体目标之间的互易性,优点在于考虑了各个目标之间的相关性,能够最大限度地保留原始信息,对高维数据进行最佳的综合降维处理。在此基础上,提出了基于数据挖掘和关联分析的模型。分析灰色关联的基本任务是基于因素序列的微观或宏观几何关系是否接近,分析决定因素之间的因素对主要行为的影响程度和贡献程度,而灰色关联空间是分析灰色关联的基础。将神经网络和灰色分析相结合的方法应用于空气和水的质量评价。实验结果反映了该模型的有效性,能有效地评价环境质量。
{"title":"Construction and Simulation on Environmental Quality Evaluation Model Based on Data Mining and Correlation Analysis","authors":"Meimei Wang, Duoyong Zhang, Huimei Xu","doi":"10.14257/ijdta.2017.10.1.12","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.1.12","url":null,"abstract":"In this paper, we conduct research on environmental quality evaluation model based on data mining and correlation analysis. Along with the application of multi-statistical analysis method, the big data analysis law by has been applied in the environmental quality evaluation. Reciprocities of this method among from many targets starts that changes into a few not related overall targets many targets and the merit lies in had considered the relevance among various targets that can maximum limit retain original information, carries on best comprehensive dimensionality reduction processing to the high dimensional data. Aside by using this feature, this paper proposes the data mining and correlation analysis based model. The basic task of the analytical grey incidence is the microscopic or macroscopic geometry of behavior based factor sequence is close, to analyze and contribution degree of influence or the factor between determination factors to main behavior, but the gray incidence space carries on the foundation of analytical grey incidence. We implement the model on the air and water quality evaluation which are assisted with the neural network and gray analysis. The experimental result reflect the effectiveness of our model, it can evaluate the environmental quality effectively.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77181418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation Method of College Students’ English Proficiency Based on Computer Aided Cluster Analysis 基于计算机辅助聚类分析的大学生英语水平评价方法
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.17
Yanjiao Xiao
Discovery in databases knowledge means obtain effective, implicit and potentially useful knowledge from a large number data of database. As China's higher education has been transferred to mass education, the scale of school and the number of students is increasing. By using data mining techniques, the author makes the score analysis of National English test (CET-4), mining useful information hidden in the performance data, then provides theoretical basis for the teaching design and management in English teaching. After K value clustering, we can effectively classify the students, so as to carry on the difference teaching, and this classified teaching will improve the quality of English teaching.
数据库知识发现是指从数据库的大量数据中获取有效的、隐含的和潜在有用的知识。随着中国高等教育向大众化教育转变,学校规模和学生人数不断增加。运用数据挖掘技术对全国英语四级考试成绩进行分析,挖掘成绩数据中隐藏的有用信息,为英语教学的教学设计和管理提供理论依据。K值聚类后,我们可以有效地对学生进行分类,从而进行差异化教学,这种分类教学可以提高英语教学的质量。
{"title":"Evaluation Method of College Students’ English Proficiency Based on Computer Aided Cluster Analysis","authors":"Yanjiao Xiao","doi":"10.14257/IJDTA.2017.10.1.17","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.17","url":null,"abstract":"Discovery in databases knowledge means obtain effective, implicit and potentially useful knowledge from a large number data of database. As China's higher education has been transferred to mass education, the scale of school and the number of students is increasing. By using data mining techniques, the author makes the score analysis of National English test (CET-4), mining useful information hidden in the performance data, then provides theoretical basis for the teaching design and management in English teaching. After K value clustering, we can effectively classify the students, so as to carry on the difference teaching, and this classified teaching will improve the quality of English teaching.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86298093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Decision Tree Data Mining Algorithm to Predict Causes of Road Traffic Accidents, its Prone Locations and Time along Kano –Wudil Highway 用决策树数据挖掘算法预测卡诺-武迪尔公路道路交通事故的原因、易发地点和时间
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.18
L. J. Muhammad, S. Salisu, A. Yakubu, Y. M. Malgwi, E. Abdullahi, I. .. Mohammed, N. Muhammad
Road traffic accidents, the inadvertent crash involving at least one motor vehicle, occurring on a road open to public circulation, in which at least one person is injured or killed; intentional acts (murder, suicide) and natural disasters excluded, is indisputably one of the most frequent and most damaging calamities bedeviling human societies, in particular, Nigeria, today. It is therefore, of paramount importance to seek to identify the root causes of road traffic accidents in order to proffer mitigating solutions to address the menace. This research, aimed at predicting the likely causes of road accidents, its prone locations and time along Kano– Wudil highway in order to take all necessary counter measures is a step forward in this direction. In this study data mining decision tree algorithm was used to predict the causes of the accidents, its prone locations and time along Kano – Wudil Highway that links Kano State to Wudil Local Government Area Kano State for effective decision making. performance were analyzed using road accidents data set. The location is between the first 40 kilometers along the Ibadan-Lagos Express road. The work used Multilayer Perceptron as well as Radial Basis Function (RBF) Neural Networks, Id3 and Function Tree algorithms. that the tree algorithm performed with accuracy performed
道路交通事故,指在公共交通的道路上发生至少一辆机动车的意外碰撞,造成至少一人受伤或者死亡的事故;蓄意行为(谋杀、自杀)和自然灾害除外,这无疑是当今困扰人类社会,特别是尼日利亚的最频繁和最具破坏性的灾难之一。因此,最重要的是设法查明道路交通事故的根本原因,以便提供减轻这一威胁的解决办法。这项研究的目的是预测道路事故的可能原因、其易发地点和沿Kano - Wudil高速公路的时间,以便采取一切必要的应对措施,这是朝着这个方向迈出的一步。本研究采用数据挖掘决策树算法预测连接卡诺州和卡诺州地方政府区域的卡诺-乌迪尔公路沿线的事故原因、易发地点和时间,以进行有效的决策。使用道路事故数据集对性能进行分析。地点在伊巴丹-拉各斯高速公路的前40公里之间。这项工作使用了多层感知器、径向基函数(RBF)神经网络、Id3和函数树算法。树形算法执行得很准确
{"title":"Using Decision Tree Data Mining Algorithm to Predict Causes of Road Traffic Accidents, its Prone Locations and Time along Kano –Wudil Highway","authors":"L. J. Muhammad, S. Salisu, A. Yakubu, Y. M. Malgwi, E. Abdullahi, I. .. Mohammed, N. Muhammad","doi":"10.14257/IJDTA.2017.10.1.18","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.18","url":null,"abstract":"Road traffic accidents, the inadvertent crash involving at least one motor vehicle, occurring on a road open to public circulation, in which at least one person is injured or killed; intentional acts (murder, suicide) and natural disasters excluded, is indisputably one of the most frequent and most damaging calamities bedeviling human societies, in particular, Nigeria, today. It is therefore, of paramount importance to seek to identify the root causes of road traffic accidents in order to proffer mitigating solutions to address the menace. This research, aimed at predicting the likely causes of road accidents, its prone locations and time along Kano– Wudil highway in order to take all necessary counter measures is a step forward in this direction. In this study data mining decision tree algorithm was used to predict the causes of the accidents, its prone locations and time along Kano – Wudil Highway that links Kano State to Wudil Local Government Area Kano State for effective decision making. performance were analyzed using road accidents data set. The location is between the first 40 kilometers along the Ibadan-Lagos Express road. The work used Multilayer Perceptron as well as Radial Basis Function (RBF) Neural Networks, Id3 and Function Tree algorithms. that the tree algorithm performed with accuracy performed","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74812245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Kernel Credal Classification Rule – Application on Road Safety 核凭证分类规则-在道路安全中的应用
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.10
Khawla El Bendadi, Y. Lakhdar, E. Sbai
A credal partition based on belief functions has been proposed in the literature for data clustering. It allows the objects to belong; with different masses of belief; not only to the specific classes, but also to the sets of classes called meta-class which correspond to the disjunction of several specific classes. In this paper, a kernel version of the credal classification rule (CCR) is proposed to perform the classification in feature space of higher dimension. Each singleton class or meta-class is characterized by a center that can be obtained using many way. The kernels based approaches have become popular for several years to solve supervised or unsupervised learning problems. In this paper, our method is extended to the CCR. It is realized by replacing the inner product with an appropriate positive definite function, implicitly perform a nonlinear mapping of the input data into a high-dimensional feature space, and the corresponding algorithm is called kernel Credal Classification Rule( KCCR). We present in this work KCCR algorithm to powerful corresponding nonlinear form using Mercer kernels. The approach is applied for the classification of experimental data collected from a system called VehicleInfrastructure-Driver (VID), based on several representative trajectories observations made in a bend, to obtain adequate results with data experimentally realized based on the instructions given to drivers. The test on real experimental data shows the value of the exploratory analysis method of data. Another experiments using the generated and real data form benchmark database are presented to evaluate and compare the performance of the KCCR method with other classification approaches.
文献中提出了一种基于信念函数的可信度划分方法用于数据聚类。它允许对象归属;有着不同的信仰;不仅针对特定类,还针对称为元类的类集,这些类集对应于几个特定类的析取。本文提出了一种核版本的凭证分类规则(CCR),用于在高维特征空间中进行分类。每个单例类或元类都有一个可以通过多种方式获得的中心。基于核的方法已经流行了几年来解决监督或无监督学习问题。本文将该方法推广到CCR。它是通过用适当的正定函数代替内积,隐式地将输入数据非线性映射到高维特征空间来实现的,相应的算法称为核凭证分类规则(KCCR)。本文利用默瑟核将KCCR算法转化为强大的非线性形式。该方法应用于从车辆基础设施驾驶员(VID)系统收集的实验数据的分类,该系统基于在弯道中进行的几个代表性轨迹观察,以获得基于给驾驶员的指令实验实现的数据的适当结果。对实际实验数据的检验表明了数据探索性分析方法的价值。利用生成的基准数据库和真实数据表进行了另一个实验,以评估和比较KCCR方法与其他分类方法的性能。
{"title":"Kernel Credal Classification Rule – Application on Road Safety","authors":"Khawla El Bendadi, Y. Lakhdar, E. Sbai","doi":"10.14257/IJDTA.2017.10.1.10","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.10","url":null,"abstract":"A credal partition based on belief functions has been proposed in the literature for data clustering. It allows the objects to belong; with different masses of belief; not only to the specific classes, but also to the sets of classes called meta-class which correspond to the disjunction of several specific classes. In this paper, a kernel version of the credal classification rule (CCR) is proposed to perform the classification in feature space of higher dimension. Each singleton class or meta-class is characterized by a center that can be obtained using many way. The kernels based approaches have become popular for several years to solve supervised or unsupervised learning problems. In this paper, our method is extended to the CCR. It is realized by replacing the inner product with an appropriate positive definite function, implicitly perform a nonlinear mapping of the input data into a high-dimensional feature space, and the corresponding algorithm is called kernel Credal Classification Rule( KCCR). We present in this work KCCR algorithm to powerful corresponding nonlinear form using Mercer kernels. The approach is applied for the classification of experimental data collected from a system called VehicleInfrastructure-Driver (VID), based on several representative trajectories observations made in a bend, to obtain adequate results with data experimentally realized based on the instructions given to drivers. The test on real experimental data shows the value of the exploratory analysis method of data. Another experiments using the generated and real data form benchmark database are presented to evaluate and compare the performance of the KCCR method with other classification approaches.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79603904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Branch-combined PLSA for Topic Extraction 用于主题提取的分支组合PLSA
Pub Date : 2017-01-31 DOI: 10.14257/ijdta.2017.10.1.14
Jiali Lin, Zhiqiang Wei, Z. Li
Li (lizhen0130@gmail.com) Abstract With the developing of the Internet technology, the information on the network is expanding at the speed of geometric progression. Facing such vast network information, quickly extracting the important information becomes the urgent needs. The subject extraction model is a good solution to the problem. In this paper, a new model based on Probabilistic Latent Semantic Analysis (PLSA) is proposed which is called Branch-combined PLSA (BPLSA). BPLSA divides training data into two subsets, and trains subsets separately first, then the global training is implemented. At the same time, Message Passing Interface (MPI) is used for parallel computing to speed up the proposed method. Through the parallelization of the BPLSA, the efficiency is
随着互联网技术的发展,网络上的信息正以几何级数的速度增长。面对如此庞大的网络信息,快速提取重要信息成为迫切需要。主题抽取模型很好地解决了这一问题。本文提出了一种基于概率潜在语义分析(PLSA)的新模型,即分支组合语义分析(BPLSA)。BPLSA将训练数据分成两个子集,先对子集进行单独训练,然后进行全局训练。同时,采用消息传递接口(Message Passing Interface, MPI)进行并行计算,提高了算法的运行速度。通过BPLSA的并行化,效率为
{"title":"Branch-combined PLSA for Topic Extraction","authors":"Jiali Lin, Zhiqiang Wei, Z. Li","doi":"10.14257/ijdta.2017.10.1.14","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.1.14","url":null,"abstract":"Li (lizhen0130@gmail.com) Abstract With the developing of the Internet technology, the information on the network is expanding at the speed of geometric progression. Facing such vast network information, quickly extracting the important information becomes the urgent needs. The subject extraction model is a good solution to the problem. In this paper, a new model based on Probabilistic Latent Semantic Analysis (PLSA) is proposed which is called Branch-combined PLSA (BPLSA). BPLSA divides training data into two subsets, and trains subsets separately first, then the global training is implemented. At the same time, Message Passing Interface (MPI) is used for parallel computing to speed up the proposed method. Through the parallelization of the BPLSA, the efficiency is","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80674722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Clustering Analysis on Grassmann Manifold Metric 格拉斯曼流形度量的数据聚类分析
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.20
Yinghong Xie, Yuqing He, Xiaosheng Yu, Xindong You, Q. Guo
In the standard spectrum clustering algorithm, the metric based on Euclidean space can not represent the complicate space distribution feature of some data set, which might lead to the clustering result inaccuracy. While the geometric relationship between data can be describe more precise by manifold space. Considering Grassmann manifold is a entropy of Lie group, which not only has the smooth curved surface but also has the feature more fit for measuring the distance between data. All these can make the clustering result more accurate. The improved spectrum clustering analysis algorithm based on the distance metric under Graasmann manifold is proposed by this paper. The similarity between data is analyzed under manifold space. Experimental results show that the proposed algorithm can cluster data set either belonging the same or different subspace more accurately, further more, it can cluster data set with more complicate geometric structure under manifold space efficiently.
在标准的频谱聚类算法中,基于欧氏空间的度量不能代表某些数据集复杂的空间分布特征,可能导致聚类结果不准确。而数据间的几何关系可以用流形空间更精确地描述。考虑到Grassmann流形是李群的一个熵,它不仅具有光滑的曲面,而且具有更适合测量数据间距离的特征。这些都可以使聚类结果更加准确。提出了一种改进的Graasmann流形下基于距离度量的频谱聚类分析算法。在流形空间下分析数据间的相似度。实验结果表明,该算法能较准确地聚类属于同一子空间或不同子空间的数据集,更能在流形空间下有效地聚类几何结构较为复杂的数据集。
{"title":"Data Clustering Analysis on Grassmann Manifold Metric","authors":"Yinghong Xie, Yuqing He, Xiaosheng Yu, Xindong You, Q. Guo","doi":"10.14257/IJDTA.2017.10.1.20","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.20","url":null,"abstract":"In the standard spectrum clustering algorithm, the metric based on Euclidean space can not represent the complicate space distribution feature of some data set, which might lead to the clustering result inaccuracy. While the geometric relationship between data can be describe more precise by manifold space. Considering Grassmann manifold is a entropy of Lie group, which not only has the smooth curved surface but also has the feature more fit for measuring the distance between data. All these can make the clustering result more accurate. The improved spectrum clustering analysis algorithm based on the distance metric under Graasmann manifold is proposed by this paper. The similarity between data is analyzed under manifold space. Experimental results show that the proposed algorithm can cluster data set either belonging the same or different subspace more accurately, further more, it can cluster data set with more complicate geometric structure under manifold space efficiently.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91484373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-source Heterogeneous Data Fusion Method Considering Information Entropy in Large Data Environment 大数据环境下考虑信息熵的多源异构数据融合方法
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.04
Shujuan Zhang, Zijing Wang
Massive trivial redundancy alarm information with high error alarm rate, generated by network security defense equipment, causes great difficulty in alarm analysis and understanding. In allusion to the research on this problem, an improved multi-source heterogeneous data fusion scheme is proposed in this paper to comprehensively analyze such attributes as alarm type, source IP, destination IP, destination port and time interval and summarize four rules, thus to dynamically update the time interval threshold value during the fusion process and improve the fusion accuracy. The experiment result shows that such method can efficiently reduce the quantity of the heterogeneous alarm information, and obtain accurate super-alarm data, as well as realize the ability for timely processing the alarm information.
网络安全防御设备产生的大量琐碎冗余报警信息,错误率高,给告警分析和理解带来很大困难。针对这一问题的研究,本文提出了一种改进的多源异构数据融合方案,综合分析告警类型、源IP、目的IP、目的端口、时间间隔等属性,总结出4条规则,从而在融合过程中动态更新时间间隔阈值,提高融合精度。实验结果表明,该方法可以有效地减少异构报警信息的数量,获得准确的超级报警数据,并实现对报警信息的及时处理能力。
{"title":"Multi-source Heterogeneous Data Fusion Method Considering Information Entropy in Large Data Environment","authors":"Shujuan Zhang, Zijing Wang","doi":"10.14257/IJDTA.2017.10.1.04","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.04","url":null,"abstract":"Massive trivial redundancy alarm information with high error alarm rate, generated by network security defense equipment, causes great difficulty in alarm analysis and understanding. In allusion to the research on this problem, an improved multi-source heterogeneous data fusion scheme is proposed in this paper to comprehensively analyze such attributes as alarm type, source IP, destination IP, destination port and time interval and summarize four rules, thus to dynamically update the time interval threshold value during the fusion process and improve the fusion accuracy. The experiment result shows that such method can efficiently reduce the quantity of the heterogeneous alarm information, and obtain accurate super-alarm data, as well as realize the ability for timely processing the alarm information.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84457325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Text Mining Techniques and Methods: A Review Approach 文本挖掘技术与方法综述
Pub Date : 2017-01-31 DOI: 10.14257/ijdta.2017.10.1.02
Shivaprasad Km, T. H. Reddy
Over last few decades, we have witnessed the enormous accumulation and usage of the data. Major issues faced by this data are mismatch and overload. The mismatch is the some useful or interesting data has been overlooked and overload is nothing but the gathered data is not one the user needed. To overcome this issue a technique of text mining has been developed. Text mining extracts the useful and interesting data from the large unstructured data; it helps to cope up with the issues. A complex task in text mining is the analysis and categorization of the extracted data. For the efficient and effective extraction and analysis of the patterns of data, various techniques and methods like categorization, clustering, summarization, stemming etc. have been recently developed. Some of the techniques and methods are discussed in this paper.
在过去的几十年里,我们见证了数据的巨大积累和使用。这些数据面临的主要问题是不匹配和过载。不匹配是一些有用或有趣的数据被忽略了,过载只是收集的数据不是用户需要的数据。为了克服这一问题,人们开发了一种文本挖掘技术。文本挖掘从大量的非结构化数据中提取有用的、有趣的数据;这有助于处理问题。文本挖掘中的一项复杂任务是对提取的数据进行分析和分类。为了高效、有效地提取和分析数据的模式,近年来发展了分类、聚类、摘要、词干提取等各种技术和方法。本文对其中的一些技术和方法进行了讨论。
{"title":"A Survey on Text Mining Techniques and Methods: A Review Approach","authors":"Shivaprasad Km, T. H. Reddy","doi":"10.14257/ijdta.2017.10.1.02","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.1.02","url":null,"abstract":"Over last few decades, we have witnessed the enormous accumulation and usage of the data. Major issues faced by this data are mismatch and overload. The mismatch is the some useful or interesting data has been overlooked and overload is nothing but the gathered data is not one the user needed. To overcome this issue a technique of text mining has been developed. Text mining extracts the useful and interesting data from the large unstructured data; it helps to cope up with the issues. A complex task in text mining is the analysis and categorization of the extracted data. For the efficient and effective extraction and analysis of the patterns of data, various techniques and methods like categorization, clustering, summarization, stemming etc. have been recently developed. Some of the techniques and methods are discussed in this paper.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88218284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Five-Step Data Mining Algorithm 一种新的五步数据挖掘算法
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.11
Wang Yiwen
Based on the traditional data mining algorithm, a novel data mining algorithm is proposed. This algorithm consists of 5 steps: the first step, set the tree set; the second step, set the window third, subtree contribution; decision tree construction; the fourth step test, positive and negative examples set; the fifth step, expand the achievements window. The experimental study on open source data sets. The results showed that the five step proposed data mining method, not only can build a more concise decision tree, data mining and the accuracy is also higher than the traditional decision tree method.
在传统数据挖掘算法的基础上,提出了一种新的数据挖掘算法。该算法包括5步:第一步,设置树集;第二步,设置窗口第三步,子树贡献;决策树构造;第四步测试,正反例设置;第五步,展开成就窗口。开源数据集的实验研究。结果表明,提出的五步数据挖掘方法,不仅可以构建更加简洁的决策树,而且数据挖掘的准确率也高于传统的决策树方法。
{"title":"A Novel Five-Step Data Mining Algorithm","authors":"Wang Yiwen","doi":"10.14257/IJDTA.2017.10.1.11","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.11","url":null,"abstract":"Based on the traditional data mining algorithm, a novel data mining algorithm is proposed. This algorithm consists of 5 steps: the first step, set the tree set; the second step, set the window third, subtree contribution; decision tree construction; the fourth step test, positive and negative examples set; the fifth step, expand the achievements window. The experimental study on open source data sets. The results showed that the five step proposed data mining method, not only can build a more concise decision tree, data mining and the accuracy is also higher than the traditional decision tree method.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74865939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
International journal of database theory and application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1