首页 > 最新文献

International journal of database theory and application最新文献

英文 中文
Internet Traffic Classification Using Machine Learning 使用机器学习的互联网流量分类
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.05
M. Singh, Gargi Srivastava, Prabhat Kumar
Internet traffic classification is one of the popular research interest area because of its benefits for many applications like intrusion detection system, congestion avoidance, traffic prediction etc. Internet traffic is classified on the basis of statistical features because port and payload based techniques have their limitations. For statistics based techniques machine learning is used. The statistical feature set is large. Hence, it is a challenge to reduce the large feature set to an optimal feature set. This will reduce the time complexity of the machine learning algorithm. This paper tries to obtain an optimal feature set by using a hybrid approach -An unsupervised clustering algorithm (K-Means) with a supervised feature selection algorithm (Best Feature Selection).
网络流量分类在入侵检测系统、拥塞避免、流量预测等方面具有重要的应用价值,是当前网络流量分类研究的热点之一。由于基于端口和负载的技术有其局限性,因此Internet流量是基于统计特征进行分类的。对于基于统计的技术,使用机器学习。统计特征集很大。因此,如何将庞大的特征集缩减为最优特征集是一个挑战。这将降低机器学习算法的时间复杂度。本文尝试使用一种混合方法-无监督聚类算法(K-Means)和监督特征选择算法(Best feature selection)来获得最优特征集。
{"title":"Internet Traffic Classification Using Machine Learning","authors":"M. Singh, Gargi Srivastava, Prabhat Kumar","doi":"10.14257/IJDTA.2016.9.12.05","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.05","url":null,"abstract":"Internet traffic classification is one of the popular research interest area because of its benefits for many applications like intrusion detection system, congestion avoidance, traffic prediction etc. Internet traffic is classified on the basis of statistical features because port and payload based techniques have their limitations. For statistics based techniques machine learning is used. The statistical feature set is large. Hence, it is a challenge to reduce the large feature set to an optimal feature set. This will reduce the time complexity of the machine learning algorithm. This paper tries to obtain an optimal feature set by using a hybrid approach -An unsupervised clustering algorithm (K-Means) with a supervised feature selection algorithm (Best Feature Selection).","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73760497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Implementation of Basketball Training Management System Based on Big Data Technology 基于大数据技术的篮球训练管理系统的实现
Pub Date : 2016-12-31 DOI: 10.14257/ijdta.2016.9.12.27
Jia-Hong Su
The technology of large data analysis has important practical significance to players digging, tactics and training monitoring. In order to improve the performance of basketball training, the big data technology is applied in the training management system. The reform of basketball training is being carried out, and the research on the combination selection mode of the basketball training is being discussed. This is not only from the traditional technology to the combination of training to our physical education, but also from the tactical thinking to cultivate students. This can promote the performance and training interactive quality in basketball sports training and training.
大数据分析技术对球员挖掘、战术和训练监控具有重要的现实意义。为了提高篮球训练的效果,将大数据技术应用到训练管理系统中。我国正在进行篮球训练改革,并对篮球训练组合选择模式进行了探讨。这既要从传统的技术训练到结合训练到我们的体育教育,更要从战术思维培养学生。这样可以促进篮球运动训练与训练中表现与训练的互动质量。
{"title":"Implementation of Basketball Training Management System Based on Big Data Technology","authors":"Jia-Hong Su","doi":"10.14257/ijdta.2016.9.12.27","DOIUrl":"https://doi.org/10.14257/ijdta.2016.9.12.27","url":null,"abstract":"The technology of large data analysis has important practical significance to players digging, tactics and training monitoring. In order to improve the performance of basketball training, the big data technology is applied in the training management system. The reform of basketball training is being carried out, and the research on the combination selection mode of the basketball training is being discussed. This is not only from the traditional technology to the combination of training to our physical education, but also from the tactical thinking to cultivate students. This can promote the performance and training interactive quality in basketball sports training and training.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85865682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on an Improved Decision Tree Classification Algorithm 一种改进的决策树分类算法研究
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.19
Wenyi Xu
In the paper, with the introduction of data mining algorithm of the classification in detail, and then combining the classification algorithm and incremental learning technology, an incremental decision tree algorithm is proposed to solve the problem of incremental learning and analysis the experimental data for this algorithm. The paper used ID3 and C4.5 algorithm for detailed research. According to two algorithms, combining Bayesian classification algorithm’s incremental learning characteristic, the paper proposed an incremental decision tree algorithm , and by the analysis of experimental data. This algorithm can solve the incremental learning problem of the decision tree algorithm very well.
本文详细介绍了分类的数据挖掘算法,然后将分类算法与增量学习技术相结合,提出了一种增量决策树算法来解决增量学习问题,并对该算法的实验数据进行了分析。本文采用ID3和C4.5算法进行详细研究。根据两种算法,结合贝叶斯分类算法的增量学习特性,提出了一种增量决策树算法,并通过实验数据分析。该算法很好地解决了决策树算法的增量学习问题。
{"title":"Research on an Improved Decision Tree Classification Algorithm","authors":"Wenyi Xu","doi":"10.14257/IJDTA.2016.9.12.19","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.19","url":null,"abstract":"In the paper, with the introduction of data mining algorithm of the classification in detail, and then combining the classification algorithm and incremental learning technology, an incremental decision tree algorithm is proposed to solve the problem of incremental learning and analysis the experimental data for this algorithm. The paper used ID3 and C4.5 algorithm for detailed research. According to two algorithms, combining Bayesian classification algorithm’s incremental learning characteristic, the paper proposed an incremental decision tree algorithm , and by the analysis of experimental data. This algorithm can solve the incremental learning problem of the decision tree algorithm very well.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85142276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trust Evaluation on Social Media based on Different Similarity Metrics 基于不同相似性度量的社交媒体信任评价
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.10
A. Maurya, M. Singh
With advancement in internet era, the importance of social media is increasing day by day. It enables users to share their profile data, ideas, videos and any content they have with them. With benefits, it also has several issues related to it. One of the issue is “how to protect users from after effect of friendship over social media?”. This paper proposes a trust model to overcome it. The proposed model calculates trust to assist end users to take decision about accepting friend-request on social media. Trust evaluation is based upon profile similarity analysis. Trust computation uses preferred attribute among profile attributes to evaluate trust of users. The paper analyzes different trust evaluation methods based on the proposed model.
随着互联网时代的发展,社交媒体的重要性与日俱增。它允许用户分享他们的个人资料数据、想法、视频和他们拥有的任何内容。有了好处,它也有一些相关的问题。其中一个问题是“如何保护用户免受社交媒体上友谊的后遗症?”本文提出了一种克服这一问题的信任模型。该模型通过计算信任来帮助最终用户决定是否接受社交媒体上的好友请求。信任评价基于档案相似度分析。信任计算使用配置文件属性中的首选属性来评估用户的信任程度。基于该模型分析了不同的信任评估方法。
{"title":"Trust Evaluation on Social Media based on Different Similarity Metrics","authors":"A. Maurya, M. Singh","doi":"10.14257/IJDTA.2016.9.12.10","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.10","url":null,"abstract":"With advancement in internet era, the importance of social media is increasing day by day. It enables users to share their profile data, ideas, videos and any content they have with them. With benefits, it also has several issues related to it. One of the issue is “how to protect users from after effect of friendship over social media?”. This paper proposes a trust model to overcome it. The proposed model calculates trust to assist end users to take decision about accepting friend-request on social media. Trust evaluation is based upon profile similarity analysis. Trust computation uses preferred attribute among profile attributes to evaluate trust of users. The paper analyzes different trust evaluation methods based on the proposed model.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88420856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Big Data Acquisition and Analysis Platform for Intermodal Transport 多式联运大数据采集与分析平台
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.07
Kai Xu, Hong Zhen, Yan Li, L. Yue
This paper aims for the transparency and visualization of the international intermodal cargo transportation throughout its whole process, achieving a comprehensive monitoring on the multiple transportation means such as by ocean, by air, by land or by rail. Based on Internet-of-Things-based distributed data acquisition technology and the cloud-computing-based big data analysis technology, this paper gives out a Multimodal Monitoring technology that can uniformly solve the comprehensive management of multiple transportation vehicles, which includes a service functionality model, a network hierarchy model and a technology system model. By building a Generic Target Monitoring System, it proves the multimodal monitoring resolution is able to effectively monitor the multiple transportation means and provide a fair good database platform of later analysis, distribution and optimization of those vehicles.
本文旨在实现国际多式联运货物运输全过程的透明化和可视化,实现对海运、空运、陆运、铁路等多种运输方式的全面监控。基于物联网的分布式数据采集技术和基于云计算的大数据分析技术,提出了一种统一解决多运输车辆综合管理的多模式监控技术,包括服务功能模型、网络层次模型和技术系统模型。通过构建通用目标监控系统,验证了多式联运监控分辨率能够有效监控多种运输方式,为后续的车辆分析、分配和优化提供了良好的数据库平台。
{"title":"Big Data Acquisition and Analysis Platform for Intermodal Transport","authors":"Kai Xu, Hong Zhen, Yan Li, L. Yue","doi":"10.14257/IJDTA.2016.9.12.07","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.07","url":null,"abstract":"This paper aims for the transparency and visualization of the international intermodal cargo transportation throughout its whole process, achieving a comprehensive monitoring on the multiple transportation means such as by ocean, by air, by land or by rail. Based on Internet-of-Things-based distributed data acquisition technology and the cloud-computing-based big data analysis technology, this paper gives out a Multimodal Monitoring technology that can uniformly solve the comprehensive management of multiple transportation vehicles, which includes a service functionality model, a network hierarchy model and a technology system model. By building a Generic Target Monitoring System, it proves the multimodal monitoring resolution is able to effectively monitor the multiple transportation means and provide a fair good database platform of later analysis, distribution and optimization of those vehicles.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88215066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Research on Spatial Clustering Algorithm based on Data Mining 基于数据挖掘的空间聚类算法研究
Pub Date : 2016-12-31 DOI: 10.14257/ijdta.2016.9.12.20
Runtao Lv, Jin Zhao, Yu Li
We extended the online learning strategy and scalable clustering technique to soft subspace clustering, and propose two online soft subspace clustering methods, OFWSC and OEWSC. The proposed evolving soft subspace clustering algorithms can not only reveal the important local subspace characteristics of high dimensional data, but also leverage on the effectiveness of online learning scheme, as well as the ability of scalable clustering methods for the large or streaming data. Furthermore, we apply our proposed algorithms to text clustering of information retrieval, gene expression data clustering, face image classification and the problem of predicting disulfide connectivity.
将在线学习策略和可扩展聚类技术扩展到软子空间聚类,提出了两种在线软子空间聚类方法OFWSC和OEWSC。所提出的演化软子空间聚类算法不仅可以揭示高维数据的重要局部子空间特征,而且可以利用在线学习方案的有效性,以及对大型数据或流数据的可扩展聚类方法的能力。此外,我们将提出的算法应用于信息检索中的文本聚类、基因表达数据聚类、人脸图像分类和预测二硫连通性问题。
{"title":"Research on Spatial Clustering Algorithm based on Data Mining","authors":"Runtao Lv, Jin Zhao, Yu Li","doi":"10.14257/ijdta.2016.9.12.20","DOIUrl":"https://doi.org/10.14257/ijdta.2016.9.12.20","url":null,"abstract":"We extended the online learning strategy and scalable clustering technique to soft subspace clustering, and propose two online soft subspace clustering methods, OFWSC and OEWSC. The proposed evolving soft subspace clustering algorithms can not only reveal the important local subspace characteristics of high dimensional data, but also leverage on the effectiveness of online learning scheme, as well as the ability of scalable clustering methods for the large or streaming data. Furthermore, we apply our proposed algorithms to text clustering of information retrieval, gene expression data clustering, face image classification and the problem of predicting disulfide connectivity.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80187819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved PSO Research for Solving the Inverse Problem of Parabolic Equation 改进粒子群算法求解抛物方程反问题的研究
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.16
Peng Ya-mian, J. Nan, Zhang Huancheng
Parameter identification problem has important research background and research value, has become in recent years inverse problem of heat conduction of top priority. This paper studies the Parabolic Equation Inverse Problems of parameter identification problem, and applies PSO to solve research. Firstly, this paper establishes the model of the inverse problem of partial differential equations. The content and classification of the inverse problem of partial differential equations are explained. Frequently, the construction and solution of the finite difference method for parabolic equations are studied, and two stable schemes for one dimensional parabolic equation are given. And two numerical simulations were given. Partial differential equation discretization was with difference quotient instead of partial derivative. The partial differential equations with initial boundary value problem into algebraic equations, and then solving the resulting algebraic equations. Then, the basic principles of PSO and its improved algorithms are studied and compared. Particle swarm optimization algorithm program implementation. Finally, the Parabolic Equation Inverse Problems of particle swarm optimization algorithm performed three simulations. We use a set of basis functions gradually approaching the true solution, selection of initial value. The reaction is converted into direct problem question, then use difference method Solution of the direct problem. The solution of the problem with the additional conditions has being compared. The reaction optimization problem is transformed into the final particle swarm optimization algorithm to solve. Verify the Parabolic Equation Inverse Problems of particle swarm optimization algorithm correctness and applicability.
参数辨识问题具有重要的研究背景和研究价值,已成为近年来热传导反问题研究的重中之重。本文研究抛物方程参数辨识问题的逆问题,并应用粒子群算法进行求解研究。首先,建立了偏微分方程反问题的模型。阐述了偏微分方程反问题的内容和分类。研究了抛物型方程有限差分法的构造和求解,给出了一维抛物型方程的两种稳定格式。并给出了两个数值模拟。偏微分方程离散化用差商代替偏导数。将偏微分方程的初边值问题转化为代数方程,然后求解得到的代数方程。然后,对粒子群算法及其改进算法的基本原理进行了研究和比较。粒子群优化算法程序实现。最后,对粒子群优化算法的抛物方程反问题进行了三次仿真。我们用一组基函数逐渐逼近真解,选择初值。将反应转化为直接问题,然后用差分法求解直接问题。对附加条件下问题的解进行了比较。将反应优化问题转化为最终的粒子群优化算法进行求解。验证抛物方程反问题粒子群优化算法的正确性和适用性。
{"title":"Improved PSO Research for Solving the Inverse Problem of Parabolic Equation","authors":"Peng Ya-mian, J. Nan, Zhang Huancheng","doi":"10.14257/IJDTA.2016.9.12.16","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.16","url":null,"abstract":"Parameter identification problem has important research background and research value, has become in recent years inverse problem of heat conduction of top priority. This paper studies the Parabolic Equation Inverse Problems of parameter identification problem, and applies PSO to solve research. Firstly, this paper establishes the model of the inverse problem of partial differential equations. The content and classification of the inverse problem of partial differential equations are explained. Frequently, the construction and solution of the finite difference method for parabolic equations are studied, and two stable schemes for one dimensional parabolic equation are given. And two numerical simulations were given. Partial differential equation discretization was with difference quotient instead of partial derivative. The partial differential equations with initial boundary value problem into algebraic equations, and then solving the resulting algebraic equations. Then, the basic principles of PSO and its improved algorithms are studied and compared. Particle swarm optimization algorithm program implementation. Finally, the Parabolic Equation Inverse Problems of particle swarm optimization algorithm performed three simulations. We use a set of basis functions gradually approaching the true solution, selection of initial value. The reaction is converted into direct problem question, then use difference method Solution of the direct problem. The solution of the problem with the additional conditions has being compared. The reaction optimization problem is transformed into the final particle swarm optimization algorithm to solve. Verify the Parabolic Equation Inverse Problems of particle swarm optimization algorithm correctness and applicability.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82518911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on Online Sports Metadata Extraction System based on Video Processing Technology 基于视频处理技术的在线体育元数据提取系统研究
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.25
Haixin Yao, Jinmei Shao
Sports video metadata extraction system based on the content of basic goal use an automated or semi-automated interactive means to obtain video data as complete features and attributes for efficient retrieval mechanism. For fast access to video information needed, sports video ornamental create conditions. Firstly, video-based layered metadata description model, we discuss the structure of the video processing technology, and an increase in the time domain and airspace video object motion information on this basis. Low-level visual features for video and high-level semantic features presents a particular field of video information for video implicit hierarchical division method. Video automated visual feature extraction, semantic feature places marked attracted achieve human-computer interaction. Focus on the sports information descriptors and visual content descriptors, descriptor structure video. Video data based on hierarchical structure model and video features standard video content description model.
体育视频元数据提取系统基于内容的基本目标,采用自动化或半自动化的交互手段获取视频数据作为完整特征和属性的高效检索机制。为快速获取所需视频信息,为体育视频观赏性创造条件。首先,基于视频分层元数据描述模型,讨论了视频结构的处理技术,并在此基础上增加了时域和空域视频对象的运动信息。针对视频的低级视觉特征和高级语义特征提出了一种针对视频信息特定领域的隐式分层划分方法。视频自动提取视觉特征,标注吸引语义特征的地方,实现人机交互。重点研究了体育信息描述符和视觉内容描述符、视频结构描述符。基于视频数据的层次结构模型和视频特征的标准视频内容描述模型。
{"title":"Research on Online Sports Metadata Extraction System based on Video Processing Technology","authors":"Haixin Yao, Jinmei Shao","doi":"10.14257/IJDTA.2016.9.12.25","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.25","url":null,"abstract":"Sports video metadata extraction system based on the content of basic goal use an automated or semi-automated interactive means to obtain video data as complete features and attributes for efficient retrieval mechanism. For fast access to video information needed, sports video ornamental create conditions. Firstly, video-based layered metadata description model, we discuss the structure of the video processing technology, and an increase in the time domain and airspace video object motion information on this basis. Low-level visual features for video and high-level semantic features presents a particular field of video information for video implicit hierarchical division method. Video automated visual feature extraction, semantic feature places marked attracted achieve human-computer interaction. Focus on the sports information descriptors and visual content descriptors, descriptor structure video. Video data based on hierarchical structure model and video features standard video content description model.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80195943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Uncertain Probabilistic Data Modeling 不确定概率数据建模
Pub Date : 2016-12-31 DOI: 10.14257/ijdta.2016.9.12.17
Teng Lv, Ping Yan, Weimin He
Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogonous data sources is commonly uncertain because of uncertain data mapping, data inconsistency, missing data, and dirty data. For data policy, data is modified or hided for policies of data privacy and data confidentiality in an organization. But traditional deterministic data management mainly deals with deterministic data which is precise and certain, and cannot process uncertain data. Modeling uncertain data is a foundation of other technologies for further processing data, such as indexing, querying, searching, mapping, integrating, and mining data, etc. Probabilistic data models of relational databases, XML data and graph data are widely used in many applications and areas today, such as World Wide Web, semantic web, sensor networks, Internet of Things, mobile ad-hoc networks, social networks, traffic networks, biological networks, genome databases, and medical records, etc. This paper presents a survey study of different probabilistic models of uncertain data in relational databases, XML data, and graph data, respectively. The advantages and disadvantages of each kind of probabilistic modes are analyzed and compared. Further open topics of modeling uncertain probabilistic data such as semantic and computation aspects are discussed in the paper. Criteria for modeling uncertain data, such as expressive power, complexity, efficiency, extension are also proposed in the paper.
数据的不确定性是由数据本身、数据映射、数据策略等多种原因造成的。对于数据本身来说,由于各种原因,数据是不确定的。例如,由于设备或环境因素,来自传感器网络、物联网或射频识别的数据通常是不准确和不确定的。对于数据映射,由于数据映射不确定、数据不一致、数据缺失和脏数据,来自各种异构数据源的集成数据通常是不确定的。对于数据策略,是指根据组织中的数据隐私和数据机密性策略对数据进行修改或隐藏。但传统的确定性数据管理主要处理精确、确定的确定性数据,无法处理不确定性数据。不确定数据建模是进一步处理数据的其他技术的基础,如索引、查询、搜索、映射、集成和挖掘数据等。目前,关系数据库、XML数据和图形数据的概率数据模型被广泛应用于万维网、语义网、传感器网络、物联网、移动自组网、社交网络、交通网络、生物网络、基因组数据库、医疗记录等众多应用和领域。本文分别对关系数据库、XML数据和图形数据中不确定数据的不同概率模型进行了综述研究。分析比较了各种概率模式的优缺点。本文进一步讨论了不确定概率数据建模中语义和计算方面的开放性问题。本文还提出了不确定数据建模的表达能力、复杂性、效率、可拓性等标准。
{"title":"On Uncertain Probabilistic Data Modeling","authors":"Teng Lv, Ping Yan, Weimin He","doi":"10.14257/ijdta.2016.9.12.17","DOIUrl":"https://doi.org/10.14257/ijdta.2016.9.12.17","url":null,"abstract":"Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogonous data sources is commonly uncertain because of uncertain data mapping, data inconsistency, missing data, and dirty data. For data policy, data is modified or hided for policies of data privacy and data confidentiality in an organization. But traditional deterministic data management mainly deals with deterministic data which is precise and certain, and cannot process uncertain data. Modeling uncertain data is a foundation of other technologies for further processing data, such as indexing, querying, searching, mapping, integrating, and mining data, etc. Probabilistic data models of relational databases, XML data and graph data are widely used in many applications and areas today, such as World Wide Web, semantic web, sensor networks, Internet of Things, mobile ad-hoc networks, social networks, traffic networks, biological networks, genome databases, and medical records, etc. This paper presents a survey study of different probabilistic models of uncertain data in relational databases, XML data, and graph data, respectively. The advantages and disadvantages of each kind of probabilistic modes are analyzed and compared. Further open topics of modeling uncertain probabilistic data such as semantic and computation aspects are discussed in the paper. Criteria for modeling uncertain data, such as expressive power, complexity, efficiency, extension are also proposed in the paper.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83180407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model 一种基于相关度排序模型的抄袭来源检索与文本对齐方法
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.04
Lei-lei Kong, Zicheng Zhao, Zhimao Lu, Haoliang Qi, Feng Zhao
The problem of text plagiarism has increased because of the digital resources available on the World Wide Web. Source Retrieval and Text Alignment are two core tasks of plagiarism detection. A plagiarism source retrieval and text alignment system based on relevance ranking model is described in this paper. Not only the source retrieval task but also the text alignment task is all regarded as a process of information retrieval, and the relevance ranking is used to search the plagiarism sources and obtain the candidate plagiarism seeds. For source retrieval, BM25 model is used, while for text alignment, Vector Space Model is exploited. Furthermore, a plagiarism detection system named HawkEyes is developed based on the proposed methods and some demonstrations of HawkEyes are given.
由于万维网上可获得的数字资源,文本剽窃问题日益严重。来源检索和文本比对是抄袭检测的两个核心任务。介绍了一种基于相关度排序模型的抄袭源检索与文本对齐系统。无论是源检索任务还是文本对齐任务都将其视为一个信息检索过程,并通过相关度排序来搜索抄袭源,获得候选抄袭种子。源检索采用BM25模型,文本对齐采用向量空间模型。在此基础上,开发了一个名为HawkEyes的抄袭检测系统,并给出了一些演示。
{"title":"A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model","authors":"Lei-lei Kong, Zicheng Zhao, Zhimao Lu, Haoliang Qi, Feng Zhao","doi":"10.14257/IJDTA.2016.9.12.04","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.04","url":null,"abstract":"The problem of text plagiarism has increased because of the digital resources available on the World Wide Web. Source Retrieval and Text Alignment are two core tasks of plagiarism detection. A plagiarism source retrieval and text alignment system based on relevance ranking model is described in this paper. Not only the source retrieval task but also the text alignment task is all regarded as a process of information retrieval, and the relevance ranking is used to search the plagiarism sources and obtain the candidate plagiarism seeds. For source retrieval, BM25 model is used, while for text alignment, Vector Space Model is exploited. Furthermore, a plagiarism detection system named HawkEyes is developed based on the proposed methods and some demonstrations of HawkEyes are given.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79628710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
International journal of database theory and application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1