首页 > 最新文献

International journal of database theory and application最新文献

英文 中文
SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce 使用MapReduce进行高效OLAP查询处理的sql到MapReduce转换
Pub Date : 2017-06-30 DOI: 10.14257/IJDTA.2017.10.6.05
Hyeon Gyu Kim
Substantial research has addressed that frequent I/O required for scalability and faulttolerance sacrifices efficiency of MapReduce. Regarding this, our previous work discussed a method to reduce I/O cost when processing OLAP queries with MapReduce. The method can be implemented simply by providing an SQL-to-MapReduce translator on top of the MapReduce framework and needs not modify the underlying framework. In this paper, we present techniques to translate SQL queries into corresponding MapReduce programs which support the method discussed in our previous work for I/O cost reduction.
大量的研究表明,可伸缩性和容错所需的频繁I/O会牺牲MapReduce的效率。关于这一点,我们之前的工作讨论了使用MapReduce处理OLAP查询时减少I/O成本的方法。该方法可以简单地通过在MapReduce框架之上提供SQL-to-MapReduce转换器来实现,而不需要修改底层框架。在本文中,我们介绍了将SQL查询转换为相应MapReduce程序的技术,这些程序支持我们之前的工作中讨论的降低I/O成本的方法。
{"title":"SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce","authors":"Hyeon Gyu Kim","doi":"10.14257/IJDTA.2017.10.6.05","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.6.05","url":null,"abstract":"Substantial research has addressed that frequent I/O required for scalability and faulttolerance sacrifices efficiency of MapReduce. Regarding this, our previous work discussed a method to reduce I/O cost when processing OLAP queries with MapReduce. The method can be implemented simply by providing an SQL-to-MapReduce translator on top of the MapReduce framework and needs not modify the underlying framework. In this paper, we present techniques to translate SQL queries into corresponding MapReduce programs which support the method discussed in our previous work for I/O cost reduction.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84396797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Capital Markets Prediction: Multi-Faceted Sentiment Analysis using Supervised Machine Learning 资本市场预测:使用监督机器学习的多方面情绪分析
Pub Date : 2017-06-30 DOI: 10.14257/IJDTA.2017.10.6.07
Kushatha Kelebeng, H. Hlomani
Over the years the stock market has proved to be very difficult to predict due to its unpredictable activities. Data mining techniques such as clustering, decision trees, genetic algorithms and artificial neural networks have been used in order to predict the stock market. Although there has been a significant amount of research done in this area, there are still many issues that have not been explored yet. The impact of fundamental analysis in the prediction of the stock market has been ignored though it can play a vital role in the prediction of the stock market. In this research, the problem of how a social data sentiment correlates to stock price is studied. A stock price prediction model was built using social data sentiments to predict the stock market. Sentiments analysis principles were applied to machine learning techniques in order to find the correlation between the stock market and public sentiments. This study particularly intended to assess the predictability of prices on the Botswana Stock Exchange through the application of Facebook sentiments classification. Three classification models were created that depicted news polarity as happy, calm, alert and vital. Results show that Naïve Bayes and Support vector machine performed well in both types of testing as compared to Random Forest. Naïve Bayes gave good results in terms of error margins with an accuracy of 83.3% making it the best classifier for our data set. When plotting the time series plot of sentiment scores and comparing it to the actual stock price graph, a conclusion can be reached that sentiments and stock prices are related and thus stock prices can be predicted using sentiments.
多年来,由于其不可预测的活动,股票市场被证明是非常难以预测的。数据挖掘技术如聚类、决策树、遗传算法和人工神经网络已被用于预测股票市场。虽然在这方面已经做了大量的研究,但仍有许多问题尚未探讨。虽然基本面分析在股票市场预测中起着至关重要的作用,但其在股票市场预测中的作用却一直被忽视。在本研究中,研究了社会数据情绪与股票价格之间的关系。利用社会数据情绪对股票市场进行预测,建立了股票价格预测模型。情绪分析原理应用于机器学习技术,以发现股票市场与公众情绪之间的相关性。本研究特别旨在通过应用Facebook情绪分类来评估博茨瓦纳证券交易所价格的可预测性。我们创建了三种分类模型,将新闻极性描述为快乐、平静、警惕和重要。结果表明Naïve与随机森林相比,贝叶斯和支持向量机在两种类型的测试中都表现良好。Naïve贝叶斯在误差范围方面给出了很好的结果,准确率为83.3%,使其成为我们数据集的最佳分类器。绘制情绪得分的时间序列图,并将其与实际股价图进行比较,可以得出情绪与股价相关的结论,因此可以使用情绪来预测股价。
{"title":"Capital Markets Prediction: Multi-Faceted Sentiment Analysis using Supervised Machine Learning","authors":"Kushatha Kelebeng, H. Hlomani","doi":"10.14257/IJDTA.2017.10.6.07","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.6.07","url":null,"abstract":"Over the years the stock market has proved to be very difficult to predict due to its unpredictable activities. Data mining techniques such as clustering, decision trees, genetic algorithms and artificial neural networks have been used in order to predict the stock market. Although there has been a significant amount of research done in this area, there are still many issues that have not been explored yet. The impact of fundamental analysis in the prediction of the stock market has been ignored though it can play a vital role in the prediction of the stock market. In this research, the problem of how a social data sentiment correlates to stock price is studied. A stock price prediction model was built using social data sentiments to predict the stock market. Sentiments analysis principles were applied to machine learning techniques in order to find the correlation between the stock market and public sentiments. This study particularly intended to assess the predictability of prices on the Botswana Stock Exchange through the application of Facebook sentiments classification. Three classification models were created that depicted news polarity as happy, calm, alert and vital. Results show that Naïve Bayes and Support vector machine performed well in both types of testing as compared to Random Forest. Naïve Bayes gave good results in terms of error margins with an accuracy of 83.3% making it the best classifier for our data set. When plotting the time series plot of sentiment scores and comparing it to the actual stock price graph, a conclusion can be reached that sentiments and stock prices are related and thus stock prices can be predicted using sentiments.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75605592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Topology-concerned Spatial Vector Data Model for Column-oriented Databases 面向列数据库的拓扑空间矢量数据模型
Pub Date : 2017-05-31 DOI: 10.14257/IJDTA.2017.10.5.04
Kun Zheng, M. Kwan, Falin Fang, Junjun Yin, D. Gu, Yanli Fu
In today’s “Big Data” era, the volume of spatial data grows rapidly. Addressing the challenges in efficient spatial Big Data storage and management becomes urgent. However, conventional row-based spatial databases have many limitations, such a slow data I/O efficiency, low data retrieval performance, poor scalability, and high maintenance costs. These conventional spatial databases are no longer suitable for today’s spatial Big Data. On the other hand, column-oriented databases have several superior features, such as high reliability, scalability and fault tolerance. More importantly, they have better I/O efficiency for query processing. This paper presents a topology-concerned spatial vector data model for column-oriented databases and designed the physical storage model, which is a unified model for storing and managing information of geometry, attribute and topology of spatial objects. For the storage characteristics of column-oriented databases, the model designed a new Rowkey encoding schema with the Z-order filling curve approach. This encoding schema of Rowkey considering spatial proximity optimizes the organizational structure of spatial data models. It means nearby spatial objects are also closer to each other in the physical storage, which can further improve the efficiency of spatial data storage and enable spatial query capability in column-oriented databases. Three experiments were conducted including data storing, range query and K-NN query to analyze the efficiency and spatial query capability of the data model. The results of the experiments show that the data model has good scalability and efficiency on the vector data storage and spatial query. It is suitable for large-scale spatial vector data storage and management in column-oriented databases.
在当今“大数据”时代,空间数据量快速增长。解决空间大数据高效存储和管理的挑战迫在眉睫。然而,传统的基于行的空间数据库存在数据I/O效率低、数据检索性能差、可扩展性差、维护成本高等诸多局限性。这些传统的空间数据库已经不适合今天的空间大数据。另一方面,面向列的数据库具有一些优越的特性,例如高可靠性、可伸缩性和容错性。更重要的是,它们具有更好的查询处理I/O效率。提出了面向列数据库的拓扑空间矢量数据模型,并设计了物理存储模型,作为存储和管理空间对象的几何、属性和拓扑信息的统一模型。针对面向列数据库的存储特点,该模型设计了一种新的采用z顺序填充曲线方法的Rowkey编码模式。这种考虑空间接近性的Rowkey编码模式优化了空间数据模型的组织结构。这意味着附近的空间对象在物理存储中也更接近,可以进一步提高空间数据存储的效率,实现面向列数据库的空间查询能力。通过数据存储、范围查询和K-NN查询三个实验,分析了该数据模型的效率和空间查询能力。实验结果表明,该模型在矢量数据存储和空间查询方面具有良好的可扩展性和效率。它适用于面向列的数据库中大规模空间矢量数据的存储和管理。
{"title":"A Topology-concerned Spatial Vector Data Model for Column-oriented Databases","authors":"Kun Zheng, M. Kwan, Falin Fang, Junjun Yin, D. Gu, Yanli Fu","doi":"10.14257/IJDTA.2017.10.5.04","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.5.04","url":null,"abstract":"In today’s “Big Data” era, the volume of spatial data grows rapidly. Addressing the challenges in efficient spatial Big Data storage and management becomes urgent. However, conventional row-based spatial databases have many limitations, such a slow data I/O efficiency, low data retrieval performance, poor scalability, and high maintenance costs. These conventional spatial databases are no longer suitable for today’s spatial Big Data. On the other hand, column-oriented databases have several superior features, such as high reliability, scalability and fault tolerance. More importantly, they have better I/O efficiency for query processing. This paper presents a topology-concerned spatial vector data model for column-oriented databases and designed the physical storage model, which is a unified model for storing and managing information of geometry, attribute and topology of spatial objects. For the storage characteristics of column-oriented databases, the model designed a new Rowkey encoding schema with the Z-order filling curve approach. This encoding schema of Rowkey considering spatial proximity optimizes the organizational structure of spatial data models. It means nearby spatial objects are also closer to each other in the physical storage, which can further improve the efficiency of spatial data storage and enable spatial query capability in column-oriented databases. Three experiments were conducted including data storing, range query and K-NN query to analyze the efficiency and spatial query capability of the data model. The results of the experiments show that the data model has good scalability and efficiency on the vector data storage and spatial query. It is suitable for large-scale spatial vector data storage and management in column-oriented databases.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75993915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on Mobile Database Model of Ad Hoc Network Based on Multi-parameter Weighted Clustering 基于多参数加权聚类的Ad Hoc网络移动数据库模型研究
Pub Date : 2017-05-31 DOI: 10.14257/IJDTA.2017.10.5.02
Tao Zhan, Lei Wang
{"title":"Research on Mobile Database Model of Ad Hoc Network Based on Multi-parameter Weighted Clustering","authors":"Tao Zhan, Lei Wang","doi":"10.14257/IJDTA.2017.10.5.02","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.5.02","url":null,"abstract":"","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86483987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Comparative Study of Different Dimensionality Reduction Methods with Naïve Bayes Classifier for Mapping Customer Requirements to Product Configurations 基于Naïve贝叶斯分类器的不同降维方法在客户需求到产品配置映射中的比较研究
Pub Date : 2017-05-31 DOI: 10.14257/ijdta.2017.10.5.05
Yao Jiao, Yu Yang
Mapping customer requirements to product configurations are difficult due to the uncertainty and ambiguity of customers’ expression. The Naïve Bayes Classifier (NBC) is suitable to quantify the expression of customers, and to map their requirements to configurations with good performance. However, the prerequisite of manually independent of product attributes for NBC require preprocess. Dimensionality reduction methods are effective for simplifying the data complexity while separating the correlations between data Against the background, this paper conducts a comparative study of 7 dimensionality reduction methods as preprocess procedure for integrating with NBC to map customer requirements to product configurations. Two realistic design cases are illustrated for the comparison, and the outcomes are measured by the accuracy and F-measure. The results of this study imply several findings that the loss of information has great impact on all methods, and linear methods are more sensitive to the loss of information, and several nonlinear methods are more capable in handling the loss of information than other methods, and local linear methods are suggested compared with global nonlinear methods.
由于客户表达的不确定性和模糊性,将客户需求映射到产品配置是困难的。Naïve贝叶斯分类器(NBC)适合于量化客户的表达,并将客户的需求映射到性能良好的配置。然而,手工独立于产品属性的前提条件需要对NBC进行预处理。降维方法可以有效地简化数据复杂性,同时分离数据之间的相关性。在此背景下,本文对7种降维方法作为集成NBC将客户需求映射到产品配置的预处理程序进行了对比研究。以两个实际设计案例为例进行比较,并通过精度和F-measure来衡量结果。本研究的结果表明,信息丢失对所有方法都有很大的影响,线性方法对信息丢失更敏感,一些非线性方法比其他方法更能处理信息丢失,与全局非线性方法相比,建议采用局部线性方法。
{"title":"A Comparative Study of Different Dimensionality Reduction Methods with Naïve Bayes Classifier for Mapping Customer Requirements to Product Configurations","authors":"Yao Jiao, Yu Yang","doi":"10.14257/ijdta.2017.10.5.05","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.5.05","url":null,"abstract":"Mapping customer requirements to product configurations are difficult due to the uncertainty and ambiguity of customers’ expression. The Naïve Bayes Classifier (NBC) is suitable to quantify the expression of customers, and to map their requirements to configurations with good performance. However, the prerequisite of manually independent of product attributes for NBC require preprocess. Dimensionality reduction methods are effective for simplifying the data complexity while separating the correlations between data Against the background, this paper conducts a comparative study of 7 dimensionality reduction methods as preprocess procedure for integrating with NBC to map customer requirements to product configurations. Two realistic design cases are illustrated for the comparison, and the outcomes are measured by the accuracy and F-measure. The results of this study imply several findings that the loss of information has great impact on all methods, and linear methods are more sensitive to the loss of information, and several nonlinear methods are more capable in handling the loss of information than other methods, and local linear methods are suggested compared with global nonlinear methods.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75009311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic QoS evaluation for Web Services Using Data Envelopment Analysis on Real-time Status 基于实时状态数据包络分析的Web服务动态QoS评价
Pub Date : 2017-05-31 DOI: 10.14257/ijdta.2017.10.5.06
Luda Wang, Peng Zhang
Service run-status monitoring can provide treal-time status for service QoS evaluation as service properties. In this work, dynamic QoS evaluation for Web services are based on DEA. Proposed methods could be used to analyze real-time status of Web services. DEA-based service performance evaluation is implemented by a multi-objective model, and DEA-based service QoS evaluation is implemented by a multi-objective model with critical real-time status performance. Both models are effective depending on particular argument and validation. Dynamic QoS evaluation for Web services are based on DEA could provide performance and QoS information to service composition.
服务运行状态监控可以作为服务属性为服务QoS评估提供实时状态。本文提出了一种基于DEA的Web服务动态QoS评价方法。所提出的方法可用于分析Web服务的实时状态。基于dea的业务性能评估采用多目标模型实现,基于dea的业务QoS评估采用具有关键实时状态性能的多目标模型实现。这两种模型都是有效的,取决于特定的论证和验证。基于DEA的Web服务动态QoS评价可以为服务组合提供性能和QoS信息。
{"title":"Dynamic QoS evaluation for Web Services Using Data Envelopment Analysis on Real-time Status","authors":"Luda Wang, Peng Zhang","doi":"10.14257/ijdta.2017.10.5.06","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.5.06","url":null,"abstract":"Service run-status monitoring can provide treal-time status for service QoS evaluation as service properties. In this work, dynamic QoS evaluation for Web services are based on DEA. Proposed methods could be used to analyze real-time status of Web services. DEA-based service performance evaluation is implemented by a multi-objective model, and DEA-based service QoS evaluation is implemented by a multi-objective model with critical real-time status performance. Both models are effective depending on particular argument and validation. Dynamic QoS evaluation for Web services are based on DEA could provide performance and QoS information to service composition.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90483701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Internet of Things in Cloud Environment: Services and Challenges 云环境下的物联网:服务与挑战
Pub Date : 2017-05-31 DOI: 10.14257/IJDTA.2017.10.5.03
Karuna Lochab, D. Yadav, Mayank Singh, A. Sharmab
In this paper evolution of IoT along with its intimacy with cloud is described. Application of IoT in various domains and the services are major research areas and can be enhanced further for more effectiveness and efficiency. Limitations associated with this technique are also addressed and appropriate solutions suggested. The future lies in the ‘Internet of Everything’. We have proposed a model which can be used in a wide collection of domains such as by Indian army, military, air-force, navy etc. The main focus of the proposed work is to evolve a combined approach (IoT + Cloud) into a single technology IoTC which will have the benefits of both the technologies and may overcome the shortcomings. It is beneficial in terms of space and time complexity, security, precision and accuracy of result. IoTC can further be applied to services of critically important services.
本文描述了物联网的发展及其与云的密切关系。物联网在各个领域和服务中的应用是主要的研究领域,可以进一步增强其有效性和效率。还讨论了与此技术相关的局限性,并提出了适当的解决方案。未来在于“万物互联”。我们提出了一个模型,可以在印度陆军、军队、空军、海军等广泛领域使用。拟议工作的主要重点是将一种组合方法(物联网+云)演变为一种单一的物联网技术,这种技术将具有两种技术的优点,并可能克服缺点。它在空间和时间复杂性、安全性、结果的精密度和准确性方面都是有益的。IoTC可以进一步应用于至关重要的服务。
{"title":"Internet of Things in Cloud Environment: Services and Challenges","authors":"Karuna Lochab, D. Yadav, Mayank Singh, A. Sharmab","doi":"10.14257/IJDTA.2017.10.5.03","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.5.03","url":null,"abstract":"In this paper evolution of IoT along with its intimacy with cloud is described. Application of IoT in various domains and the services are major research areas and can be enhanced further for more effectiveness and efficiency. Limitations associated with this technique are also addressed and appropriate solutions suggested. The future lies in the ‘Internet of Everything’. We have proposed a model which can be used in a wide collection of domains such as by Indian army, military, air-force, navy etc. The main focus of the proposed work is to evolve a combined approach (IoT + Cloud) into a single technology IoTC which will have the benefits of both the technologies and may overcome the shortcomings. It is beneficial in terms of space and time complexity, security, precision and accuracy of result. IoTC can further be applied to services of critically important services.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80618499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Public Cloud Storage for the Seismic Big Data Based on Amazon EC2 Cluster and Hadoop 基于Amazon EC2集群和Hadoop的地震大数据公有云存储
Pub Date : 2017-05-31 DOI: 10.14257/ijdta.2017.10.5.01
Jie Xiong, Song Zhang
The seismic data expanded rapidly in recent years, whose size could be up to hundreds TBs, as modern seismic aquisition technologies were employed. How to store and access the seismic big data efficiently is an emergency problem for the oil industry and scientific research. A public cloud storage scheme for the seismic big data is proposed based on the Amazon EC2 and Hadoop. The IO performance evaluation results show that the proposed public cloud storage scheme has advantages of high IO performance and good scalability. It is suitable for the seismic big data storage and access.
近年来,随着现代地震采集技术的应用,地震数据规模迅速扩大,规模可达数百tb。如何高效地存储和访问地震大数据,是石油工业和科研领域亟待解决的问题。提出了一种基于Amazon EC2和Hadoop的地震大数据公有云存储方案。IO性能评估结果表明,所提出的公有云存储方案具有IO性能高、可扩展性好等优点。它适用于地震大数据的存储和访问。
{"title":"Public Cloud Storage for the Seismic Big Data Based on Amazon EC2 Cluster and Hadoop","authors":"Jie Xiong, Song Zhang","doi":"10.14257/ijdta.2017.10.5.01","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.5.01","url":null,"abstract":"The seismic data expanded rapidly in recent years, whose size could be up to hundreds TBs, as modern seismic aquisition technologies were employed. How to store and access the seismic big data efficiently is an emergency problem for the oil industry and scientific research. A public cloud storage scheme for the seismic big data is proposed based on the Amazon EC2 and Hadoop. The IO performance evaluation results show that the proposed public cloud storage scheme has advantages of high IO performance and good scalability. It is suitable for the seismic big data storage and access.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79812769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grid-based k-Nearest Neighbor Queries over Moving Object Trajectories with MapReduce 基于网格的移动对象轨迹k近邻查询
Pub Date : 2017-04-30 DOI: 10.14257/IJDTA.2017.10.4.01
Ying Xia, Ruidi Wang, Xu Zhang, Hae-Young Bae
k-Nearest Neighbor Trajectory (k-NNT) Query is a basic and important spatial query operation widely used in many fields, such as intelligent transportation and urban planning. However, with the rapid increase of trajectory data volume, traditional k-NNT query algorithms for centralized environment are not effective and scalable enough, because the computational complexity increases dramatically when the spatial continuity of trajectories is considered. To address this problem, we propose a distributed grid index for trajectory data which partitions the trajectory into grids under MapReduce framework. Furthermore, a parallel query approach MR-GB-KNNT is proposed based on the proposed grid index to improve the efficiency and scalability of the k-NNT query. The experiment demonstrates that MR-GB-KNNT could perform well in cloud computing environment and improve the querying performance of the k-NNT.
k-最近邻轨迹查询(k-NNT)是一种基本而重要的空间查询操作,广泛应用于智能交通、城市规划等领域。然而,随着轨迹数据量的迅速增加,传统的k-NNT集中式环境查询算法在考虑轨迹空间连续性时计算复杂度急剧增加,其有效性和可扩展性不足。为了解决这个问题,我们提出了一种分布式的轨迹数据网格索引,该索引在MapReduce框架下将轨迹划分为网格。在此基础上,提出了一种基于网格索引的并行查询方法MR-GB-KNNT,以提高k-NNT查询的效率和可扩展性。实验结果表明,MR-GB-KNNT在云计算环境下具有良好的性能,提高了k-NNT的查询性能。
{"title":"Grid-based k-Nearest Neighbor Queries over Moving Object Trajectories with MapReduce","authors":"Ying Xia, Ruidi Wang, Xu Zhang, Hae-Young Bae","doi":"10.14257/IJDTA.2017.10.4.01","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.4.01","url":null,"abstract":"k-Nearest Neighbor Trajectory (k-NNT) Query is a basic and important spatial query operation widely used in many fields, such as intelligent transportation and urban planning. However, with the rapid increase of trajectory data volume, traditional k-NNT query algorithms for centralized environment are not effective and scalable enough, because the computational complexity increases dramatically when the spatial continuity of trajectories is considered. To address this problem, we propose a distributed grid index for trajectory data which partitions the trajectory into grids under MapReduce framework. Furthermore, a parallel query approach MR-GB-KNNT is proposed based on the proposed grid index to improve the efficiency and scalability of the k-NNT query. The experiment demonstrates that MR-GB-KNNT could perform well in cloud computing environment and improve the querying performance of the k-NNT.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78081889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Current Situation and Application of Graph Data Mining Technology 图数据挖掘技术的现状及应用
Pub Date : 2017-03-31 DOI: 10.14257/ijdta.2017.10.3.01
Meng Zhang, Pingping Wei, Suzhi Zhang, Jiaxing Xu
As an important data structure, graph can be used to describe the complex relationship among stuffs. With the setting up of social network, web network and other network in figure data, data mining technology has gradually become a hot research. Traditional data mining technology has been applied to the field of graph data mining constantly. Consequently the development of the graph data mining technology has been accelerated. This paper demonstrates the definition of graph data, and the current graph data mining algorithms which include graph classification, graph clustering, query graph, graph matching, graph of frequent subgraph mining, and graphic database development status. At last, what challenges graph mining technology confronts is illustrated in this paper.
图是一种重要的数据结构,可以用来描述事物之间的复杂关系。随着社交网络、web网络等网络在图形数据中的建立,数据挖掘技术逐渐成为研究的热点。传统的数据挖掘技术在图数据挖掘领域得到了不断的应用。从而加速了图数据挖掘技术的发展。本文阐述了图数据的定义,当前的图数据挖掘算法,包括图分类、图聚类、查询图、图匹配、频繁子图挖掘,以及图数据库的发展现状。最后,本文阐述了图挖掘技术面临的挑战。
{"title":"Current Situation and Application of Graph Data Mining Technology","authors":"Meng Zhang, Pingping Wei, Suzhi Zhang, Jiaxing Xu","doi":"10.14257/ijdta.2017.10.3.01","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.3.01","url":null,"abstract":"As an important data structure, graph can be used to describe the complex relationship among stuffs. With the setting up of social network, web network and other network in figure data, data mining technology has gradually become a hot research. Traditional data mining technology has been applied to the field of graph data mining constantly. Consequently the development of the graph data mining technology has been accelerated. This paper demonstrates the definition of graph data, and the current graph data mining algorithms which include graph classification, graph clustering, query graph, graph matching, graph of frequent subgraph mining, and graphic database development status. At last, what challenges graph mining technology confronts is illustrated in this paper.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91270408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International journal of database theory and application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1