首页 > 最新文献

International Journal of Database Management Systems最新文献

英文 中文
A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using PISA 2018 Data 基于PISA 2018数据的数据挖掘方法和层次线性建模的比较分析
Pub Date : 2023-06-27 DOI: 10.5121/ijdms.2023.15301
Wenting Weng, Wen Luo
Educational research often encounters clustered data sets, where observations are organized into multilevel units, consisting of lower-level units (individuals) nested within higher-level units (clusters). However, many studies in education utilize tree-based methods like Random Forest without considering the hierarchical structure of the data sets. Neglecting the clustered data structure can result in biased or inaccurate results. To address this issue, this study aimed to conduct a comprehensive survey of three tree- based data mining algorithms and hierarchical linear modeling (HLM). The study utilized the Programme for International Student Assessment (PISA) 2018 data to compare different methods, including non-mixed- effects tree models (e.g., Random Forest) and mixed-effects tree models (e.g., random effects expectation minimization recursive partitioning method, mixed-effects Random Forest), as well as the HLM approach. Based on the findings of this study, mixed-effects Random Forest demonstrated the highest prediction accuracy, while the random effects expectation minimization recursive partitioning method had the lowest prediction accuracy. However, it is important to note that tree-based methods limit deep interpretation of the results. Therefore, further analysis is needed to gain a more comprehensive understanding. In comparison, the HLM approach retains its value in terms of interpretability. Overall, this study offers valuable insights for selecting and utilizing suitable methods when analyzing clustered educational datasets.
教育研究经常遇到聚类数据集,其中观察被组织成多层单元,由嵌套在较高级别单元(集群)中的较低级别单元(个人)组成。然而,许多教育研究使用随机森林等基于树的方法,而没有考虑数据集的层次结构。忽略聚类数据结构可能导致有偏差或不准确的结果。为了解决这一问题,本研究旨在对三种基于树的数据挖掘算法和层次线性建模(HLM)进行全面的研究。该研究利用国际学生评估项目(PISA) 2018年的数据来比较不同的方法,包括非混合效应树模型(如随机森林)和混合效应树模型(如随机效应期望最小化递归划分法、混合效应随机森林),以及HLM方法。从本研究结果来看,混合效应随机森林预测精度最高,而随机效应期望最小化递归划分方法预测精度最低。然而,需要注意的是,基于树的方法限制了对结果的深入解释。因此,需要进一步的分析来获得更全面的了解。相比之下,HLM方法在可解释性方面保留了其价值。总的来说,本研究为在分析聚类教育数据集时选择和使用合适的方法提供了有价值的见解。
{"title":"A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using PISA 2018 Data","authors":"Wenting Weng, Wen Luo","doi":"10.5121/ijdms.2023.15301","DOIUrl":"https://doi.org/10.5121/ijdms.2023.15301","url":null,"abstract":"Educational research often encounters clustered data sets, where observations are organized into multilevel units, consisting of lower-level units (individuals) nested within higher-level units (clusters). However, many studies in education utilize tree-based methods like Random Forest without considering the hierarchical structure of the data sets. Neglecting the clustered data structure can result in biased or inaccurate results. To address this issue, this study aimed to conduct a comprehensive survey of three tree- based data mining algorithms and hierarchical linear modeling (HLM). The study utilized the Programme for International Student Assessment (PISA) 2018 data to compare different methods, including non-mixed- effects tree models (e.g., Random Forest) and mixed-effects tree models (e.g., random effects expectation minimization recursive partitioning method, mixed-effects Random Forest), as well as the HLM approach. Based on the findings of this study, mixed-effects Random Forest demonstrated the highest prediction accuracy, while the random effects expectation minimization recursive partitioning method had the lowest prediction accuracy. However, it is important to note that tree-based methods limit deep interpretation of the results. Therefore, further analysis is needed to gain a more comprehensive understanding. In comparison, the HLM approach retains its value in terms of interpretability. Overall, this study offers valuable insights for selecting and utilizing suitable methods when analyzing clustered educational datasets.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115306045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of the Use of R Programming for data Science Research in Botswana 在博茨瓦纳数据科学研究中使用R编程的回顾
Pub Date : 2023-02-27 DOI: 10.5121/ijdms.2023.15101
Simisani Ndaba
R is widely used by researchers in the statistics field and academia. In Botswana, it is used in a few research for data analysis. The paper aims to synthesis research conducted in Botswana that has used R programming for data analysis and to demonstrate to data scientists, the R community in Botswana and internationally the gaps and applications in practice in research work using R in the context of Botswana. The paper followed the PRISMA methodology and the articles were taken from information technology databases. The findings show that research conducted in Botswana that use R programming were used in Health Care, Climatology, Conservation and Physical Geography, with R part as the most used R package across the research areas. It was also found that a lot of R packages are used in Health care for genomics, plotting, networking and classification was the common model used across research areas.
R被统计领域和学术界的研究人员广泛使用。在博茨瓦纳,它被用于一些数据分析研究。本文旨在综合在博茨瓦纳进行的使用R编程进行数据分析的研究,并向数据科学家、博茨瓦纳的R社区和国际上的R社区展示在博茨瓦纳背景下使用R的研究工作中的差距和实践应用。本文采用PRISMA方法,文章取自信息技术数据库。研究结果表明,在博茨瓦纳进行的使用R编程的研究被用于卫生保健、气候学、保护和自然地理学,其中R部分是整个研究领域中使用最多的R包。研究还发现,在医疗保健中,许多R包用于基因组学、绘图、网络和分类是跨研究领域使用的通用模型。
{"title":"A Review of the Use of R Programming for data Science Research in Botswana","authors":"Simisani Ndaba","doi":"10.5121/ijdms.2023.15101","DOIUrl":"https://doi.org/10.5121/ijdms.2023.15101","url":null,"abstract":"R is widely used by researchers in the statistics field and academia. In Botswana, it is used in a few research for data analysis. The paper aims to synthesis research conducted in Botswana that has used R programming for data analysis and to demonstrate to data scientists, the R community in Botswana and internationally the gaps and applications in practice in research work using R in the context of Botswana. The paper followed the PRISMA methodology and the articles were taken from information technology databases. The findings show that research conducted in Botswana that use R programming were used in Health Care, Climatology, Conservation and Physical Geography, with R part as the most used R package across the research areas. It was also found that a lot of R packages are used in Health care for genomics, plotting, networking and classification was the common model used across research areas.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123995597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scaling Distributed Database Joins by Decoupling Computation and Communication 通过解耦计算和通信扩展分布式数据库连接
Pub Date : 2023-02-27 DOI: 10.5121/ijdms.2023.15102
Abhirup Chakraborty
To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.
为了处理大量数据,现代数据管理系统使用通过网络连接的机器集合。分布式联接是现代数据密集型系统中计算和通信密集型的工作负载,本文提出了分布式联接处理的框架和算法。通过利用单个机器中的多个处理核心,我们实现了一个处理数据库连接的系统,该系统可以并行化每个节点内的计算,将计算与通信结合起来,通过允许多个同步数据传输(发送/接收)来并行化通信。我们的实验结果表明,与单线程相比,每个节点仅使用四个线程,框架在节点内的性能提高了3.5倍。此外,对于连接处理工作负载,可以观察到集群范围的性能(和加速)由节点内计算负载决定;随着节点的增加,该特性带来了近乎线性的加速,这是现代大规模数据处理系统所渴望的特性。
{"title":"Scaling Distributed Database Joins by Decoupling Computation and Communication","authors":"Abhirup Chakraborty","doi":"10.5121/ijdms.2023.15102","DOIUrl":"https://doi.org/10.5121/ijdms.2023.15102","url":null,"abstract":"To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129265756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Comparison between Pytorch and Mindspore Pytorch和Mindspore性能比较
Pub Date : 2022-04-30 DOI: 10.5121/ijdms.2022.14201
Xiangyu Xia, Shaoxiang Zhou
Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEI's MindSpore has better performance than FaceBook's PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.
深度学习已经在许多领域得到了很好的应用。然而,在训练神经网络时存在大量的数据,这使得许多深度学习框架似乎是为深度学习从业者服务的,提供了更方便使用、性能更好的服务。MindSpore和PyTorch都是深度学习框架。MindSpore归华为所有,而PyTorch归Facebook所有。有人认为华为的MindSpore比FaceBook的PyTorch性能更好,这让深度学习从业者对两者的选择感到困惑。在本文中,我们进行了分析和实验分析,以揭示MIndSpore和PyTorch在单个GPU上的训练速度比较。为了确保我们的调查尽可能全面,我们精心选择了两个主要领域的神经网络,包括计算机视觉和自然语言处理(NLP)。这项工作的贡献是双重的。首先,我们对MindSpore和PyTorch进行了详细的基准测试实验,分析了它们性能差异的原因。这项工作为最终用户在这两个框架之间进行选择提供了指导。
{"title":"Performance Comparison between Pytorch and Mindspore","authors":"Xiangyu Xia, Shaoxiang Zhou","doi":"10.5121/ijdms.2022.14201","DOIUrl":"https://doi.org/10.5121/ijdms.2022.14201","url":null,"abstract":"Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEI's MindSpore has better performance than FaceBook's PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131697807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Healthbot for Polycystic Ovarian Syndrome 多囊卵巢综合征健康机器人
Pub Date : 2021-04-30 DOI: 10.5121/IJDMS.2021.13201
Jeyshree Krishnaswamy Sundararajan, Yanyan Li, A. Hadaegh
Polycystic ovarian syndrome(PCOS) is one of the predominant hormonal imbalances present in women of reproductive age. It needs to be diagnosed and treated at an earlier stage as it's inter-related to diabetes, high cholesterol levels, and obesity. This paper presents an application specially designed for women to help them keep track of their Body Mass Index, Blood Sugar, and Blood Pressure based on their age. The people diagnosed with PCOS(an endocrine disorder) can use this application to make their life easy since it helps follow certain exercises, diets, and timely reminders for water and medicines. It has features like the period tracker to track the user’s menstrual cycle, find dieticians nearby, links to various PCOS supplements, users can track their moods during different menstrual phases and control their mood swings. Finally, the application has games to add that interactive touch.
多囊卵巢综合征(PCOS)是一个主要的激素失衡存在于育龄妇女。由于它与糖尿病、高胆固醇水平和肥胖有关,因此需要在早期诊断和治疗。本文介绍了一个专门为女性设计的应用程序,可以帮助她们根据年龄跟踪自己的体重指数、血糖和血压。被诊断患有多囊卵巢综合征(一种内分泌紊乱)的人可以使用这个应用程序使他们的生活变得轻松,因为它有助于遵循某些运动,饮食,及时提醒水和药物。它的功能包括跟踪用户的月经周期,找到附近的营养师,各种多囊卵巢综合征补充剂的链接,用户可以在不同的月经阶段跟踪自己的情绪,控制自己的情绪波动。最后,该应用程序还有一些游戏可以添加这种互动性。
{"title":"Healthbot for Polycystic Ovarian Syndrome","authors":"Jeyshree Krishnaswamy Sundararajan, Yanyan Li, A. Hadaegh","doi":"10.5121/IJDMS.2021.13201","DOIUrl":"https://doi.org/10.5121/IJDMS.2021.13201","url":null,"abstract":"Polycystic ovarian syndrome(PCOS) is one of the predominant hormonal imbalances present in women of reproductive age. It needs to be diagnosed and treated at an earlier stage as it's inter-related to diabetes, high cholesterol levels, and obesity. This paper presents an application specially designed for women to help them keep track of their Body Mass Index, Blood Sugar, and Blood Pressure based on their age. The people diagnosed with PCOS(an endocrine disorder) can use this application to make their life easy since it helps follow certain exercises, diets, and timely reminders for water and medicines. It has features like the period tracker to track the user’s menstrual cycle, find dieticians nearby, links to various PCOS supplements, users can track their moods during different menstrual phases and control their mood swings. Finally, the application has games to add that interactive touch.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124188472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping Common Errors in Entity Relationship Diagram Design of Novice Designers 新手设计师实体关系图设计中的常见错误映射
Pub Date : 2021-02-28 DOI: 10.5121/IJDMS.2021.13101
Rami Rashkovits, I. Lavy
Data modeling in the context of database design is a challenging task for any database designer, even more so for novice designers. A proper database schema is a key factor for the success of any information systems, hence conceptual data modeling that yields the database schema is an essential process of the system development. However, novice designers encounter difficulties in understanding and implementing such models. This study aims to identify the difficulties in understanding and implementing data models and explore the origins of these difficulties. This research examines the data model produced by students and maps the errors done by the students. The errors were classified using the SOLO taxonomy. The study also sheds light on the underlying reasons for the errors done during the design of the data model based on interviews conducted with a representative group of the study participants. We also suggest ways to improve novice designer's performances more effectively, so they can draw more accurate models and make use of advanced design constituents such as entity hierarchies, ternary relationships, aggregated entities, and alike. The research findings might enrich the data body research on data model design from the students' perspectives.
数据库设计上下文中的数据建模对于任何数据库设计人员来说都是一项具有挑战性的任务,对于新手设计人员来说更是如此。适当的数据库模式是任何信息系统成功的关键因素,因此产生数据库模式的概念数据建模是系统开发的基本过程。然而,新手设计师在理解和实现这些模型时会遇到困难。本研究旨在识别理解和实现数据模型的困难,并探讨这些困难的根源。本研究检验了学生产生的数据模型,并绘制了学生所犯错误的地图。使用SOLO分类法对错误进行分类。该研究还揭示了数据模型设计过程中所犯错误的潜在原因,这些错误是基于对研究参与者的代表性群体进行的访谈。我们还提出了更有效地提高新手设计人员性能的方法,以便他们能够绘制更准确的模型,并利用高级设计成分,如实体层次结构、三元关系、聚合实体等。研究结果可以从学生的角度丰富数据模型设计的数据体研究。
{"title":"Mapping Common Errors in Entity Relationship Diagram Design of Novice Designers","authors":"Rami Rashkovits, I. Lavy","doi":"10.5121/IJDMS.2021.13101","DOIUrl":"https://doi.org/10.5121/IJDMS.2021.13101","url":null,"abstract":"Data modeling in the context of database design is a challenging task for any database designer, even more so for novice designers. A proper database schema is a key factor for the success of any information systems, hence conceptual data modeling that yields the database schema is an essential process of the system development. However, novice designers encounter difficulties in understanding and implementing such models. This study aims to identify the difficulties in understanding and implementing data models and explore the origins of these difficulties. This research examines the data model produced by students and maps the errors done by the students. The errors were classified using the SOLO taxonomy. The study also sheds light on the underlying reasons for the errors done during the design of the data model based on interviews conducted with a representative group of the study participants. We also suggest ways to improve novice designer's performances more effectively, so they can draw more accurate models and make use of advanced design constituents such as entity hierarchies, ternary relationships, aggregated entities, and alike. The research findings might enrich the data body research on data model design from the students' perspectives.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126344314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Re-optimization for Multi-objective Cloud Database Query Processing using Machine Learning 基于机器学习的多目标云数据库查询处理再优化
Pub Date : 2021-02-28 DOI: 10.5121/IJDMS.2021.13102
Chenxiao Wang, Z. Arani, L. Gruenwald, Laurent d'Orazio, Eleazar Leal
In cloud environments, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer of a cloud database management system (DBMS) to select an optimal query execution plan (QEP). In order to optimize a query with a more accurate cost estimation, performing query re-optimizations during the query execution has been proposed in the literature. However, some of there-optimizations may not provide any performance gain in terms of query response time or monetary costs, which are the two optimization objectives for cloud databases, and may also have negative impacts on the performance due to their overheads. This raises the question of how to determine when are-optimization is beneficial. In this paper, we present a technique called ReOptML that uses machine learning to enable effective re-optimizations. This technique executes a query in stages, employs a machine learning model to predict whether a query re-optimization is beneficial after a stage is executed, and invokes the query optimizer to perform the re-optimization automatically. The experiments comparing ReOptML with existing query re-optimization algorithms show that ReOptML improves query response time from 13% to 35% for skew data and from 13% to 21% for uniform data, and improves monetary cost paid to cloud service providers from 17% to 35% on skewdata.
在云环境中,硬件配置、数据使用和工作负载分配都在不断变化。这些变化使得云数据库管理系统(DBMS)的查询优化器难以选择最佳的查询执行计划(QEP)。为了以更准确的成本估计优化查询,文献中已经提出在查询执行期间执行查询重新优化。但是,就查询响应时间或货币成本(这是云数据库的两个优化目标)而言,某些优化可能不会提供任何性能增益,而且由于它们的开销,还可能对性能产生负面影响。这就提出了一个问题,即如何确定何时进行非优化是有益的。在本文中,我们提出了一种称为ReOptML的技术,它使用机器学习来实现有效的重新优化。该技术分阶段执行查询,使用机器学习模型来预测执行某个阶段后查询重新优化是否有益,并调用查询优化器自动执行重新优化。对比ReOptML与现有查询重新优化算法的实验表明,ReOptML将倾斜数据的查询响应时间从13%提高到35%,将统一数据的查询响应时间从13%提高到21%,并将向云服务提供商支付的货币成本从17%提高到35%。
{"title":"Re-optimization for Multi-objective Cloud Database Query Processing using Machine Learning","authors":"Chenxiao Wang, Z. Arani, L. Gruenwald, Laurent d'Orazio, Eleazar Leal","doi":"10.5121/IJDMS.2021.13102","DOIUrl":"https://doi.org/10.5121/IJDMS.2021.13102","url":null,"abstract":"In cloud environments, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer of a cloud database management system (DBMS) to select an optimal query execution plan (QEP). In order to optimize a query with a more accurate cost estimation, performing query re-optimizations during the query execution has been proposed in the literature. However, some of there-optimizations may not provide any performance gain in terms of query response time or monetary costs, which are the two optimization objectives for cloud databases, and may also have negative impacts on the performance due to their overheads. This raises the question of how to determine when are-optimization is beneficial. In this paper, we present a technique called ReOptML that uses machine learning to enable effective re-optimizations. This technique executes a query in stages, employs a machine learning model to predict whether a query re-optimization is beneficial after a stage is executed, and invokes the query optimizer to perform the re-optimization automatically. The experiments comparing ReOptML with existing query re-optimization algorithms show that ReOptML improves query response time from 13% to 35% for skew data and from 13% to 21% for uniform data, and improves monetary cost paid to cloud service providers from 17% to 35% on skewdata.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big Data Storage System Based on a Distributed Hash Tables System 基于分布式哈希表系统的大数据存储系统
Pub Date : 2020-10-30 DOI: 10.5121/ijdms.2020.12501
Telesphore Tiendrebeogo, Mamadou Diarra
The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.
考虑到数字是消费者日常生活中主要的交流形式,大数据是不可避免的。对其股份和数据质量的控制必须是一个优先事项,以免为了获取利润而扭曲处理这些数据所产生的战略。为了实现这一点,公司进行了大量的研究工作,并创建了几个平台。MapReduce是一种使能技术,已被证明适用于广泛的领域。然而,尽管它很重要,但最近的工作显示出它的局限性。为了解决这个问题,我们使用了分布式哈希表(DHT)。因此,本文不仅对MapReduce的实现和顶级域(TLD)进行了一般的分析,而且对DHT的模型进行了描述,并为今后的研究提供了一些指导方针。
{"title":"Big Data Storage System Based on a Distributed Hash Tables System","authors":"Telesphore Tiendrebeogo, Mamadou Diarra","doi":"10.5121/ijdms.2020.12501","DOIUrl":"https://doi.org/10.5121/ijdms.2020.12501","url":null,"abstract":"The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124970188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Design, Implementation, and Assessment of Innovative Data Warehousing; Extract, Transformation, and Load(ETL); and Online Analytical Processing(OLAP) on BI 创新数据仓库的设计、实现与评估提取、转换和加载(ETL);BI的在线分析处理(OLAP)
Pub Date : 2020-06-30 DOI: 10.5121/ijdms.2020.12301
R. Venkatakrishnan
The effectiveness of a Business Intelligence System is hugely dependent on these three fundamental components, 1) Data Acquisition (ETL), 2) Data Storage (Data Warehouse), and 3) Data Analytics (OLAP). The predominant challenges with these fundamental components are Data Volume, Data Variety, Data Integration, Complex Analytics, Constant Business changes, Lack of skill sets, Compliance, Security, Data Quality, and Computing requirements. There is no comprehensive documentation that talks about guidelines for ETL, Data Warehouse and OLAP to include the recent trends such as Data Latency (to provide real-time data), BI flexibility (to accommodate changes with the explosion of data) and SelfService BI. This research paper attempts to fill this gap by analyzing existing scholarly articles in the last three to five years to compile guidelines for effective design, implementation, and assessment of DW, ETL, and OLAP in BI.
商业智能系统的有效性很大程度上依赖于这三个基本组件:1)数据采集(ETL), 2)数据存储(数据仓库)和3)数据分析(OLAP)。这些基本组件的主要挑战是数据量、数据种类、数据集成、复杂分析、不断的业务变化、缺乏技能、遵从性、安全性、数据质量和计算需求。目前还没有全面的文档讨论ETL、数据仓库和OLAP的指导方针,包括最近的趋势,如数据延迟(提供实时数据)、BI灵活性(适应数据爆炸的变化)和SelfService BI。本研究论文试图通过分析过去三到五年的现有学术文章来填补这一空白,以编制BI中DW、ETL和OLAP的有效设计、实现和评估指南。
{"title":"Design, Implementation, and Assessment of Innovative Data Warehousing; Extract, Transformation, and Load(ETL); and Online Analytical Processing(OLAP) on BI","authors":"R. Venkatakrishnan","doi":"10.5121/ijdms.2020.12301","DOIUrl":"https://doi.org/10.5121/ijdms.2020.12301","url":null,"abstract":"The effectiveness of a Business Intelligence System is hugely dependent on these three fundamental components, 1) Data Acquisition (ETL), 2) Data Storage (Data Warehouse), and 3) Data Analytics (OLAP). The predominant challenges with these fundamental components are Data Volume, Data Variety, Data Integration, Complex Analytics, Constant Business changes, Lack of skill sets, Compliance, Security, Data Quality, and Computing requirements. There is no comprehensive documentation that talks about guidelines for ETL, Data Warehouse and OLAP to include the recent trends such as Data Latency (to provide real-time data), BI flexibility (to accommodate changes with the explosion of data) and SelfService BI. This research paper attempts to fill this gap by analyzing existing scholarly articles in the last three to five years to compile guidelines for effective design, implementation, and assessment of DW, ETL, and OLAP in BI.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sarcasm Detection Beyond using Lexical Features 超越词汇特征的讽刺检测
Pub Date : 2020-06-30 DOI: 10.5121/ijdms.2020.12304
Adewuyi, Joseph Oluwaseyi, Oladeji, Ifeoluwa David
In current time, social media websites such as facebook, twitter, and so forth have improved and received substantial importance. These websites have grown into huge environments wherein users explicit their thoughts, perspectives and reviews evidently. Organizations leverage this environment to tap into people’s opinion on their services and to make a quick feedback. This research seeks to keep away from using grammatical words as the only features for sarcasm detection however also the contextual features, which are theories explaining when, how and why sarcasm is expressed. A deep neural network architecture model was employed to carry out this task, which is a bidirectional long short-term memory with conditional random fields (Bi-LSTM-CRF), two stages were employed to classify if a reply or comment to a tweet is sarcastic or non-sarcastic. The performance of the models was evaluated using the following metrics: Accuracy, Precision, Recall, F-measure.
目前,诸如facebook、twitter等社交媒体网站已经得到了改进,并得到了相当大的重视。这些网站已经发展成为一个巨大的环境,用户可以明显地表达他们的想法、观点和评论。组织利用这种环境来了解人们对其服务的意见,并做出快速反馈。本研究试图避免使用语法词作为讽刺检测的唯一特征,同时也使用上下文特征,这是解释讽刺何时,如何以及为什么表达的理论。该任务采用深度神经网络结构模型,即双向长短期记忆条件随机场模型(Bi-LSTM-CRF),采用两个阶段对tweet的回复或评论进行讽刺或非讽刺分类。使用以下指标评估模型的性能:准确性,精密度,召回率,F-measure。
{"title":"Sarcasm Detection Beyond using Lexical Features","authors":"Adewuyi, Joseph Oluwaseyi, Oladeji, Ifeoluwa David","doi":"10.5121/ijdms.2020.12304","DOIUrl":"https://doi.org/10.5121/ijdms.2020.12304","url":null,"abstract":"In current time, social media websites such as facebook, twitter, and so forth have improved and received substantial importance. These websites have grown into huge environments wherein users explicit their thoughts, perspectives and reviews evidently. Organizations leverage this environment to tap into people’s opinion on their services and to make a quick feedback. This research seeks to keep away from using grammatical words as the only features for sarcasm detection however also the contextual features, which are theories explaining when, how and why sarcasm is expressed. A deep neural network architecture model was employed to carry out this task, which is a bidirectional long short-term memory with conditional random fields (Bi-LSTM-CRF), two stages were employed to classify if a reply or comment to a tweet is sarcastic or non-sarcastic. The performance of the models was evaluated using the following metrics: Accuracy, Precision, Recall, F-measure.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123944059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
International Journal of Database Management Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1