Pub Date : 2023-06-27DOI: 10.5121/ijdms.2023.15301
Wenting Weng, Wen Luo
Educational research often encounters clustered data sets, where observations are organized into multilevel units, consisting of lower-level units (individuals) nested within higher-level units (clusters). However, many studies in education utilize tree-based methods like Random Forest without considering the hierarchical structure of the data sets. Neglecting the clustered data structure can result in biased or inaccurate results. To address this issue, this study aimed to conduct a comprehensive survey of three tree- based data mining algorithms and hierarchical linear modeling (HLM). The study utilized the Programme for International Student Assessment (PISA) 2018 data to compare different methods, including non-mixed- effects tree models (e.g., Random Forest) and mixed-effects tree models (e.g., random effects expectation minimization recursive partitioning method, mixed-effects Random Forest), as well as the HLM approach. Based on the findings of this study, mixed-effects Random Forest demonstrated the highest prediction accuracy, while the random effects expectation minimization recursive partitioning method had the lowest prediction accuracy. However, it is important to note that tree-based methods limit deep interpretation of the results. Therefore, further analysis is needed to gain a more comprehensive understanding. In comparison, the HLM approach retains its value in terms of interpretability. Overall, this study offers valuable insights for selecting and utilizing suitable methods when analyzing clustered educational datasets.
{"title":"A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using PISA 2018 Data","authors":"Wenting Weng, Wen Luo","doi":"10.5121/ijdms.2023.15301","DOIUrl":"https://doi.org/10.5121/ijdms.2023.15301","url":null,"abstract":"Educational research often encounters clustered data sets, where observations are organized into multilevel units, consisting of lower-level units (individuals) nested within higher-level units (clusters). However, many studies in education utilize tree-based methods like Random Forest without considering the hierarchical structure of the data sets. Neglecting the clustered data structure can result in biased or inaccurate results. To address this issue, this study aimed to conduct a comprehensive survey of three tree- based data mining algorithms and hierarchical linear modeling (HLM). The study utilized the Programme for International Student Assessment (PISA) 2018 data to compare different methods, including non-mixed- effects tree models (e.g., Random Forest) and mixed-effects tree models (e.g., random effects expectation minimization recursive partitioning method, mixed-effects Random Forest), as well as the HLM approach. Based on the findings of this study, mixed-effects Random Forest demonstrated the highest prediction accuracy, while the random effects expectation minimization recursive partitioning method had the lowest prediction accuracy. However, it is important to note that tree-based methods limit deep interpretation of the results. Therefore, further analysis is needed to gain a more comprehensive understanding. In comparison, the HLM approach retains its value in terms of interpretability. Overall, this study offers valuable insights for selecting and utilizing suitable methods when analyzing clustered educational datasets.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115306045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-27DOI: 10.5121/ijdms.2023.15101
Simisani Ndaba
R is widely used by researchers in the statistics field and academia. In Botswana, it is used in a few research for data analysis. The paper aims to synthesis research conducted in Botswana that has used R programming for data analysis and to demonstrate to data scientists, the R community in Botswana and internationally the gaps and applications in practice in research work using R in the context of Botswana. The paper followed the PRISMA methodology and the articles were taken from information technology databases. The findings show that research conducted in Botswana that use R programming were used in Health Care, Climatology, Conservation and Physical Geography, with R part as the most used R package across the research areas. It was also found that a lot of R packages are used in Health care for genomics, plotting, networking and classification was the common model used across research areas.
{"title":"A Review of the Use of R Programming for data Science Research in Botswana","authors":"Simisani Ndaba","doi":"10.5121/ijdms.2023.15101","DOIUrl":"https://doi.org/10.5121/ijdms.2023.15101","url":null,"abstract":"R is widely used by researchers in the statistics field and academia. In Botswana, it is used in a few research for data analysis. The paper aims to synthesis research conducted in Botswana that has used R programming for data analysis and to demonstrate to data scientists, the R community in Botswana and internationally the gaps and applications in practice in research work using R in the context of Botswana. The paper followed the PRISMA methodology and the articles were taken from information technology databases. The findings show that research conducted in Botswana that use R programming were used in Health Care, Climatology, Conservation and Physical Geography, with R part as the most used R package across the research areas. It was also found that a lot of R packages are used in Health care for genomics, plotting, networking and classification was the common model used across research areas.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123995597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-27DOI: 10.5121/ijdms.2023.15102
Abhirup Chakraborty
To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.
{"title":"Scaling Distributed Database Joins by Decoupling Computation and Communication","authors":"Abhirup Chakraborty","doi":"10.5121/ijdms.2023.15102","DOIUrl":"https://doi.org/10.5121/ijdms.2023.15102","url":null,"abstract":"To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129265756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-30DOI: 10.5121/ijdms.2022.14201
Xiangyu Xia, Shaoxiang Zhou
Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEI's MindSpore has better performance than FaceBook's PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.
{"title":"Performance Comparison between Pytorch and Mindspore","authors":"Xiangyu Xia, Shaoxiang Zhou","doi":"10.5121/ijdms.2022.14201","DOIUrl":"https://doi.org/10.5121/ijdms.2022.14201","url":null,"abstract":"Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEI's MindSpore has better performance than FaceBook's PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131697807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-30DOI: 10.5121/IJDMS.2021.13201
Jeyshree Krishnaswamy Sundararajan, Yanyan Li, A. Hadaegh
Polycystic ovarian syndrome(PCOS) is one of the predominant hormonal imbalances present in women of reproductive age. It needs to be diagnosed and treated at an earlier stage as it's inter-related to diabetes, high cholesterol levels, and obesity. This paper presents an application specially designed for women to help them keep track of their Body Mass Index, Blood Sugar, and Blood Pressure based on their age. The people diagnosed with PCOS(an endocrine disorder) can use this application to make their life easy since it helps follow certain exercises, diets, and timely reminders for water and medicines. It has features like the period tracker to track the user’s menstrual cycle, find dieticians nearby, links to various PCOS supplements, users can track their moods during different menstrual phases and control their mood swings. Finally, the application has games to add that interactive touch.
{"title":"Healthbot for Polycystic Ovarian Syndrome","authors":"Jeyshree Krishnaswamy Sundararajan, Yanyan Li, A. Hadaegh","doi":"10.5121/IJDMS.2021.13201","DOIUrl":"https://doi.org/10.5121/IJDMS.2021.13201","url":null,"abstract":"Polycystic ovarian syndrome(PCOS) is one of the predominant hormonal imbalances present in women of reproductive age. It needs to be diagnosed and treated at an earlier stage as it's inter-related to diabetes, high cholesterol levels, and obesity. This paper presents an application specially designed for women to help them keep track of their Body Mass Index, Blood Sugar, and Blood Pressure based on their age. The people diagnosed with PCOS(an endocrine disorder) can use this application to make their life easy since it helps follow certain exercises, diets, and timely reminders for water and medicines. It has features like the period tracker to track the user’s menstrual cycle, find dieticians nearby, links to various PCOS supplements, users can track their moods during different menstrual phases and control their mood swings. Finally, the application has games to add that interactive touch.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124188472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-28DOI: 10.5121/IJDMS.2021.13101
Rami Rashkovits, I. Lavy
Data modeling in the context of database design is a challenging task for any database designer, even more so for novice designers. A proper database schema is a key factor for the success of any information systems, hence conceptual data modeling that yields the database schema is an essential process of the system development. However, novice designers encounter difficulties in understanding and implementing such models. This study aims to identify the difficulties in understanding and implementing data models and explore the origins of these difficulties. This research examines the data model produced by students and maps the errors done by the students. The errors were classified using the SOLO taxonomy. The study also sheds light on the underlying reasons for the errors done during the design of the data model based on interviews conducted with a representative group of the study participants. We also suggest ways to improve novice designer's performances more effectively, so they can draw more accurate models and make use of advanced design constituents such as entity hierarchies, ternary relationships, aggregated entities, and alike. The research findings might enrich the data body research on data model design from the students' perspectives.
{"title":"Mapping Common Errors in Entity Relationship Diagram Design of Novice Designers","authors":"Rami Rashkovits, I. Lavy","doi":"10.5121/IJDMS.2021.13101","DOIUrl":"https://doi.org/10.5121/IJDMS.2021.13101","url":null,"abstract":"Data modeling in the context of database design is a challenging task for any database designer, even more so for novice designers. A proper database schema is a key factor for the success of any information systems, hence conceptual data modeling that yields the database schema is an essential process of the system development. However, novice designers encounter difficulties in understanding and implementing such models. This study aims to identify the difficulties in understanding and implementing data models and explore the origins of these difficulties. This research examines the data model produced by students and maps the errors done by the students. The errors were classified using the SOLO taxonomy. The study also sheds light on the underlying reasons for the errors done during the design of the data model based on interviews conducted with a representative group of the study participants. We also suggest ways to improve novice designer's performances more effectively, so they can draw more accurate models and make use of advanced design constituents such as entity hierarchies, ternary relationships, aggregated entities, and alike. The research findings might enrich the data body research on data model design from the students' perspectives.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126344314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-28DOI: 10.5121/IJDMS.2021.13102
Chenxiao Wang, Z. Arani, L. Gruenwald, Laurent d'Orazio, Eleazar Leal
In cloud environments, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer of a cloud database management system (DBMS) to select an optimal query execution plan (QEP). In order to optimize a query with a more accurate cost estimation, performing query re-optimizations during the query execution has been proposed in the literature. However, some of there-optimizations may not provide any performance gain in terms of query response time or monetary costs, which are the two optimization objectives for cloud databases, and may also have negative impacts on the performance due to their overheads. This raises the question of how to determine when are-optimization is beneficial. In this paper, we present a technique called ReOptML that uses machine learning to enable effective re-optimizations. This technique executes a query in stages, employs a machine learning model to predict whether a query re-optimization is beneficial after a stage is executed, and invokes the query optimizer to perform the re-optimization automatically. The experiments comparing ReOptML with existing query re-optimization algorithms show that ReOptML improves query response time from 13% to 35% for skew data and from 13% to 21% for uniform data, and improves monetary cost paid to cloud service providers from 17% to 35% on skewdata.
{"title":"Re-optimization for Multi-objective Cloud Database Query Processing using Machine Learning","authors":"Chenxiao Wang, Z. Arani, L. Gruenwald, Laurent d'Orazio, Eleazar Leal","doi":"10.5121/IJDMS.2021.13102","DOIUrl":"https://doi.org/10.5121/IJDMS.2021.13102","url":null,"abstract":"In cloud environments, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer of a cloud database management system (DBMS) to select an optimal query execution plan (QEP). In order to optimize a query with a more accurate cost estimation, performing query re-optimizations during the query execution has been proposed in the literature. However, some of there-optimizations may not provide any performance gain in terms of query response time or monetary costs, which are the two optimization objectives for cloud databases, and may also have negative impacts on the performance due to their overheads. This raises the question of how to determine when are-optimization is beneficial. In this paper, we present a technique called ReOptML that uses machine learning to enable effective re-optimizations. This technique executes a query in stages, employs a machine learning model to predict whether a query re-optimization is beneficial after a stage is executed, and invokes the query optimizer to perform the re-optimization automatically. The experiments comparing ReOptML with existing query re-optimization algorithms show that ReOptML improves query response time from 13% to 35% for skew data and from 13% to 21% for uniform data, and improves monetary cost paid to cloud service providers from 17% to 35% on skewdata.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-30DOI: 10.5121/ijdms.2020.12501
Telesphore Tiendrebeogo, Mamadou Diarra
The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.
{"title":"Big Data Storage System Based on a Distributed Hash Tables System","authors":"Telesphore Tiendrebeogo, Mamadou Diarra","doi":"10.5121/ijdms.2020.12501","DOIUrl":"https://doi.org/10.5121/ijdms.2020.12501","url":null,"abstract":"The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124970188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-30DOI: 10.5121/ijdms.2020.12301
R. Venkatakrishnan
The effectiveness of a Business Intelligence System is hugely dependent on these three fundamental components, 1) Data Acquisition (ETL), 2) Data Storage (Data Warehouse), and 3) Data Analytics (OLAP). The predominant challenges with these fundamental components are Data Volume, Data Variety, Data Integration, Complex Analytics, Constant Business changes, Lack of skill sets, Compliance, Security, Data Quality, and Computing requirements. There is no comprehensive documentation that talks about guidelines for ETL, Data Warehouse and OLAP to include the recent trends such as Data Latency (to provide real-time data), BI flexibility (to accommodate changes with the explosion of data) and SelfService BI. This research paper attempts to fill this gap by analyzing existing scholarly articles in the last three to five years to compile guidelines for effective design, implementation, and assessment of DW, ETL, and OLAP in BI.
{"title":"Design, Implementation, and Assessment of Innovative Data Warehousing; Extract, Transformation, and Load(ETL); and Online Analytical Processing(OLAP) on BI","authors":"R. Venkatakrishnan","doi":"10.5121/ijdms.2020.12301","DOIUrl":"https://doi.org/10.5121/ijdms.2020.12301","url":null,"abstract":"The effectiveness of a Business Intelligence System is hugely dependent on these three fundamental components, 1) Data Acquisition (ETL), 2) Data Storage (Data Warehouse), and 3) Data Analytics (OLAP). The predominant challenges with these fundamental components are Data Volume, Data Variety, Data Integration, Complex Analytics, Constant Business changes, Lack of skill sets, Compliance, Security, Data Quality, and Computing requirements. There is no comprehensive documentation that talks about guidelines for ETL, Data Warehouse and OLAP to include the recent trends such as Data Latency (to provide real-time data), BI flexibility (to accommodate changes with the explosion of data) and SelfService BI. This research paper attempts to fill this gap by analyzing existing scholarly articles in the last three to five years to compile guidelines for effective design, implementation, and assessment of DW, ETL, and OLAP in BI.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-30DOI: 10.5121/ijdms.2020.12304
Adewuyi, Joseph Oluwaseyi, Oladeji, Ifeoluwa David
In current time, social media websites such as facebook, twitter, and so forth have improved and received substantial importance. These websites have grown into huge environments wherein users explicit their thoughts, perspectives and reviews evidently. Organizations leverage this environment to tap into people’s opinion on their services and to make a quick feedback. This research seeks to keep away from using grammatical words as the only features for sarcasm detection however also the contextual features, which are theories explaining when, how and why sarcasm is expressed. A deep neural network architecture model was employed to carry out this task, which is a bidirectional long short-term memory with conditional random fields (Bi-LSTM-CRF), two stages were employed to classify if a reply or comment to a tweet is sarcastic or non-sarcastic. The performance of the models was evaluated using the following metrics: Accuracy, Precision, Recall, F-measure.
{"title":"Sarcasm Detection Beyond using Lexical Features","authors":"Adewuyi, Joseph Oluwaseyi, Oladeji, Ifeoluwa David","doi":"10.5121/ijdms.2020.12304","DOIUrl":"https://doi.org/10.5121/ijdms.2020.12304","url":null,"abstract":"In current time, social media websites such as facebook, twitter, and so forth have improved and received substantial importance. These websites have grown into huge environments wherein users explicit their thoughts, perspectives and reviews evidently. Organizations leverage this environment to tap into people’s opinion on their services and to make a quick feedback. This research seeks to keep away from using grammatical words as the only features for sarcasm detection however also the contextual features, which are theories explaining when, how and why sarcasm is expressed. A deep neural network architecture model was employed to carry out this task, which is a bidirectional long short-term memory with conditional random fields (Bi-LSTM-CRF), two stages were employed to classify if a reply or comment to a tweet is sarcastic or non-sarcastic. The performance of the models was evaluated using the following metrics: Accuracy, Precision, Recall, F-measure.","PeriodicalId":247652,"journal":{"name":"International Journal of Database Management Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123944059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}