Pub Date : 2018-12-01DOI: 10.1109/ICOSST.2018.8632179
Tahira Mahboob, A. Ijaz, Amber Shahzad, Muqadas Kalsoom
Missing values in large datasets have become a difficult task for researchers and industrialists. Specifically in the field of medicine, the datasets contain missing values due to human error or non-availability of data. If these datasets have to utilized for inference purposes or predictive studies, the resutls are not that reliable. Discarding such instances is an option but effects overall accuracy and thus it is viable to perform some replacement or imputation technique. Here, imputaiton technique enable to estimate the missing values in the datasets by applying various algorithms. Therefore, in this paper we present a framework that assists in imouting missing values in a large Chronic Kidney Disease (CKD) datasets. We have used three machine learning algorithms i.e., K-Nearest Neighbors, K-Means and K-Medoids Clustering to impute the missing values. Performance evaluation of the proposed technique has been carried out by application of Decision Tree and Random Forest algorithms. Experimental results demonstrate that KNN algorithm provides the most accurate results compared with K-Means and K-Medoids clustering algorithms. KNN achieves an accuracy of 86.67% for Decision Tree algorithm, and 75.25% for Random Forest algorithm. Additionally it also has a less relative, absolute and root mean square error. Conclusively, KNN imputed datasets are used in our research for future predictions.
{"title":"Handling Missing Values in Chronic Kidney Disease Datasets Using KNN, K-Means and K-Medoids Algorithms","authors":"Tahira Mahboob, A. Ijaz, Amber Shahzad, Muqadas Kalsoom","doi":"10.1109/ICOSST.2018.8632179","DOIUrl":"https://doi.org/10.1109/ICOSST.2018.8632179","url":null,"abstract":"Missing values in large datasets have become a difficult task for researchers and industrialists. Specifically in the field of medicine, the datasets contain missing values due to human error or non-availability of data. If these datasets have to utilized for inference purposes or predictive studies, the resutls are not that reliable. Discarding such instances is an option but effects overall accuracy and thus it is viable to perform some replacement or imputation technique. Here, imputaiton technique enable to estimate the missing values in the datasets by applying various algorithms. Therefore, in this paper we present a framework that assists in imouting missing values in a large Chronic Kidney Disease (CKD) datasets. We have used three machine learning algorithms i.e., K-Nearest Neighbors, K-Means and K-Medoids Clustering to impute the missing values. Performance evaluation of the proposed technique has been carried out by application of Decision Tree and Random Forest algorithms. Experimental results demonstrate that KNN algorithm provides the most accurate results compared with K-Means and K-Medoids clustering algorithms. KNN achieves an accuracy of 86.67% for Decision Tree algorithm, and 75.25% for Random Forest algorithm. Additionally it also has a less relative, absolute and root mean square error. Conclusively, KNN imputed datasets are used in our research for future predictions.","PeriodicalId":261288,"journal":{"name":"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125898841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/ICOSST.2018.8632190
Natalia Chaudhry, M. Yousaf
Blockchain is a distributed ledger that gained a prevalent attention in many areas. Many industries have started to implement blockchain solutions for their application and services. It is important to know the key components, functional characteristics, and architecture of blockchain to understand its impact and applicability to various applications. The most well-known use case of blockchain is bitcoin: a cryptocurrency. Being a distributed ledger, consensus mechanism is needed among peer nodes of a blockchain network to ensure its proper working. Many consensus algorithms have been proposed in literature each having its own performance and security characteristics. One consensus algorithm cannot serve the requirements of every application. It is vital to technically compare the available consensus algorithms to highlight their strengths, weaknesses, and use cases. We have identified and discussed parameters related to performance and security of consensus in blockchain. The consensus algorithms are analyzed and compared with respect to these parameters. Research gap regarding designing an efficient consensus algorithm and evaluating existing algorithms is presented. This paper will act as a guide for developers and researchers to evaluate and design a consensus algorithm.
{"title":"Consensus Algorithms in Blockchain: Comparative Analysis, Challenges and Opportunities","authors":"Natalia Chaudhry, M. Yousaf","doi":"10.1109/ICOSST.2018.8632190","DOIUrl":"https://doi.org/10.1109/ICOSST.2018.8632190","url":null,"abstract":"Blockchain is a distributed ledger that gained a prevalent attention in many areas. Many industries have started to implement blockchain solutions for their application and services. It is important to know the key components, functional characteristics, and architecture of blockchain to understand its impact and applicability to various applications. The most well-known use case of blockchain is bitcoin: a cryptocurrency. Being a distributed ledger, consensus mechanism is needed among peer nodes of a blockchain network to ensure its proper working. Many consensus algorithms have been proposed in literature each having its own performance and security characteristics. One consensus algorithm cannot serve the requirements of every application. It is vital to technically compare the available consensus algorithms to highlight their strengths, weaknesses, and use cases. We have identified and discussed parameters related to performance and security of consensus in blockchain. The consensus algorithms are analyzed and compared with respect to these parameters. Research gap regarding designing an efficient consensus algorithm and evaluating existing algorithms is presented. This paper will act as a guide for developers and researchers to evaluate and design a consensus algorithm.","PeriodicalId":261288,"journal":{"name":"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121226849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/ICOSST.2018.8632205
Asad Ali, C. Gravino
A Systematic Literature Review (SLR) allows us to combine and analyze data from multiple (published and unpublished) studies. Though it provides a complete and comprehensive empirical evidence of an area of interest, the results we usually get from the data synthesis phase of an SLR include huge tables and graphs and thus, for users, it is a tedious and time-consuming job to get the required results. In this work, we propose to semi-automate some steps which can be used to fetch the information from an SLR, beyond the traditional tables, graphs, and plots. The automation is performed using Semantic Web technologies like ontology, Jena API and SPARQL queries. The Semantic Web, also called Web 3.0, provides a common framework and thus allows us to share and re-use the data across the applications and enterprises. It can be used to integrate, extract, and infer the most relevant data required by the users, which are hidden behind the huge information on the Web. We also provide an easy-to-use user interface in order to allow users to perform different searches and find their required SLR results easily and quickly. Finally, we present the results of a preliminary user study performed to analyze the amount of time users need to extract their required information, both via the SLR tables and our proposal. The results revealed that with our system the users get their required information in less time compared to the manual system.
{"title":"An Ontology-Based Approach to Semi-Automate Systematic Literature Reviews","authors":"Asad Ali, C. Gravino","doi":"10.1109/ICOSST.2018.8632205","DOIUrl":"https://doi.org/10.1109/ICOSST.2018.8632205","url":null,"abstract":"A Systematic Literature Review (SLR) allows us to combine and analyze data from multiple (published and unpublished) studies. Though it provides a complete and comprehensive empirical evidence of an area of interest, the results we usually get from the data synthesis phase of an SLR include huge tables and graphs and thus, for users, it is a tedious and time-consuming job to get the required results. In this work, we propose to semi-automate some steps which can be used to fetch the information from an SLR, beyond the traditional tables, graphs, and plots. The automation is performed using Semantic Web technologies like ontology, Jena API and SPARQL queries. The Semantic Web, also called Web 3.0, provides a common framework and thus allows us to share and re-use the data across the applications and enterprises. It can be used to integrate, extract, and infer the most relevant data required by the users, which are hidden behind the huge information on the Web. We also provide an easy-to-use user interface in order to allow users to perform different searches and find their required SLR results easily and quickly. Finally, we present the results of a preliminary user study performed to analyze the amount of time users need to extract their required information, both via the SLR tables and our proposal. The results revealed that with our system the users get their required information in less time compared to the manual system.","PeriodicalId":261288,"journal":{"name":"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132683266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-16DOI: 10.1109/ICOSST.2018.8632183
A. Qayyum, Z. Gilani, S. Latif, Junaid Qadir, J. Singh
Media outlets and political campaigners recognise social media as a means for widely disseminating news and opinions. In particular, Twitter is used by political groups all over the world to spread political messages, engage their supporters, drive election campaigns, and challenge their critics. Further, news agencies, many of which aim to give an impression of balance, are often of a particular political persuasion which is reflected in the content they produce. Driven by the potential for political and media organisations to influence public opinion, our aim is to quantify the nature of political discourse by these organisations through their use of social media. In this study, we analyse the sentiments, toxicity, and bias exhibited by the most prominent Pakistani and Indian political parties and media houses, and the pattern by which these political parties utilise Twitter. We found that media bias and toxicity exist in the political discourse of these two developing nations.
{"title":"Exploring Media Bias and Toxicity in South Asian Political Discourse","authors":"A. Qayyum, Z. Gilani, S. Latif, Junaid Qadir, J. Singh","doi":"10.1109/ICOSST.2018.8632183","DOIUrl":"https://doi.org/10.1109/ICOSST.2018.8632183","url":null,"abstract":"Media outlets and political campaigners recognise social media as a means for widely disseminating news and opinions. In particular, Twitter is used by political groups all over the world to spread political messages, engage their supporters, drive election campaigns, and challenge their critics. Further, news agencies, many of which aim to give an impression of balance, are often of a particular political persuasion which is reflected in the content they produce. Driven by the potential for political and media organisations to influence public opinion, our aim is to quantify the nature of political discourse by these organisations through their use of social media. In this study, we analyse the sentiments, toxicity, and bias exhibited by the most prominent Pakistani and Indian political parties and media houses, and the pattern by which these political parties utilise Twitter. We found that media bias and toxicity exist in the political discourse of these two developing nations.","PeriodicalId":261288,"journal":{"name":"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117265441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}