Pub Date : 2020-12-29DOI: 10.1504/ijbdm.2020.10034546
Michael A. Veronin, Robert P. Schumaker, R. Dixit, Pooja Dhake, Morgan Ogwo
Data 'cleaning', also known as data 'cleansing', or data 'curation' is about identifying and rectifying errors in data. The objective of this report is to present a data cleaning and standardisation process for the drug name files in the U.S. Food and Drug Administration adverse event reporting system database, FAERS. Drug name data was cleaned and standardised using a combination of data cleaning tools and manual correction techniques. Data files were organised into frequency intervals and a strategy of cleaning using iteration and programming scripts in the MySQL Workbench was employed. The download of the FAERS quarterly reports for the time periods ranging from Q1 2004 to Q3 2016 resulted in 32,736,657 DRUG file records. Records contained a variety of errors, such as misspellings, abbreviations and non-descript or ambiguous names. Upon completion of the process, standardisation of greater than 95% of the drug name data in the FAERS database was achieved. With large datasets such as FAERS, a cleaning process is necessary to rectify data that may be incomplete or inaccurate due to input errors, in order to improve the quality and validity of information.
{"title":"A systematic approach to 'cleaning' of drug name records data in the FAERS database: a case report","authors":"Michael A. Veronin, Robert P. Schumaker, R. Dixit, Pooja Dhake, Morgan Ogwo","doi":"10.1504/ijbdm.2020.10034546","DOIUrl":"https://doi.org/10.1504/ijbdm.2020.10034546","url":null,"abstract":"Data 'cleaning', also known as data 'cleansing', or data 'curation' is about identifying and rectifying errors in data. The objective of this report is to present a data cleaning and standardisation process for the drug name files in the U.S. Food and Drug Administration adverse event reporting system database, FAERS. Drug name data was cleaned and standardised using a combination of data cleaning tools and manual correction techniques. Data files were organised into frequency intervals and a strategy of cleaning using iteration and programming scripts in the MySQL Workbench was employed. The download of the FAERS quarterly reports for the time periods ranging from Q1 2004 to Q3 2016 resulted in 32,736,657 DRUG file records. Records contained a variety of errors, such as misspellings, abbreviations and non-descript or ambiguous names. Upon completion of the process, standardisation of greater than 95% of the drug name data in the FAERS database was achieved. With large datasets such as FAERS, a cleaning process is necessary to rectify data that may be incomplete or inaccurate due to input errors, in order to improve the quality and validity of information.","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115738874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-29DOI: 10.1504/ijbdm.2020.10032871
Prabhat Mittal
In recent years, data analytics has enabled the policy makers to improve the accuracy levels of results while framing policies and strategies. This research field still has great potential waiting to be tapped, which would help to mitigate the challenges of public administration system. The present article introduces the concept of big data and provides a comprehensive overview to readers about the 'big data application framework' in public administration via data driven e-governance (DDeG). The conceptual framework here identifies the inherent possibilities of big data from the perspective of individual citizen as well as the administration. The overall finding of the study has broadened the scope of e-governance by exploring the technological aspects like network of internet (IoT), and artificial intelligence (AI). The author has concluded by pointing, the role of big data processes and its corresponding improved characteristics in public administration.
{"title":"Big data and analytics: a data management perspective in public administration","authors":"Prabhat Mittal","doi":"10.1504/ijbdm.2020.10032871","DOIUrl":"https://doi.org/10.1504/ijbdm.2020.10032871","url":null,"abstract":"In recent years, data analytics has enabled the policy makers to improve the accuracy levels of results while framing policies and strategies. This research field still has great potential waiting to be tapped, which would help to mitigate the challenges of public administration system. The present article introduces the concept of big data and provides a comprehensive overview to readers about the 'big data application framework' in public administration via data driven e-governance (DDeG). The conceptual framework here identifies the inherent possibilities of big data from the perspective of individual citizen as well as the administration. The overall finding of the study has broadened the scope of e-governance by exploring the technological aspects like network of internet (IoT), and artificial intelligence (AI). The author has concluded by pointing, the role of big data processes and its corresponding improved characteristics in public administration.","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115869679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-20DOI: 10.1504/ijbdm.2020.10026932
S. R. Nair
In the contemporary digitalised age, big data analytics have enabled organisations to automate and analyse multiple sources of data and information quickly such that it facilitates optimised decision making process that help in achieving organisational goals. While, from a strategic perspective analysing of the data for eventual analysis is vital, given the availability of varieties of data that can be accessed from multiple media sources makes big data management highly challenging. Moreover, given that big data analyses data very fast, it enables access to data-information which could compromise (either inadvertently or deliberately) individual privacy, be misused, etc. raising ethical issues concerning the sharing and usage of data. To address these concerns on ethicality in big data management, this study proposes to use a simple 'stakeholders-ethics-framework' to develop a 'stakeholder analysis approach framework' suggestive be linked to sustainability guidelines that help towards a sustainable big data industry, is assumed.
{"title":"A review on ethical concerns in big data management","authors":"S. R. Nair","doi":"10.1504/ijbdm.2020.10026932","DOIUrl":"https://doi.org/10.1504/ijbdm.2020.10026932","url":null,"abstract":"In the contemporary digitalised age, big data analytics have enabled organisations to automate and analyse multiple sources of data and information quickly such that it facilitates optimised decision making process that help in achieving organisational goals. While, from a strategic perspective analysing of the data for eventual analysis is vital, given the availability of varieties of data that can be accessed from multiple media sources makes big data management highly challenging. Moreover, given that big data analyses data very fast, it enables access to data-information which could compromise (either inadvertently or deliberately) individual privacy, be misused, etc. raising ethical issues concerning the sharing and usage of data. To address these concerns on ethicality in big data management, this study proposes to use a simple 'stakeholders-ethics-framework' to develop a 'stakeholder analysis approach framework' suggestive be linked to sustainability guidelines that help towards a sustainable big data industry, is assumed.","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121752427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2019.10025856
S. Mostafa
The missing data is likely to occur in statistical analyses. The quality of the data is affected by the used imputation method. In this paper, a method is proposed to impute the missing data on variables of interest (i.e., recipient) using observed values from other variables (i.e., donors). Some existing methods rely upon only the recipient (e.g., unconditional means), others rely on the recipient and one donor (i.e., interpolation). The proposed method depends on the similarities of the values in the donor to impute the missing data in the recipient. If the similarities are not sufficient to impute all missing values, another method is combined with the proposed method to impute the residual missing data. The proposed approach is straightforward and can be combined with existing methods. The empirical study validated the superiority of the proposed approach and showed that it can significantly improve the quality of data. In addition, the improvement is more remarkable when the missing values ratio is greater.
{"title":"Missing data imputation by the aid of features similarities","authors":"S. Mostafa","doi":"10.1504/ijbdm.2019.10025856","DOIUrl":"https://doi.org/10.1504/ijbdm.2019.10025856","url":null,"abstract":"The missing data is likely to occur in statistical analyses. The quality of the data is affected by the used imputation method. In this paper, a method is proposed to impute the missing data on variables of interest (i.e., recipient) using observed values from other variables (i.e., donors). Some existing methods rely upon only the recipient (e.g., unconditional means), others rely on the recipient and one donor (i.e., interpolation). The proposed method depends on the similarities of the values in the donor to impute the missing data in the recipient. If the similarities are not sufficient to impute all missing values, another method is combined with the proposed method to impute the residual missing data. The proposed approach is straightforward and can be combined with existing methods. The empirical study validated the superiority of the proposed approach and showed that it can significantly improve the quality of data. In addition, the improvement is more remarkable when the missing values ratio is greater.","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123435492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2021.10043324
M. Zaman, Rajibul Hasan, Eloise Princet
{"title":"How Social Media Data Can Influence Consumers Attitudes towards Cosmetic Brands The Case of Maybelline","authors":"M. Zaman, Rajibul Hasan, Eloise Princet","doi":"10.1504/ijbdm.2021.10043324","DOIUrl":"https://doi.org/10.1504/ijbdm.2021.10043324","url":null,"abstract":"","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117032523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2022.128453
C. Maithri, H. Chandramouli
{"title":"A hybrid neuro-fuzzy technique to overcome clustering approach issues in big data","authors":"C. Maithri, H. Chandramouli","doi":"10.1504/ijbdm.2022.128453","DOIUrl":"https://doi.org/10.1504/ijbdm.2022.128453","url":null,"abstract":"","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122402838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2023.10050228
C. H, M. C.
{"title":"A Hybrid Neuro-Fuzzy Technique to Overcome Clustering Approach Issues in Big Data","authors":"C. H, M. C.","doi":"10.1504/ijbdm.2023.10050228","DOIUrl":"https://doi.org/10.1504/ijbdm.2023.10050228","url":null,"abstract":"","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115222695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2020.10032568
Ta-Tao Chuang, Kazuo Nakatani, V. Patil
{"title":"A concentric framework for leveraging big data for business value","authors":"Ta-Tao Chuang, Kazuo Nakatani, V. Patil","doi":"10.1504/ijbdm.2020.10032568","DOIUrl":"https://doi.org/10.1504/ijbdm.2020.10032568","url":null,"abstract":"","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125483550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2020.112407
Olufikayo Abodunde, O. Jegede, T. Oyebisi
{"title":"A longitudinal assessment of Nigeria's research output for evidence based science policy development","authors":"Olufikayo Abodunde, O. Jegede, T. Oyebisi","doi":"10.1504/ijbdm.2020.112407","DOIUrl":"https://doi.org/10.1504/ijbdm.2020.112407","url":null,"abstract":"","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125844426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/ijbdm.2023.10048598
D. Kyriazis, Argyro Mavrogiorgou, Yannis Poulakis, Panagiotis Karamolegkos, Andreas Karabetian, K. Voulgaris, Athanasios Kiourtis
{"title":"Diastema: Data-driven Stack for Big Data Applications Management and Deployment","authors":"D. Kyriazis, Argyro Mavrogiorgou, Yannis Poulakis, Panagiotis Karamolegkos, Andreas Karabetian, K. Voulgaris, Athanasios Kiourtis","doi":"10.1504/ijbdm.2023.10048598","DOIUrl":"https://doi.org/10.1504/ijbdm.2023.10048598","url":null,"abstract":"","PeriodicalId":158664,"journal":{"name":"International Journal of Big Data Management","volume":"88 25 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126314812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}