J. Aquino, J. Allison, R. Rilling, D. Stott, Kathryn Young, Michael Daniels
In an effort to lead our community in following modern data citation practices by formally citing data used in published research and implementing standards to facilitate reproducible research results and data, while also producing meaningful metrics that help assess the impact of our services, the National Center for Atmospheric Research (NCAR) Earth Observing Laboratory (EOL) has implemented the use of Digital Object Identifiers (DOIs) (DataCite 2017) for both physical objects (e.g., research platforms and instruments) and datasets. We discuss why this work is important and timely, and review the development of guidelines for the use of DOIs at EOL by focusing on how decisions were made. We discuss progress in assigning DOIs to physical objects and datasets, summarize plans to cite software, describe a current collaboration to develop community tools to display citations on websites, and touch on future plans to cite workflows that document dataset processing and quality control. Finally, we will review the status of efforts to engage our scientific community in the process of using DOIs in their research publications.
{"title":"Motivation and strategies for implementing Digital Object Identifiers (DOIs) at NCAR’s Earth Observing Laboratory -- Past progress and future collaborations","authors":"J. Aquino, J. Allison, R. Rilling, D. Stott, Kathryn Young, Michael Daniels","doi":"10.5334/DSJ-2017-007","DOIUrl":"https://doi.org/10.5334/DSJ-2017-007","url":null,"abstract":"In an effort to lead our community in following modern data citation practices by formally citing data used in published research and implementing standards to facilitate reproducible research results and data, while also producing meaningful metrics that help assess the impact of our services, the National Center for Atmospheric Research (NCAR) Earth Observing Laboratory (EOL) has implemented the use of Digital Object Identifiers (DOIs) (DataCite 2017) for both physical objects (e.g., research platforms and instruments) and datasets. We discuss why this work is important and timely, and review the development of guidelines for the use of DOIs at EOL by focusing on how decisions were made. We discuss progress in assigning DOIs to physical objects and datasets, summarize plans to cite software, describe a current collaboration to develop community tools to display citations on websites, and touch on future plans to cite workflows that document dataset processing and quality control. Finally, we will review the status of efforts to engage our scientific community in the process of using DOIs in their research publications.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47915850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Xuezhi, Zhao Jianghua, Zhou Yuanchun, Liao Jianhui
The rapid growth in the volume of remote sensing data and its increasing computational requirements bring huge challenges for researchers as traditional systems cannot adequately satisfy the huge demand for service. Cloud computing has the advantage of high scalability and reliability, which can provide firm technical support. This paper proposes a highly scalable geospatial cloud platform named the Geospatial Data Cloud, which is constructed based on cloud computing. The architecture of the platform is first introduced, and then two subsystems, the cloud-based data management platform and the cloud-based data processing platform, are described. ––– This paper was presented at the First Scientific Data Conference on Scientific Research, Big Data, and Data Science, organized by CODATA-China and held in Beijing on 24-25 February, 2014.
{"title":"The Geospatial Data Cloud: An Implementation of Applying Cloud Computing in Geosciences","authors":"Wang Xuezhi, Zhao Jianghua, Zhou Yuanchun, Liao Jianhui","doi":"10.2481/DSJ.14-042","DOIUrl":"https://doi.org/10.2481/DSJ.14-042","url":null,"abstract":"The rapid growth in the volume of remote sensing data and its increasing computational requirements bring huge challenges for researchers as traditional systems cannot adequately satisfy the huge demand for service. Cloud computing has the advantage of high scalability and reliability, which can provide firm technical support. This paper proposes a highly scalable geospatial cloud platform named the Geospatial Data Cloud, which is constructed based on cloud computing. The architecture of the platform is first introduced, and then two subsystems, the cloud-based data management platform and the cloud-based data processing platform, are described. ––– This paper was presented at the First Scientific Data Conference on Scientific Research, Big Data, and Data Science, organized by CODATA-China and held in Beijing on 24-25 February, 2014.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Differences in modelling techniques and model performance assessments typically impinge on the quality of knowledge extraction from data. We propose an algorithm for determining optimal patterns in data by separately training and testing three decision tree models in the Pima Indians Diabetes and the Bupa Liver Disorders datasets. Model performance is assessed using ROC curves and the Youden Index. Moving differences between sequential fitted parameters are then extracted, and their respective probability density estimations are used to track their variability using an iterative graphical data visualisation technique developed for this purpose. Our results show that the proposed strategy separates the groups more robustly than the plain ROC/Youden approach, eliminates obscurity, and minimizes over-fitting. Further, the algorithm can easily be understood by non-specialists and demonstrates multi-disciplinary compliance.
{"title":"A Data-Driven Method for Selecting Optimal Models Based on Graphical Visualisation of Differences in Sequentially Fitted ROC Model Parameters","authors":"Kassim S. Mwitondi, R. Moustafa, A. Hadi","doi":"10.2481/dsj.WDS-045","DOIUrl":"https://doi.org/10.2481/dsj.WDS-045","url":null,"abstract":"Differences in modelling techniques and model performance assessments typically impinge on the quality of knowledge extraction from data. We propose an algorithm for determining optimal patterns in data by separately training and testing three decision tree models in the Pima Indians Diabetes and the Bupa Liver Disorders datasets. Model performance is assessed using ROC curves and the Youden Index. Moving differences between sequential fitted parameters are then extracted, and their respective probability density estimations are used to track their variability using an iterative graphical data visualisation technique developed for this purpose. Our results show that the proposed strategy separates the groups more robustly than the plain ROC/Youden approach, eliminates obscurity, and minimizes over-fitting. Further, the algorithm can easily be understood by non-specialists and demonstrates multi-disciplinary compliance.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tackling the global challenges relating to health, poverty, business, and the environment is heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination, and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised, information society remains digitally divided. On the African continent in particular, this division has resulted in a gap between the knowledge generation and its transformation into tangible products and services. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the people's quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi-disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability. The paper's conclusions include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies among the private sector, academic, and research institutions within and among countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the people's overall quality of life. To avoid running high implementation costs, selected open source tools are recommended for developing and sustaining the system.
{"title":"Harnessing Data Flow and Modelling Potentials for Sustainable Development","authors":"Kassim S. Mwitondi, J. Bugrien","doi":"10.2481/dsj.009-027","DOIUrl":"https://doi.org/10.2481/dsj.009-027","url":null,"abstract":"Tackling the global challenges relating to health, poverty, business, and the environment is heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination, and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised, information society remains digitally divided. On the African continent in particular, this division has resulted in a gap between the knowledge generation and its transformation into tangible products and services. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the people's quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi-disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability. The paper's conclusions include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies among the private sector, academic, and research institutions within and among countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the people's overall quality of life. To avoid running high implementation costs, selected open source tools are recommended for developing and sustaining the system.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69170777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The vision of the Electronic Geophysical Year (eGY) is that we can achieve a major step forward in geoscience capability, knowledge, and usage throughout the world for the benefit of humanity by accelerating the adoption of modern and visionary practices such as virtual observatories for managing and sharing data and information. eGY has found that the biggest challenges to implementing the vision are educating program mangers and senior scientists on the need for modern data management techniques and providing incentives for practitioners of the new field of geoinformatics.
{"title":"Open Access to Digital Information: Opportunities and Challenges Identified During the Electronic Geophysical Year","authors":"W.K. (Bill) Peterson","doi":"10.2481/DSJ.IGY-002","DOIUrl":"https://doi.org/10.2481/DSJ.IGY-002","url":null,"abstract":"The vision of the Electronic Geophysical Year (eGY) is that we can achieve a major step forward in geoscience capability, knowledge, and usage throughout the world for the benefit of humanity by accelerating the adoption of modern and visionary practices such as virtual observatories for managing and sharing data and information. eGY has found that the biggest challenges to implementing the vision are educating program mangers and senior scientists on the need for modern data management techniques and providing incentives for practitioners of the new field of geoinformatics.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wrong:Authors, Authors' affiliations, and Equations (4) and (5) Right:See the corrected PDF.
错误:作者、作者所属机构和方程式(4)和(5)正确:见更正后的PDF。
{"title":"Corrections made by the Editors","authors":"Chen-Yuan Liu, Jhen-Cheng Wang, S. Luo","doi":"10.2481/539","DOIUrl":"https://doi.org/10.2481/539","url":null,"abstract":"Wrong:Authors, Authors' affiliations, and Equations (4) and (5) Right:See the corrected PDF.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69170769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hu Lianglin, Hou Yufang, Liao Jianhui, Yin Ling, Shi Wenwen
Many databases and platforms for human brain data have been established in China over the years, and metadata plays an important role in understanding and using them. The BrainBank Metadata Specification for the Human Brain Project and Neuroinformatics provides a structure for describing the context and content information of BrainBank databases and services. It includes six parts: identification, method, data schema, distribution of the database, metadata extension, and metadata reference The application of the BrainBank Metadata Specification will promote conservation and management of BrainBank databases and platforms. it will also greatly facilitate the retrieval, evaluation, acquisition, and application of the data.
{"title":"BrainBank Metadata Specification for the Human Brain Project and Neuroinformatics","authors":"Hu Lianglin, Hou Yufang, Liao Jianhui, Yin Ling, Shi Wenwen","doi":"10.2481/DSJ.6.S375","DOIUrl":"https://doi.org/10.2481/DSJ.6.S375","url":null,"abstract":"Many databases and platforms for human brain data have been established in China over the years, and metadata plays an important role in understanding and using them. The BrainBank Metadata Specification for the Human Brain Project and Neuroinformatics provides a structure for describing the context and content information of BrainBank databases and services. It includes six parts: identification, method, data schema, distribution of the database, metadata extension, and metadata reference The application of the BrainBank Metadata Specification will promote conservation and management of BrainBank databases and platforms. it will also greatly facilitate the retrieval, evaluation, acquisition, and application of the data.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diane Smith Rumble, J. Rumble, Huadong Guo, Yun Xiao
Wrong:All papers have been reviewed and edited, but the reader should be clear that these contributions are papers contributed as conference proceedigs and not original research contributions. Right:All papers have been reviewed and edited, but the reader should be clear that these contributions are papers contributed as conference proceedings.
{"title":"Correction made by the Authors","authors":"Diane Smith Rumble, J. Rumble, Huadong Guo, Yun Xiao","doi":"10.2481/538","DOIUrl":"https://doi.org/10.2481/538","url":null,"abstract":"Wrong:All papers have been reviewed and edited, but the reader should be clear that these contributions are papers contributed as conference proceedigs and not original research contributions. Right:All papers have been reviewed and edited, but the reader should be clear that these contributions are papers contributed as conference proceedings.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69170760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose an approach for understanding leadership behavior in dot-jp, a non-profit organization, by analyzing heterogeneous multi-data composed of questionnaires and mailing list archives. Attitudes toward leaders were obtained from the questionnaires, and human networks were extracted from the mailing list archives. By integrating the results, we discovered that leaders must receive messages from other people as well as send messages to construct reliable relationships.
{"title":"Multi-data Mining for Understanding Leadership Behavior","authors":"N. Matsumura, Yoshihiro Sasaki","doi":"10.2481/dsj.6.S61","DOIUrl":"https://doi.org/10.2481/dsj.6.S61","url":null,"abstract":"We propose an approach for understanding leadership behavior in dot-jp, a non-profit organization, by analyzing heterogeneous multi-data composed of questionnaires and mailing list archives. Attitudes toward leaders were obtained from the questionnaires, and human networks were extracted from the mailing list archives. By integrating the results, we discovered that leaders must receive messages from other people as well as send messages to construct reliable relationships.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74920114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is important for a collaborative community to decide its next action. The leader of a collaborative community must choose an action that increases rewards and reduces risks. When a leader cannot make this decision, action will be determined through community member discussion. However, this decision cannot be made in blind discussions, so systematic discussion is necessary to choose effective action in a limited time. In this paper, we propose a bulletin board system framework in which effective discussion is established through visualized discussion logs.
{"title":"Discussion Visualization on a Bulletin Board System","authors":"W. Sunayama","doi":"10.2481/dsj.6.S51","DOIUrl":"https://doi.org/10.2481/dsj.6.S51","url":null,"abstract":"It is important for a collaborative community to decide its next action. The leader of a collaborative community must choose an action that increases rewards and reduces risks. When a leader cannot make this decision, action will be determined through community member discussion. However, this decision cannot be made in blind discussions, so systematic discussion is necessary to choose effective action in a limited time. In this paper, we propose a bulletin board system framework in which effective discussion is established through visualized discussion logs.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89742435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}