Y. Kartika, Zaenal Akbar, D. R. Saleh, W. Fatriasari
{"title":"生物多样性领域大词汇知识重叠的实证分析","authors":"Y. Kartika, Zaenal Akbar, D. R. Saleh, W. Fatriasari","doi":"10.1145/3575882.3575921","DOIUrl":null,"url":null,"abstract":"In advancing scientific discovery, the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles have been widely adopted for managing digital assets such as data, algorithms, tools, or workflows. In addition, the principles stimulate the adoption of domain-related community standards for data sharing. Unfortunately, the stimulation also gives rise to Big Vocabulary, where multiple standardized vocabularies (i.e., ontologies) have been developed across domains. For example, more than a thousand ontologies are available for biological and biomedical sciences. Moreover, since the ontology developments were performed distributively, there is a high possibility for overlapped knowledge among those ontologies. This work analyzed the overlapped knowledge represented by multiple ontologies related to the biodiversity domain. The analysis was conducted by aligning fields from a biodiversity database to the available terms across multiple ontologies such that the scores for mapped, overlap, and coverage can be computed. Based on the findings, the score of overlapping knowledge is up to 27%, where a single ontology can represent at most 53% of fields. As an implication, when sharing data of a specific case, it is required to integrate multiple ontologies and extend it.","PeriodicalId":367340,"journal":{"name":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Empirical Analysis of Knowledge Overlapping from Big Vocabulary in Biodiversity Domain\",\"authors\":\"Y. Kartika, Zaenal Akbar, D. R. Saleh, W. Fatriasari\",\"doi\":\"10.1145/3575882.3575921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In advancing scientific discovery, the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles have been widely adopted for managing digital assets such as data, algorithms, tools, or workflows. In addition, the principles stimulate the adoption of domain-related community standards for data sharing. Unfortunately, the stimulation also gives rise to Big Vocabulary, where multiple standardized vocabularies (i.e., ontologies) have been developed across domains. For example, more than a thousand ontologies are available for biological and biomedical sciences. Moreover, since the ontology developments were performed distributively, there is a high possibility for overlapped knowledge among those ontologies. This work analyzed the overlapped knowledge represented by multiple ontologies related to the biodiversity domain. The analysis was conducted by aligning fields from a biodiversity database to the available terms across multiple ontologies such that the scores for mapped, overlap, and coverage can be computed. Based on the findings, the score of overlapping knowledge is up to 27%, where a single ontology can represent at most 53% of fields. As an implication, when sharing data of a specific case, it is required to integrate multiple ontologies and extend it.\",\"PeriodicalId\":367340,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575882.3575921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575882.3575921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Empirical Analysis of Knowledge Overlapping from Big Vocabulary in Biodiversity Domain
In advancing scientific discovery, the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles have been widely adopted for managing digital assets such as data, algorithms, tools, or workflows. In addition, the principles stimulate the adoption of domain-related community standards for data sharing. Unfortunately, the stimulation also gives rise to Big Vocabulary, where multiple standardized vocabularies (i.e., ontologies) have been developed across domains. For example, more than a thousand ontologies are available for biological and biomedical sciences. Moreover, since the ontology developments were performed distributively, there is a high possibility for overlapped knowledge among those ontologies. This work analyzed the overlapped knowledge represented by multiple ontologies related to the biodiversity domain. The analysis was conducted by aligning fields from a biodiversity database to the available terms across multiple ontologies such that the scores for mapped, overlap, and coverage can be computed. Based on the findings, the score of overlapping knowledge is up to 27%, where a single ontology can represent at most 53% of fields. As an implication, when sharing data of a specific case, it is required to integrate multiple ontologies and extend it.