German-speaking naturalists working in southeastern Australia in the mid-19th century relied heavily on the expertise of First Nations intermediaries who acted as guides, collectors, traders and translators (Clarke 2008, Olsen and Russell 2019). Many of these naturalists went to Australia because of the research opportunities offered by the British Empire at a time when the German nation states did not have colonies of their own. Others sought to escape political upheaval at home. They were welcome employees for colonial government agencies due to their training in the emerging research-oriented natural sciences that the reformed German universities offered at a time when British universities were still providing a broad general education (Home 1995, Kirchberger 2000). Wilhelm von Blandowski (1822–1878 ) and Gerard Krefft (1830–1881 ), who both worked in colonial Victoria and New South Wales, are among this group. Throughout their work, they corresponded extensively with naturalists in Berlin, exchanging specimens and ideas. But the preserved Australian animals, plants and rock samples, as well as the written and drawn records of animals and landscapes now held at the Museum für Naturkunde Berlin (MfN), are much more than objects of scientific interest. They also contain information about Australia's First Nations. The collections provide evidence of their role in collecting as well as their knowledge of the natural world, which has long been overlooked and, at least in part deliberately, made invisible by Western knowledge systems (e.g., Das and Lowe 2018, Ashby 2020). People data have been recognised as crucial for linking such collection objects with expeditions, publications, archival material and correspondence (Groom et al. 2020, Groom et al. 2022). It can thus potentially help reconstruct invisibilized Indigenous histories and knowledge. However, while the MfN keeps information about European collectors and other non-indigenous agents associated with their specimens in internal catalogues, databases and wikis, Indigenous actors remain largely absent from these repositories, which reproduce the colonial archive 'along the archival grain' (Stoler 2009). With this in mind, we discuss in our presentation the complexities of using persistent identifiers and tools, such as Wikidata, to improve the integration and linkage of people data in the work currently being undertaken by the MfN and the Berlin's Australian Archive project to digitise and make accessible the museum’s collections. Drawing upon the guidance provided by the FAIR*1 and CARE*2 principles for data (Wilkinson et al. 2016, Carroll et al. 2020), and learning from the 2012 ATSILIRN Protocols for Libraries, Archives and Information Services*3, the 2019 Tandanya Adelaide Declaration and the 2020 AIATSIS Code of Ethics*4, we address the potential of these efforts in terms of collection accessibility, and also highlight the challenges and limitations of this approach in the context of
19世纪中期,在澳大利亚东南部工作的讲德语的博物学家严重依赖原住民中介的专业知识,他们担任导游、收藏家、贸易商和翻译(Clarke 2008, Olsen and Russell 2019)。这些博物学家中的许多人去澳大利亚是因为大英帝国提供了研究机会,当时德意志民族国家还没有自己的殖民地。其他人则试图逃离国内的政治动荡。他们是受殖民地政府机构欢迎的雇员,因为他们在改革后的德国大学提供的新兴研究型自然科学方面接受了培训,而当时英国大学仍在提供广泛的通识教育(Home 1995, Kirchberger 2000)。威廉·冯·布兰多斯基(Wilhelm von Blandowski, 1822-1878)和杰拉德·克雷夫特(Gerard Krefft, 1830-1881)都曾在维多利亚殖民地和新南威尔士州工作过。在他们的工作中,他们与柏林的博物学家广泛通信,交换标本和想法。但是,保存下来的澳大利亚动物、植物和岩石样本,以及现在保存在柏林自然博物馆(MfN)的动物和景观的书面和绘画记录,远不止是科学兴趣的对象。它们还包含有关澳大利亚第一民族的信息。这些藏品提供了证据,证明了他们在收集方面的作用以及他们对自然世界的了解,这些知识长期以来一直被忽视,至少在一定程度上故意被西方知识体系所忽视(例如,Das和Lowe 2018, Ashby 2020)。人们认为,人类数据对于将这些收集对象与探险、出版物、档案材料和通信联系起来至关重要(Groom et al. 2020, Groom et al. 2022)。因此,它可能有助于重建隐形的土著历史和知识。然而,虽然MfN在内部目录、数据库和维基中保留了有关欧洲收藏家和其他非土著代理人与其标本相关的信息,但土著行动者在这些“沿着档案谷物”复制殖民档案的资料库中基本上仍然缺失(Stoler 2009)。考虑到这一点,我们在演讲中讨论了使用持久标识符和工具(如维基数据)的复杂性,以改善MfN和柏林澳大利亚档案馆项目目前正在进行的工作中的人员数据的整合和联系,从而使博物馆的藏品数字化并易于访问。根据FAIR*1和CARE*2数据原则提供的指导(Wilkinson等人2016年,Carroll等人2020年),并从2012年《图书馆、档案和信息服务ATSILIRN协议》*3、2019年《坦德雅·阿德莱德宣言》和2020年《AIATSIS道德准则》*4中学习,我们解决了这些努力在馆藏可访问性方面的潜力,并强调了这种方法在殖民馆藏背景下的挑战和局限性。
{"title":"Collections from Colonial Australia in Berlin's Museum für Naturkunde and the Challenges of Data Accessibility","authors":"Anja Schwarz, Fiona Möhrle, Sabine von Mering","doi":"10.3897/biss.7.111980","DOIUrl":"https://doi.org/10.3897/biss.7.111980","url":null,"abstract":"German-speaking naturalists working in southeastern Australia in the mid-19th century relied heavily on the expertise of First Nations intermediaries who acted as guides, collectors, traders and translators (Clarke 2008, Olsen and Russell 2019). Many of these naturalists went to Australia because of the research opportunities offered by the British Empire at a time when the German nation states did not have colonies of their own. Others sought to escape political upheaval at home. They were welcome employees for colonial government agencies due to their training in the emerging research-oriented natural sciences that the reformed German universities offered at a time when British universities were still providing a broad general education (Home 1995, Kirchberger 2000).\u0000 Wilhelm von Blandowski (1822–1878 ) and Gerard Krefft (1830–1881 ), who both worked in colonial Victoria and New South Wales, are among this group. Throughout their work, they corresponded extensively with naturalists in Berlin, exchanging specimens and ideas. But the preserved Australian animals, plants and rock samples, as well as the written and drawn records of animals and landscapes now held at the Museum für Naturkunde Berlin (MfN), are much more than objects of scientific interest. They also contain information about Australia's First Nations. The collections provide evidence of their role in collecting as well as their knowledge of the natural world, which has long been overlooked and, at least in part deliberately, made invisible by Western knowledge systems (e.g., Das and Lowe 2018, Ashby 2020).\u0000 People data have been recognised as crucial for linking such collection objects with expeditions, publications, archival material and correspondence (Groom et al. 2020, Groom et al. 2022). It can thus potentially help reconstruct invisibilized Indigenous histories and knowledge. However, while the MfN keeps information about European collectors and other non-indigenous agents associated with their specimens in internal catalogues, databases and wikis, Indigenous actors remain largely absent from these repositories, which reproduce the colonial archive 'along the archival grain' (Stoler 2009).\u0000 With this in mind, we discuss in our presentation the complexities of using persistent identifiers and tools, such as Wikidata, to improve the integration and linkage of people data in the work currently being undertaken by the MfN and the Berlin's Australian Archive project to digitise and make accessible the museum’s collections. Drawing upon the guidance provided by the FAIR*1 and CARE*2 principles for data (Wilkinson et al. 2016, Carroll et al. 2020), and learning from the 2012 ATSILIRN Protocols for Libraries, Archives and Information Services*3, the 2019 Tandanya Adelaide Declaration and the 2020 AIATSIS Code of Ethics*4, we address the potential of these efforts in terms of collection accessibility, and also highlight the challenges and limitations of this approach in the context of","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89306401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Australian National Species List (AuNSL) is a unified, nationally accepted, taxonomy for the native and naturalised biota of Australia. It is derived from a set of taxon-focussed resources including the Australian Plant Name Index and Australian Plant Census, the Australian Faunal Directory, and similar lists of fungi, lichens and bryophytes. These resources share a common infrastructure, contribute to the single national taxonomy (AuNSL), but retain their independent curation practices and online presentation. The AuNSL is now the core national infrastructure providing names and taxonomy for significant biodiversity data infrastructures including the Atlas of Living Australia, the Terrestrial Ecosystem Research Network, the Biodiversity Data Repository, and the Species Profile and Threats Database. As the go-to resource for names and taxonomy for Australia’s unique biodiversity, the AuNSL must be constantly updated to reflect taxonomic and nomenclatural change. For some taxonomic groups, the AuNSL is substantially complete, and the incorporation of new taxa and other novelties occurs with little time lag. For other taxonomic groups the data are patchy and updates sporadic. Like similar projects, the AuNSL would benefit from improvements to taxonomic data publishing and sharing. Such improvements have the potential to enable automated, real-time ingestion for new taxonomic and nomenclatural data, allowing curator time to be re-directed to backfilling the historical data from a dispersed and complex literature. Ideally, the AuNSL will be able to benefit from advances in automated approaches to processing the historical data, including via the sharing of standardised representations of such data. Here we outline the AuNSL data model, editor functionality, and describe our approach to sharing our data via existing and emerging standards such as Darwin Core and Taxon Concept Schema (TCS2). We then describe what we, as consumers of taxonomic data from published works, really need from publishers of new, and reprocessed historical data. In brief, we need structured taxonomic data conforming to an adequate standard.
{"title":"Building the Australian National Species List","authors":"Endymion Cooper, G. Whitbread, Anne Fuchs","doi":"10.3897/biss.7.111986","DOIUrl":"https://doi.org/10.3897/biss.7.111986","url":null,"abstract":"The Australian National Species List (AuNSL) is a unified, nationally accepted, taxonomy for the native and naturalised biota of Australia. It is derived from a set of taxon-focussed resources including the Australian Plant Name Index and Australian Plant Census, the Australian Faunal Directory, and similar lists of fungi, lichens and bryophytes. These resources share a common infrastructure, contribute to the single national taxonomy (AuNSL), but retain their independent curation practices and online presentation. The AuNSL is now the core national infrastructure providing names and taxonomy for significant biodiversity data infrastructures including the Atlas of Living Australia, the Terrestrial Ecosystem Research Network, the Biodiversity Data Repository, and the Species Profile and Threats Database.\u0000 As the go-to resource for names and taxonomy for Australia’s unique biodiversity, the AuNSL must be constantly updated to reflect taxonomic and nomenclatural change. For some taxonomic groups, the AuNSL is substantially complete, and the incorporation of new taxa and other novelties occurs with little time lag. For other taxonomic groups the data are patchy and updates sporadic. Like similar projects, the AuNSL would benefit from improvements to taxonomic data publishing and sharing. Such improvements have the potential to enable automated, real-time ingestion for new taxonomic and nomenclatural data, allowing curator time to be re-directed to backfilling the historical data from a dispersed and complex literature. Ideally, the AuNSL will be able to benefit from advances in automated approaches to processing the historical data, including via the sharing of standardised representations of such data.\u0000 Here we outline the AuNSL data model, editor functionality, and describe our approach to sharing our data via existing and emerging standards such as Darwin Core and Taxon Concept Schema (TCS2). We then describe what we, as consumers of taxonomic data from published works, really need from publishers of new, and reprocessed historical data. In brief, we need structured taxonomic data conforming to an adequate standard.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91274161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
From its birth as the ‘Biosystematic Database of World Diptera’ in 1984, the ‘Systema Dipterorum’ (Evenhuis and Pape 2023) has grown into one of the largest databases currently maintained for the taxonomy and nomenclature of a single order of insects. Systema Dipterorum covers all two-winged insects (Diptera), and with almost a quarter of a million names representing more than 170,000 valid species distributed in some 13,000 valid genera, we cover about 10% of the described and named Animalia. About 1,000 new nominal species are described annually within Diptera. Data are entered in FileMaker Pro (database) and served through an online portal*1 with an updated version currently provided every two months. Names are harvested and reviewed through a four-tier quality assurance hierarchy with entries eventually reaching taxonomic and nomenclatural standards equivalent to being published online. The nomenclatural status of each name is shown using 50 different codes, and at this moment a published authority source is linked to more than 70% of the entries. Universal Unique Identifiers (UUIDs) are automatically generated for every record of names and the more than 35,000 references. Names are made available for the Catalogue of Life, and we envision a web portal for seamless harvesting of new names and literature as well as for updating of both nomenclature and taxonomy by making changes and correcting errors with explicit reference to published authority sources. We envision the future for Systema Dipterorum to be a one-stop website, where clicking on a name resulting from a search may call up links to, e.g., its nomenclatural registry in ZooBank, the original description through the Biodiversity Heritage Library, taxonomic treatments from Plazi, images from Morphbank, occurrence data through the Global Biodiversity Information Facility (GBIF), molecular sequence data from GenBank, Barcode Index Numbers (BINs) from Barcode of Life, and additional data from many other sources.
自1984年作为“世界双翅目生物系统数据库”诞生以来,“Systema Dipterorum”(Evenhuis和Pape 2023)已经发展成为目前最大的单一昆虫目分类和命名数据库之一。双翅目涵盖了所有的双翅昆虫(双翅目),我们拥有近25万个名称,代表了分布在13000个有效属中的170,000多个有效物种,覆盖了大约10%的已描述和已命名的动物。在双翅目中,每年大约有1000种新的名义物种被描述。数据在FileMaker Pro(数据库)中输入,并通过在线门户*1提供,目前每两个月提供一次更新版本。通过四层质量保证体系收集和审查名称,最终条目达到分类和命名标准,相当于在线发布。每个名称的命名状态使用50种不同的代码来显示,此时,一个已发布的权威来源链接到70%以上的条目。通用唯一标识符(uuid)是为每个名称记录和超过35,000个引用自动生成的。“生命目录”提供了名字,我们设想了一个门户网站,可以无缝地收集新的名字和文献,并通过明确引用已出版的权威来源进行更改和纠正错误来更新命名法和分类法。我们设想Systema Dipterorum的未来是一个一站式网站,点击搜索结果中的名称可以调用链接,例如,ZooBank中的命名注册表,生物多样性遗产库中的原始描述,Plazi的分类处理,Morphbank的图像,全球生物多样性信息设施(GBIF)的发生数据,GenBank中的分子序列数据,Barcode Index Numbers (bin)来自Barcode of Life,以及其他来源的额外数据。
{"title":"Systema Dipterorum","authors":"Thomas Pape, Neal Evenhuis","doi":"10.3897/biss.7.111959","DOIUrl":"https://doi.org/10.3897/biss.7.111959","url":null,"abstract":"From its birth as the ‘Biosystematic Database of World Diptera’ in 1984, the ‘Systema Dipterorum’ (Evenhuis and Pape 2023) has grown into one of the largest databases currently maintained for the taxonomy and nomenclature of a single order of insects. Systema Dipterorum covers all two-winged insects (Diptera), and with almost a quarter of a million names representing more than 170,000 valid species distributed in some 13,000 valid genera, we cover about 10% of the described and named Animalia. About 1,000 new nominal species are described annually within Diptera. Data are entered in FileMaker Pro (database) and served through an online portal*1 with an updated version currently provided every two months. Names are harvested and reviewed through a four-tier quality assurance hierarchy with entries eventually reaching taxonomic and nomenclatural standards equivalent to being published online. The nomenclatural status of each name is shown using 50 different codes, and at this moment a published authority source is linked to more than 70% of the entries. Universal Unique Identifiers (UUIDs) are automatically generated for every record of names and the more than 35,000 references. Names are made available for the Catalogue of Life, and we envision a web portal for seamless harvesting of new names and literature as well as for updating of both nomenclature and taxonomy by making changes and correcting errors with explicit reference to published authority sources. We envision the future for Systema Dipterorum to be a one-stop website, where clicking on a name resulting from a search may call up links to, e.g., its nomenclatural registry in ZooBank, the original description through the Biodiversity Heritage Library, taxonomic treatments from Plazi, images from Morphbank, occurrence data through the Global Biodiversity Information Facility (GBIF), molecular sequence data from GenBank, Barcode Index Numbers (BINs) from Barcode of Life, and additional data from many other sources.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81194395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sharing data is crucial in biodiversity research as well as in all scientific domains. Biodiversity Information Standards (TDWG) validates and makes available a set of standards to facilitate the sharing of biodiversity data. Of the 23 standards listed in alphabetical order, each has a status, a category, and a short description. But these standards are designed for very different purposes, which we will discuss by linking them to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) . The FAIR principles (Wilkinson et al. 2016) focus on the ability of machines to automatically find and use the digital data. It is therefore crucial that software for editing, acquiring and using data, shares defined standards that are made available to all. TDWG has been working in this direction for over 30 years. Pioneers in biodiversity informatics, such as Richard Pankhust (Pankhurst 1970), Mike Dallwitz (Dallwitz 1974, Dallwitz 1980) and Jacques Lebbe (Lebbe et al. 1987) worked specifically on taxon identification with computers and how to represent morphological descriptions of taxa and specimens. Some TDWG standards, such as ABCD (Access to Biological Collection Data; Access to Biological Collections Data Task Group 2005), TCS (Taxonomic Concept Transfer Schema; Taxonomic Names Subgroup 2006) or SDD (Structured Descriptive Data; Structure of Descriptive Data (SDD) Subgroup 2006) are expressed by an XML schema covering a formal data model. Other standards, as Floristic Regions of the World (Takhtajan 1986), or Vocabulary Maintenance Standard (VMS; Vocabulary Maintenance Specification Task Group 2017) concern vocabularies or a collection of standardized terms. The Plant Occurrence and Status Scheme (POSS; World Conservation Monitoring Centre 1995) provides both, a list of accepted terms, and a data model (list of fields). In case of morpho-anatomical data describing taxa or specimens, TDWG offers two standards: DELTA (DEscription Language for TAxonomy, Dallwitz 2006) and SDD (Structured Descriptive Data, Hagedorn 2007). In order to further the discussion on morphological description data sharing, we would like to clarify what is meant by the term standard. We'll be looking at the concepts of guidelines, rules, defined format, referential list of terms, data schema, model, metamodel, protocols, which are all terms linked to this notion of standard and FAIR principles. Perhaps this reflection will lead us to propose criteria for better classifying TDWG standards.
共享数据对生物多样性研究以及所有科学领域都至关重要。生物多样性信息标准(TDWG)验证并提供了一套促进生物多样性数据共享的标准。在按字母顺序列出的23个标准中,每个标准都有一个地位、一个类别和一个简短的描述。但是这些标准是为非常不同的目的而设计的,我们将通过将它们与FAIR原则(可查找、可访问、可互操作和可重用)联系起来讨论这些目的。FAIR原则(Wilkinson et al. 2016)侧重于机器自动查找和使用数字数据的能力。因此,至关重要的是,用于编辑、获取和使用数据的软件,共享所有人都可以使用的定义标准。TDWG在这个方向上已经工作了30多年。生物多样性信息学的先驱,如Richard Pankhust (Pankhurst 1970), Mike Dallwitz (Dallwitz 1974, Dallwitz 1980)和Jacques Lebbe (Lebbe et al. 1987),专门研究了用计算机识别分类单元以及如何表示分类单元和标本的形态描述。一些TDWG标准,如ABCD(获取生物收集数据;生物馆藏数据访问任务组(2005),TCS(分类学概念转移图式;分类名称子组2006)或SDD(结构化描述性数据;描述数据结构(SDD) Subgroup 2006)由覆盖正式数据模型的XML模式表示。其他标准,如《世界植物区系》(Takhtajan 1986),或《词汇维护标准》(VMS;词汇维护规范任务组(2017)关注词汇表或标准化术语集合。植物发生和状态计划(POSS);世界保护监测中心(1995年)提供了一份接受的术语清单和一个数据模型(字段清单)。对于描述分类群或标本的形态解剖数据,TDWG提供了两个标准:DELTA(分类描述语言,Dallwitz 2006)和SDD(结构化描述数据,Hagedorn 2007)。为了进一步讨论形态描述数据共享,我们想澄清术语标准的含义。我们将研究指导方针、规则、定义格式、术语参考列表、数据模式、模型、元模型、协议等概念,这些都是与标准和公平原则概念相关的术语。也许这种反思将引导我们提出更好地分类TDWG标准的标准。
{"title":"FAIR Principles and TDWG Standards: The case of morphological description of taxa and specimens","authors":"Régine Vignes Lebbe","doi":"10.3897/biss.7.111859","DOIUrl":"https://doi.org/10.3897/biss.7.111859","url":null,"abstract":"Sharing data is crucial in biodiversity research as well as in all scientific domains. Biodiversity Information Standards (TDWG) validates and makes available a set of standards to facilitate the sharing of biodiversity data. Of the 23 standards listed in alphabetical order, each has a status, a category, and a short description. But these standards are designed for very different purposes, which we will discuss by linking them to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) .\u0000 The FAIR principles (Wilkinson et al. 2016) focus on the ability of machines to automatically find and use the digital data. It is therefore crucial that software for editing, acquiring and using data, shares defined standards that are made available to all. TDWG has been working in this direction for over 30 years. Pioneers in biodiversity informatics, such as Richard Pankhust (Pankhurst 1970), Mike Dallwitz (Dallwitz 1974, Dallwitz 1980) and Jacques Lebbe (Lebbe et al. 1987) worked specifically on taxon identification with computers and how to represent morphological descriptions of taxa and specimens.\u0000 Some TDWG standards, such as ABCD (Access to Biological Collection Data; Access to Biological Collections Data Task Group 2005), TCS (Taxonomic Concept Transfer Schema; Taxonomic Names Subgroup 2006) or SDD (Structured Descriptive Data; Structure of Descriptive Data (SDD) Subgroup 2006) are expressed by an XML schema covering a formal data model. Other standards, as Floristic Regions of the World (Takhtajan 1986), or Vocabulary Maintenance Standard (VMS; Vocabulary Maintenance Specification Task Group 2017) concern vocabularies or a collection of standardized terms. The Plant Occurrence and Status Scheme (POSS; World Conservation Monitoring Centre 1995) provides both, a list of accepted terms, and a data model (list of fields). In case of morpho-anatomical data describing taxa or specimens, TDWG offers two standards: DELTA (DEscription Language for TAxonomy, Dallwitz 2006) and SDD (Structured Descriptive Data, Hagedorn 2007).\u0000 In order to further the discussion on morphological description data sharing, we would like to clarify what is meant by the term standard. We'll be looking at the concepts of guidelines, rules, defined format, referential list of terms, data schema, model, metamodel, protocols, which are all terms linked to this notion of standard and FAIR principles. Perhaps this reflection will lead us to propose criteria for better classifying TDWG standards.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76053342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Showtaro Kakizoe, Aino T. Ota, T. Hosoya, U. Jinbo
The Japan Biodiversity Information Initiative (JBIF) was originally established in 2007 as the Global Biodiversity Information Facility (GBIF) Japan National Node to aggregate biodiversity data in Japan and conduct publications through GBIF. JBIF was later renamed after Japan became a GBIF observer, but activities including data publication through GBIF have continued to the present. JBIF operates with the support of the National BioResource Project (NBRP) by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), with collaboration from three institutions: the National Institute of Genetics (NIG), the National Institute for Environmental Studies (NIES), and the National Museum of Nature and Science (NMNS). The NBRP is a project that focuses on the collection, preservation, provision, and enhancement of bioresources. JBIF collects both observation and specimen data and publishes them through GBIF. For domestic data use, a search system for data published by JBIF is available on the JBIF website. Moreover, NMNS managed a museum network called the Science Museum Net (S-Net), and bilingual (Japanese and English) specimen data collected by S-Net is also available via the S-Net portal site. We are working to promote the biodiversity informatics field in Japan through a translation of the GBIF resources, including the website, important documents such as the GBIF Science Review, as well as organize workshops and conferences, primarily targeting students, researchers, museum curators, and local government officials, to facilitate the sharing of information and exchange of opinions on biodiversity information. To date, Japan has published 564 datasets and over 12 million occurrences to GBIF, making it the third-largest contributor of data to GBIF in Asia, following India and Taiwan. Moreover, regarding specimen-based occurrence data, Japan is the largest contributor in Asia. In this presentation, we will introduce JBIF's initiatives and future activities.
{"title":"Japan Biodiversity Information Initiative (JBIF)'s Efforts to Collect and Publish Biodiversity Information from Japan","authors":"Showtaro Kakizoe, Aino T. Ota, T. Hosoya, U. Jinbo","doi":"10.3897/biss.7.111893","DOIUrl":"https://doi.org/10.3897/biss.7.111893","url":null,"abstract":"The Japan Biodiversity Information Initiative (JBIF) was originally established in 2007 as the Global Biodiversity Information Facility (GBIF) Japan National Node to aggregate biodiversity data in Japan and conduct publications through GBIF. JBIF was later renamed after Japan became a GBIF observer, but activities including data publication through GBIF have continued to the present. JBIF operates with the support of the National BioResource Project (NBRP) by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), with collaboration from three institutions: the National Institute of Genetics (NIG), the National Institute for Environmental Studies (NIES), and the National Museum of Nature and Science (NMNS). The NBRP is a project that focuses on the collection, preservation, provision, and enhancement of bioresources.\u0000 JBIF collects both observation and specimen data and publishes them through GBIF. For domestic data use, a search system for data published by JBIF is available on the JBIF website. Moreover, NMNS managed a museum network called the Science Museum Net (S-Net), and bilingual (Japanese and English) specimen data collected by S-Net is also available via the S-Net portal site. We are working to promote the biodiversity informatics field in Japan through a translation of the GBIF resources, including the website, important documents such as the GBIF Science Review, as well as organize workshops and conferences, primarily targeting students, researchers, museum curators, and local government officials, to facilitate the sharing of information and exchange of opinions on biodiversity information.\u0000 To date, Japan has published 564 datasets and over 12 million occurrences to GBIF, making it the third-largest contributor of data to GBIF in Asia, following India and Taiwan. Moreover, regarding specimen-based occurrence data, Japan is the largest contributor in Asia.\u0000 In this presentation, we will introduce JBIF's initiatives and future activities.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"589 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78940555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific data is diverse and can be complex, potentially including biotic or abiotic measurements, material samples, DNA derived data and more. Especially for researchers who are new to the Darwin Core standard (Darwin Core Task Group 2009), it is not always obvious what the best practice is for creating a Darwin Core Archive (GBIF 2021) for their data. Which core and extensions should they select? Which Darwin Core terms*1 should they include? We present the 'Learnings from Nansen Legacy template generator' (Marsden and Schneider 2023), a spreadsheet template generator to simplify the creation of Darwin Core Archives. It enables users to create a single Microsoft Excel file that includes one sheet per core or extension using a graphical user interface. The user can select from a complete list Darwin Core terms to use as column headers. There are requirements and recommendations for which terms are selected for each core and extension. Descriptions for all terms are displayed when one hovers over a Darwin Core term, and are stored as notes in the template when one select a relevant cell. The generated template includes cell restrictions to prevent one from inputting data in an incorrect format. A separate configuration is also available to aid researchers in creating CF-NetCDF*2 files for physical data, which are also compliant with the FAIR (Findable, Accessible, Interoperable, Resuable)*3 data management principles. The Learnings from Nansen Legacy template generator is published (Marsden and Schneider 2023) and can be installed for use on your website or computer by following the instructions on the software's GitHub repository*4. The template generator can be tested where it is currently hosted, by SIOS*5 (Svalbard Integrated Arctic Earth Observing System). One can also refer to a YouTube tutorial*6 on how the template generator works.
科学数据是多种多样的,可能是复杂的,可能包括生物或非生物测量、材料样品、DNA衍生数据等等。特别是对于刚刚接触达尔文核心标准(达尔文核心任务组2009)的研究人员来说,为他们的数据创建达尔文核心档案(GBIF 2021)的最佳实践并不总是显而易见的。他们应该选择哪个核心和扩展?它们应该包括哪些达尔文核心术语*1 ?我们介绍了“从Nansen Legacy模板生成器中学习”(Marsden and Schneider 2023),这是一个简化达尔文核心档案创建的电子表格模板生成器。它使用户能够使用图形用户界面创建单个Microsoft Excel文件,其中每个核心或扩展包含一个工作表。用户可以从一个完整的Darwin Core术语列表中选择作为列标题。对于每个核心和扩展,都有选择术语的要求和建议。当用户将鼠标悬停在Darwin Core术语上时,将显示所有术语的描述,并在选择相关单元格时作为注释存储在模板中。生成的模板包括单元格限制,以防止以不正确的格式输入数据。一个单独的配置也可用于帮助研究人员为物理数据创建CF-NetCDF*2文件,这些文件也符合FAIR(可查找、可访问、可互操作、可重用)*3数据管理原则。Nansen Legacy模板生成器发布(Marsden and Schneider 2023),可以按照软件GitHub存储库*4上的说明安装在您的网站或计算机上使用。模板生成器可以在目前托管的地方进行测试,由SIOS*5(斯瓦尔巴群岛综合北极地球观测系统)。你也可以参考YouTube教程*6来了解模板生成器是如何工作的。
{"title":"An Excel Template Generator for Darwin Core","authors":"L. Marsden, O. Schneider","doi":"10.3897/biss.7.111907","DOIUrl":"https://doi.org/10.3897/biss.7.111907","url":null,"abstract":"Scientific data is diverse and can be complex, potentially including biotic or abiotic measurements, material samples, DNA derived data and more. Especially for researchers who are new to the Darwin Core standard (Darwin Core Task Group 2009), it is not always obvious what the best practice is for creating a Darwin Core Archive (GBIF 2021) for their data. Which core and extensions should they select? Which Darwin Core terms*1 should they include? We present the 'Learnings from Nansen Legacy template generator' (Marsden and Schneider 2023), a spreadsheet template generator to simplify the creation of Darwin Core Archives. It enables users to create a single Microsoft Excel file that includes one sheet per core or extension using a graphical user interface. The user can select from a complete list Darwin Core terms to use as column headers. There are requirements and recommendations for which terms are selected for each core and extension. Descriptions for all terms are displayed when one hovers over a Darwin Core term, and are stored as notes in the template when one select a relevant cell. The generated template includes cell restrictions to prevent one from inputting data in an incorrect format. A separate configuration is also available to aid researchers in creating CF-NetCDF*2 files for physical data, which are also compliant with the FAIR (Findable, Accessible, Interoperable, Resuable)*3 data management principles.\u0000 The Learnings from Nansen Legacy template generator is published (Marsden and Schneider 2023) and can be installed for use on your website or computer by following the instructions on the software's GitHub repository*4. The template generator can be tested where it is currently hosted, by SIOS*5 (Svalbard Integrated Arctic Earth Observing System). One can also refer to a YouTube tutorial*6 on how the template generator works.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85584577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Tarasov, G. Montanaro, Federica Losacco, D. Porto
Taxonomic descriptions hold immense phenotypic data, but their natural language (NL) format poses challenges for computer analysis. In this talk, we will present Phenoscript, a user-friendly computer language enabling computer-readable species descriptions and automated phenotype comparisons, in accordance with FAIR (Findable, Accessible, Interoperable, Reusable) principles. Phenoscript facilitates the creation of semantic species descriptions that represent a knowledge graph composed of terms from predefined biological ontologies. A Phenoscript description resembles a NL description, but follows a specific language grammar. We have developed the Phenospy package: a Python-based Phenoscript toolkit. Phenospy converts Phenoscript descriptions into both NL format, facilitating scientific publication, and the Web Ontology Language (OWL) format, enabling downstream analysis and computable phenotypic comparisons. OWL is a standard for sharing semantic data on the Web. While initially designed for phenotypes, Phenoscript can be extended to create semantic ecological data, encompassing environmental traits, functional traits, and species interactions. We will discuss the integration of species and ecological traits encoded in Phenoscript into downstream analysis, highlighting its potential for phenomic-level research in biology.
{"title":"Towards FAIR Principles in Biodiversity Research: Enabling computable taxonomic descriptions and ecological data with Phenoscript","authors":"S. Tarasov, G. Montanaro, Federica Losacco, D. Porto","doi":"10.3897/biss.7.111862","DOIUrl":"https://doi.org/10.3897/biss.7.111862","url":null,"abstract":"Taxonomic descriptions hold immense phenotypic data, but their natural language (NL) format poses challenges for computer analysis. In this talk, we will present Phenoscript, a user-friendly computer language enabling computer-readable species descriptions and automated phenotype comparisons, in accordance with FAIR (Findable, Accessible, Interoperable, Reusable) principles.\u0000 Phenoscript facilitates the creation of semantic species descriptions that represent a knowledge graph composed of terms from predefined biological ontologies. A Phenoscript description resembles a NL description, but follows a specific language grammar. We have developed the Phenospy package: a Python-based Phenoscript toolkit. Phenospy converts Phenoscript descriptions into both NL format, facilitating scientific publication, and the Web Ontology Language (OWL) format, enabling downstream analysis and computable phenotypic comparisons. OWL is a standard for sharing semantic data on the Web. While initially designed for phenotypes, Phenoscript can be extended to create semantic ecological data, encompassing environmental traits, functional traits, and species interactions. We will discuss the integration of species and ecological traits encoded in Phenoscript into downstream analysis, highlighting its potential for phenomic-level research in biology.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84910661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Paupério, Vikas Gupta, Josephine Burgin, Suran Jayathilaka, J. Lanfear, K. Abarenkov, U. Kõljalg, L. Penev, G. Cochrane
The advancements in sequencing technologies have promoted the generation of molecular data for cataloguing and describing biodiversity. The analysis of environmental DNA (eDNA) through the application of metabarcoding techniques enables comprehensive descriptions of communities and their function, being fundamental for understanding and preserving biodiversity. Metabarcoding is becoming widely used and standard methods are being generated for a growing range of applications with high scalability. The generated data can be made available in its unprocessed form, as raw data (the sequenced reads) or as interpreted data, including sets of sequences derived after bioinformatics processing (Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)) and occurrence tables (tables that describe the occurrences and abundances of species or OTUs/ASVs). However, for this data to be Findable, Accessible, Interoperable and Reusable (FAIR), and therefore fully available for meaningful interpretation, it needs to be deposited in public repositories together with enriched sample metadata, protocols and analysis workflows (ten Hoopen et al. 2017). Metabarcoding raw data and associated sample metadata is often stored and made available through the International Nucleotide Sequence Database Collaboration (INSDC) archives (Arita et al. 2020), of which the European Nucleotide Archive (ENA, Burgin et al. 2022) is its European database, but it is often deposited with minimal information, which hinders data reusability. Within the scope of the Horizon 2020 project, Biodiversity Community Integrated Knowledge Library (BiCIKL), which is building a community of interconnected data for biodiversity research (Penev et al. 2022), we are working towards improving the standards for molecular ecology data sharing, developing tools to facilitate data deposition and retrieval, and linking between data types. Here we will present the ENA data model, showcasing how metabarcoding data can be shared, while providing enriched metadata, and how this data is linked with existing data in other research infrastructures in the biodiversity domain, such as the Global Biodiversity Information Facility (GBIF), where data is deposited following the guidelines published in Abarenkov et al. (2023). We will also present the results of our recent discussions on standards for this data type and discuss future plans towards continuing to improve data sharing and interoperability for molecular ecology.
测序技术的进步促进了生物多样性分子数据编目和描述的产生。应用元条形码技术对环境DNA (eDNA)进行分析,可以全面描述群落及其功能,是了解和保护生物多样性的基础。元条形码的应用越来越广泛,并且为越来越多具有高可扩展性的应用程序生成了标准方法。生成的数据可以以未经处理的形式提供,作为原始数据(测序读数)或作为解释数据,包括经过生物信息学处理(扩增子序列变体(asv)或操作分类单位(OTUs))和发生表(描述物种或OTUs/ asv的发生和丰度的表)衍生的序列集。然而,为了使这些数据可查找、可访问、可互操作和可重用(FAIR),并因此完全可用于有意义的解释,它需要与丰富的样本元数据、协议和分析工作流一起存储在公共存储库中(ten Hoopen et al. 2017)。元条形码原始数据和相关样本元数据通常通过国际核苷酸序列数据库协作(INSDC)档案(Arita等人,2020)存储和提供,其中欧洲核苷酸档案(ENA, Burgin等人,2022)是其欧洲数据库,但它通常存储的信息很少,这阻碍了数据的可重用性。在“地平线2020”生物多样性社区综合知识图书馆(BiCIKL)项目范围内,我们正在为生物多样性研究建立一个相互关联的数据社区(Penev et al. 2022),我们正在努力改善分子生态学数据共享的标准,开发促进数据沉积和检索的工具,以及数据类型之间的链接。在这里,我们将展示ENA数据模型,展示如何共享元条形码数据,同时提供丰富的元数据,以及如何将这些数据与生物多样性领域其他研究基础设施中的现有数据相关联,例如全球生物多样性信息设施(GBIF),其中数据按照Abarenkov等人(2023)发表的指南进行存储。我们还将介绍我们最近对该数据类型标准的讨论结果,并讨论继续改进分子生态学数据共享和互操作性的未来计划。
{"title":"Improving FAIRness of eDNA and Metabarcoding Data: Standards and tools for European Nucleotide Archive data deposition","authors":"J. Paupério, Vikas Gupta, Josephine Burgin, Suran Jayathilaka, J. Lanfear, K. Abarenkov, U. Kõljalg, L. Penev, G. Cochrane","doi":"10.3897/biss.7.111835","DOIUrl":"https://doi.org/10.3897/biss.7.111835","url":null,"abstract":"The advancements in sequencing technologies have promoted the generation of molecular data for cataloguing and describing biodiversity. The analysis of environmental DNA (eDNA) through the application of metabarcoding techniques enables comprehensive descriptions of communities and their function, being fundamental for understanding and preserving biodiversity. Metabarcoding is becoming widely used and standard methods are being generated for a growing range of applications with high scalability. The generated data can be made available in its unprocessed form, as raw data (the sequenced reads) or as interpreted data, including sets of sequences derived after bioinformatics processing (Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)) and occurrence tables (tables that describe the occurrences and abundances of species or OTUs/ASVs). However, for this data to be Findable, Accessible, Interoperable and Reusable (FAIR), and therefore fully available for meaningful interpretation, it needs to be deposited in public repositories together with enriched sample metadata, protocols and analysis workflows (ten Hoopen et al. 2017). \u0000 Metabarcoding raw data and associated sample metadata is often stored and made available through the International Nucleotide Sequence Database Collaboration (INSDC) archives (Arita et al. 2020), of which the European Nucleotide Archive (ENA, Burgin et al. 2022) is its European database, but it is often deposited with minimal information, which hinders data reusability. \u0000 Within the scope of the Horizon 2020 project, Biodiversity Community Integrated Knowledge Library (BiCIKL), which is building a community of interconnected data for biodiversity research (Penev et al. 2022), we are working towards improving the standards for molecular ecology data sharing, developing tools to facilitate data deposition and retrieval, and linking between data types. \u0000 Here we will present the ENA data model, showcasing how metabarcoding data can be shared, while providing enriched metadata, and how this data is linked with existing data in other research infrastructures in the biodiversity domain, such as the Global Biodiversity Information Facility (GBIF), where data is deposited following the guidelines published in Abarenkov et al. (2023). We will also present the results of our recent discussions on standards for this data type and discuss future plans towards continuing to improve data sharing and interoperability for molecular ecology.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76347338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In 2019, the Atlas of Living Australia (ALA) ran a national consultation, clarifying a long-held suspicion that while simple occurrence records provide invaluable discoverability and analysis for biodiversity data, the lack of contextual information on data collection methodology and protocols limits its usefulness for species abundance estimation and time-series analysis. The consultation recognised that the ALA has strong leadership in biodiversity standards and development, and that our 12-year history and investment in projects and engagement demonstrates a clear capacity to transition to a repository capable of capturing and aggregating the monitoring and survey data required for conservation efforts (Daly 2019). Around the same time, the larger data landscape was undergoing change in a similar direction, both internationally through the Global Biodiversity Information Facility’s (GBIF) Unified Model engagements, and nationally through the development of the Australian Biodiversity Information Standard (ABIS), an ontology for describing environmental data (Anonymous 2021). We embarked on a project to examine existing data standards and practices, extend our own occurrence model, and build software that could ingest event-based datasets and make them discoverable and interoperable. Initially we focused on well-structured surveys, both marine and terrestrial, to develop the system and user interface (UI). During the project, we restructured and modeled other exemplar datasets, collaborating with GBIF to develop event terms, vocabularies, and user interface components. Seeking interoperability with existing standards, we integrated concepts from both ABIS and the Ocean Biodiversity Information System’s (OBIS) ENV-DATA model (De Pooter et al. 2017) into a standardised yet flexible implementation of Event Core, navigable via a friendly user interface. The initial software release is comprised of an ingestion pipeline for events in parallel to occurrences, an index capable of handling nested data structures, and a user interface. The UI guides the user to explore and filter datasets; includes visualisations for data structures, taxonomic scope, repeat location surveys, extended measurements or facts; and links out to child occurrence records. Users can download filtered original and interpreted datasets with Digital Object Identifiers (DOI), in compressed files that comply simultaneously with Darwin Core Archive and Frictionless Data Package specifications. On release, we will present a range of datasets covering different event-based scenarios. The model has serendipitously provided the flexibility to encapsulate complex seed bank data. During the project, we developed a draft extension, which we used to service a new data portal for the Australian Seed Bank Partnership, a testament to the model’s serviceability for novel use cases. The ALA has taken innovative steps beyond simple collection of complex data types and worked with our local
2019年,澳大利亚生活地图集(ALA)进行了一次全国咨询,澄清了长期以来的怀疑,即虽然简单的事件记录为生物多样性数据提供了宝贵的可发现性和分析,但缺乏数据收集方法和协议的背景信息限制了其对物种丰度估计和时间序列分析的有用性。磋商会认识到,ALA在生物多样性标准和发展方面具有强大的领导作用,我们12年的历史以及在项目和参与方面的投资表明,我们有明显的能力向能够捕获和汇总保护工作所需的监测和调查数据的存储库过渡(Daly 2019)。大约在同一时间,更大的数据格局也在朝着类似的方向发生变化,国际上通过全球生物多样性信息设施(GBIF)统一模型的参与,以及国内通过澳大利亚生物多样性信息标准(ABIS)的发展,这是一个描述环境数据的本体(匿名2021)。我们开始了一个项目,以检查现有的数据标准和实践,扩展我们自己的发生模型,并构建能够摄取基于事件的数据集并使其可发现和可互操作的软件。最初,我们专注于结构良好的海洋和陆地调查,以开发系统和用户界面(UI)。在项目期间,我们对其他范例数据集进行了重构和建模,并与GBIF合作开发事件术语、词汇表和用户界面组件。为了寻求与现有标准的互操作性,我们将ABIS和海洋生物多样性信息系统(OBIS) ENV-DATA模型(De Pooter et al. 2017)的概念整合到Event Core的标准化但灵活的实现中,通过友好的用户界面进行导航。最初的软件版本由一个与事件发生并行的事件摄取管道、一个能够处理嵌套数据结构的索引和一个用户界面组成。UI引导用户探索和过滤数据集;包括数据结构、分类范围、重复位置调查、扩展测量或事实的可视化;并链接到儿童事故记录。用户可以下载过滤原始和解释数据集与数字对象标识符(DOI),压缩文件,同时符合达尔文核心档案和无摩擦数据包规范。在发布时,我们将提供一系列数据集,涵盖不同的基于事件的场景。该模型意外地提供了封装复杂种子库数据的灵活性。在项目期间,我们开发了一个扩展草案,用于为澳大利亚种子银行合作伙伴关系提供新的数据门户,这证明了该模型对新用例的可服务性。美国生物多样性协会采取了创新的步骤,不仅仅是简单地收集复杂的数据类型,而是与我们当地的生物多样性信息学社区合作,为这些数据提供可导航的界面。我们打算继续与我们自己的数据提供商和国际社会合作,以实现更复杂的数据模型的好处。
{"title":"Building Software for Hierarchical Events in Biodiversity Informatics","authors":"P. Newman, David Martin, J. Molina","doi":"10.3897/biss.7.111770","DOIUrl":"https://doi.org/10.3897/biss.7.111770","url":null,"abstract":"In 2019, the Atlas of Living Australia (ALA) ran a national consultation, clarifying a long-held suspicion that while simple occurrence records provide invaluable discoverability and analysis for biodiversity data, the lack of contextual information on data collection methodology and protocols limits its usefulness for species abundance estimation and time-series analysis. The consultation recognised that the ALA has strong leadership in biodiversity standards and development, and that our 12-year history and investment in projects and engagement demonstrates a clear capacity to transition to a repository capable of capturing and aggregating the monitoring and survey data required for conservation efforts (Daly 2019). \u0000 Around the same time, the larger data landscape was undergoing change in a similar direction, both internationally through the Global Biodiversity Information Facility’s (GBIF) Unified Model engagements, and nationally through the development of the Australian Biodiversity Information Standard (ABIS), an ontology for describing environmental data (Anonymous 2021). We embarked on a project to examine existing data standards and practices, extend our own occurrence model, and build software that could ingest event-based datasets and make them discoverable and interoperable.\u0000 Initially we focused on well-structured surveys, both marine and terrestrial, to develop the system and user interface (UI). During the project, we restructured and modeled other exemplar datasets, collaborating with GBIF to develop event terms, vocabularies, and user interface components. Seeking interoperability with existing standards, we integrated concepts from both ABIS and the Ocean Biodiversity Information System’s (OBIS) ENV-DATA model (De Pooter et al. 2017) into a standardised yet flexible implementation of Event Core, navigable via a friendly user interface. \u0000 The initial software release is comprised of an ingestion pipeline for events in parallel to occurrences, an index capable of handling nested data structures, and a user interface. The UI guides the user to explore and filter datasets; includes visualisations for data structures, taxonomic scope, repeat location surveys, extended measurements or facts; and links out to child occurrence records. Users can download filtered original and interpreted datasets with Digital Object Identifiers (DOI), in compressed files that comply simultaneously with Darwin Core Archive and Frictionless Data Package specifications.\u0000 On release, we will present a range of datasets covering different event-based scenarios. The model has serendipitously provided the flexibility to encapsulate complex seed bank data. During the project, we developed a draft extension, which we used to service a new data portal for the Australian Seed Bank Partnership, a testament to the model’s serviceability for novel use cases. \u0000 The ALA has taken innovative steps beyond simple collection of complex data types and worked with our local","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78903435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Restricted access species data requires constrained shared access to meet conservation, legal and legislative requirements. Access to this data is essential for better evidenced-based decision-making; reduced regulatory timeframes and improved environmental and research outcomes. The Restricted Access Species Data Service (RASDS)*1 was developed to accept, track and manage data use requests, passing enquiries to data custodians, and providing a conduit for data and legal agreements between custodians and requesters. For those data custodians who wish to delegate queries to the RASDS, the service will ensure that datasets are transformed, attributed, and Digital Object Identifiers and metadata are applied. The data service will also manage reporting on data use. In this presentation we will demonstrate the RASDS, and discuss the future roadmap for the system. This abstract sets the scene for the rest of workshop and provides a starting point for the futher talks and discussions.
{"title":"Restricted Access Species Data Systems: A starting point","authors":"Piers Higgs, Cameron Slatyer","doi":"10.3897/biss.7.111746","DOIUrl":"https://doi.org/10.3897/biss.7.111746","url":null,"abstract":"Restricted access species data requires constrained shared access to meet conservation, legal and legislative requirements. Access to this data is essential for better evidenced-based decision-making; reduced regulatory timeframes and improved environmental and research outcomes. The Restricted Access Species Data Service (RASDS)*1 was developed to accept, track and manage data use requests, passing enquiries to data custodians, and providing a conduit for data and legal agreements between custodians and requesters. For those data custodians who wish to delegate queries to the RASDS, the service will ensure that datasets are transformed, attributed, and Digital Object Identifiers and metadata are applied. The data service will also manage reporting on data use. In this presentation we will demonstrate the RASDS, and discuss the future roadmap for the system. This abstract sets the scene for the rest of workshop and provides a starting point for the futher talks and discussions.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81965050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}