iSamples项目中物理样本记录聚合的自动元数据增强

Q3 Social Sciences Proceedings of the Association for Information Science and Technology Pub Date : 2023-10-01 DOI:10.1002/pra2.968

Hyunju Song, Hong Cui, Dave Vieglais, Danny Mandel, Andrea K. Thomer

{"title":"iSamples项目中物理样本记录聚合的自动元数据增强","authors":"Hyunju Song, Hong Cui, Dave Vieglais, Danny Mandel, Andrea K. Thomer","doi":"10.1002/pra2.968","DOIUrl":null,"url":null,"abstract":"ABSTRACT Large amounts of samples have been collected and stored by different institutions and collections across the world. However, even the most carefully curated collections can appear incomplete when aggregated. To solve this problem and support the increasing multidisciplinary science conducted on these samples, we propose a method to support the FAIRness of the aggregation by augmenting the metadata of source records. Using a pipeline that is a combination of rule‐based and machine learning‐based procedures, we predict the missing values of the metadata fields of 4,388,514 samples. We use these inferred fields in our user interface to improve the reusability.","PeriodicalId":37833,"journal":{"name":"Proceedings of the Association for Information Science and Technology","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated Metadata Enhancement for Physical Sample Record Aggregation in the <scp>iSamples</scp> Project\",\"authors\":\"Hyunju Song, Hong Cui, Dave Vieglais, Danny Mandel, Andrea K. Thomer\",\"doi\":\"10.1002/pra2.968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Large amounts of samples have been collected and stored by different institutions and collections across the world. However, even the most carefully curated collections can appear incomplete when aggregated. To solve this problem and support the increasing multidisciplinary science conducted on these samples, we propose a method to support the FAIRness of the aggregation by augmenting the metadata of source records. Using a pipeline that is a combination of rule‐based and machine learning‐based procedures, we predict the missing values of the metadata fields of 4,388,514 samples. We use these inferred fields in our user interface to improve the reusability.\",\"PeriodicalId\":37833,\"journal\":{\"name\":\"Proceedings of the Association for Information Science and Technology\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Association for Information Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/pra2.968\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Association for Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/pra2.968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

摘要

世界各地不同的机构和藏品收集和储存了大量的样本。然而，即使是最精心策划的集合，在汇总时也可能显得不完整。为了解决这一问题并支持对这些样本进行的越来越多的多学科科学研究，我们提出了一种通过增加源记录的元数据来支持聚合公平性的方法。使用基于规则和基于机器学习的程序组合的管道，我们预测了4,388,514个样本的元数据字段的缺失值。我们在用户界面中使用这些推断字段来提高可重用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automated Metadata Enhancement for Physical Sample Record Aggregation in the iSamples Project

ABSTRACT Large amounts of samples have been collected and stored by different institutions and collections across the world. However, even the most carefully curated collections can appear incomplete when aggregated. To solve this problem and support the increasing multidisciplinary science conducted on these samples, we propose a method to support the FAIRness of the aggregation by augmenting the metadata of source records. Using a pipeline that is a combination of rule‐based and machine learning‐based procedures, we predict the missing values of the metadata fields of 4,388,514 samples. We use these inferred fields in our user interface to improve the reusability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊