Framework for Efficient Indexing and Searching of Scientific Metadata

Chaitali Gupta, M. Govindaraju
{"title":"Framework for Efficient Indexing and Searching of Scientific Metadata","authors":"Chaitali Gupta, M. Govindaraju","doi":"10.1109/CCGRID.2010.120","DOIUrl":null,"url":null,"abstract":"A seamless and intuitive data reduction capability for the vast amount of scientific metadata generated by experiments is critical to ensure effective use of the data by domain specific scientists. The portal environments and scientific gateways currently used by scientists provide search capability that is limited to the pre-defined pull-down menus and conditions set in the portal interface. Currently, data reduction can only be effectively achieved by scientists who have developed expertise in dealing with complex and disparate query languages. A common theme in our discussions with scientists is that data reduction capability, similar to web search in terms of ease-of-use, scalability, and freshness/accuracy of results, is a critical need that can greatly enhance the productivity and quality of scientific research. Most existing search tools are designed for exact string matching, but such matches are highly unlikely given the nature of metadata produced by instruments and a user’s inability to recall exact numbers to search in very large datasets. This paper presents research to locate metadata of interest within a range of values. To meet this goal, we leverage the use of XML in metadata description for scientific datasets, specifically the NeXus datasets generated by the SNS scientists. We have designed a scalable indexing structure for processing data reduction queries. Web semantics and ontology based methodologies are also employed to provide an elegant, intuitive, and powerful free-form query based data reduction interface to end users.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2010.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

A seamless and intuitive data reduction capability for the vast amount of scientific metadata generated by experiments is critical to ensure effective use of the data by domain specific scientists. The portal environments and scientific gateways currently used by scientists provide search capability that is limited to the pre-defined pull-down menus and conditions set in the portal interface. Currently, data reduction can only be effectively achieved by scientists who have developed expertise in dealing with complex and disparate query languages. A common theme in our discussions with scientists is that data reduction capability, similar to web search in terms of ease-of-use, scalability, and freshness/accuracy of results, is a critical need that can greatly enhance the productivity and quality of scientific research. Most existing search tools are designed for exact string matching, but such matches are highly unlikely given the nature of metadata produced by instruments and a user’s inability to recall exact numbers to search in very large datasets. This paper presents research to locate metadata of interest within a range of values. To meet this goal, we leverage the use of XML in metadata description for scientific datasets, specifically the NeXus datasets generated by the SNS scientists. We have designed a scalable indexing structure for processing data reduction queries. Web semantics and ontology based methodologies are also employed to provide an elegant, intuitive, and powerful free-form query based data reduction interface to end users.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
科学元数据高效索引与检索框架
为实验产生的大量科学元数据提供无缝和直观的数据缩减能力对于确保特定领域科学家有效使用数据至关重要。科学家目前使用的门户环境和科学网关提供的搜索功能仅限于预定义的下拉菜单和门户界面中设置的条件。目前,只有在处理复杂和不同查询语言方面具有专业知识的科学家才能有效地实现数据约简。在我们与科学家的讨论中,一个共同的主题是数据简化能力,类似于在易用性、可扩展性和结果的新鲜度/准确性方面的网络搜索,是一个可以大大提高科学研究的生产力和质量的关键需求。大多数现有的搜索工具都是为精确的字符串匹配而设计的,但是考虑到仪器产生的元数据的性质以及用户无法回忆起在非常大的数据集中搜索的精确数字,这种匹配是极不可能的。本文提出了在一系列值中定位感兴趣的元数据的研究。为了实现这一目标,我们在科学数据集的元数据描述中利用XML,特别是由SNS科学家生成的NeXus数据集。我们设计了一个可伸缩的索引结构来处理数据约简查询。还使用Web语义和基于本体的方法为最终用户提供优雅、直观和强大的基于自由格式查询的数据简化接口。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
In Search of Visualization Metaphors for PlanetLab Multi-criteria Content Adaptation Service Selection Broker Enabling the Next Generation of Scalable Clusters Development and Support of Platforms for Research into Rare Diseases Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1