The Challenges of Big Data in Expanding Geoscience: Embracing New Initiatives to Untangle our World

IF 1.8 4区 地球科学 Q3 GEOSCIENCES, MULTIDISCIPLINARY Geoscience Canada Pub Date : 2019-12-18 DOI:10.12789/geocanj.2019.46.152
Dène Tarkyth
{"title":"The Challenges of Big Data in Expanding Geoscience: Embracing New Initiatives to Untangle our World","authors":"Dène Tarkyth","doi":"10.12789/geocanj.2019.46.152","DOIUrl":null,"url":null,"abstract":"It was my pleasure to serve as the president of this organization through 2018 and part of 2019, and such an experience cannot help but remind me of the effort that comes from GAC staff and our many volunteers, but it also brought home the challenges that all of us face in organizing our time and activities in this so-called Information Age. We live in a world where both space and time are increasingly compressed, and all of us at times struggle to manage the demands of our work and our lives beyond the office walls. So I will start this address by asking you all to imagine that you had one extra day a week given to you some time that you could spend on fun science and investigating exciting questions, or just catching up on work and life. Would we not all welcome such a gift? But then look back over the last few weeks, months or even years and think about how much time you spent searching for information, skimming papers to finding sample locations, compiling and cleaning up data, georeferencing maps....just some of the many basic things that need to get done before you can get to the fun part of your job as a geoscientist. There are estimates that geologists now spend 80% of their time searching for, formatting and organizing information and data, and I do not find these hard to believe. A recent article highlighted the approach taken by Cameco, one of Canada’s leading mining companies, to change how they manage data in order to save 20% of their geologists’ time – one day a week – so that they would not have to spend countless hours looking for data and could do geology instead (Heffernan 2015). There are many efforts to amalgamate and process data in ways that make this process easier and more amenable to automation. A young student geologist at Princeton University, Julia Wilcots, undertook a summer project with a senior researcher at University of Wisconsin to examine the distribution of stromatolites through geological time by searching descriptive literature. Anyone who has worked in the Precambrian, or indeed in sedimentary rocks of any Eon or Era, can well imagine the immensity of that search. However, through the use of computer search techniques and the ‘Geodeepdive’ database, she was quickly able to identify over 10,000 papers that mentioned stromatolites (in the text, but not necessarily in the title) and extract the associated rock unit names from 10% of them. Then, by linking these results to the ‘Macrostat’ database, she was then able to come up with an estimate of the percentage of shallow marine rocks that contain stromatolites within different geological time periods. A more senior researcher at the University involved with the project estimated that doing this same search would have taken him sixteen months of tedium. The overall conclusions of the study – that the distribution of stromatolites is most closely linked to the abundance of dolomitic carbonate rocks (Peters et al. 2017) – are important, but the methodology demonstrates the ability of new techniques to unravel seemingly infinite tangles of data. What other questions could we address and what other problems could we solve as Earth Scientists if we were routinely able to query efficiently organized data with such rapidity? As a science, geology continues to evolve towards a bigger view from rocks alone, to facies, to entire sedimentary systems, to geodynamic environments, and to the Earth System as a whole. We increasingly recognize the interconnected nature of all geoscience data, and the need for a ‘Big Context’ to make sense of ‘Big Data’. This address seeks to emphasize the great potential of the data explosion that confronts us but sometimes confounds us, and also to specifically highlight some of the new and exciting tools and techniques that can help us exploit it. I seek to provide but a glimpse of an ever-expanding branch of our science, which will feature more and more in our professional lives in the 21 century.","PeriodicalId":55106,"journal":{"name":"Geoscience Canada","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2019-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscience Canada","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.12789/geocanj.2019.46.152","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

It was my pleasure to serve as the president of this organization through 2018 and part of 2019, and such an experience cannot help but remind me of the effort that comes from GAC staff and our many volunteers, but it also brought home the challenges that all of us face in organizing our time and activities in this so-called Information Age. We live in a world where both space and time are increasingly compressed, and all of us at times struggle to manage the demands of our work and our lives beyond the office walls. So I will start this address by asking you all to imagine that you had one extra day a week given to you some time that you could spend on fun science and investigating exciting questions, or just catching up on work and life. Would we not all welcome such a gift? But then look back over the last few weeks, months or even years and think about how much time you spent searching for information, skimming papers to finding sample locations, compiling and cleaning up data, georeferencing maps....just some of the many basic things that need to get done before you can get to the fun part of your job as a geoscientist. There are estimates that geologists now spend 80% of their time searching for, formatting and organizing information and data, and I do not find these hard to believe. A recent article highlighted the approach taken by Cameco, one of Canada’s leading mining companies, to change how they manage data in order to save 20% of their geologists’ time – one day a week – so that they would not have to spend countless hours looking for data and could do geology instead (Heffernan 2015). There are many efforts to amalgamate and process data in ways that make this process easier and more amenable to automation. A young student geologist at Princeton University, Julia Wilcots, undertook a summer project with a senior researcher at University of Wisconsin to examine the distribution of stromatolites through geological time by searching descriptive literature. Anyone who has worked in the Precambrian, or indeed in sedimentary rocks of any Eon or Era, can well imagine the immensity of that search. However, through the use of computer search techniques and the ‘Geodeepdive’ database, she was quickly able to identify over 10,000 papers that mentioned stromatolites (in the text, but not necessarily in the title) and extract the associated rock unit names from 10% of them. Then, by linking these results to the ‘Macrostat’ database, she was then able to come up with an estimate of the percentage of shallow marine rocks that contain stromatolites within different geological time periods. A more senior researcher at the University involved with the project estimated that doing this same search would have taken him sixteen months of tedium. The overall conclusions of the study – that the distribution of stromatolites is most closely linked to the abundance of dolomitic carbonate rocks (Peters et al. 2017) – are important, but the methodology demonstrates the ability of new techniques to unravel seemingly infinite tangles of data. What other questions could we address and what other problems could we solve as Earth Scientists if we were routinely able to query efficiently organized data with such rapidity? As a science, geology continues to evolve towards a bigger view from rocks alone, to facies, to entire sedimentary systems, to geodynamic environments, and to the Earth System as a whole. We increasingly recognize the interconnected nature of all geoscience data, and the need for a ‘Big Context’ to make sense of ‘Big Data’. This address seeks to emphasize the great potential of the data explosion that confronts us but sometimes confounds us, and also to specifically highlight some of the new and exciting tools and techniques that can help us exploit it. I seek to provide but a glimpse of an ever-expanding branch of our science, which will feature more and more in our professional lives in the 21 century.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大数据在扩展地球科学方面的挑战:接受新的举措来打开我们的世界
我很高兴能在2018年和2019年的部分时间担任该组织的主席,这样的经历不禁让我想起广汽员工和我们的许多志愿者所做的努力,但它也让我意识到,在这个所谓的信息时代,我们所有人在组织时间和活动时都面临着挑战。我们生活在一个空间和时间都越来越压缩的世界里,我们所有人有时都在努力管理办公室之外的工作和生活需求。因此,我将在演讲开始时请大家想象一下,每周多给你们一天时间,让你们花在有趣的科学和研究令人兴奋的问题上,或者只是了解工作和生活。我们不都欢迎这样的礼物吗?但回顾过去几周、几个月甚至几年,想想你花了多少时间搜索信息、浏览论文、寻找样本位置、汇编和清理数据、地理参考地图。。。。这只是在你成为一名地球科学家之前需要完成的许多基本事情中的一些。据估计,地质学家现在80%的时间都在搜索、格式化和组织信息和数据,我并不觉得这些很难相信。最近的一篇文章强调了加拿大领先的矿业公司之一Cameco采取的方法,即改变他们管理数据的方式,以节省地质学家20%的时间——每周一天——这样他们就不必花无数个小时寻找数据,而是可以做地质学(Heffernan,2015)。有许多努力来合并和处理数据,使这个过程更容易,更易于自动化。普林斯顿大学的一位年轻的地质学家学生Julia Wilcots与威斯康星大学的一名高级研究员一起进行了一个夏季项目,通过搜索描述性文献来研究叠层石在地质时期的分布。任何在前寒武纪工作过的人,或者在任何Eon或Era的沉积岩中工作过的任何人,都可以很好地想象这种探索的巨大性。然而,通过使用计算机搜索技术和“大地测量”数据库,她很快就能够识别出超过10000篇提到叠层石的论文(在文本中,但不一定在标题中),并从其中10%的论文中提取出相关的岩石单元名称。然后,通过将这些结果与“Macrostat”数据库联系起来,她能够估计出不同地质时期内含有叠层石的浅海岩石的百分比。该大学参与该项目的一位更资深的研究人员估计,进行同样的搜索会让他乏味16个月。该研究的总体结论——叠层石的分布与白云质碳酸盐岩的丰度最为密切相关(Peters等人,2017)——很重要,但该方法证明了新技术解开看似无限数据纠缠的能力。作为地球科学家,如果我们能够以如此快的速度定期查询高效组织的数据,我们还能解决哪些其他问题?作为一门科学,地质学继续朝着更大的视角发展,从岩石本身,到岩相,到整个沉积系统,到地球动力学环境,再到整个地球系统。我们越来越认识到所有地球科学数据的相互关联性,以及理解“大数据”的“大背景”的必要性。本演讲旨在强调我们面临但有时会让我们困惑的数据爆炸的巨大潜力,并特别强调一些新的、令人兴奋的工具和技术,这些工具和技术可以帮助我们利用它。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Geoscience Canada
Geoscience Canada 地学-地球科学综合
CiteScore
3.30
自引率
0.00%
发文量
9
审稿时长
>12 weeks
期刊介绍: Established in 1974, Geoscience Canada is the main technical publication of the Geological Association of Canada (GAC). We are a quarterly journal that emphasizes diversity of material, and also the presentation of informative technical articles that can be understood not only by specialist research workers, but by non-specialists in other branches of the Earth Sciences. We aim to be a journal that you want to read, and which will leave you better informed, rather than more confused.
期刊最新文献
GAC-MAC-PEG 2024 Brandon Meeting: Abstracts, Volume 47 Roger Webb Macqueen: 1935–2024 Geoscience Canada: Some Reflections on our Golden Anniversary Paleoproterozoic Rocks of the Belcher Islands, Nunavut: A Review of Their Remarkable Geology and Relevance to Inuit-led Conservation Efforts Introducing Sedimentology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1