TidyGEO: preparing analysis-ready datasets from Gene Expression Omnibus.

IF 1.5 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Integrative Bioinformatics Pub Date : 2023-12-05 eCollection Date: 2024-03-01 DOI:10.1515/jib-2023-0021
Avery Mecham, Ashlie Stephenson, Badi I Quinteros, Grace S Brown, Stephen R Piccolo
{"title":"TidyGEO: preparing analysis-ready datasets from Gene Expression Omnibus.","authors":"Avery Mecham, Ashlie Stephenson, Badi I Quinteros, Grace S Brown, Stephen R Piccolo","doi":"10.1515/jib-2023-0021","DOIUrl":null,"url":null,"abstract":"<p><p>TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jib-2023-0021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TidyGEO:从基因表达Omnibus准备分析就绪的数据集。
TidyGEO是一个基于web的工具,用于下载、整理和重新格式化基因表达Omnibus (GEO)的数据系列。GEO是一个可免费访问的数据库,拥有4000多种生物的600多万个生物样本的数据,为二次研究提供了多种机会。虽然科学家可能会发现与给定研究问题相关的分析数据,但大多数分析需要样本级别的注释。在GEO中,这些注释与分析数据一起存储在分隔的基于文本的文件中。然而,注解的结构和语义在不同的系列之间差别很大,许多注解对于分析目的是没有用处的。因此,每一个GEO序列在分析之前都必须进行整理。可以使用手工方法,但这些方法容易出错,并且会占用其他研究任务的时间。可以编写自定义计算机脚本,但许多科学家缺乏创建此类脚本的计算专业知识。为了应对这些挑战,我们创建了TidyGEO,它支持样本级注释的基本数据清理任务,例如选择信息列、重命名列、拆分或合并列、标准化数据值和过滤样本。此外,用户可以将注释与分析数据集成,重构分析数据,并生成代码,使其他人能够重现这些步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Integrative Bioinformatics
Journal of Integrative Bioinformatics Medicine-Medicine (all)
CiteScore
3.10
自引率
5.30%
发文量
27
审稿时长
12 weeks
期刊最新文献
MCMVDRP: a multi-channel multi-view deep learning framework for cancer drug response prediction. Leonhard Med, a trusted research environment for processing sensitive research data. Exploring animal behaviour multilayer networks in immersive environments - a conceptual framework. Inferences on the evolution of the ascorbic acid synthesis pathway in insects using Phylogenetic Tree Collapser (PTC), a tool for the automated collapsing of phylogenetic trees using taxonomic information. Specifications of standards in systems and synthetic biology: status, developments, and tools in 2024.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1