首页 > 最新文献

GigaScience最新文献

英文 中文
Genomic insights into endangerment and conservation of the garlic-fruit tree (Malania oleifera), a plant species with extremely small populations. 基因组学对蒜果树(Malania oleifera)--一种种群数量极少的植物物种--的濒危和保护的启示。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae070
Yuanting Shen, Lidan Tao, Rengang Zhang, Gang Yao, Minjie Zhou, Weibang Sun, Yongpeng Ma

Background: Advanced whole-genome sequencing techniques enable covering nearly all genome nucleotide variations and thus can provide deep insights into protecting endangered species. However, the use of genomic data to make conservation strategies is still rare, particularly for endangered plants. Here we performed comprehensive conservation genomic analysis for Malania oleifera, an endangered tree species with a high amount of nervonic acid. We used whole-genome resequencing data of 165 samples, covering 16 populations across the entire distribution range, to investigate the formation reasons of its extremely small population sizes and to evaluate the possible genomic offsets and changes of ecology niche suitability under future climate change.

Results: Although M. oleifera maintains relatively high genetic diversity among endangered woody plants (θπ = 3.87 × 10-3), high levels of inbreeding have been observed, which have reduced genetic diversity in 3 populations (JM, NP, and BM2) and caused the accumulation of deleterious mutations. Repeated bottleneck events, recent inbreeding (∼490 years ago), and anthropogenic disturbance to wild habitats have aggravated the fragmentation of M. oleifera and made it endangered. Due to the significant effect of higher average annual temperature, populations distributed in low altitude exhibit a greater genomic offset. Furthermore, ecological niche modeling shows the suitable habitats for M. oleifera will decrease by 71.15% and 98.79% in 2100 under scenarios SSP126 and SSP585, respectively.

Conclusions: The basic realizations concerning the threats to M. oleifera provide scientific foundation for defining management and adaptive units, as well as prioritizing populations for genetic rescue. Meanwhile, we highlight the importance of integrating genomic offset and ecological niche modeling to make targeted conservation actions under future climate change. Overall, our study provides a paradigm for genomics-directed conservation.

背景:先进的全基因组测序技术能够覆盖几乎所有的基因组核苷酸变异,因此能够为保护濒危物种提供深入的见解。然而,利用基因组数据制定保护策略的情况仍然很少见,尤其是对濒危植物而言。在这里,我们对含有大量神经酸的濒危树种油橄榄(Malania oleifera)进行了全面的保护基因组分析。我们使用了 165 个样本的全基因组重测序数据,涵盖了整个分布区的 16 个种群,以研究其种群规模极小的形成原因,并评估在未来气候变化下可能的基因组偏移和生态位适宜性的变化:结果:虽然油橄榄在濒危木本植物中保持着相对较高的遗传多样性(θπ = 3.87 × 10-3),但近亲繁殖水平很高,降低了 3 个种群(JM、NP 和 BM2)的遗传多样性,并导致有害突变的积累。反复的瓶颈事件、最近的近亲繁殖(距今 490 年)以及对野生栖息地的人为干扰加剧了油橄榄的破碎化,使其濒临灭绝。由于年平均气温较高的显著影响,分布在低海拔地区的种群表现出更大的基因组偏移。此外,生态位模型显示,在 SSP126 和 SSP585 两种情景下,油橄榄的适宜栖息地在 2100 年将分别减少 71.15% 和 98.79%:对油橄榄所面临威胁的基本认识为确定管理和适应单元以及优先遗传拯救种群提供了科学依据。同时,我们还强调了基因组补偿与生态位建模相结合的重要性,以便在未来气候变化的情况下采取有针对性的保护行动。总之,我们的研究为基因组学指导的保护提供了一个范例。
{"title":"Genomic insights into endangerment and conservation of the garlic-fruit tree (Malania oleifera), a plant species with extremely small populations.","authors":"Yuanting Shen, Lidan Tao, Rengang Zhang, Gang Yao, Minjie Zhou, Weibang Sun, Yongpeng Ma","doi":"10.1093/gigascience/giae070","DOIUrl":"10.1093/gigascience/giae070","url":null,"abstract":"<p><strong>Background: </strong>Advanced whole-genome sequencing techniques enable covering nearly all genome nucleotide variations and thus can provide deep insights into protecting endangered species. However, the use of genomic data to make conservation strategies is still rare, particularly for endangered plants. Here we performed comprehensive conservation genomic analysis for Malania oleifera, an endangered tree species with a high amount of nervonic acid. We used whole-genome resequencing data of 165 samples, covering 16 populations across the entire distribution range, to investigate the formation reasons of its extremely small population sizes and to evaluate the possible genomic offsets and changes of ecology niche suitability under future climate change.</p><p><strong>Results: </strong>Although M. oleifera maintains relatively high genetic diversity among endangered woody plants (θπ = 3.87 × 10-3), high levels of inbreeding have been observed, which have reduced genetic diversity in 3 populations (JM, NP, and BM2) and caused the accumulation of deleterious mutations. Repeated bottleneck events, recent inbreeding (∼490 years ago), and anthropogenic disturbance to wild habitats have aggravated the fragmentation of M. oleifera and made it endangered. Due to the significant effect of higher average annual temperature, populations distributed in low altitude exhibit a greater genomic offset. Furthermore, ecological niche modeling shows the suitable habitats for M. oleifera will decrease by 71.15% and 98.79% in 2100 under scenarios SSP126 and SSP585, respectively.</p><p><strong>Conclusions: </strong>The basic realizations concerning the threats to M. oleifera provide scientific foundation for defining management and adaptive units, as well as prioritizing populations for genetic rescue. Meanwhile, we highlight the importance of integrating genomic offset and ecological niche modeling to make targeted conservation actions under future climate change. Overall, our study provides a paradigm for genomics-directed conservation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11417964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142283910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whole-genome sequencing of the invasive golden apple snail Pomacea canaliculata from Asia reveals rapid expansion and adaptive evolution. 亚洲入侵金苹果蜗牛 Pomacea canaliculata 的全基因组测序揭示了其快速扩张和适应性进化。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae064
Yan Lu, Fang Luo, An Zhou, Cun Yi, Hao Chen, Jian Li, Yunhai Guo, Yuxiang Xie, Wei Zhang, Datao Lin, Yaming Yang, Zhongdao Wu, Yi Zhang, Shuhua Xu, Wei Hu

Pomacea canaliculata, an invasive species native to South America, is recognized for its broad geographic distribution and adaptability to a variety of ecological conditions. The details concerning the evolution and adaptation of P. canaliculate remain unclear due to a lack of whole-genome resequencing data. We examined 173 P. canaliculata genomes representing 17 geographic populations in East and Southeast Asia. Interestingly, P. canaliculata showed a higher level of genetic diversity than other mollusks, and our analysis suggested that the dispersal of P. canaliculata could have been driven by climate changes and human activities. Notably, we identified a set of genes associated with low temperature adaptation, including Csde1, a cold shock protein coding gene. Further RNA sequencing analysis and reverse transcription quantitative polymerase chain reaction experiments demonstrated the gene's dynamic pattern and biological functions during cold exposure. Moreover, both positive selection and balancing selection are likely to have contributed to the rapid environmental adaptation of P. canaliculata populations. In particular, genes associated with energy metabolism and stress response were undergoing positive selection, while a large number of immune-related genes showed strong signatures of balancing selection. Our study has advanced our understanding of the evolution of P. canaliculata and has provided a valuable resource concerning an invasive species.

Pomacea canaliculata 是一种原产于南美洲的入侵物种,因其广泛的地理分布和对各种生态条件的适应性而被公认。由于缺乏全基因组重测序数据,有关P. canaliculata进化和适应的细节仍不清楚。我们研究了代表东亚和东南亚 17 个地理种群的 173 个 P. canaliculata 基因组。有趣的是,与其他软体动物相比,P. canaliculata 表现出更高水平的遗传多样性,我们的分析表明 P. canaliculata 的扩散可能是由气候变化和人类活动驱动的。值得注意的是,我们发现了一组与低温适应相关的基因,包括冷休克蛋白编码基因 Csde1。进一步的 RNA 测序分析和反转录定量聚合酶链反应实验证明了该基因在低温暴露过程中的动态模式和生物学功能。此外,正向选择和平衡选择都可能促成了 P. canaliculata 种群对环境的快速适应。特别是,与能量代谢和应激反应相关的基因正在经历正选择,而大量与免疫相关的基因则表现出强烈的平衡选择特征。我们的研究增进了我们对 P. canaliculata 进化的了解,并为研究这一入侵物种提供了宝贵的资料。
{"title":"Whole-genome sequencing of the invasive golden apple snail Pomacea canaliculata from Asia reveals rapid expansion and adaptive evolution.","authors":"Yan Lu, Fang Luo, An Zhou, Cun Yi, Hao Chen, Jian Li, Yunhai Guo, Yuxiang Xie, Wei Zhang, Datao Lin, Yaming Yang, Zhongdao Wu, Yi Zhang, Shuhua Xu, Wei Hu","doi":"10.1093/gigascience/giae064","DOIUrl":"10.1093/gigascience/giae064","url":null,"abstract":"<p><p>Pomacea canaliculata, an invasive species native to South America, is recognized for its broad geographic distribution and adaptability to a variety of ecological conditions. The details concerning the evolution and adaptation of P. canaliculate remain unclear due to a lack of whole-genome resequencing data. We examined 173 P. canaliculata genomes representing 17 geographic populations in East and Southeast Asia. Interestingly, P. canaliculata showed a higher level of genetic diversity than other mollusks, and our analysis suggested that the dispersal of P. canaliculata could have been driven by climate changes and human activities. Notably, we identified a set of genes associated with low temperature adaptation, including Csde1, a cold shock protein coding gene. Further RNA sequencing analysis and reverse transcription quantitative polymerase chain reaction experiments demonstrated the gene's dynamic pattern and biological functions during cold exposure. Moreover, both positive selection and balancing selection are likely to have contributed to the rapid environmental adaptation of P. canaliculata populations. In particular, genes associated with energy metabolism and stress response were undergoing positive selection, while a large number of immune-related genes showed strong signatures of balancing selection. Our study has advanced our understanding of the evolution of P. canaliculata and has provided a valuable resource concerning an invasive species.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11417965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142283912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing population diversity: in search of tools of the trade. 利用人口多样性:寻找贸易工具。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae068
Danilo Bzdok, Guy Wolf, Jakub Kopal

Big neuroscience datasets are not big small datasets when it comes to quantitative data analysis. Neuroscience has now witnessed the advent of many population cohort studies that deep-profile participants, yielding hundreds of measures, capturing dimensions of each individual's position in the broader society. Indeed, there is a rebalancing from small, strictly selected, and thus homogenized cohorts toward always larger, more representative, and thus diverse cohorts. This shift in cohort composition is prompting the revision of incumbent modeling practices. Major sources of population stratification increasingly overshadow the subtle effects that neuroscientists are typically studying. In our opinion, as we sample individuals from always wider diversity backgrounds, we will require a new stack of quantitative tools to realize diversity-aware modeling. We here take inventory of candidate analytical frameworks. Better incorporating driving factors behind population structure will allow refining our understanding of how brain-behavior relationships depend on human subgroups.

在定量数据分析方面,大型神经科学数据集并非大型小型数据集。现在,神经科学领域出现了许多人群队列研究,这些研究对参与者进行了深度剖析,得出了数百个测量值,捕捉到了每个人在更广泛社会中的地位。事实上,目前的研究正在从规模较小、经过严格筛选、因而同质化的队列向规模更大、更具代表性、因而多样化的队列转变。队列构成的这一变化正在促使对现有的建模方法进行修正。人口分层的主要来源日益掩盖了神经科学家通常研究的微妙影响。我们认为,随着我们对来自更广泛多样性背景的个体进行采样,我们将需要一系列新的定量工具来实现多样性感知建模。我们在此盘点了候选分析框架。更好地纳入人口结构背后的驱动因素将使我们能够更好地理解大脑与行为之间的关系如何取决于人类亚群体。
{"title":"Harnessing population diversity: in search of tools of the trade.","authors":"Danilo Bzdok, Guy Wolf, Jakub Kopal","doi":"10.1093/gigascience/giae068","DOIUrl":"https://doi.org/10.1093/gigascience/giae068","url":null,"abstract":"<p><p>Big neuroscience datasets are not big small datasets when it comes to quantitative data analysis. Neuroscience has now witnessed the advent of many population cohort studies that deep-profile participants, yielding hundreds of measures, capturing dimensions of each individual's position in the broader society. Indeed, there is a rebalancing from small, strictly selected, and thus homogenized cohorts toward always larger, more representative, and thus diverse cohorts. This shift in cohort composition is prompting the revision of incumbent modeling practices. Major sources of population stratification increasingly overshadow the subtle effects that neuroscientists are typically studying. In our opinion, as we sample individuals from always wider diversity backgrounds, we will require a new stack of quantitative tools to realize diversity-aware modeling. We here take inventory of candidate analytical frameworks. Better incorporating driving factors behind population structure will allow refining our understanding of how brain-behavior relationships depend on human subgroups.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142344886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning. 通过可解释分层几何深度学习预测蛋白质-蛋白质和蛋白质-核酸结合位点。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae080
Shizhuo Zhang, Jiyun Han, Juntao Liu

Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.

蛋白质-蛋白质和蛋白质-核酸结合位点的鉴定有助于深入了解与蛋白质功能相关的生物过程,并为疾病诊断和药物设计提供技术指导。然而,由于对残基结合模式的了解有限,通过计算方法进行准确预测仍然具有很大的挑战性。一个残基的结合模式应该由其相邻残基的空间分布结合其物理化学信息相互作用来表征,而以往的方法无法实现这一点。在此,我们设计了一种分层几何深度学习模型 GraphRBF,用于从大数据中学习残基结合模式。为了实现这一目标,GraphRBF 通过设计一个增强的图神经网络来描述理化信息的相互作用,并通过引入一个优先径向基函数神经网络来表征残基的空间分布。经过训练和测试,GraphRBF 与现有的先进方法相比有了很大的改进,其学习到的表征具有很强的可解释性。将 GraphRBF 应用于 SARS-CoV-2 omicron 穗蛋白,它成功地识别了该蛋白的已知表位。此外,它还预测了新纳米抗体甚至新药的多个潜在结合区域,证据确凿。GraphRBF 的用户友好型在线服务器可在 http://liulab.top/GraphRBF/server 免费获取。
{"title":"Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning.","authors":"Shizhuo Zhang, Jiyun Han, Juntao Liu","doi":"10.1093/gigascience/giae080","DOIUrl":"10.1093/gigascience/giae080","url":null,"abstract":"<p><p>Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142557605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational reproducibility of Jupyter notebooks from biomedical publications. 生物医学出版物中 Jupyter 笔记本的计算再现性。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad113
Sheeba Samuel, Daniel Mietchen

Background: Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications.

Approach: We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article's full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion.

Results: Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions.

Conclusions: We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

背景Jupyter 笔记本便于在一个交互式环境中捆绑可执行代码及其文档和输出,是记录和共享计算工作流程(包括研究出版物)的流行机制。研究计算方面的可重复性是科学可重复性的关键组成部分,但与生物医学出版物相关的 Jupyter 笔记本尚未进行大规模评估:我们在两个层面上解决了计算可重复性问题:(i) 利用全自动工作流程,我们分析了与生物医学文献库 PubMed Central 中索引的出版物相关的 Jupyter 笔记本的计算可重复性。我们通过挖掘文章全文、尝试在 GitHub 上找到这些笔记本并尝试在尽可能接近原始环境的情况下重新运行这些笔记本来确定这些笔记本。我们记录了再现的成功和例外情况,并探讨了笔记本再现性与笔记本或出版物相关变量之间的关系。(ii) 本研究本身就是一次可重复性尝试,在两年时间里,我们在 PubMed Central 上两次使用了基本相同的方法,在此期间,PubMed Central 所收录的文章中的 Jupyter 笔记本语料库以高度动态的方式增长:在与 3,467 篇论文相关的 2,660 个 GitHub 软件库中的 27,271 本 Jupyter 笔记本中,有 22,578 本笔记本是用 Python 编写的,其中 15,817 本笔记本在标准需求文件中声明了其依赖关系,我们尝试自动重新运行这些笔记本。其中 10,388 本笔记本的所有声明依赖项都能成功安装,我们对它们进行了重新运行,以评估其可重复性。在这些笔记本中,有 1203 个笔记本在运行时没有出现任何错误,其中有 879 个笔记本的运行结果与原始笔记本中报告的结果完全相同,有 324 个笔记本的运行结果与原始笔记本中报告的结果不同。运行其他笔记本时出现了异常:我们放大了常见问题和做法,强调了趋势,并讨论了与生物医学出版物相关的 Jupyter 相关工作流程的潜在改进措施。
{"title":"Computational reproducibility of Jupyter notebooks from biomedical publications.","authors":"Sheeba Samuel, Daniel Mietchen","doi":"10.1093/gigascience/giad113","DOIUrl":"10.1093/gigascience/giad113","url":null,"abstract":"<p><strong>Background: </strong>Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications.</p><p><strong>Approach: </strong>We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article's full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion.</p><p><strong>Results: </strong>Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions.</p><p><strong>Conclusions: </strong>We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10783158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis. Machine Learning Made Easy (MLme):机器学习驱动数据分析的综合工具包。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad111
Akshay Akshay, Mitali Katoch, Navid Shekarchizadeh, Masoud Abedi, Ankush Sharma, Fiona C Burkhard, Rosalyn M Adam, Katia Monastyrskaya, Ali Hashemi Gheinani

Background: Machine learning (ML) has emerged as a vital asset for researchers to analyze and extract valuable information from complex datasets. However, developing an effective and robust ML pipeline can present a real challenge, demanding considerable time and effort, thereby impeding research progress. Existing tools in this landscape require a profound understanding of ML principles and programming skills. Furthermore, users are required to engage in the comprehensive configuration of their ML pipeline to obtain optimal performance.

Results: To address these challenges, we have developed a novel tool called Machine Learning Made Easy (MLme) that streamlines the use of ML in research, specifically focusing on classification problems at present. By integrating 4 essential functionalities-namely, Data Exploration, AutoML, CustomML, and Visualization-MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts. To demonstrate the applicability of MLme, we conducted rigorous testing on 6 distinct datasets, each presenting unique characteristics and challenges. Our results consistently showed promising performance across different datasets, reaffirming the versatility and effectiveness of the tool. Additionally, by utilizing MLme's feature selection functionality, we successfully identified significant markers for CD8+ naive (BACH2), CD16+ (CD16), and CD14+ (VCAN) cell populations.

Conclusion: MLme serves as a valuable resource for leveraging ML to facilitate insightful data analysis and enhance research outcomes, while alleviating concerns related to complex coding scripts. The source code and a detailed tutorial for MLme are available at https://github.com/FunctionalUrology/MLme.

背景:机器学习(ML)已成为研究人员从复杂数据集中分析和提取有价值信息的重要资产。然而,开发有效而强大的 ML 管道是一项真正的挑战,需要花费大量的时间和精力,从而阻碍了研究的进展。该领域的现有工具需要对 ML 原理和编程技巧有深刻的理解。此外,用户还需要对其 ML 管道进行全面配置,以获得最佳性能:为了应对这些挑战,我们开发了一款名为 "机器学习变得简单"(MLme)的新型工具,它可以简化 ML 在研究中的使用,目前尤其侧重于分类问题。通过整合 4 项基本功能--即数据探索、自动ML、自定义ML 和可视化--MLme 可以满足研究人员的各种需求,同时无需大量编码工作。为了证明 MLme 的适用性,我们在 6 个不同的数据集上进行了严格的测试,每个数据集都具有独特的特征和挑战。我们的测试结果一致表明,该工具在不同的数据集上都有良好的表现,这再次证明了该工具的通用性和有效性。此外,通过利用 MLme 的特征选择功能,我们成功地鉴定出了 CD8+ 幼稚细胞群 (BACH2)、CD16+ 细胞群 (CD16) 和 CD14+ 细胞群 (VCAN) 的重要标记物:MLme 是利用 ML 促进深入数据分析和提高研究成果的宝贵资源,同时减轻了与复杂编码脚本相关的担忧。有关 MLme 的源代码和详细教程,请访问 https://github.com/FunctionalUrology/MLme。
{"title":"Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis.","authors":"Akshay Akshay, Mitali Katoch, Navid Shekarchizadeh, Masoud Abedi, Ankush Sharma, Fiona C Burkhard, Rosalyn M Adam, Katia Monastyrskaya, Ali Hashemi Gheinani","doi":"10.1093/gigascience/giad111","DOIUrl":"10.1093/gigascience/giad111","url":null,"abstract":"<p><strong>Background: </strong>Machine learning (ML) has emerged as a vital asset for researchers to analyze and extract valuable information from complex datasets. However, developing an effective and robust ML pipeline can present a real challenge, demanding considerable time and effort, thereby impeding research progress. Existing tools in this landscape require a profound understanding of ML principles and programming skills. Furthermore, users are required to engage in the comprehensive configuration of their ML pipeline to obtain optimal performance.</p><p><strong>Results: </strong>To address these challenges, we have developed a novel tool called Machine Learning Made Easy (MLme) that streamlines the use of ML in research, specifically focusing on classification problems at present. By integrating 4 essential functionalities-namely, Data Exploration, AutoML, CustomML, and Visualization-MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts. To demonstrate the applicability of MLme, we conducted rigorous testing on 6 distinct datasets, each presenting unique characteristics and challenges. Our results consistently showed promising performance across different datasets, reaffirming the versatility and effectiveness of the tool. Additionally, by utilizing MLme's feature selection functionality, we successfully identified significant markers for CD8+ naive (BACH2), CD16+ (CD16), and CD14+ (VCAN) cell populations.</p><p><strong>Conclusion: </strong>MLme serves as a valuable resource for leveraging ML to facilitate insightful data analysis and enhance research outcomes, while alleviating concerns related to complex coding scripts. The source code and a detailed tutorial for MLme are available at https://github.com/FunctionalUrology/MLme.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10783149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics. spatiAlign:用于空间解析转录组学数据整合的无监督对比学习模型。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae042
Chao Zhang, Lin Liu, Ying Zhang, Mei Li, Shuangsang Fang, Qiang Kang, Ao Chen, Xun Xu, Yong Zhang, Yuxiang Li

Background: Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times.

Findings: We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space.

Conclusions: In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.

背景对空间分辨率转录组学数据集进行整合分析有助于加深对复杂生物系统的理解。然而,整合多个组织切片给批次效应的去除带来了挑战,尤其是当切片是由不同技术测量或在不同时间采集时:我们提出的 spatiAlign 是一种无监督对比学习模型,它利用所有测量基因的表达和细胞的空间位置来整合多个组织切片。它不仅能在低维嵌入中,还能在重建的全表达空间中对多个数据集进行联合下游分析:在基准分析中,spatiAlign 在学习组织切片的联合和鉴别表征方面优于最先进的方法,而每个组织切片都可能具有复杂的批次效应或独特的生物特征。此外,我们还展示了 spatiAlign 在时间序列脑切片综合分析方面的优势,包括空间聚类、差异表达分析,特别是需要校正基因表达矩阵的轨迹推断。
{"title":"spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics.","authors":"Chao Zhang, Lin Liu, Ying Zhang, Mei Li, Shuangsang Fang, Qiang Kang, Ao Chen, Xun Xu, Yong Zhang, Yuxiang Li","doi":"10.1093/gigascience/giae042","DOIUrl":"10.1093/gigascience/giae042","url":null,"abstract":"<p><strong>Background: </strong>Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times.</p><p><strong>Findings: </strong>We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space.</p><p><strong>Conclusions: </strong>In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11258913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141727100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning a generalized graph transformer for protein function prediction in dissimilar sequences. 学习广义图变换器,预测不同序列中的蛋白质功能。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae093
Yiwei Fu, Zhonghui Gu, Xiao Luo, Qirui Guo, Luhua Lai, Minghua Deng

Background: In the face of a growing disparity between high-throughput sequence data and low-throughput experimental studies, the emerging field of deep learning stands as a promising alternative. Generally, many data-driven approaches are capable of facilitating fast and accurate predictions of protein functions. Nevertheless, the inherent statistical nature of deep learning techniques may limit their generalization capabilities when applied to novel nonhomologous proteins that diverge significantly from existing ones.

Results: In this work, we herein propose a novel, generalized approach named Graph Adversarial Learning with Alignment (GALA) for protein function prediction. Our GALA method integrates a graph transformer architecture with an attention pooling module to extract embeddings from both protein sequences and structures, facilitating unified learning of protein representations. Particularly noteworthy, GALA incorporates a domain discriminator conditioned on both learnable representations and predicted probabilities, which undergoes adversarial learning to ensure representation invariance across diverse environments. To optimize the model with abundant label information, we generate label embeddings in the hidden space, explicitly aligning them with protein representations. Benchmarked on datasets derived from the PDB database and Swiss-Prot database, our GALA achieves considerable performance comparable to several state-of-the-art methods. Even more, GALA demonstrates wonderful biological interpretability by identifying significant functional residues associated with Gene Ontology terms through class activation mapping.

Conclusions: GALA, which leverages adversarial learning and label embedding alignment to acquire domain-invariant protein representations, exhibits outstanding generalizability in function prediction for proteins from previously unseen sequence space. By incorporating the structures predicted by AlphaFold2, GALA demonstrates significant potential for function annotation in newly discovered sequences. A detailed implementation of our GALA is available at https://github.com/fuyw-aisw/GALA.

背景:高通量序列数据与低通量实验研究之间的差距越来越大,面对这种情况,新兴的深度学习领域成为了一种前景广阔的替代方法。一般来说,许多数据驱动的方法都能快速准确地预测蛋白质的功能。然而,当深度学习技术应用于与现有蛋白质有显著差异的新型非同源蛋白质时,其固有的统计性质可能会限制其泛化能力:在这项工作中,我们提出了一种用于蛋白质功能预测的新型通用方法,名为 "图形对抗学习与配准(GALA)"。我们的 GALA 方法集成了图转换器架构和注意力集合模块,可从蛋白质序列和结构中提取嵌入,从而促进蛋白质表征的统一学习。尤其值得注意的是,GALA 包含了一个以可学习表征和预测概率为条件的领域判别器,该判别器经过对抗学习,以确保在不同环境下的表征不变性。为了利用丰富的标签信息优化模型,我们在隐藏空间中生成了标签嵌入,明确地将它们与蛋白质表征对齐。以来自 PDB 数据库和 Swiss-Prot 数据库的数据集为基准,我们的 GALA 取得了与几种最先进方法相当的性能。此外,GALA 还通过类激活映射识别了与基因本体术语相关的重要功能残基,从而展示了出色的生物可解释性:GALA利用对抗学习和标签嵌入比对来获取领域不变的蛋白质表征,在对来自以前未见过的序列空间的蛋白质进行功能预测时表现出了出色的普适性。通过结合 AlphaFold2 预测的结构,GALA 在新发现序列的功能注释方面展现出巨大潜力。有关 GALA 的详细实现过程,请访问 https://github.com/fuyw-aisw/GALA。
{"title":"Learning a generalized graph transformer for protein function prediction in dissimilar sequences.","authors":"Yiwei Fu, Zhonghui Gu, Xiao Luo, Qirui Guo, Luhua Lai, Minghua Deng","doi":"10.1093/gigascience/giae093","DOIUrl":"10.1093/gigascience/giae093","url":null,"abstract":"<p><strong>Background: </strong>In the face of a growing disparity between high-throughput sequence data and low-throughput experimental studies, the emerging field of deep learning stands as a promising alternative. Generally, many data-driven approaches are capable of facilitating fast and accurate predictions of protein functions. Nevertheless, the inherent statistical nature of deep learning techniques may limit their generalization capabilities when applied to novel nonhomologous proteins that diverge significantly from existing ones.</p><p><strong>Results: </strong>In this work, we herein propose a novel, generalized approach named Graph Adversarial Learning with Alignment (GALA) for protein function prediction. Our GALA method integrates a graph transformer architecture with an attention pooling module to extract embeddings from both protein sequences and structures, facilitating unified learning of protein representations. Particularly noteworthy, GALA incorporates a domain discriminator conditioned on both learnable representations and predicted probabilities, which undergoes adversarial learning to ensure representation invariance across diverse environments. To optimize the model with abundant label information, we generate label embeddings in the hidden space, explicitly aligning them with protein representations. Benchmarked on datasets derived from the PDB database and Swiss-Prot database, our GALA achieves considerable performance comparable to several state-of-the-art methods. Even more, GALA demonstrates wonderful biological interpretability by identifying significant functional residues associated with Gene Ontology terms through class activation mapping.</p><p><strong>Conclusions: </strong>GALA, which leverages adversarial learning and label embedding alignment to acquire domain-invariant protein representations, exhibits outstanding generalizability in function prediction for proteins from previously unseen sequence space. By incorporating the structures predicted by AlphaFold2, GALA demonstrates significant potential for function annotation in newly discovered sequences. A detailed implementation of our GALA is available at https://github.com/fuyw-aisw/GALA.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stratum corneum nanotexture feature detection using deep learning and spatial analysis: a noninvasive tool for skin barrier assessment. 利用深度学习和空间分析进行角质层纳米纹理特征检测:一种用于皮肤屏障评估的无创工具。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae095
Jen-Hung Wang, Jorge Pereda, Ching-Wen Du, Chia-Yu Chu, Maria Oberländer Christensen, Sanja Kezic, Ivone Jakasa, Jacob P Thyssen, Sreeja Satheesh, Edwin En-Te Hwu

Background: Corneocyte surface nanoscale topography (nanotexture) has recently emerged as a potential biomarker for inflammatory skin diseases, such as atopic dermatitis (AD). This assessment method involves quantifying circular nano-size objects (CNOs) in corneocyte nanotexture images, enabling noninvasive analysis via stratum corneum (SC) tape stripping. Current approaches for identifying CNOs rely on computer vision techniques with specific geometric criteria, resulting in inaccuracies due to the susceptibility of nano-imaging techniques to environmental noise and structural occlusion on the corneocyte.

Results: This study recruited 45 AD patients and 15 healthy controls, evenly divided into 4 severity groups based on their Eczema Area and Severity Index scores. Subsequently, we collected a dataset of over 1,000 corneocyte nanotexture images using our in-house high-speed dermal atomic force microscope. This dataset was utilized to train state-of-the-art deep learning object detectors for identifying CNOs. Additionally, we implemented a kernel density estimator to analyze the spatial distribution of CNOs, excluding ineffective regions with minimal CNO occurrence, such as ridges and occlusions, thereby enhancing accuracy in density calculations. After fine-tuning, our detection model achieved an overall accuracy of 91.4% in detecting CNOs.

Conclusions: By integrating deep learning object detector with spatial analysis algorithms, we developed a precise methodology for calculating CNO density, termed the Effective Corneocyte Topographical Index (ECTI). The ECTI demonstrated exceptional robustness to nano-imaging artifacts and presents substantial potential for advancing AD diagnostics by effectively distinguishing between SC samples of varying AD severity and healthy controls.

背景:角质细胞表面纳米级形貌(纳米纹理)最近已成为特应性皮炎(AD)等炎症性皮肤病的潜在生物标志物。这种评估方法涉及量化角质细胞纳米纹理图像中的圆形纳米尺寸物体(CNOs),通过剥离角质层(SC)胶带实现无创分析。目前识别 CNO 的方法依赖于具有特定几何标准的计算机视觉技术,但由于纳米成像技术易受环境噪声和角质层结构闭塞的影响,因此会产生误差:本研究招募了 45 名 AD 患者和 15 名健康对照者,根据他们的湿疹面积和严重程度指数评分平均分为 4 个严重程度组。随后,我们使用内部高速皮肤原子力显微镜收集了超过 1000 张角质细胞纳米纹理图像的数据集。该数据集用于训练最先进的深度学习对象检测器,以识别 CNO。此外,我们还采用了核密度估计器来分析 CNO 的空间分布,排除了 CNO 出现最少的无效区域,例如脊和闭塞区,从而提高了密度计算的准确性。经过微调后,我们的检测模型在检测 CNO 方面的总体准确率达到了 91.4%:通过将深度学习对象检测器与空间分析算法相结合,我们开发出了一种精确计算CNO密度的方法,称为有效角质细胞地形指数(ECTI)。ECTI 对纳米成像伪影表现出卓越的鲁棒性,通过有效区分不同严重程度的 AD SC 样本和健康对照组,为推进 AD 诊断提供了巨大的潜力。
{"title":"Stratum corneum nanotexture feature detection using deep learning and spatial analysis: a noninvasive tool for skin barrier assessment.","authors":"Jen-Hung Wang, Jorge Pereda, Ching-Wen Du, Chia-Yu Chu, Maria Oberländer Christensen, Sanja Kezic, Ivone Jakasa, Jacob P Thyssen, Sreeja Satheesh, Edwin En-Te Hwu","doi":"10.1093/gigascience/giae095","DOIUrl":"10.1093/gigascience/giae095","url":null,"abstract":"<p><strong>Background: </strong>Corneocyte surface nanoscale topography (nanotexture) has recently emerged as a potential biomarker for inflammatory skin diseases, such as atopic dermatitis (AD). This assessment method involves quantifying circular nano-size objects (CNOs) in corneocyte nanotexture images, enabling noninvasive analysis via stratum corneum (SC) tape stripping. Current approaches for identifying CNOs rely on computer vision techniques with specific geometric criteria, resulting in inaccuracies due to the susceptibility of nano-imaging techniques to environmental noise and structural occlusion on the corneocyte.</p><p><strong>Results: </strong>This study recruited 45 AD patients and 15 healthy controls, evenly divided into 4 severity groups based on their Eczema Area and Severity Index scores. Subsequently, we collected a dataset of over 1,000 corneocyte nanotexture images using our in-house high-speed dermal atomic force microscope. This dataset was utilized to train state-of-the-art deep learning object detectors for identifying CNOs. Additionally, we implemented a kernel density estimator to analyze the spatial distribution of CNOs, excluding ineffective regions with minimal CNO occurrence, such as ridges and occlusions, thereby enhancing accuracy in density calculations. After fine-tuning, our detection model achieved an overall accuracy of 91.4% in detecting CNOs.</p><p><strong>Conclusions: </strong>By integrating deep learning object detector with spatial analysis algorithms, we developed a precise methodology for calculating CNO density, termed the Effective Corneocyte Topographical Index (ECTI). The ECTI demonstrated exceptional robustness to nano-imaging artifacts and presents substantial potential for advancing AD diagnostics by effectively distinguishing between SC samples of varying AD severity and healthy controls.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629979/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StereoSiTE: a framework to spatially and quantitatively profile the cellular neighborhood organized iTME. StereoSiTE:从空间上定量分析细胞邻近组织 iTME 的框架。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae078
Xing Liu, Chi Qu, Chuandong Liu, Na Zhu, Huaqiang Huang, Fei Teng, Caili Huang, Bingying Luo, Xuanzhu Liu, Min Xie, Feng Xi, Mei Li, Liang Wu, Yuxiang Li, Ao Chen, Xun Xu, Sha Liao, Jiajun Zhang

Background: Spatial transcriptome (ST) technologies are emerging as powerful tools for studying tumor biology. However, existing tools for analyzing ST data are limited, as they mainly rely on algorithms developed for single-cell RNA sequencing data and do not fully utilize the spatial information. While some algorithms have been developed for ST data, they are often designed for specific tasks, lacking a comprehensive analytical framework for leveraging spatial information.

Results: In this study, we present StereoSiTE, an analytical framework that combines open-source bioinformatics tools with custom algorithms to accurately infer the functional spatial cell interaction intensity (SCII) within the cellular neighborhood (CN) of interest. We applied StereoSiTE to decode ST datasets from xenograft models and found that the CN efficiently distinguished different cellular contexts, while the SCII analysis provided more precise insights into intercellular interactions by incorporating spatial information. By applying StereoSiTE to multiple samples, we successfully identified a CN region dominated by neutrophils, suggesting their potential role in remodeling the immune tumor microenvironment (iTME) after treatment. Moreover, the SCII analysis within the CN region revealed neutrophil-mediated communication, supported by pathway enrichment, transcription factor regulon activities, and protein-protein interactions.

Conclusions: StereoSiTE represents a promising framework for unraveling the mechanisms underlying treatment response within the iTME by leveraging CN-based tissue domain identification and SCII-inferred spatial intercellular interactions. The software is designed to be scalable, modular, and user-friendly, making it accessible to a wide range of researchers.

背景:空间转录组(ST)技术正在成为研究肿瘤生物学的强大工具。然而,现有的空间转录组数据分析工具非常有限,因为它们主要依赖于为单细胞 RNA 测序数据开发的算法,不能充分利用空间信息。虽然针对 ST 数据开发了一些算法,但它们往往是针对特定任务设计的,缺乏利用空间信息的综合分析框架:在这项研究中,我们提出了一个分析框架 StereoSiTE,该框架将开源生物信息学工具与定制算法相结合,可准确推断感兴趣的细胞邻域(CN)内的功能性空间细胞相互作用强度(SCII)。我们将 StereoSiTE 应用于解码异种移植模型的 ST 数据集,结果发现 CN 能有效区分不同的细胞环境,而 SCII 分析则通过整合空间信息更精确地洞察细胞间的相互作用。通过将 StereoSiTE 应用于多个样本,我们成功确定了以中性粒细胞为主的 CN 区域,这表明中性粒细胞在治疗后重塑免疫肿瘤微环境 (iTME) 中的潜在作用。此外,中性粒细胞区域内的SCII分析显示了中性粒细胞介导的交流,并得到了通路富集、转录因子调节子活性和蛋白-蛋白相互作用的支持:结论:StereoSiTE 是一种很有前景的框架,可利用基于 CN 的组织结构域识别和 SCII 推断的空间细胞间相互作用,揭示 iTME 内治疗反应的基本机制。该软件的设计具有可扩展性、模块化和用户友好性,使其能够为广大研究人员所使用。
{"title":"StereoSiTE: a framework to spatially and quantitatively profile the cellular neighborhood organized iTME.","authors":"Xing Liu, Chi Qu, Chuandong Liu, Na Zhu, Huaqiang Huang, Fei Teng, Caili Huang, Bingying Luo, Xuanzhu Liu, Min Xie, Feng Xi, Mei Li, Liang Wu, Yuxiang Li, Ao Chen, Xun Xu, Sha Liao, Jiajun Zhang","doi":"10.1093/gigascience/giae078","DOIUrl":"https://doi.org/10.1093/gigascience/giae078","url":null,"abstract":"<p><strong>Background: </strong>Spatial transcriptome (ST) technologies are emerging as powerful tools for studying tumor biology. However, existing tools for analyzing ST data are limited, as they mainly rely on algorithms developed for single-cell RNA sequencing data and do not fully utilize the spatial information. While some algorithms have been developed for ST data, they are often designed for specific tasks, lacking a comprehensive analytical framework for leveraging spatial information.</p><p><strong>Results: </strong>In this study, we present StereoSiTE, an analytical framework that combines open-source bioinformatics tools with custom algorithms to accurately infer the functional spatial cell interaction intensity (SCII) within the cellular neighborhood (CN) of interest. We applied StereoSiTE to decode ST datasets from xenograft models and found that the CN efficiently distinguished different cellular contexts, while the SCII analysis provided more precise insights into intercellular interactions by incorporating spatial information. By applying StereoSiTE to multiple samples, we successfully identified a CN region dominated by neutrophils, suggesting their potential role in remodeling the immune tumor microenvironment (iTME) after treatment. Moreover, the SCII analysis within the CN region revealed neutrophil-mediated communication, supported by pathway enrichment, transcription factor regulon activities, and protein-protein interactions.</p><p><strong>Conclusions: </strong>StereoSiTE represents a promising framework for unraveling the mechanisms underlying treatment response within the iTME by leveraging CN-based tissue domain identification and SCII-inferred spatial intercellular interactions. The software is designed to be scalable, modular, and user-friendly, making it accessible to a wide range of researchers.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1