首页 > 最新文献

Database: The Journal of Biological Databases and Curation最新文献

英文 中文
Dataset of miRNA-disease relations extracted from textual data using transformer-based neural networks. 使用基于变压器的神经网络从文本数据中提取 miRNA 与疾病关系的数据集。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-05 DOI: 10.1093/database/baae066
Sumit Madan, Lisa Kühnel, Holger Fröhlich, Martin Hofmann-Apitius, Juliane Fluck

MicroRNAs (miRNAs) play important roles in post-transcriptional processes and regulate major cellular functions. The abnormal regulation of expression of miRNAs has been linked to numerous human diseases such as respiratory diseases, cancer, and neurodegenerative diseases. Latest miRNA-disease associations are predominantly found in unstructured biomedical literature. Retrieving these associations manually can be cumbersome and time-consuming due to the continuously expanding number of publications. We propose a deep learning-based text mining approach that extracts normalized miRNA-disease associations from biomedical literature. To train the deep learning models, we build a new training corpus that is extended by distant supervision utilizing multiple external databases. A quantitative evaluation shows that the workflow achieves an area under receiver operator characteristic curve of 98% on a holdout test set for the detection of miRNA-disease associations. We demonstrate the applicability of the approach by extracting new miRNA-disease associations from biomedical literature (PubMed and PubMed Central). We have shown through quantitative analysis and evaluation on three different neurodegenerative diseases that our approach can effectively extract miRNA-disease associations not yet available in public databases. Database URL: https://zenodo.org/records/10523046.

微小核糖核酸(miRNA)在转录后过程中发挥着重要作用,并调控着细胞的主要功能。miRNAs 表达的异常调控与许多人类疾病有关,如呼吸系统疾病、癌症和神经退行性疾病。最新的 miRNA 与疾病的关联主要存在于非结构化的生物医学文献中。由于论文数量不断增加,手动检索这些关联可能会非常麻烦和耗时。我们提出了一种基于深度学习的文本挖掘方法,可从生物医学文献中提取归一化的 miRNA-疾病关联。为了训练深度学习模型,我们建立了一个新的训练语料库,该语料库通过利用多个外部数据库进行远距离监督来扩展。定量评估结果表明,该工作流程在检测 miRNA-疾病关联时,在保留测试集上的接收者操作者特征曲线下面积达到了 98%。我们通过从生物医学文献(PubMed 和 PubMed Central)中提取新的 miRNA-疾病关联来证明该方法的适用性。我们通过对三种不同神经退行性疾病的定量分析和评估表明,我们的方法可以有效地提取公共数据库中尚未提供的 miRNA-疾病关联。数据库网址:https://zenodo.org/records/10523046。
{"title":"Dataset of miRNA-disease relations extracted from textual data using transformer-based neural networks.","authors":"Sumit Madan, Lisa Kühnel, Holger Fröhlich, Martin Hofmann-Apitius, Juliane Fluck","doi":"10.1093/database/baae066","DOIUrl":"10.1093/database/baae066","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) play important roles in post-transcriptional processes and regulate major cellular functions. The abnormal regulation of expression of miRNAs has been linked to numerous human diseases such as respiratory diseases, cancer, and neurodegenerative diseases. Latest miRNA-disease associations are predominantly found in unstructured biomedical literature. Retrieving these associations manually can be cumbersome and time-consuming due to the continuously expanding number of publications. We propose a deep learning-based text mining approach that extracts normalized miRNA-disease associations from biomedical literature. To train the deep learning models, we build a new training corpus that is extended by distant supervision utilizing multiple external databases. A quantitative evaluation shows that the workflow achieves an area under receiver operator characteristic curve of 98% on a holdout test set for the detection of miRNA-disease associations. We demonstrate the applicability of the approach by extracting new miRNA-disease associations from biomedical literature (PubMed and PubMed Central). We have shown through quantitative analysis and evaluation on three different neurodegenerative diseases that our approach can effectively extract miRNA-disease associations not yet available in public databases. Database URL: https://zenodo.org/records/10523046.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FatPlants: a comprehensive information system for lipid-related genes and metabolic pathways in plants. FatPlants:植物脂质相关基因和代谢途径综合信息系统。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-05 DOI: 10.1093/database/baae074
Chunhui Xu, Trey Shaw, Sai Akhil Choppararu, Yiwei Lu, Shaik Naveed Farooq, Yongfang Qin, Matt Hudson, Brock Weekley, Michael Fisher, Fei He, Jose Roberto Da Silva Nascimento, Nicholas Wergeles, Trupti Joshi, Philip D Bates, Abraham J Koo, Doug K Allen, Edgar B Cahoon, Jay J Thelen, Dong Xu

FatPlants, an open-access, web-based database, consolidates data, annotations, analysis results, and visualizations of lipid-related genes, proteins, and metabolic pathways in plants. Serving as a minable resource, FatPlants offers a user-friendly interface for facilitating studies into the regulation of plant lipid metabolism and supporting breeding efforts aimed at increasing crop oil content. This web resource, developed using data derived from our own research, curated from public resources, and gleaned from academic literature, comprises information on known fatty-acid-related proteins, genes, and pathways in multiple plants, with an emphasis on Glycine max, Arabidopsis thaliana, and Camelina sativa. Furthermore, the platform includes machine-learning based methods and navigation tools designed to aid in characterizing metabolic pathways and protein interactions. Comprehensive gene and protein information cards, a Basic Local Alignment Search Tool search function, similar structure search capacities from AphaFold, and ChatGPT-based query for protein information are additional features. Database URL: https://www.fatplants.net/.

FatPlants 是一个开放存取的网络数据库,它整合了植物中脂质相关基因、蛋白质和代谢途径的数据、注释、分析结果和可视化信息。作为一种可挖掘的资源,FatPlants 提供了一个用户友好型界面,可促进对植物脂质代谢调控的研究,并支持旨在提高作物含油量的育种工作。该网络资源是利用我们自己的研究数据、公共资源和学术文献中收集到的数据开发的,包含多种植物中已知的脂肪酸相关蛋白、基因和途径的信息,重点关注最大甘氨酸、拟南芥和荠菜。此外,该平台还包括基于机器学习的方法和导航工具,旨在帮助确定代谢途径和蛋白质相互作用的特征。全面的基因和蛋白质信息卡、基本局部比对搜索工具搜索功能、来自 AphaFold 的相似结构搜索能力以及基于 ChatGPT 的蛋白质信息查询功能都是该平台的附加功能。数据库网址:https://www.fatplants.net/。
{"title":"FatPlants: a comprehensive information system for lipid-related genes and metabolic pathways in plants.","authors":"Chunhui Xu, Trey Shaw, Sai Akhil Choppararu, Yiwei Lu, Shaik Naveed Farooq, Yongfang Qin, Matt Hudson, Brock Weekley, Michael Fisher, Fei He, Jose Roberto Da Silva Nascimento, Nicholas Wergeles, Trupti Joshi, Philip D Bates, Abraham J Koo, Doug K Allen, Edgar B Cahoon, Jay J Thelen, Dong Xu","doi":"10.1093/database/baae074","DOIUrl":"10.1093/database/baae074","url":null,"abstract":"<p><p>FatPlants, an open-access, web-based database, consolidates data, annotations, analysis results, and visualizations of lipid-related genes, proteins, and metabolic pathways in plants. Serving as a minable resource, FatPlants offers a user-friendly interface for facilitating studies into the regulation of plant lipid metabolism and supporting breeding efforts aimed at increasing crop oil content. This web resource, developed using data derived from our own research, curated from public resources, and gleaned from academic literature, comprises information on known fatty-acid-related proteins, genes, and pathways in multiple plants, with an emphasis on Glycine max, Arabidopsis thaliana, and Camelina sativa. Furthermore, the platform includes machine-learning based methods and navigation tools designed to aid in characterizing metabolic pathways and protein interactions. Comprehensive gene and protein information cards, a Basic Local Alignment Search Tool search function, similar structure search capacities from AphaFold, and ChatGPT-based query for protein information are additional features. Database URL: https://www.fatplants.net/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes. 针对西班牙临床笔记的生物医学多类命名实体识别的多头 CRF 分类器。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-30 DOI: 10.1093/database/baae068
Richard A A Jonker, Tiago Almeida, Rui Antunes, João R Almeida, Sérgio Matos

The identification of medical concepts from clinical narratives has a large interest in the biomedical scientific community due to its importance in treatment improvements or drug development research. Biomedical named entity recognition (NER) in clinical texts is crucial for automated information extraction, facilitating patient record analysis, drug development, and medical research. Traditional approaches often focus on single-class NER tasks, yet recent advancements emphasize the necessity of addressing multi-class scenarios, particularly in complex biomedical domains. This paper proposes a strategy to integrate a multi-head conditional random field (CRF) classifier for multi-class NER in Spanish clinical documents. Our methodology overcomes overlapping entity instances of different types, a common challenge in traditional NER methodologies, by using a multi-head CRF model. This architecture enhances computational efficiency and ensures scalability for multi-class NER tasks, maintaining high performance. By combining four diverse datasets, SympTEMIST, MedProcNER, DisTEMIST, and PharmaCoNER, we expand the scope of NER to encompass five classes: symptoms, procedures, diseases, chemicals, and proteins. To the best of our knowledge, these datasets combined create the largest Spanish multi-class dataset focusing on biomedical entity recognition and linking for clinical notes, which is important to train a biomedical model in Spanish. We also provide entity linking to the multi-lingual Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) vocabulary, with the eventual goal of performing biomedical relation extraction. Through experimentation and evaluation of Spanish clinical documents, our strategy provides competitive results against single-class NER models. For NER, our system achieves a combined micro-averaged F1-score of 78.73, with clinical mentions normalized to SNOMED CT with an end-to-end F1-score of 54.51. The code to run our system is publicly available at https://github.com/ieeta-pt/Multi-Head-CRF. Database URL: https://github.com/ieeta-pt/Multi-Head-CRF.

从临床叙述中识别医学概念对改善治疗或药物开发研究具有重要意义,因此在生物医学科学界引起了广泛关注。临床文本中的生物医学命名实体识别(NER)对于自动信息提取、促进病历分析、药物开发和医学研究至关重要。传统方法通常侧重于单类命名实体识别任务,但最近的研究进展强调了处理多类场景的必要性,尤其是在复杂的生物医学领域。本文提出了一种整合多头条件随机场(CRF)分类器的策略,用于西班牙临床文档中的多类 NER。我们的方法通过使用多头 CRF 模型,克服了传统 NER 方法中常见的挑战--不同类型实体实例重叠的问题。这种架构提高了计算效率,确保了多类 NER 任务的可扩展性,并保持了高性能。通过结合 SympTEMIST、MedProcNER、DisTEMIST 和 PharmaCoNER 这四个不同的数据集,我们将 NER 的范围扩展到了五个类别:症状、程序、疾病、化学物质和蛋白质。据我们所知,这些数据集的组合创造了西班牙最大的多类数据集,其重点是临床笔记的生物医学实体识别和链接,这对训练西班牙语生物医学模型非常重要。我们还提供了与多语言系统化医学临床术语(SNOMED CT)词汇的实体链接,最终目标是进行生物医学关系提取。通过对西班牙语临床文档的实验和评估,我们的策略提供了与单类 NER 模型相比具有竞争力的结果。在 NER 方面,我们的系统取得了 78.73 的综合微平均 F1 分数,而根据 SNOMED CT 规范化的临床提及则取得了 54.51 的端到端 F1 分数。运行我们系统的代码可通过 https://github.com/ieeta-pt/Multi-Head-CRF 公开获取。数据库网址:https://github.com/ieeta-pt/Multi-Head-CRF。
{"title":"Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes.","authors":"Richard A A Jonker, Tiago Almeida, Rui Antunes, João R Almeida, Sérgio Matos","doi":"10.1093/database/baae068","DOIUrl":"10.1093/database/baae068","url":null,"abstract":"<p><p>The identification of medical concepts from clinical narratives has a large interest in the biomedical scientific community due to its importance in treatment improvements or drug development research. Biomedical named entity recognition (NER) in clinical texts is crucial for automated information extraction, facilitating patient record analysis, drug development, and medical research. Traditional approaches often focus on single-class NER tasks, yet recent advancements emphasize the necessity of addressing multi-class scenarios, particularly in complex biomedical domains. This paper proposes a strategy to integrate a multi-head conditional random field (CRF) classifier for multi-class NER in Spanish clinical documents. Our methodology overcomes overlapping entity instances of different types, a common challenge in traditional NER methodologies, by using a multi-head CRF model. This architecture enhances computational efficiency and ensures scalability for multi-class NER tasks, maintaining high performance. By combining four diverse datasets, SympTEMIST, MedProcNER, DisTEMIST, and PharmaCoNER, we expand the scope of NER to encompass five classes: symptoms, procedures, diseases, chemicals, and proteins. To the best of our knowledge, these datasets combined create the largest Spanish multi-class dataset focusing on biomedical entity recognition and linking for clinical notes, which is important to train a biomedical model in Spanish. We also provide entity linking to the multi-lingual Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) vocabulary, with the eventual goal of performing biomedical relation extraction. Through experimentation and evaluation of Spanish clinical documents, our strategy provides competitive results against single-class NER models. For NER, our system achieves a combined micro-averaged F1-score of 78.73, with clinical mentions normalized to SNOMED CT with an end-to-end F1-score of 54.51. The code to run our system is publicly available at https://github.com/ieeta-pt/Multi-Head-CRF. Database URL: https://github.com/ieeta-pt/Multi-Head-CRF.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving biomedical entity linking for complex entity mentions with LLM-based text simplification. 利用基于 LLM 的文本简化,改进复杂实体提及的生物医学实体链接。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-26 DOI: 10.1093/database/baae067
Florian Borchert, Ignacio Llorca, Matthieu-P Schapranow

Large amounts of important medical information are captured in free-text documents in biomedical research and within healthcare systems, which can be made accessible through natural language processing (NLP). A key component in most biomedical NLP pipelines is entity linking, i.e. grounding textual mentions of named entities to a reference of medical concepts, usually derived from a terminology system, such as the Systematized Nomenclature of Medicine Clinical Terms. However, complex entity mentions, spanning multiple tokens, are notoriously hard to normalize due to the difficulty of finding appropriate candidate concepts. In this work, we propose an approach to preprocess such mentions for candidate generation, building upon recent advances in text simplification with generative large language models. We evaluate the feasibility of our method in the context of the entity linking track of the BioCreative VIII SympTEMIST shared task. We find that instructing the latest Generative Pre-trained Transformer model with a few-shot prompt for text simplification results in mention spans that are easier to normalize. Thus, we can improve recall during candidate generation by 2.9 percentage points compared to our baseline system, which achieved the best score in the original shared task evaluation. Furthermore, we show that this improvement in recall can be fully translated into top-1 accuracy through careful initialization of a subsequent reranking model. Our best system achieves an accuracy of 63.6% on the SympTEMIST test set. The proposed approach has been integrated into the open-source xMEN toolkit, which is available online via https://github.com/hpi-dhc/xmen.

在生物医学研究和医疗保健系统中,大量重要的医学信息被记录在自由文本文件中,这些信息可以通过自然语言处理(NLP)来获取。大多数生物医学 NLP 管道中的一个关键组成部分是实体链接,即把命名实体的文本提及与医学概念的参考文献联系起来,医学概念的参考文献通常来自术语系统,如《医学临床术语系统命名法》(Systematized Nomenclature of Medicine Clinical Terms)。然而,由于难以找到合适的候选概念,跨越多个标记的复杂实体提及很难规范化。在这项工作中,我们提出了一种预处理此类提及以便生成候选概念的方法,该方法基于最近在使用生成式大语言模型进行文本简化方面取得的进展。我们在 BioCreative VIII SympTEMIST 共享任务的实体链接轨道中评估了我们方法的可行性。我们发现,使用最新的生成式预训练转换器模型,并对文本简化进行少量提示,会使提及跨度更容易归一化。因此,与我们的基线系统相比,我们可以将候选词生成过程中的召回率提高 2.9 个百分点。此外,我们还证明,通过对后续重排模型进行仔细的初始化,这种召回率的提高完全可以转化为最高的准确率。我们的最佳系统在 SympTEMIST 测试集上达到了 63.6% 的准确率。我们提出的方法已被集成到开源的 xMEN 工具包中,该工具包可通过 https://github.com/hpi-dhc/xmen 在线获取。
{"title":"Improving biomedical entity linking for complex entity mentions with LLM-based text simplification.","authors":"Florian Borchert, Ignacio Llorca, Matthieu-P Schapranow","doi":"10.1093/database/baae067","DOIUrl":"10.1093/database/baae067","url":null,"abstract":"<p><p>Large amounts of important medical information are captured in free-text documents in biomedical research and within healthcare systems, which can be made accessible through natural language processing (NLP). A key component in most biomedical NLP pipelines is entity linking, i.e. grounding textual mentions of named entities to a reference of medical concepts, usually derived from a terminology system, such as the Systematized Nomenclature of Medicine Clinical Terms. However, complex entity mentions, spanning multiple tokens, are notoriously hard to normalize due to the difficulty of finding appropriate candidate concepts. In this work, we propose an approach to preprocess such mentions for candidate generation, building upon recent advances in text simplification with generative large language models. We evaluate the feasibility of our method in the context of the entity linking track of the BioCreative VIII SympTEMIST shared task. We find that instructing the latest Generative Pre-trained Transformer model with a few-shot prompt for text simplification results in mention spans that are easier to normalize. Thus, we can improve recall during candidate generation by 2.9 percentage points compared to our baseline system, which achieved the best score in the original shared task evaluation. Furthermore, we show that this improvement in recall can be fully translated into top-1 accuracy through careful initialization of a subsequent reranking model. Our best system achieves an accuracy of 63.6% on the SympTEMIST test set. The proposed approach has been integrated into the open-source xMEN toolkit, which is available online via https://github.com/hpi-dhc/xmen.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11281847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CO-19 PDB 2.0: A Comprehensive COVID-19 Database with Global Auto-Alerts, Statistical Analysis, and Cancer Correlations. CO-19 PDB 2.0:具有全局自动预警、统计分析和癌症相关性的 COVID-19 综合数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-26 DOI: 10.1093/database/baae072
Shahid Ullah, Yingmei Li, Wajeeha Rahman, Farhan Ullah, Muhammad Ijaz, Anees Ullah, Gulzar Ahmad, Hameed Ullah, Tianshun Gao

Biological databases serve as critical basics for modern research, and amid the dynamic landscape of biology, the COVID-19 database has emerged as an indispensable resource. The global outbreak of Covid-19, commencing in December 2019, necessitates comprehensive databases to unravel the intricate connections between this novel virus and cancer. Despite existing databases, a crucial need persists for a centralized and accessible method to acquire precise information within the research community. The main aim of the work is to develop a database which has all the COVID-19-related data available in just one click with auto global notifications. This gap is addressed by the meticulously designed COVID-19 Pandemic Database (CO-19 PDB 2.0), positioned as a comprehensive resource for researchers navigating the complexities of COVID-19 and cancer. Between December 2019 and June 2024, the CO-19 PDB 2.0 systematically collected and organized 120 datasets into six distinct categories, each catering to specific functionalities. These categories encompass a chemical structure database, a digital image database, a visualization tool database, a genomic database, a social science database, and a literature database. Functionalities range from image analysis and gene sequence information to data visualization and updates on environmental events. CO-19 PDB 2.0 has the option to choose either the search page for the database or the autonotification page, providing a seamless retrieval of information. The dedicated page introduces six predefined charts, providing insights into crucial criteria such as the number of cases and deaths', country-wise distribution, 'new cases and recovery', and rates of death and recovery. The global impact of COVID-19 on cancer patients has led to extensive collaboration among research institutions, producing numerous articles and computational studies published in international journals. A key feature of this initiative is auto daily notifications for standardized information updates. Users can easily navigate based on different categories or use a direct search option. The study offers up-to-date COVID-19 datasets and global statistics on COVID-19 and cancer, highlighting the top 10 cancers diagnosed in the USA in 2022. Breast and prostate cancers are the most common, representing 30% and 26% of new cases, respectively. The initiative also ensures the removal or replacement of dead links, providing a valuable resource for researchers, healthcare professionals, and individuals. The database has been implemented in PHP, HTML, CSS and MySQL and is available freely at https://www.co-19pdb.habdsk.org/. Database URL: https://www.co-19pdb.habdsk.org/.

生物数据库是现代研究的重要基础,在生物学的动态发展中,COVID-19 数据库已成为不可或缺的资源。Covid-19病毒将于2019年12月在全球爆发,因此有必要建立全面的数据库,以揭示这种新型病毒与癌症之间错综复杂的联系。尽管有了现有的数据库,但研究界仍然迫切需要一种集中、易用的方法来获取精确信息。这项工作的主要目的是开发一个数据库,只需点击一下,就能获得所有与 COVID-19 相关的数据,并自动发出全球通知。精心设计的 COVID-19 大流行数据库(COV-19 PDB 2.0)填补了这一空白,该数据库将成为研究人员了解 COVID-19 和癌症复杂性的综合资源。从 2019 年 12 月到 2024 年 6 月,COVID-19 PDB 2.0 系统收集并整理了 120 个数据集,分为六个不同的类别,每个类别都有特定的功能。这些类别包括化学结构数据库、数字图像数据库、可视化工具数据库、基因组数据库、社会科学数据库和文献数据库。功能范围从图像分析和基因序列信息到数据可视化和环境事件更新。CO-19 PDB 2.0 可选择数据库搜索页面或自动识别页面,提供无缝信息检索。专用页面引入了六个预定义图表,提供了对病例和死亡人数、国家分布、"新病例和康复 "以及死亡和康复率等关键标准的深入了解。COVID-19 对癌症患者的全球影响促成了研究机构之间的广泛合作,在国际期刊上发表了大量文章和计算研究。该计划的一个主要特点是每天自动通知标准化信息更新。用户可根据不同类别轻松浏览,或使用直接搜索选项。该研究提供了最新的COVID-19数据集以及有关COVID-19和癌症的全球统计数据,重点介绍了2022年美国诊断出的十大癌症。乳腺癌和前列腺癌最为常见,分别占新病例的30%和26%。该倡议还确保删除或替换死链接,为研究人员、医疗保健专业人员和个人提供宝贵的资源。该数据库采用 PHP、HTML、CSS 和 MySQL 实现,可在 https://www.co-19pdb.habdsk.org/ 免费获取。数据库网址:https://www.co-19pdb.habdsk.org/。
{"title":"CO-19 PDB 2.0: A Comprehensive COVID-19 Database with Global Auto-Alerts, Statistical Analysis, and Cancer Correlations.","authors":"Shahid Ullah, Yingmei Li, Wajeeha Rahman, Farhan Ullah, Muhammad Ijaz, Anees Ullah, Gulzar Ahmad, Hameed Ullah, Tianshun Gao","doi":"10.1093/database/baae072","DOIUrl":"10.1093/database/baae072","url":null,"abstract":"<p><p>Biological databases serve as critical basics for modern research, and amid the dynamic landscape of biology, the COVID-19 database has emerged as an indispensable resource. The global outbreak of Covid-19, commencing in December 2019, necessitates comprehensive databases to unravel the intricate connections between this novel virus and cancer. Despite existing databases, a crucial need persists for a centralized and accessible method to acquire precise information within the research community. The main aim of the work is to develop a database which has all the COVID-19-related data available in just one click with auto global notifications. This gap is addressed by the meticulously designed COVID-19 Pandemic Database (CO-19 PDB 2.0), positioned as a comprehensive resource for researchers navigating the complexities of COVID-19 and cancer. Between December 2019 and June 2024, the CO-19 PDB 2.0 systematically collected and organized 120 datasets into six distinct categories, each catering to specific functionalities. These categories encompass a chemical structure database, a digital image database, a visualization tool database, a genomic database, a social science database, and a literature database. Functionalities range from image analysis and gene sequence information to data visualization and updates on environmental events. CO-19 PDB 2.0 has the option to choose either the search page for the database or the autonotification page, providing a seamless retrieval of information. The dedicated page introduces six predefined charts, providing insights into crucial criteria such as the number of cases and deaths', country-wise distribution, 'new cases and recovery', and rates of death and recovery. The global impact of COVID-19 on cancer patients has led to extensive collaboration among research institutions, producing numerous articles and computational studies published in international journals. A key feature of this initiative is auto daily notifications for standardized information updates. Users can easily navigate based on different categories or use a direct search option. The study offers up-to-date COVID-19 datasets and global statistics on COVID-19 and cancer, highlighting the top 10 cancers diagnosed in the USA in 2022. Breast and prostate cancers are the most common, representing 30% and 26% of new cases, respectively. The initiative also ensures the removal or replacement of dead links, providing a valuable resource for researchers, healthcare professionals, and individuals. The database has been implemented in PHP, HTML, CSS and MySQL and is available freely at https://www.co-19pdb.habdsk.org/. Database URL: https://www.co-19pdb.habdsk.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11281848/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transverse aortic constriction multi-omics analysis uncovers pathophysiological cardiac molecular mechanisms. 横向主动脉收缩多组学分析揭示了病理生理的心脏分子机制。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-25 DOI: 10.1093/database/baae060
Enio Gjerga, Matthias Dewenter, Thiago Britto-Borges, Johannes Grosso, Frank Stein, Jessica Eschenbach, Mandy Rettel, Johannes Backs, Christoph Dieterich

Time-course multi-omics data of a murine model of progressive heart failure (HF) induced by transverse aortic constriction (TAC) provide insights into the molecular mechanisms that are causatively involved in contractile failure and structural cardiac remodelling. We employ Illumina-based transcriptomics, Nanopore sequencing and mass spectrometry-based proteomics on samples from the left ventricle (LV) and right ventricle (RV, RNA only) of the heart at 1, 7, 21 and 56 days following TAC and Sham surgery. Here, we present Transverse Aortic COnstriction Multi-omics Analysis (TACOMA), as an interactive web application that integrates and visualizes transcriptomics and proteomics data collected in a TAC time-course experiment. TACOMA enables users to visualize the expression profile of known and novel genes and protein products thereof. Importantly, we capture alternative splicing events by assessing differential transcript and exon usage as well. Co-expression-based clustering algorithms and functional enrichment analysis revealed overrepresented annotations of biological processes and molecular functions at the protein and gene levels. To enhance data integration, TACOMA synchronizes transcriptomics and proteomics profiles, enabling cross-omics comparisons. With TACOMA (https://shiny.dieterichlab.org/app/tacoma), we offer a rich web-based resource to uncover molecular events and biological processes implicated in contractile failure and cardiac hypertrophy. For example, we highlight: (i) changes in metabolic genes and proteins in the time course of hypertrophic growth and contractile impairment; (ii) identification of RNA splicing changes in the expression of Tpm2 isoforms between RV and LV; and (iii) novel transcripts and genes likely contributing to the pathogenesis of HF. We plan to extend these data with additional environmental and genetic models of HF to decipher common and distinct molecular changes in heart diseases of different aetiologies. Database URL: https://shiny.dieterichlab.org/app/tacoma.

横向主动脉缩窄(TAC)诱导的进行性心力衰竭(HF)小鼠模型的时程多组学数据为我们提供了深入了解导致收缩力衰竭和心脏结构重塑的分子机制的机会。我们采用了基于 Illumina 的转录组学、Nanopore 测序和基于质谱的蛋白质组学,对 TAC 和 Sham 手术后 1、7、21 和 56 天的左心室和右心室样本进行了分析。在这里,我们介绍横纹肌收缩多组学分析(TACOMA),它是一种交互式网络应用程序,可将在横纹肌收缩时程实验中收集到的转录组学和蛋白质组学数据进行整合和可视化。TACOMA 使用户能够直观地看到已知和新基因及其蛋白产物的表达谱。重要的是,我们还通过评估不同的转录本和外显子使用情况来捕捉替代剪接事件。基于共表达的聚类算法和功能富集分析揭示了蛋白质和基因水平上生物过程和分子功能的高比例注释。为了加强数据整合,TACOMA同步了转录组学和蛋白质组学资料,实现了交叉组学比较。通过 TACOMA (https://shiny.dieterichlab.org/app/tacoma),我们提供了丰富的网络资源,以揭示与收缩力衰竭和心肌肥大有关的分子事件和生物过程。例如,我们重点介绍了:(i) 在肥大生长和收缩功能障碍的时间过程中代谢基因和蛋白质的变化;(ii) 鉴别 RV 和 LV 之间 Tpm2 同工酶表达的 RNA 剪接变化;(iii) 可能导致高房颤症发病机制的新转录本和基因。我们计划用更多的高频房颤环境和遗传模型来扩展这些数据,以破译不同病因的心脏疾病中常见和不同的分子变化。数据库网址:https://shiny.dieterichlab.org/app/tacoma。
{"title":"Transverse aortic constriction multi-omics analysis uncovers pathophysiological cardiac molecular mechanisms.","authors":"Enio Gjerga, Matthias Dewenter, Thiago Britto-Borges, Johannes Grosso, Frank Stein, Jessica Eschenbach, Mandy Rettel, Johannes Backs, Christoph Dieterich","doi":"10.1093/database/baae060","DOIUrl":"10.1093/database/baae060","url":null,"abstract":"<p><p>Time-course multi-omics data of a murine model of progressive heart failure (HF) induced by transverse aortic constriction (TAC) provide insights into the molecular mechanisms that are causatively involved in contractile failure and structural cardiac remodelling. We employ Illumina-based transcriptomics, Nanopore sequencing and mass spectrometry-based proteomics on samples from the left ventricle (LV) and right ventricle (RV, RNA only) of the heart at 1, 7, 21 and 56 days following TAC and Sham surgery. Here, we present Transverse Aortic COnstriction Multi-omics Analysis (TACOMA), as an interactive web application that integrates and visualizes transcriptomics and proteomics data collected in a TAC time-course experiment. TACOMA enables users to visualize the expression profile of known and novel genes and protein products thereof. Importantly, we capture alternative splicing events by assessing differential transcript and exon usage as well. Co-expression-based clustering algorithms and functional enrichment analysis revealed overrepresented annotations of biological processes and molecular functions at the protein and gene levels. To enhance data integration, TACOMA synchronizes transcriptomics and proteomics profiles, enabling cross-omics comparisons. With TACOMA (https://shiny.dieterichlab.org/app/tacoma), we offer a rich web-based resource to uncover molecular events and biological processes implicated in contractile failure and cardiac hypertrophy. For example, we highlight: (i) changes in metabolic genes and proteins in the time course of hypertrophic growth and contractile impairment; (ii) identification of RNA splicing changes in the expression of Tpm2 isoforms between RV and LV; and (iii) novel transcripts and genes likely contributing to the pathogenesis of HF. We plan to extend these data with additional environmental and genetic models of HF to decipher common and distinct molecular changes in heart diseases of different aetiologies. Database URL: https://shiny.dieterichlab.org/app/tacoma.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11270014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141757630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data set of fraction unbound values in the in vitro incubations for metabolic studies for better prediction of human clearance. 用于代谢研究的体外培养中未结合部分的数据集,以更好地预测人体清除率。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-24 DOI: 10.1093/database/baae063
Laura Krumpholz, Aleksandra Klimczyk, Wiktoria Bieniek, Sebastian Polak, Barbara Wiśniowska

In vitro-in vivo extrapolation is a commonly applied technique for liver clearance prediction. Various in vitro models are available such as hepatocytes, human liver microsomes, or recombinant cytochromes P450. According to the free drug theory, only the unbound fraction (fu) of a chemical can undergo metabolic changes. Therefore, to ensure the reliability of predictions, both specific and nonspecific binding in the model should be accounted. However, the fraction unbound in the experiment is often not reported. The study aimed to provide a detailed repository of the literature data on the compound's fu value in various in vitro systems used for drug metabolism evaluation and corresponding human plasma binding levels. Data on the free fraction in plasma and different in vitro models were supplemented with the following information: the experimental method used for the assessment of the degree of drug binding, protein or cell concentration in the incubation, and other experimental conditions, if different from the standard ones, species, reference to the source publication, and the author's name and date of publication. In total, we collected 129 literature studies on 1425 different compounds. The provided data set can be used as a reference for scientists involved in pharmacokinetic/physiologically based pharmacokinetic modelling as well as researchers interested in Quantitative Structure-Activity Relationship models for the prediction of fraction unbound based on compound structure. Database URL: https://data.mendeley.com/datasets/3bs5526htd/1.

体外-体内外推法是预测肝脏清除率的常用技术。目前有多种体外模型,如肝细胞、人肝微粒体或重组细胞色素 P450。根据游离药物理论,只有未结合部分(fu)的化学物质才能发生代谢变化。因此,为确保预测的可靠性,模型中的特异性和非特异性结合都应考虑在内。然而,实验中未结合的部分往往没有报告。本研究旨在提供一个详细的文献数据库,其中包括化合物在用于药物代谢评价的各种体外系统中的 fu 值以及相应的人体血浆结合水平。有关血浆和不同体外模型中游离部分的数据均附有以下信息:用于评估药物结合程度的实验方法、培养过程中的蛋白质或细胞浓度以及其他实验条件(如果与标准条件不同)、物种、来源出版物的参考文献以及作者姓名和发表日期。我们总共收集了 129 篇关于 1425 种不同化合物的文献研究。所提供的数据集可作为参与药代动力学/生理学药代动力学建模的科学家以及对基于化合物结构预测未结合部分的定量结构-活性关系模型感兴趣的研究人员的参考资料。数据库网址:https://data.mendeley.com/datasets/3bs5526htd/1.
{"title":"Data set of fraction unbound values in the in vitro incubations for metabolic studies for better prediction of human clearance.","authors":"Laura Krumpholz, Aleksandra Klimczyk, Wiktoria Bieniek, Sebastian Polak, Barbara Wiśniowska","doi":"10.1093/database/baae063","DOIUrl":"10.1093/database/baae063","url":null,"abstract":"<p><p>In vitro-in vivo extrapolation is a commonly applied technique for liver clearance prediction. Various in vitro models are available such as hepatocytes, human liver microsomes, or recombinant cytochromes P450. According to the free drug theory, only the unbound fraction (fu) of a chemical can undergo metabolic changes. Therefore, to ensure the reliability of predictions, both specific and nonspecific binding in the model should be accounted. However, the fraction unbound in the experiment is often not reported. The study aimed to provide a detailed repository of the literature data on the compound's fu value in various in vitro systems used for drug metabolism evaluation and corresponding human plasma binding levels. Data on the free fraction in plasma and different in vitro models were supplemented with the following information: the experimental method used for the assessment of the degree of drug binding, protein or cell concentration in the incubation, and other experimental conditions, if different from the standard ones, species, reference to the source publication, and the author's name and date of publication. In total, we collected 129 literature studies on 1425 different compounds. The provided data set can be used as a reference for scientists involved in pharmacokinetic/physiologically based pharmacokinetic modelling as well as researchers interested in Quantitative Structure-Activity Relationship models for the prediction of fraction unbound based on compound structure. Database URL: https://data.mendeley.com/datasets/3bs5526htd/1.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11269425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141757629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PANGEN: an online platform for the comparison and creation of diagnostic gene panels. PANGEN:用于比较和创建诊断基因面板的在线平台。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-23 DOI: 10.1093/database/baae065
Ofer Isakov, Dina Marek-Yagel, Rotem Greenberg, Michal Naftali, Shay Ben-Shachar

Targeted gene panel sequencing is used to limit the search for causative genetic variants solely to genes with an established association with the phenotype. The design of gene panels is challenging due to the lack of consensus regarding phenotypic associations for some genes, which results in high variation in gene composition for the same panel offered by different laboratories. We developed PANGEN, a platform that provides a centralized resource for gene panel information, with the ability to compare and generate new intelligent diagnostic panels. Gene-phenotype associations were collected from 12 public and commercial sources (Blueprint, Cegat, Centogene, ClinGen, Fulgent, GeneDx, Health in Code, Human Phenotype Ontology, Invitae, PanelApp, Prevention genetics, and Pronto diagnostics). Gene-phenotype associations are categorized into tiers according to categories derived from the original source panel. Pairwise panel similarity was calculated by dividing the number of common genes by the total number of genes in both panels. Regions with extreme guanine-cytosine (GC) content were collected from the Genome in a Bottle stratifications dataset, and putative genomic duplications were retrieved from the University of Santa Cruz database. Overall, 1533 panels, 9759 phenotypes, and 6979 genes were collected. The platform provides an interface to (i) explore and compare collected panels, (ii) find similar panels, (iii) identify genes with high GC content or duplication levels, (iv) generate gene panels by combining panels from various sources, and (v) stratify a generated panel into genes with a strong phenotype association ('core') and those with a weaker association ('extended'). The presented platform represents a unique resource for gene panel exploration and comparison that facilitates the generation of tailored diagnostic panels through a public online web server. Database URL: https://c-gc.shinyapps.io/PANGEN/.

有针对性的基因组测序可将致病基因变异的搜索范围限制在与表型有明确关联的基因上。由于对某些基因的表型关联缺乏共识,导致不同实验室提供的同一基因组的基因组成差异很大,因此基因组的设计极具挑战性。我们开发了 PANGEN 平台,该平台提供了一个基因面板信息的集中资源,能够比较和生成新的智能诊断面板。我们从 12 个公共和商业来源(Blueprint、Cegat、Centogene、ClinGen、Fulgent、GeneDx、Health in Code、Human Phenotype Ontology、Invitae、PanelApp、Prevention genetics 和 Pronto diagnostics)收集了基因与表型的关联。基因与表型的关联根据原始源面板得出的类别分为不同等级。配对面板相似性的计算方法是将共同基因的数量除以两个面板中基因的总数。具有极端鸟嘌呤-胞嘧啶(GC)含量的区域是从 "Genome in a Bottle stratifications "数据集中收集的,假定的基因组重复则是从圣克鲁斯大学数据库中检索的。总共收集了 1533 个面板、9759 种表型和 6979 个基因。该平台提供了一个界面:(i) 探索和比较收集到的面板;(ii) 寻找相似面板;(iii) 识别具有高 GC 含量或重复水平的基因;(iv) 通过组合来自不同来源的面板生成基因面板;(v) 将生成的面板分层为具有强表型关联的基因("核心")和关联较弱的基因("扩展")。所介绍的平台是基因面板探索和比较的独特资源,可通过公共在线网络服务器生成量身定制的诊断面板。数据库网址:https://c-gc.shinyapps.io/PANGEN/。
{"title":"PANGEN: an online platform for the comparison and creation of diagnostic gene panels.","authors":"Ofer Isakov, Dina Marek-Yagel, Rotem Greenberg, Michal Naftali, Shay Ben-Shachar","doi":"10.1093/database/baae065","DOIUrl":"10.1093/database/baae065","url":null,"abstract":"<p><p>Targeted gene panel sequencing is used to limit the search for causative genetic variants solely to genes with an established association with the phenotype. The design of gene panels is challenging due to the lack of consensus regarding phenotypic associations for some genes, which results in high variation in gene composition for the same panel offered by different laboratories. We developed PANGEN, a platform that provides a centralized resource for gene panel information, with the ability to compare and generate new intelligent diagnostic panels. Gene-phenotype associations were collected from 12 public and commercial sources (Blueprint, Cegat, Centogene, ClinGen, Fulgent, GeneDx, Health in Code, Human Phenotype Ontology, Invitae, PanelApp, Prevention genetics, and Pronto diagnostics). Gene-phenotype associations are categorized into tiers according to categories derived from the original source panel. Pairwise panel similarity was calculated by dividing the number of common genes by the total number of genes in both panels. Regions with extreme guanine-cytosine (GC) content were collected from the Genome in a Bottle stratifications dataset, and putative genomic duplications were retrieved from the University of Santa Cruz database. Overall, 1533 panels, 9759 phenotypes, and 6979 genes were collected. The platform provides an interface to (i) explore and compare collected panels, (ii) find similar panels, (iii) identify genes with high GC content or duplication levels, (iv) generate gene panels by combining panels from various sources, and (v) stratify a generated panel into genes with a strong phenotype association ('core') and those with a weaker association ('extended'). The presented platform represents a unique resource for gene panel exploration and comparison that facilitates the generation of tailored diagnostic panels through a public online web server. Database URL: https://c-gc.shinyapps.io/PANGEN/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11265858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141751328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aerial Wildlife Image Repository for animal monitoring with drones in the age of artificial intelligence. 人工智能时代利用无人机进行动物监测的空中野生动物图像库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-23 DOI: 10.1093/database/baae070
Sathishkumar Samiappan, B Santhana Krishnan, Damion Dehart, Landon R Jones, Jared A Elmore, Kristine O Evans, Raymond B Iglay

Drones (unoccupied aircraft systems) have become effective tools for wildlife monitoring and conservation. Automated animal detection and classification using artificial intelligence (AI) can substantially reduce logistical and financial costs and improve drone surveys. However, the lack of annotated animal imagery for training AI is a critical bottleneck in achieving accurate performance of AI algorithms compared to other fields. To bridge this gap for drone imagery and help advance and standardize automated animal classification, we have created the Aerial Wildlife Image Repository (AWIR), which is a dynamic, interactive database with annotated images captured from drone platforms using visible and thermal cameras. The AWIR provides the first open-access repository for users to upload, annotate, and curate images of animals acquired from drones. The AWIR also provides annotated imagery and benchmark datasets that users can download to train AI algorithms to automatically detect and classify animals, and compare algorithm performance. The AWIR contains 6587 animal objects in 1325 visible and thermal drone images of predominantly large birds and mammals of 13 species in open areas of North America. As contributors increase the taxonomic and geographic diversity of available images, the AWIR will open future avenues for AI research to improve animal surveys using drones for conservation applications. Database URL: https://projectportal.gri.msstate.edu/awir/.

无人机(无人驾驶飞机系统)已成为监测和保护野生动物的有效工具。利用人工智能(AI)进行动物自动检测和分类可大大降低后勤和财务成本,改善无人机勘测工作。然而,与其他领域相比,缺乏用于训练人工智能的注释动物图像是实现人工智能算法准确性能的关键瓶颈。为了弥补无人机图像的这一不足,帮助推进动物自动分类并使之标准化,我们创建了空中野生动物图像库(AWIR),这是一个动态的交互式数据库,其中包含使用可见光和热像仪从无人机平台捕获的带注释的图像。AWIR 为用户提供了第一个开放访问的资源库,用于上传、注释和整理从无人机获取的动物图像。AWIR 还提供了带注释的图像和基准数据集,用户可以下载这些数据集来训练人工智能算法,以自动检测和分类动物,并比较算法性能。AWIR 包含 1325 张可见光和热无人机图像中的 6587 个动物对象,主要是北美开阔地区 13 个物种的大型鸟类和哺乳动物。随着贡献者增加可用图像的分类和地理多样性,AWIR 将为人工智能研究开辟未来的途径,以改善使用无人机进行动物调查的保护应用。数据库网址:https://projectportal.gri.msstate.edu/awir/。
{"title":"Aerial Wildlife Image Repository for animal monitoring with drones in the age of artificial intelligence.","authors":"Sathishkumar Samiappan, B Santhana Krishnan, Damion Dehart, Landon R Jones, Jared A Elmore, Kristine O Evans, Raymond B Iglay","doi":"10.1093/database/baae070","DOIUrl":"10.1093/database/baae070","url":null,"abstract":"<p><p>Drones (unoccupied aircraft systems) have become effective tools for wildlife monitoring and conservation. Automated animal detection and classification using artificial intelligence (AI) can substantially reduce logistical and financial costs and improve drone surveys. However, the lack of annotated animal imagery for training AI is a critical bottleneck in achieving accurate performance of AI algorithms compared to other fields. To bridge this gap for drone imagery and help advance and standardize automated animal classification, we have created the Aerial Wildlife Image Repository (AWIR), which is a dynamic, interactive database with annotated images captured from drone platforms using visible and thermal cameras. The AWIR provides the first open-access repository for users to upload, annotate, and curate images of animals acquired from drones. The AWIR also provides annotated imagery and benchmark datasets that users can download to train AI algorithms to automatically detect and classify animals, and compare algorithm performance. The AWIR contains 6587 animal objects in 1325 visible and thermal drone images of predominantly large birds and mammals of 13 species in open areas of North America. As contributors increase the taxonomic and geographic diversity of available images, the AWIR will open future avenues for AI research to improve animal surveys using drones for conservation applications. Database URL: https://projectportal.gri.msstate.edu/awir/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11265857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141751327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PPCRKB: a risk factor knowledge base of postoperative pulmonary complications. PPCRKB:术后肺部并发症风险因素知识库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-19 DOI: 10.1093/database/baae054
Jianchao Duan, Peiyi Li, Aibin Shao, Xuechao Hao, Ruihao Zhou, Cheng Bi, Xingyun Liu, Weimin Li, Huadong Zhu, Guo Chen, Bairong Shen, Tao Zhu

Postoperative pulmonary complications (PPCs) are highly heterogeneous disorders with diverse risk factors frequently occurring after surgical interventions, resulting in significant financial burdens, prolonged hospitalization and elevated mortality rates. Despite the existence of multiple studies on PPCs, a comprehensive knowledge base that can effectively integrate and visualize the diverse risk factors associated with PPCs is currently lacking. This study aims to develop an online knowledge platform on risk factors for PPCs (Postoperative Pulmonary Complications Risk Factor Knowledge Base, PPCRKB) that categorizes and presents the risk and protective factors associated with PPCs, as well as to facilitate the development of individualized prevention and management strategies for PPCs based on the needs of each investigator. The PPCRKB is a novel knowledge base that encompasses all investigated potential risk factors linked to PPCs, offering users a web-based platform to access these risk factors. The PPCRKB contains 2673 entries, 915 risk factors that have been categorized into 11 distinct groups. These categories include habit and behavior, surgical factors, anesthetic factors, auxiliary examination, environmental factors, clinical status, medicines and treatment, demographic characteristics, psychosocial factors, genetic factors and miscellaneous factors. The PPCRKB holds significant value for PPC research. The inclusion of both quantitative and qualitative data in the PPCRKB enhances the ability to uncover new insights and solutions related to PPCs. It could provide clinicians with a more comprehensive perspective on research related to PPCs in future. Database URL: http://sysbio.org.cn/PPCs.

术后肺部并发症(PPCs)是一种高度异质性的疾病,具有多种风险因素,经常发生在外科手术之后,导致严重的经济负担、住院时间延长和死亡率升高。尽管已有多项关于术后并发症的研究,但目前仍缺乏一个能有效整合并直观显示与术后并发症相关的各种风险因素的综合知识库。本研究旨在开发一个关于 PPCs 风险因素的在线知识平台(术后肺部并发症风险因素知识库,PPCRKB),对与 PPCs 相关的风险因素和保护因素进行分类和展示,并根据每位研究者的需求,帮助其制定个性化的 PPCs 预防和管理策略。PPCRKB 是一个新颖的知识库,涵盖了所有已调查的与 PPC 相关的潜在风险因素,为用户提供了一个访问这些风险因素的网络平台。PPCRKB 包含 2673 个条目,915 个风险因素,被分为 11 个不同的组别。这些类别包括习惯和行为、手术因素、麻醉因素、辅助检查、环境因素、临床状态、药物和治疗、人口特征、社会心理因素、遗传因素和其他因素。PPCRKB 对 PPC 研究具有重要价值。将定量和定性数据纳入 PPCRKB 可提高发现与 PPC 相关的新见解和解决方案的能力。该数据库可为临床医生提供更全面的视角,帮助他们在未来开展有关多发性骨髓瘤的研究。数据库网址:http://sysbio.org.cn/PPCs。
{"title":"PPCRKB: a risk factor knowledge base of postoperative pulmonary complications.","authors":"Jianchao Duan, Peiyi Li, Aibin Shao, Xuechao Hao, Ruihao Zhou, Cheng Bi, Xingyun Liu, Weimin Li, Huadong Zhu, Guo Chen, Bairong Shen, Tao Zhu","doi":"10.1093/database/baae054","DOIUrl":"10.1093/database/baae054","url":null,"abstract":"<p><p>Postoperative pulmonary complications (PPCs) are highly heterogeneous disorders with diverse risk factors frequently occurring after surgical interventions, resulting in significant financial burdens, prolonged hospitalization and elevated mortality rates. Despite the existence of multiple studies on PPCs, a comprehensive knowledge base that can effectively integrate and visualize the diverse risk factors associated with PPCs is currently lacking. This study aims to develop an online knowledge platform on risk factors for PPCs (Postoperative Pulmonary Complications Risk Factor Knowledge Base, PPCRKB) that categorizes and presents the risk and protective factors associated with PPCs, as well as to facilitate the development of individualized prevention and management strategies for PPCs based on the needs of each investigator. The PPCRKB is a novel knowledge base that encompasses all investigated potential risk factors linked to PPCs, offering users a web-based platform to access these risk factors. The PPCRKB contains 2673 entries, 915 risk factors that have been categorized into 11 distinct groups. These categories include habit and behavior, surgical factors, anesthetic factors, auxiliary examination, environmental factors, clinical status, medicines and treatment, demographic characteristics, psychosocial factors, genetic factors and miscellaneous factors. The PPCRKB holds significant value for PPC research. The inclusion of both quantitative and qualitative data in the PPCRKB enhances the ability to uncover new insights and solutions related to PPCs. It could provide clinicians with a more comprehensive perspective on research related to PPCs in future. Database URL: http://sysbio.org.cn/PPCs.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11259045/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141726929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Database: The Journal of Biological Databases and Curation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1