Pub Date : 2024-09-06DOI: 10.1093/database/baae089
Morgan N Price, Adam P Arkin
Automated annotations of protein functions are error-prone because of our lack of knowledge of protein functions. For example, it is often impossible to predict the correct substrate for an enzyme or a transporter. Furthermore, much of the knowledge that we do have about the functions of proteins is missing from the underlying databases. We discuss how to use interactive tools to quickly find different kinds of information relevant to a protein's function. Many of these tools are available via PaperBLAST (http://papers.genomics.lbl.gov). Combining these tools often allows us to infer a protein's function. Ideally, accurate annotations would allow us to predict a bacterium's capabilities from its genome sequence, but in practice, this remains challenging. We describe interactive tools that infer potential capabilities from a genome sequence or that search a genome to find proteins that might perform a specific function of interest. Database URL: http://papers.genomics.lbl.gov.
{"title":"Interactive tools for functional annotation of bacterial genomes.","authors":"Morgan N Price, Adam P Arkin","doi":"10.1093/database/baae089","DOIUrl":"10.1093/database/baae089","url":null,"abstract":"<p><p>Automated annotations of protein functions are error-prone because of our lack of knowledge of protein functions. For example, it is often impossible to predict the correct substrate for an enzyme or a transporter. Furthermore, much of the knowledge that we do have about the functions of proteins is missing from the underlying databases. We discuss how to use interactive tools to quickly find different kinds of information relevant to a protein's function. Many of these tools are available via PaperBLAST (http://papers.genomics.lbl.gov). Combining these tools often allows us to infer a protein's function. Ideally, accurate annotations would allow us to predict a bacterium's capabilities from its genome sequence, but in practice, this remains challenging. We describe interactive tools that infer potential capabilities from a genome sequence or that search a genome to find proteins that might perform a specific function of interest. Database URL: http://papers.genomics.lbl.gov.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142143039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1093/database/baae080
Hashim Halim-Fikri, Ninie Nadia Zulkipli, Hafiza Alauddin, Celeste Bento, Carsten W Lederer, Petros Kountouris, Marina Kleanthous, Yetti Hernaningsih, Meow-Keong Thong, Muhammad Hamdi Mahmood, Norafiza Mohd Yasin, Ezalia Esa, Jacques Elion, Domenico Coviello, Raja-Zahratul-Azma Raja-Sabudin, Ghada El-Kamah, John Burn, Narazah Mohd Yusoff, Raj Ramesar, Bin Alwi Zilfalil
Thalassemia is one of the most prevalent monogenic disorders in low- and middle-income countries (LMICs). There are an estimated 270 million carriers of hemoglobinopathies (abnormal hemoglobins and/or thalassemia) worldwide, necessitating global methods and solutions for effective and optimal therapy. LMICs are disproportionately impacted by thalassemia, and due to disparities in genomics awareness and diagnostic resources, certain LMICs lag behind high-income countries (HICs). This spurred the establishment of the Global Globin Network (GGN) in 2015 at UNESCO, Paris, as a project-wide endeavor within the Human Variome Project (HVP). Primarily aimed at enhancing thalassemia clinical services, research, and genomic diagnostic capabilities with a focus on LMIC needs, GGN aims to foster data collection in a shared database by all affected nations, thus improving data sharing and thalassemia management. In this paper, we propose a minimum requirement for establishing a genomic database in thalassemia based on the HVP database guidelines. We suggest using an existing platform recommended by HVP, the Leiden Open Variation Database (LOVD) (https://www.lovd.nl/). Adoption of our proposed criteria will assist in improving or supplementing the existing databases, allowing for better-quality services for individuals with thalassemia. Database URL: https://www.lovd.nl/.
{"title":"Global Globin Network and adopting genomic variant database requirements for thalassemia.","authors":"Hashim Halim-Fikri, Ninie Nadia Zulkipli, Hafiza Alauddin, Celeste Bento, Carsten W Lederer, Petros Kountouris, Marina Kleanthous, Yetti Hernaningsih, Meow-Keong Thong, Muhammad Hamdi Mahmood, Norafiza Mohd Yasin, Ezalia Esa, Jacques Elion, Domenico Coviello, Raja-Zahratul-Azma Raja-Sabudin, Ghada El-Kamah, John Burn, Narazah Mohd Yusoff, Raj Ramesar, Bin Alwi Zilfalil","doi":"10.1093/database/baae080","DOIUrl":"10.1093/database/baae080","url":null,"abstract":"<p><p>Thalassemia is one of the most prevalent monogenic disorders in low- and middle-income countries (LMICs). There are an estimated 270 million carriers of hemoglobinopathies (abnormal hemoglobins and/or thalassemia) worldwide, necessitating global methods and solutions for effective and optimal therapy. LMICs are disproportionately impacted by thalassemia, and due to disparities in genomics awareness and diagnostic resources, certain LMICs lag behind high-income countries (HICs). This spurred the establishment of the Global Globin Network (GGN) in 2015 at UNESCO, Paris, as a project-wide endeavor within the Human Variome Project (HVP). Primarily aimed at enhancing thalassemia clinical services, research, and genomic diagnostic capabilities with a focus on LMIC needs, GGN aims to foster data collection in a shared database by all affected nations, thus improving data sharing and thalassemia management. In this paper, we propose a minimum requirement for establishing a genomic database in thalassemia based on the HVP database guidelines. We suggest using an existing platform recommended by HVP, the Leiden Open Variation Database (LOVD) (https://www.lovd.nl/). Adoption of our proposed criteria will assist in improving or supplementing the existing databases, allowing for better-quality services for individuals with thalassemia. Database URL: https://www.lovd.nl/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142132087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1093/database/baae075
Yishu Xu, Zhenshu Ma, Yajie Wang, Long Zhang, Jiaming Ye, Yuan Chen, Zhengrong Yuan
Copper-induced cell death is a novel mechanism of cell death, which is defined as cuproptosis. The increasing level of copper can produce toxicity in cells and may cause the occurrence of cell death. Several previous studies have proved that cuproptosis has a tight association with various cancers. Thus, the discovery of relationships between cuproptosis-related genes (CRGs) and human cancers is of great importance. Pan-cancer analysis can efficiently help researchers find out the relationship between multiple cancers and target genes precisely and make various prognostic analyses on cancers and cancer patients. Pan-cancer web servers can provide researchers with direct results of pan-cancer prognostic analyses, which can greatly improve the efficiency of their work. However, to date, no web server provides pan-cancer analysis about CRGs. Therefore, we introduce the cuproptosis pan-cancer analysis database (CuPCA), the first database for various analysis results of CRGs through 33 cancer types. CuPCA is a user-friendly resource for cancer researchers to gain various prognostic analyses between cuproptosis and cancers. It provides single CRG pan-cancer analysis, multi-CRGs pan-cancer analysis, multi-CRlncRNA pan-cancer analysis, and mRNA-circRNA-lncRNA conjoint analysis. These analysis results can not only indicate the relationship between cancers and cuproptosis at both gene level and protein level, but also predict the conditions of different cancer patients, which include their clinical condition, survival condition, and their immunological condition. CuPCA procures the delivery of analyzed data to end users, which improves the efficiency of wide research as well as releases the value of data resources. Database URL: http://cupca.cn/.
{"title":"CuPCA: a web server for pan-cancer association analysis of large-scale cuproptosis-related genes.","authors":"Yishu Xu, Zhenshu Ma, Yajie Wang, Long Zhang, Jiaming Ye, Yuan Chen, Zhengrong Yuan","doi":"10.1093/database/baae075","DOIUrl":"10.1093/database/baae075","url":null,"abstract":"<p><p>Copper-induced cell death is a novel mechanism of cell death, which is defined as cuproptosis. The increasing level of copper can produce toxicity in cells and may cause the occurrence of cell death. Several previous studies have proved that cuproptosis has a tight association with various cancers. Thus, the discovery of relationships between cuproptosis-related genes (CRGs) and human cancers is of great importance. Pan-cancer analysis can efficiently help researchers find out the relationship between multiple cancers and target genes precisely and make various prognostic analyses on cancers and cancer patients. Pan-cancer web servers can provide researchers with direct results of pan-cancer prognostic analyses, which can greatly improve the efficiency of their work. However, to date, no web server provides pan-cancer analysis about CRGs. Therefore, we introduce the cuproptosis pan-cancer analysis database (CuPCA), the first database for various analysis results of CRGs through 33 cancer types. CuPCA is a user-friendly resource for cancer researchers to gain various prognostic analyses between cuproptosis and cancers. It provides single CRG pan-cancer analysis, multi-CRGs pan-cancer analysis, multi-CRlncRNA pan-cancer analysis, and mRNA-circRNA-lncRNA conjoint analysis. These analysis results can not only indicate the relationship between cancers and cuproptosis at both gene level and protein level, but also predict the conditions of different cancer patients, which include their clinical condition, survival condition, and their immunological condition. CuPCA procures the delivery of analyzed data to end users, which improves the efficiency of wide research as well as releases the value of data resources. Database URL: http://cupca.cn/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142132086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1093/database/baae077
Qingsong Du, Zhiyu Zhang, Wanyi Yang, Xunyu Zhou, Nan Zhou, Chuanfang Wu, Jinku Bao
The field of understanding the association between genes and diseases is rapidly expanding, making it challenging for researchers to keep up with the influx of new publications and genetic datasets. Fortunately, there are now several regularly updated databases available that focus on cataloging gene-disease relationships. The development of the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system has revolutionized the field of gene editing, providing a highly efficient, accurate, and reliable method for exploring gene-disease associations. However, currently, there is no resource specifically dedicated to collecting and integrating the latest experimentally supported gene-disease association data derived from genome-wide CRISPR screening. To address this gap, we have developed the CRISPR-Based Gene-Disease Associations (CBGDA) database, which includes over 200 manually curated gene-disease association data derived from genome-wide CRISPR screening studies. Through CBGDA, users can explore gene-disease association data derived from genome-wide CRISPR screening, gaining insights into the expression patterns of genes in different diseases, associated chemical data, and variant information. This provides a novel perspective on understanding the associations between genes and diseases. What is more, CBGDA integrates data from several other databases and resources, enhancing its comprehensiveness and utility. In summary, CBGDA offers a fresh perspective and comprehensive insights into the research on gene-disease associations. It fills the gap by providing a dedicated resource for accessing up-to-date, experimentally supported gene-disease association data derived from genome-wide CRISPR screening. Database URL: http://cbgda.zhounan.org/main.
{"title":"CBGDA: a manually curated resource for gene-disease associations based on genome-wide CRISPR.","authors":"Qingsong Du, Zhiyu Zhang, Wanyi Yang, Xunyu Zhou, Nan Zhou, Chuanfang Wu, Jinku Bao","doi":"10.1093/database/baae077","DOIUrl":"10.1093/database/baae077","url":null,"abstract":"<p><p>The field of understanding the association between genes and diseases is rapidly expanding, making it challenging for researchers to keep up with the influx of new publications and genetic datasets. Fortunately, there are now several regularly updated databases available that focus on cataloging gene-disease relationships. The development of the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system has revolutionized the field of gene editing, providing a highly efficient, accurate, and reliable method for exploring gene-disease associations. However, currently, there is no resource specifically dedicated to collecting and integrating the latest experimentally supported gene-disease association data derived from genome-wide CRISPR screening. To address this gap, we have developed the CRISPR-Based Gene-Disease Associations (CBGDA) database, which includes over 200 manually curated gene-disease association data derived from genome-wide CRISPR screening studies. Through CBGDA, users can explore gene-disease association data derived from genome-wide CRISPR screening, gaining insights into the expression patterns of genes in different diseases, associated chemical data, and variant information. This provides a novel perspective on understanding the associations between genes and diseases. What is more, CBGDA integrates data from several other databases and resources, enhancing its comprehensiveness and utility. In summary, CBGDA offers a fresh perspective and comprehensive insights into the research on gene-disease associations. It fills the gap by providing a dedicated resource for accessing up-to-date, experimentally supported gene-disease association data derived from genome-wide CRISPR screening. Database URL: http://cbgda.zhounan.org/main.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142105216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1093/database/baae082
Vincent Lam, Shivam Sharma, John L Spouge, I King Jordan, Leonardo Mariño-Ramírez
The All of Us Research Program ("All of Us") is an initiative led by the National Institutes of Health whose goal is to advance research on personalized medicine and health equity through the collection of genetic, environmental, demographic, and health data from volunteer participants who reside in the USA. The program's emphasis on recruiting a diverse participant cohort makes "All of Us" an effective platform for investigating health disparities. In this work, we analyzed participant electronic health record (EHR) data to identify the diseases and disease categories in the "All of Us" cohort for which racial and ethnic prevalence disparities can be observed. In conjunction with these analyses, we developed the US Health Disparities Browser as an interactive web application that enables users to visualize differences in race- and ethnic-group-specific prevalence estimates for 1755 different diseases: https://usdisparities.biosci.gatech.edu/. The web application features a catalog of all diseases represented in the browser, which can be sorted by overall prevalence as well as the variance in prevalence across racial and ethnic groups. The analyses outlined here provide details on the nature and extent of racial and ethnic health disparities in the "All of Us" participant cohort, and the accompanying browser can serve as a resource through which researchers can explore these disparities Database URL: https://usdisparities.biosci.gatech.edu.
我们所有人 "研究计划("All of Us")是由美国国立卫生研究院(National Institutes of Health)领导的一项计划,其目标是通过收集居住在美国的志愿参与者的基因、环境、人口和健康数据,推动个性化医疗和健康公平方面的研究。该计划强调招募多样化的参与者群体,这使 "我们所有人 "成为调查健康差异的有效平台。在这项工作中,我们分析了参与者的电子健康记录(EHR)数据,以确定 "我们所有人 "队列中可观察到种族和民族流行率差异的疾病和疾病类别。结合这些分析,我们开发了 "美国健康差异浏览器"(US Health Disparities Browser)这一交互式网络应用程序,使用户能够直观地看到 1755 种不同疾病的种族和族裔群体患病率估计值的差异:https://usdisparities.biosci.gatech.edu/。该网络应用程序提供了浏览器中所有疾病的目录,可按总体患病率以及不同种族和族裔群体患病率的差异进行排序。本文概述的分析详细说明了 "我们所有人 "参与者队列中种族和民族健康差异的性质和程度,随附的浏览器可作为研究人员探索这些差异的资源 数据库网址:https://usdisparities.biosci.gatech.edu。
{"title":"Landscape of racial and ethnic health disparities in the All of Us Research Program.","authors":"Vincent Lam, Shivam Sharma, John L Spouge, I King Jordan, Leonardo Mariño-Ramírez","doi":"10.1093/database/baae082","DOIUrl":"10.1093/database/baae082","url":null,"abstract":"<p><p>The All of Us Research Program (\"All of Us\") is an initiative led by the National Institutes of Health whose goal is to advance research on personalized medicine and health equity through the collection of genetic, environmental, demographic, and health data from volunteer participants who reside in the USA. The program's emphasis on recruiting a diverse participant cohort makes \"All of Us\" an effective platform for investigating health disparities. In this work, we analyzed participant electronic health record (EHR) data to identify the diseases and disease categories in the \"All of Us\" cohort for which racial and ethnic prevalence disparities can be observed. In conjunction with these analyses, we developed the US Health Disparities Browser as an interactive web application that enables users to visualize differences in race- and ethnic-group-specific prevalence estimates for 1755 different diseases: https://usdisparities.biosci.gatech.edu/. The web application features a catalog of all diseases represented in the browser, which can be sorted by overall prevalence as well as the variance in prevalence across racial and ethnic groups. The analyses outlined here provide details on the nature and extent of racial and ethnic health disparities in the \"All of Us\" participant cohort, and the accompanying browser can serve as a resource through which researchers can explore these disparities Database URL: https://usdisparities.biosci.gatech.edu.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142105217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transcriptional regulatory networks (TRNs) give a global view of the regulatory mechanisms of bacteria to respond to environmental signals. These networks are published in biological databases as a valuable resource for experimental and bioinformatics researchers. Despite the efforts to publish TRNs of diverse bacteria, many of them still lack one and many of the existing TRNs are incomplete. In addition, the manual extraction of information from biomedical literature ("literature curation") has been the traditional way to extract these networks, despite this being demanding and time-consuming. Recently, language models based on pretrained transformers have been used to extract relevant knowledge from biomedical literature. Moreover, the benefit of fine-tuning a large pretrained model with new limited data for a specific task ("transfer learning") opens roads to address new problems of biomedical information extraction. Here, to alleviate this lack of knowledge and assist literature curation, we present a new approach based on the Bidirectional Transformer for Language Understanding (BERT) architecture to classify transcriptional regulatory interactions of bacteria as a first step to extract TRNs from literature. The approach achieved a significant performance in a test dataset of sentences of Escherichia coli (F1-Score: 0.8685, Matthew's correlation coefficient: 0.8163). The examination of model predictions revealed that the model learned different ways to express the regulatory interaction. The approach was evaluated to extract a TRN of Salmonella using 264 complete articles. The evaluation showed that the approach was able to accurately extract 82% of the network and that it was able to extract interactions absent in curation data. To the best of our knowledge, the present study is the first effort to obtain a BERT-based approach to extract this specific kind of interaction. This approach is a starting point to address the limitations of reconstructing TRNs of bacteria and diseases of biological interest. Database URL: https://github.com/laigen-unam/BERT-trn-extraction.
{"title":"Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach.","authors":"Alfredo Varela-Vega, Ali-Berenice Posada-Reyes, Carlos-Francisco Méndez-Cruz","doi":"10.1093/database/baae094","DOIUrl":"10.1093/database/baae094","url":null,"abstract":"<p><p>Transcriptional regulatory networks (TRNs) give a global view of the regulatory mechanisms of bacteria to respond to environmental signals. These networks are published in biological databases as a valuable resource for experimental and bioinformatics researchers. Despite the efforts to publish TRNs of diverse bacteria, many of them still lack one and many of the existing TRNs are incomplete. In addition, the manual extraction of information from biomedical literature (\"literature curation\") has been the traditional way to extract these networks, despite this being demanding and time-consuming. Recently, language models based on pretrained transformers have been used to extract relevant knowledge from biomedical literature. Moreover, the benefit of fine-tuning a large pretrained model with new limited data for a specific task (\"transfer learning\") opens roads to address new problems of biomedical information extraction. Here, to alleviate this lack of knowledge and assist literature curation, we present a new approach based on the Bidirectional Transformer for Language Understanding (BERT) architecture to classify transcriptional regulatory interactions of bacteria as a first step to extract TRNs from literature. The approach achieved a significant performance in a test dataset of sentences of Escherichia coli (F1-Score: 0.8685, Matthew's correlation coefficient: 0.8163). The examination of model predictions revealed that the model learned different ways to express the regulatory interaction. The approach was evaluated to extract a TRN of Salmonella using 264 complete articles. The evaluation showed that the approach was able to accurately extract 82% of the network and that it was able to extract interactions absent in curation data. To the best of our knowledge, the present study is the first effort to obtain a BERT-based approach to extract this specific kind of interaction. This approach is a starting point to address the limitations of reconstructing TRNs of bacteria and diseases of biological interest. Database URL: https://github.com/laigen-unam/BERT-trn-extraction.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363960/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142105215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1093/database/baae083
TianCheng Xu, Jing Wen, Lei Wang, YueYing Huang, ZiJing Zhu, Qian Zhu, Yi Fang, ChengBiao Yang, YouBing Xia
In acupuncture diagnosis and treatment, non-quantitative clinical descriptions have limited the development of standardized treatment methods. This study explores the effectiveness and the reasons for discrepancies in the entity recognition and classification of meridians in acupuncture indication using the Acupuncture Bidirectional Encoder Representations from Transformers (ACUBERT) model. During the research process, we selected 54 593 different entities from 82 acupuncture medical books as the pretraining corpus for medical literature, conducting classification research on Chinese medical literature using the BERT model. Additionally, we employed the support vector machine and Random Forest models as comparative benchmarks and optimized them through parameter tuning, ultimately leading to the development of the ACUBERT model. The results show that the ACUBERT model outperforms other baseline models in classification effectiveness, achieving the best performance at Epoch = 5. The model's "precision," "recall," and F1 scores reached above 0.8. Moreover, our study has a unique feature: it trains the meridian differentiation model based on the eight principles of differentiation and zang-fu differentiation as foundational labels. It establishes an acupuncture-indication knowledge base (ACU-IKD) and ACUBERT model with traditional Chinese medicine characteristics. In summary, the ACUBERT model significantly enhances the classification effectiveness of meridian attribution in the acupuncture indication database and also demonstrates the classification advantages of deep learning methods based on BERT in multi-category, large-scale training sets. Database URL: http://acuai.njucm.edu.cn:8081/#/user/login?tenantUrl=default.
{"title":"Acupuncture indication knowledge bases: meridian entity recognition and classification based on ACUBERT.","authors":"TianCheng Xu, Jing Wen, Lei Wang, YueYing Huang, ZiJing Zhu, Qian Zhu, Yi Fang, ChengBiao Yang, YouBing Xia","doi":"10.1093/database/baae083","DOIUrl":"10.1093/database/baae083","url":null,"abstract":"<p><p>In acupuncture diagnosis and treatment, non-quantitative clinical descriptions have limited the development of standardized treatment methods. This study explores the effectiveness and the reasons for discrepancies in the entity recognition and classification of meridians in acupuncture indication using the Acupuncture Bidirectional Encoder Representations from Transformers (ACUBERT) model. During the research process, we selected 54 593 different entities from 82 acupuncture medical books as the pretraining corpus for medical literature, conducting classification research on Chinese medical literature using the BERT model. Additionally, we employed the support vector machine and Random Forest models as comparative benchmarks and optimized them through parameter tuning, ultimately leading to the development of the ACUBERT model. The results show that the ACUBERT model outperforms other baseline models in classification effectiveness, achieving the best performance at Epoch = 5. The model's \"precision,\" \"recall,\" and F1 scores reached above 0.8. Moreover, our study has a unique feature: it trains the meridian differentiation model based on the eight principles of differentiation and zang-fu differentiation as foundational labels. It establishes an acupuncture-indication knowledge base (ACU-IKD) and ACUBERT model with traditional Chinese medicine characteristics. In summary, the ACUBERT model significantly enhances the classification effectiveness of meridian attribution in the acupuncture indication database and also demonstrates the classification advantages of deep learning methods based on BERT in multi-category, large-scale training sets. Database URL: http://acuai.njucm.edu.cn:8081/#/user/login?tenantUrl=default.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142105214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1093/database/baae085
Daeahn Cho, Hyang-Mi Lee, Ji Ah Kim, Jae Gwang Song, Su-Hee Hwang, Bomi Lee, Jinsil Park, Kha Mong Tran, Jiwon Kim, Phuong Ngoc Lam Vo, Jooeun Bae, Teerapat Pimt, Kangseok Lee, Jörg Gsponer, Hyung Wook Kim, Dokyun Na
Autoinhibition, a crucial allosteric self-regulation mechanism in cell signaling, ensures signal propagation exclusively in the presence of specific molecular inputs. The heightened focus on autoinhibited proteins stems from their implication in human diseases, positioning them as potential causal factors or therapeutic targets. However, the absence of a comprehensive knowledgebase impedes a thorough understanding of their roles and applications in drug discovery. Addressing this gap, we introduce Autoinhibited Protein Database (AiPD), a curated database standardizing information on autoinhibited proteins. AiPD encompasses details on autoinhibitory domains (AIDs), their targets, regulatory mechanisms, experimental validation methods, and implications in diseases, including associated mutations and post-translational modifications. AiPD comprises 698 AIDs from 532 experimentally characterized autoinhibited proteins and 2695 AIDs from their 2096 homologs, which were retrieved from 864 published articles. AiPD also includes 42 520 AIDs of computationally predicted autoinhibited proteins. In addition, AiPD facilitates users in investigating potential AIDs within a query sequence through comparisons with documented autoinhibited proteins. As the inaugural autoinhibited protein repository, AiPD significantly aids researchers studying autoinhibition mechanisms and their alterations in human diseases. It is equally valuable for developing computational models, analyzing allosteric protein regulation, predicting new drug targets, and understanding intervention mechanisms AiPD serves as a valuable resource for diverse researchers, contributing to the understanding and manipulation of autoinhibition in cellular processes. Database URL: http://ssbio.cau.ac.kr/databases/AiPD.
{"title":"Autoinhibited Protein Database: a curated database of autoinhibitory domains and their autoinhibition mechanisms.","authors":"Daeahn Cho, Hyang-Mi Lee, Ji Ah Kim, Jae Gwang Song, Su-Hee Hwang, Bomi Lee, Jinsil Park, Kha Mong Tran, Jiwon Kim, Phuong Ngoc Lam Vo, Jooeun Bae, Teerapat Pimt, Kangseok Lee, Jörg Gsponer, Hyung Wook Kim, Dokyun Na","doi":"10.1093/database/baae085","DOIUrl":"10.1093/database/baae085","url":null,"abstract":"<p><p>Autoinhibition, a crucial allosteric self-regulation mechanism in cell signaling, ensures signal propagation exclusively in the presence of specific molecular inputs. The heightened focus on autoinhibited proteins stems from their implication in human diseases, positioning them as potential causal factors or therapeutic targets. However, the absence of a comprehensive knowledgebase impedes a thorough understanding of their roles and applications in drug discovery. Addressing this gap, we introduce Autoinhibited Protein Database (AiPD), a curated database standardizing information on autoinhibited proteins. AiPD encompasses details on autoinhibitory domains (AIDs), their targets, regulatory mechanisms, experimental validation methods, and implications in diseases, including associated mutations and post-translational modifications. AiPD comprises 698 AIDs from 532 experimentally characterized autoinhibited proteins and 2695 AIDs from their 2096 homologs, which were retrieved from 864 published articles. AiPD also includes 42 520 AIDs of computationally predicted autoinhibited proteins. In addition, AiPD facilitates users in investigating potential AIDs within a query sequence through comparisons with documented autoinhibited proteins. As the inaugural autoinhibited protein repository, AiPD significantly aids researchers studying autoinhibition mechanisms and their alterations in human diseases. It is equally valuable for developing computational models, analyzing allosteric protein regulation, predicting new drug targets, and understanding intervention mechanisms AiPD serves as a valuable resource for diverse researchers, contributing to the understanding and manipulation of autoinhibition in cellular processes. Database URL: http://ssbio.cau.ac.kr/databases/AiPD.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349611/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142079544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1093/database/baae084
Jana Batovska, Natasha D Brohier, Peter T Mee, Fiona E Constable, Brendan C Rodoni, Stacey E Lynch
The Australian Biosecurity Genomic Database (ABGD) is a curated collection of reference viral genome sequences based on the Australian National Notifiable Disease List of Terrestrial Animals. It was created to facilitate the screening of high-throughput sequencing (HTS) data for the potential presence of viruses associated with notifiable disease. The database includes a single verified sequence (the exemplar species sequence, where relevant) for each of the 60 virus species across 21 viral families that are associated with or cause these notifiable diseases, as recognized by the World Organisation for Animal Health. The open-source ABGD on GitHub provides usage guidance documents and is intended to support building a culture in Australian HTS communities that promotes the use of quality-assured, standardized, and verified databases for Australia's national biosecurity interests. Future expansion of the database will include the addition of more strains or subtypes for highly variable viruses, viruses causing diseases of aquatic animals, and genomes of other types of pathogens associated with notifiable diseases, such as bacteria. Database URL: https://github.com/ausbiopathgenDB/AustralianBiosecurityGenomicDatabase.
{"title":"The Australian Biosecurity Genomic Database: a new resource for high-throughput sequencing analysis based on the National Notifiable Disease List of Terrestrial Animals.","authors":"Jana Batovska, Natasha D Brohier, Peter T Mee, Fiona E Constable, Brendan C Rodoni, Stacey E Lynch","doi":"10.1093/database/baae084","DOIUrl":"10.1093/database/baae084","url":null,"abstract":"<p><p>The Australian Biosecurity Genomic Database (ABGD) is a curated collection of reference viral genome sequences based on the Australian National Notifiable Disease List of Terrestrial Animals. It was created to facilitate the screening of high-throughput sequencing (HTS) data for the potential presence of viruses associated with notifiable disease. The database includes a single verified sequence (the exemplar species sequence, where relevant) for each of the 60 virus species across 21 viral families that are associated with or cause these notifiable diseases, as recognized by the World Organisation for Animal Health. The open-source ABGD on GitHub provides usage guidance documents and is intended to support building a culture in Australian HTS communities that promotes the use of quality-assured, standardized, and verified databases for Australia's national biosecurity interests. Future expansion of the database will include the addition of more strains or subtypes for highly variable viruses, viruses causing diseases of aquatic animals, and genomes of other types of pathogens associated with notifiable diseases, such as bacteria. Database URL: https://github.com/ausbiopathgenDB/AustralianBiosecurityGenomicDatabase.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11352597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1093/database/baae086
YuMin M Loh, Matthew P Su, Kayla G Haruni, Azusa Kamikouchi
Acoustic communication plays an important role during the courtship of many mosquito species. Male mosquitoes show strong attraction to female wing beat frequencies, mediated via spectral matching between female wing beat frequency and male ear mechanical tuning frequency. Such acoustic communication typically occurs within swarms, male-dominated aggregations with species-specific properties. Despite hundreds of relevant publications being available, the lack of a central platform hosting all associated data hinders research efforts and limits cross-species comparisons. Here, we introduce MACSFeD (Mosquito Acoustic Communication and Swarming Features Database), an interactive platform for the exploration of our comprehensive database containing 251 unique reports focusing on different aspects of mosquito acoustic communication, including hearing function, wing beat frequency and phonotaxis, as well as male swarming parameters. MACSFeD serves as an easily accessible, efficient, and robust data visualization tool for mosquito acoustic communication research. We envision that further in-depth studies could arise following the use of this new platform. Database URL: https://minmatt.shinyapps.io/MACSFeD/.
{"title":"MACSFeD-a database of mosquito acoustic communication and swarming features.","authors":"YuMin M Loh, Matthew P Su, Kayla G Haruni, Azusa Kamikouchi","doi":"10.1093/database/baae086","DOIUrl":"10.1093/database/baae086","url":null,"abstract":"<p><p>Acoustic communication plays an important role during the courtship of many mosquito species. Male mosquitoes show strong attraction to female wing beat frequencies, mediated via spectral matching between female wing beat frequency and male ear mechanical tuning frequency. Such acoustic communication typically occurs within swarms, male-dominated aggregations with species-specific properties. Despite hundreds of relevant publications being available, the lack of a central platform hosting all associated data hinders research efforts and limits cross-species comparisons. Here, we introduce MACSFeD (Mosquito Acoustic Communication and Swarming Features Database), an interactive platform for the exploration of our comprehensive database containing 251 unique reports focusing on different aspects of mosquito acoustic communication, including hearing function, wing beat frequency and phonotaxis, as well as male swarming parameters. MACSFeD serves as an easily accessible, efficient, and robust data visualization tool for mosquito acoustic communication research. We envision that further in-depth studies could arise following the use of this new platform. Database URL: https://minmatt.shinyapps.io/MACSFeD/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11352598/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}