Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21015
Pierre Larmande, Yusha Liu, Xinzhi Yao, Jingbo Xia
Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.
{"title":"OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition.","authors":"Pierre Larmande, Yusha Liu, Xinzhi Yao, Jingbo Xia","doi":"10.5808/gi.21015","DOIUrl":"https://doi.org/10.5808/gi.21015","url":null,"abstract":"<p><p>Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e27"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21008
Márcia Barros, Pedro Ruas, Diana Sousa, Ali Haider Bangash, Francisco M Couto
Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).
{"title":"COVID-19 recommender system based on an annotated multilingual corpus.","authors":"Márcia Barros, Pedro Ruas, Diana Sousa, Ali Haider Bangash, Francisco M Couto","doi":"10.5808/gi.21008","DOIUrl":"https://doi.org/10.5808/gi.21008","url":null,"abstract":"<p><p>Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e24"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leptospirosis is a zoonotic disease caused by spirochetes from the genus Leptospira. In Thailand, Leptospira interrogans is a major cause of leptospirosis. Leptospirosis patients present with a wide range of clinical manifestations from asymptomatic, mild infections to severe illness involving organ failure. For better understanding the difference between Leptospira isolates causing mild and severe leptospirosis, illumina sequencing was used to sequence genomic DNA in both serotypes. DNA of Leptospira isolated from two patients, one with mild and another with severe symptoms, were included in this study. The paired-end reads were removed adapters and trimmed with Q30 score using Trimmomatic. Trimmed reads were constructed to contigs and scaffolds using SPAdes. Cross-contamination of scaffolds was evaluated by ContEst16s. Prokka tool for bacterial annotation was used to annotate sequences from both Leptospira isolates. Predicted amino acid sequences from Prokka were searched in EggNOG and David gene ontology database to characterize gene ontology. In addition, Leptospira from mild and severe patients, that passed the criteria e-value < 10e-5 from blastP against virulence factor database, were used to analyze with Venn diagram. From this study, we found 13 and 12 genes that were unique in the isolates from mild and severe patients, respectively. The 12 genes in the severe isolate might be virulence factor genes that affect disease severity. However, these genes should be validated in further study.
{"title":"Comparative genome characterization of Leptospira interrogans from mild and severe leptospirosis patients.","authors":"Songtham Anuntakarun, Vorthon Sawaswong, Rungrat Jitvaropas, Kesmanee Praianantathavorn, Witthaya Poomipak, Yupin Suputtamongkol, Chintana Chirathaworn, Sunchai Payungporn","doi":"10.5808/gi.21037","DOIUrl":"https://doi.org/10.5808/gi.21037","url":null,"abstract":"<p><p>Leptospirosis is a zoonotic disease caused by spirochetes from the genus Leptospira. In Thailand, Leptospira interrogans is a major cause of leptospirosis. Leptospirosis patients present with a wide range of clinical manifestations from asymptomatic, mild infections to severe illness involving organ failure. For better understanding the difference between Leptospira isolates causing mild and severe leptospirosis, illumina sequencing was used to sequence genomic DNA in both serotypes. DNA of Leptospira isolated from two patients, one with mild and another with severe symptoms, were included in this study. The paired-end reads were removed adapters and trimmed with Q30 score using Trimmomatic. Trimmed reads were constructed to contigs and scaffolds using SPAdes. Cross-contamination of scaffolds was evaluated by ContEst16s. Prokka tool for bacterial annotation was used to annotate sequences from both Leptospira isolates. Predicted amino acid sequences from Prokka were searched in EggNOG and David gene ontology database to characterize gene ontology. In addition, Leptospira from mild and severe patients, that passed the criteria e-value < 10e-5 from blastP against virulence factor database, were used to analyze with Venn diagram. From this study, we found 13 and 12 genes that were unique in the isolates from mild and severe patients, respectively. The 12 genes in the severe isolate might be virulence factor genes that affect disease severity. However, these genes should be validated in further study.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e31"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21034
Yedukondalu Kollati, Radha Rama Devi Akella, Shaik Mohammad Naushad, Rajesh K Patel, G Bhanuprakash Reddy, Vijaya R Dirisala
In our previous studies, we have demonstrated the association of certain variants of the thyroid-stimulating hormone receptor (TSHR), thyroid peroxidase (TPO), and thyroglobulin (TG) genes with congenital hypothyroidism. Herein, we explored the mechanistic basis for this association using different in silico tools. The mRNA 3'-untranslated region (3'-UTR) plays key roles in gene expression at the post-transcriptional level. In TSHR variants (rs2268477, rs7144481, and rs17630128), the binding affinity of microRNAs (miRs) (hsa-miR-154-5p, hsa-miR-376a-2-5p, hsa-miR-3935, hsa-miR-4280, and hsa-miR-6858-3p) to the 3'-UTR is disrupted, affecting post-transcriptional gene regulation. TPO and TG are the two key proteins necessary for the biosynthesis of thyroid hormones in the presence of iodide and H2O2. Reduced stability of these proteins leads to aberrant biosynthesis of thyroid hormones. Compared to the wild-type TPO protein, the p.S398T variant was found to exhibit less stability and significant rearrangements of intra-atomic bonds affecting the stoichiometry and substrate binding (binding energies, ΔG of wild-type vs. mutant: ‒15 vs. ‒13.8 kcal/mol; and dissociation constant, Kd of wild-type vs. mutant: 7.2E-12 vs. 7.0E-11 M). The missense mutations p.G653D and p.R1999W on the TG protein showed altered ΔG (0.24 kcal/mol and 0.79 kcal/mol, respectively). In conclusion, an in silico analysis of TSHR genetic variants in the 3'-UTR showed that they alter the binding affinities of different miRs. The TPO protein structure and mutant protein complex (p.S398T) are less stable, with potentially deleterious effects. A structural and energy analysis showed that TG mutations (p.G653D and p.R1999W) reduce the stability of the TG protein and affect its structure-functional relationship.
在我们之前的研究中,我们已经证明了促甲状腺激素受体(TSHR)、甲状腺过氧化物酶(TPO)和甲状腺球蛋白(TG)基因的某些变异与先天性甲状腺功能减退症的关联。在这里,我们使用不同的硅工具探索了这种关联的机制基础。mRNA 3'-非翻译区(3'-UTR)在转录后水平的基因表达中起着关键作用。在TSHR变体(rs2268477、rs7144481和rs17630128)中,microRNAs (mir) (hsa-miR-154-5p、hsa-miR-376a-2-5p、hsa-miR-3935、hsa-miR-4280和hsa-miR-6858-3p)与3'-UTR的结合亲和力被破坏,影响转录后基因调控。TPO和TG是碘化物和H2O2存在下甲状腺激素生物合成所必需的两个关键蛋白。这些蛋白质稳定性的降低导致甲状腺激素的异常生物合成。与野生型TPO蛋白相比,发现p.S398T变体表现出较低的稳定性和显著的原子内键重排,影响化学统计和底物结合(结合能,ΔG野生型与突变型:-15 vs -13.8 kcal/mol;野生型和突变型的解离常数Kd分别为7.2E-12和7.0E-11 M)。TG蛋白上的p.G653D和p.R1999W错配突变发生了改变ΔG(分别为0.24 kcal/mol和0.79 kcal/mol)。总之,对3'-UTR中TSHR遗传变异的计算机分析表明,它们改变了不同mir的结合亲和力。TPO蛋白结构和突变蛋白复合物(p.S398T)不太稳定,具有潜在的有害作用。结构和能量分析表明,TG突变(p.G653D和p.R1999W)降低了TG蛋白的稳定性,并影响了其结构-功能关系。
{"title":"Molecular insights into the role of genetic determinants of congenital hypothyroidism.","authors":"Yedukondalu Kollati, Radha Rama Devi Akella, Shaik Mohammad Naushad, Rajesh K Patel, G Bhanuprakash Reddy, Vijaya R Dirisala","doi":"10.5808/gi.21034","DOIUrl":"https://doi.org/10.5808/gi.21034","url":null,"abstract":"<p><p>In our previous studies, we have demonstrated the association of certain variants of the thyroid-stimulating hormone receptor (TSHR), thyroid peroxidase (TPO), and thyroglobulin (TG) genes with congenital hypothyroidism. Herein, we explored the mechanistic basis for this association using different in silico tools. The mRNA 3'-untranslated region (3'-UTR) plays key roles in gene expression at the post-transcriptional level. In TSHR variants (rs2268477, rs7144481, and rs17630128), the binding affinity of microRNAs (miRs) (hsa-miR-154-5p, hsa-miR-376a-2-5p, hsa-miR-3935, hsa-miR-4280, and hsa-miR-6858-3p) to the 3'-UTR is disrupted, affecting post-transcriptional gene regulation. TPO and TG are the two key proteins necessary for the biosynthesis of thyroid hormones in the presence of iodide and H2O2. Reduced stability of these proteins leads to aberrant biosynthesis of thyroid hormones. Compared to the wild-type TPO protein, the p.S398T variant was found to exhibit less stability and significant rearrangements of intra-atomic bonds affecting the stoichiometry and substrate binding (binding energies, ΔG of wild-type vs. mutant: ‒15 vs. ‒13.8 kcal/mol; and dissociation constant, Kd of wild-type vs. mutant: 7.2E-12 vs. 7.0E-11 M). The missense mutations p.G653D and p.R1999W on the TG protein showed altered ΔG (0.24 kcal/mol and 0.79 kcal/mol, respectively). In conclusion, an in silico analysis of TSHR genetic variants in the 3'-UTR showed that they alter the binding affinities of different miRs. The TPO protein structure and mutant protein complex (p.S398T) are less stable, with potentially deleterious effects. A structural and energy analysis showed that TG mutations (p.G653D and p.R1999W) reduce the stability of the TG protein and affect its structure-functional relationship.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e29"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21018
Oscar Lithgow-Serrano, Joseph Cornelius, Vani Kanjirangat, Carlos-Francisco Méndez-Cruz, Fabio Rinaldi
Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) Clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.
{"title":"Improving classification of low-resource COVID-19 literature by using Named Entity Recognition.","authors":"Oscar Lithgow-Serrano, Joseph Cornelius, Vani Kanjirangat, Carlos-Francisco Méndez-Cruz, Fabio Rinaldi","doi":"10.5808/gi.21018","DOIUrl":"https://doi.org/10.5808/gi.21018","url":null,"abstract":"<p><p>Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) Clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e22"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510872/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-01Epub Date: 2021-06-30DOI: 10.5808/gi.21019
A M U B Mahfuz, A M Zubair-Bin-Mahfuj, Dibya Joti Podder
Even in the current age of advanced medicine, the prognosis of malignant peritoneal mesothelioma (MPM) remains abysmal. Molecular mechanisms responsible for the initiation and progression of MPM are still largely not understood. Adopting an integrated bioinformatics approach, this study aims to identify the key genes and pathways responsible for MPM. Genes that are differentially expressed in MPM in comparison with the peritoneum of healthy controls have been identified by analyzing a microarray gene expression dataset. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses of these differentially expressed genes (DEG) were conducted to gain a better insight. A protein-protein interaction (PPI) network of the proteins encoded by the DEGs was constructed using STRING and hub genes were detected analyzing this network. Next, the transcription factors and miRNAs that have possible regulatory roles on the hub genes were detected. Finally, survival analyses based on the hub genes were conducted using the GEPIA2 web server. Six hundred six genes were found to be differentially expressed in MPM; 133 are upregulated and 473 are downregulated. Analyzing the STRING generated PPI network, six dense modules and 12 hub genes were identified. Fifteen transcription factors and 10 miRNAs were identified to have the most extensive regulatory functions on the DEGs. Through bioinformatics analyses, this work provides an insight into the potential genes and pathways involved in MPM.
{"title":"A network-biology approach for identification of key genes and pathways involved in malignant peritoneal mesothelioma.","authors":"A M U B Mahfuz, A M Zubair-Bin-Mahfuj, Dibya Joti Podder","doi":"10.5808/gi.21019","DOIUrl":"10.5808/gi.21019","url":null,"abstract":"<p><p>Even in the current age of advanced medicine, the prognosis of malignant peritoneal mesothelioma (MPM) remains abysmal. Molecular mechanisms responsible for the initiation and progression of MPM are still largely not understood. Adopting an integrated bioinformatics approach, this study aims to identify the key genes and pathways responsible for MPM. Genes that are differentially expressed in MPM in comparison with the peritoneum of healthy controls have been identified by analyzing a microarray gene expression dataset. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses of these differentially expressed genes (DEG) were conducted to gain a better insight. A protein-protein interaction (PPI) network of the proteins encoded by the DEGs was constructed using STRING and hub genes were detected analyzing this network. Next, the transcription factors and miRNAs that have possible regulatory roles on the hub genes were detected. Finally, survival analyses based on the hub genes were conducted using the GEPIA2 web server. Six hundred six genes were found to be differentially expressed in MPM; 133 are upregulated and 473 are downregulated. Analyzing the STRING generated PPI network, six dense modules and 12 hub genes were identified. Fifteen transcription factors and 10 miRNAs were identified to have the most extensive regulatory functions on the DEGs. Through bioinformatics analyses, this work provides an insight into the potential genes and pathways involved in MPM.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e16"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39183233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-01Epub Date: 2021-06-30DOI: 10.5808/gi.21024
Neha Samir Roy, Yong-Wook Ban, Hana Yoo, Rahul Vasudeo Ramekar, Eun Ju Cheong, Nam-Il Park, Jong Kuk Na, Kyong-Cheul Park, Ik-Young Choi
Plant height is an important component of plant architecture and significantly affects crop breeding practices and yield. We studied DNA variations derived from F5 recombinant inbred lines (RILs) with 96.8% homozygous genotypes. Here, we report DNA variations between the normal and dwarf members of four lines harvested from a single seed parent in an F6 RIL population derived from a cross between Glycine max var. Peking and Glycine soja IT182936. Whole genome sequencing was carried out, and the DNA variations in the whole genome were compared between the normal and dwarf samples. We found a large number of DNA variations in both the dwarf and semi-dwarf lines, with one single nucleotide polymorphism (SNP) per at least 3.68 kb in the dwarf lines and 1 SNP per 11.13 kb of the whole genome. This value is 2.18 times higher than the expected DNA variation in the F6 population. A total of 186 SNPs and 241 SNPs were discovered in the coding regions of the dwarf lines 1282 and 1303, respectively, and we discovered 33 homogeneous nonsynonymous SNPs that occurred at the same loci in each set of dwarf and normal soybean. Of them, five SNPs were in the same positions between lines 1282 and 1303. Our results provide important information for improving our understanding of the genetics of soybean plant height and crop breeding. These polymorphisms could be useful genetic resources for plant breeders, geneticists, and biologists for future molecular biology and breeding projects.
株高是植物构型的重要组成部分,对作物育种和产量有重要影响。我们研究了96.8%纯合子基因型的F5重组自交系(RILs)的DNA变异。在这里,我们报告了从一个单一亲本中收获的4个品系的正常成员和矮秆成员之间的DNA变异,这些品系是由甘氨酸max var. Peking和甘氨酸大豆IT182936杂交而来的F6 RIL群体。进行全基因组测序,比较正常和矮秆样品的全基因组DNA变异。矮秆系和半矮秆系中均存在大量的DNA变异,矮秆系中至少每3.68 kb存在1个单核苷酸多态性(SNP),全基因组中每11.13 kb存在1个SNP。这个值比F6群体中预期的DNA变异高2.18倍。在矮化系1282和矮化系1303的编码区分别发现186个和241个单核苷酸多态性,在矮化系和正常系中各发现33个同源非同义单核苷酸多态性出现在同一位点。其中,有5个snp位于1282 ~ 1303行之间的相同位置。本研究结果为进一步认识大豆株高遗传和作物育种提供了重要信息。这些多态性可以为植物育种家、遗传学家和生物学家在未来的分子生物学和育种项目中提供有用的遗传资源。
{"title":"Analysis of genome variants in dwarf soybean lines obtained in F6 derived from cross of normal parents (cultivated and wild soybean).","authors":"Neha Samir Roy, Yong-Wook Ban, Hana Yoo, Rahul Vasudeo Ramekar, Eun Ju Cheong, Nam-Il Park, Jong Kuk Na, Kyong-Cheul Park, Ik-Young Choi","doi":"10.5808/gi.21024","DOIUrl":"https://doi.org/10.5808/gi.21024","url":null,"abstract":"<p><p>Plant height is an important component of plant architecture and significantly affects crop breeding practices and yield. We studied DNA variations derived from F5 recombinant inbred lines (RILs) with 96.8% homozygous genotypes. Here, we report DNA variations between the normal and dwarf members of four lines harvested from a single seed parent in an F6 RIL population derived from a cross between Glycine max var. Peking and Glycine soja IT182936. Whole genome sequencing was carried out, and the DNA variations in the whole genome were compared between the normal and dwarf samples. We found a large number of DNA variations in both the dwarf and semi-dwarf lines, with one single nucleotide polymorphism (SNP) per at least 3.68 kb in the dwarf lines and 1 SNP per 11.13 kb of the whole genome. This value is 2.18 times higher than the expected DNA variation in the F6 population. A total of 186 SNPs and 241 SNPs were discovered in the coding regions of the dwarf lines 1282 and 1303, respectively, and we discovered 33 homogeneous nonsynonymous SNPs that occurred at the same loci in each set of dwarf and normal soybean. Of them, five SNPs were in the same positions between lines 1282 and 1303. Our results provide important information for improving our understanding of the genetics of soybean plant height and crop breeding. These polymorphisms could be useful genetic resources for plant breeders, geneticists, and biologists for future molecular biology and breeding projects.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e19"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261272/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39183605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-01Epub Date: 2021-06-30DOI: 10.5808/gi.20078
Dhammapal Bharne
Coronavirus disease 2019 (COVID-19) is an on-going pandemic disease infecting millions of people across the globe. Recent reports of reduction in antibody levels and the re-emergence of the disease in recovered patients necessitated the understanding of the pandemic at the core level. The cases of multiple organ failures emphasized the consideration of different organ systems while managing the disease. The present study employed RNA sequencing data to determine the disease associated differentially regulated genes and their related protein interactions in several organ systems. It signified the importance of early diagnosis and treatment of the disease. A map of protein interactions of multiple organ systems was built and uncovered CAV1 and CTNNB1 as the top degree nodes. A core interactions sub-network was analyzed to identify different modules of functional significance. AR, CTNNB1, CAV1, and PIK3R1 proteins were unfolded as bridging nodes interconnecting different modules for the information flow across several pathways. The present study also highlighted some of the druggable targets to analyze in drug re-purposing strategies against the COVID-19 pandemic. Therefore, the protein interactions map and the modular interactions of the differentially regulated genes in the multiple organ systems would incline the scientists and researchers to investigate in novel therapeutics for the COVID-19 pandemic expeditiously.
{"title":"A protein interactions map of multiple organ systems associated with COVID-19 disease.","authors":"Dhammapal Bharne","doi":"10.5808/gi.20078","DOIUrl":"https://doi.org/10.5808/gi.20078","url":null,"abstract":"<p><p>Coronavirus disease 2019 (COVID-19) is an on-going pandemic disease infecting millions of people across the globe. Recent reports of reduction in antibody levels and the re-emergence of the disease in recovered patients necessitated the understanding of the pandemic at the core level. The cases of multiple organ failures emphasized the consideration of different organ systems while managing the disease. The present study employed RNA sequencing data to determine the disease associated differentially regulated genes and their related protein interactions in several organ systems. It signified the importance of early diagnosis and treatment of the disease. A map of protein interactions of multiple organ systems was built and uncovered CAV1 and CTNNB1 as the top degree nodes. A core interactions sub-network was analyzed to identify different modules of functional significance. AR, CTNNB1, CAV1, and PIK3R1 proteins were unfolded as bridging nodes interconnecting different modules for the information flow across several pathways. The present study also highlighted some of the druggable targets to analyze in drug re-purposing strategies against the COVID-19 pandemic. Therefore, the protein interactions map and the modular interactions of the differentially regulated genes in the multiple organ systems would incline the scientists and researchers to investigate in novel therapeutics for the COVID-19 pandemic expeditiously.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e14"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39183231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-01Epub Date: 2021-06-30DOI: 10.5808/gi.19.2.e1
Taesung Park
In this issue, there are six original articles and one mini review. The first article by by Sohag et al. (Jagannath University, Bangladesh) provides a short review on omics approaches to cardiovascular diseases (CVDs). The author summarizes the genomics, proteomics, transcriptomics, and metabolomics in CVDs with a well-organized prospect. The first original article is about a protein interactions map of multiple organ systems associated with coronavirus disease 2019 (COVID-19) disease by Dr. Bharne (University of Hyderabad, India). This study appears to be motivated by reports that reduced antibody levels and disease recurrence in recovered COVID-19 patients require understanding of the epidemic at a key level. Multiple organ failure cases in patients with COVID-19 have highlighted consideration for other organ systems. This study used RNA sequencing data to determine disease-associated differentially regulated genes and related protein interactions in multiple organ systems, which implies the importance of early diagnosis and treatment of the disease. RNA sequencing data were obtained from autopsy specimens of lung, heart, jejunum, liver, kidney, intestine, bone marrow, adipose, placenta, and skin from 24 patients who died of COVID-19 infection. The total number of samples in the sequencing data was 88, including five negative control samples. Using significantly expressed genes in different organ systems, protein interactions of multiple organ systems were then mapped, revealing CAV1 and CTNNB1 as top nodes. A core interactions sub-network was analyzed to identify several functionally important modules such as AR, CTNNB1, CAV1 and PIK3R1 proteins. In addition, this study highlighted some of the druggable targets to analyze in drug re-purposing strategies against the COVID-19 pandemic. I think the protein interaction maps and modular interactions of differentially regulated genes in multi-organ systems would provide the clues to researchers to rapidly investigate novel therapeutics for the COVID-19 pandemic. The second article by Sohpal (Beant College of Engineering & Technology, India) performed a comparative study of coronaviruses including severe acute respiratory syndrome coronavirus 2, severe acute respiratory syndrome coronavirus, and Middle East respiratory syndrome coronavirus focusing on non-synonymous and synonymous substitutions Through simulation studies, nucleotide sequence of closely related strains of respiratory syndrome viruses, codon-by-codon with maximum likelihood analysis, z selection and the divergence time were investigated. The third article by Mahfuz et al. (University of Development Alternative, Bangladesh) presented a network-biology approach for identification of key genes and pathways involved in malignant peritoneal mesothelioma (MPM). To understand the molecular mechanisms responsible for the initiation and progression of MPM, this study aims to identify the key genes and pathways responsible for MPM. Several bioin
{"title":"Editor's introduction to this issue (G&I 19:2, 2021).","authors":"Taesung Park","doi":"10.5808/gi.19.2.e1","DOIUrl":"https://doi.org/10.5808/gi.19.2.e1","url":null,"abstract":"In this issue, there are six original articles and one mini review. The first article by by Sohag et al. (Jagannath University, Bangladesh) provides a short review on omics approaches to cardiovascular diseases (CVDs). The author summarizes the genomics, proteomics, transcriptomics, and metabolomics in CVDs with a well-organized prospect. The first original article is about a protein interactions map of multiple organ systems associated with coronavirus disease 2019 (COVID-19) disease by Dr. Bharne (University of Hyderabad, India). This study appears to be motivated by reports that reduced antibody levels and disease recurrence in recovered COVID-19 patients require understanding of the epidemic at a key level. Multiple organ failure cases in patients with COVID-19 have highlighted consideration for other organ systems. This study used RNA sequencing data to determine disease-associated differentially regulated genes and related protein interactions in multiple organ systems, which implies the importance of early diagnosis and treatment of the disease. RNA sequencing data were obtained from autopsy specimens of lung, heart, jejunum, liver, kidney, intestine, bone marrow, adipose, placenta, and skin from 24 patients who died of COVID-19 infection. The total number of samples in the sequencing data was 88, including five negative control samples. Using significantly expressed genes in different organ systems, protein interactions of multiple organ systems were then mapped, revealing CAV1 and CTNNB1 as top nodes. A core interactions sub-network was analyzed to identify several functionally important modules such as AR, CTNNB1, CAV1 and PIK3R1 proteins. In addition, this study highlighted some of the druggable targets to analyze in drug re-purposing strategies against the COVID-19 pandemic. I think the protein interaction maps and modular interactions of differentially regulated genes in multi-organ systems would provide the clues to researchers to rapidly investigate novel therapeutics for the COVID-19 pandemic. The second article by Sohpal (Beant College of Engineering & Technology, India) performed a comparative study of coronaviruses including severe acute respiratory syndrome coronavirus 2, severe acute respiratory syndrome coronavirus, and Middle East respiratory syndrome coronavirus focusing on non-synonymous and synonymous substitutions Through simulation studies, nucleotide sequence of closely related strains of respiratory syndrome viruses, codon-by-codon with maximum likelihood analysis, z selection and the divergence time were investigated. The third article by Mahfuz et al. (University of Development Alternative, Bangladesh) presented a network-biology approach for identification of key genes and pathways involved in malignant peritoneal mesothelioma (MPM). To understand the molecular mechanisms responsible for the initiation and progression of MPM, this study aims to identify the key genes and pathways responsible for MPM. Several bioin","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e12"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39145806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-01Epub Date: 2021-06-30DOI: 10.5808/gi.21033
S June Oh
G protein-coupled receptors (GPCRs), including olfactory receptors, account for the largest group of genes in the human genome and occupy a very important position in signaling systems. Although olfactory receptors, which belong to the broader category of GPCRs, play an important role in monitoring the organism's surroundings, their actual three-dimensional structure has not yet been determined. Therefore, the specific details of the molecular interactions between the receptor and the ligand remain unclear. In this report, the interactions between human olfactory receptor 1A1 and its odorant molecules were simulated using computational methods, and we explored how the chemically simple odorant molecules activate the olfactory receptor.
{"title":"Implications of the simple chemical structure of the odorant molecules interacting with the olfactory receptor 1A1.","authors":"S June Oh","doi":"10.5808/gi.21033","DOIUrl":"https://doi.org/10.5808/gi.21033","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs), including olfactory receptors, account for the largest group of genes in the human genome and occupy a very important position in signaling systems. Although olfactory receptors, which belong to the broader category of GPCRs, play an important role in monitoring the organism's surroundings, their actual three-dimensional structure has not yet been determined. Therefore, the specific details of the molecular interactions between the receptor and the ligand remain unclear. In this report, the interactions between human olfactory receptor 1A1 and its odorant molecules were simulated using computational methods, and we explored how the chemically simple odorant molecules activate the olfactory receptor.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 2","pages":"e18"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8261270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39145807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}