Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21048
Jiyon Chu, Juyoun Shin, Shinseok Kang, Sun Shin, Yeun-Jun Chung
Salmonella species are among the major pathogens that cause foodborne illness outbreaks. In this study, we aimed to develop a loop-mediated isothermal amplification (LAMP) assay for the rapid and sensitive detection of Salmonella species. We designed LAMP primers targeting the hilA gene as a universal marker of Salmonella species. A total of seven Salmonella species strains and 11 non-Salmonella pathogen strains from eight different genera were used in this study. All Salmonella strains showed positive amplification signals with the Salmonella LAMP assay; however, there was no non-specific amplification signal for the non-Salmonella strains. The detection limit was 100 femtograms (20 copies per reaction), which was ~1,000 times more sensitive than the detection limits of the conventional polymerase chain reaction (PCR) assay (100 pg). The reaction time for a positive amplification signal was less than 20 minutes, which was less than one-third the time taken while using conventional PCR. In conclusion, our Salmonella LAMP assay accurately detected Salmonella species with a higher degree of sensitivity and greater rapidity than the conventional PCR assay, and it may be suitable for point-of-care testing in the field.
{"title":"Rapid and sensitive detection of Salmonella species targeting the hilA gene using a loop-mediated isothermal amplification assay.","authors":"Jiyon Chu, Juyoun Shin, Shinseok Kang, Sun Shin, Yeun-Jun Chung","doi":"10.5808/gi.21048","DOIUrl":"https://doi.org/10.5808/gi.21048","url":null,"abstract":"<p><p>Salmonella species are among the major pathogens that cause foodborne illness outbreaks. In this study, we aimed to develop a loop-mediated isothermal amplification (LAMP) assay for the rapid and sensitive detection of Salmonella species. We designed LAMP primers targeting the hilA gene as a universal marker of Salmonella species. A total of seven Salmonella species strains and 11 non-Salmonella pathogen strains from eight different genera were used in this study. All Salmonella strains showed positive amplification signals with the Salmonella LAMP assay; however, there was no non-specific amplification signal for the non-Salmonella strains. The detection limit was 100 femtograms (20 copies per reaction), which was ~1,000 times more sensitive than the detection limits of the conventional polymerase chain reaction (PCR) assay (100 pg). The reaction time for a positive amplification signal was less than 20 minutes, which was less than one-third the time taken while using conventional PCR. In conclusion, our Salmonella LAMP assay accurately detected Salmonella species with a higher degree of sensitivity and greater rapidity than the conventional PCR assay, and it may be suitable for point-of-care testing in the field.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e30"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21011
Luis Alberto Robles Hernandez, Tiffany J Callahan, Juan M Banda
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.
{"title":"A biomedically oriented automatically annotated Twitter COVID-19 dataset.","authors":"Luis Alberto Robles Hernandez, Tiffany J Callahan, Juan M Banda","doi":"10.5808/gi.21011","DOIUrl":"10.5808/gi.21011","url":null,"abstract":"<p><p>The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e21"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21016
Eisuke Dohi, Ali Haider Bangash
Since only a small number of patients have a rare disease, it is difficult to identify all of the features of these diseases. This is especially true for patients uncommonly presenting with rare diseases. It can also be difficult for the patient, their families, and even clinicians to know which one of a number of disease phenotypes the patient is exhibiting. To address this issue, during Biomedical Linked Annotation Hackathon 7 (BLAH7), we tried to extract Alexander disease patient data in Portable Document Format. We then visualized the phenotypic diversity of those Alexander disease patients with uncommon presentations. This led to us identifying several issues that we need to overcome in our future work.
{"title":"Visualizing the phenotype diversity: a case study of Alexander disease.","authors":"Eisuke Dohi, Ali Haider Bangash","doi":"10.5808/gi.21016","DOIUrl":"https://doi.org/10.5808/gi.21016","url":null,"abstract":"<p><p>Since only a small number of patients have a rare disease, it is difficult to identify all of the features of these diseases. This is especially true for patients uncommonly presenting with rare diseases. It can also be difficult for the patient, their families, and even clinicians to know which one of a number of disease phenotypes the patient is exhibiting. To address this issue, during Biomedical Linked Annotation Hackathon 7 (BLAH7), we tried to extract Alexander disease patient data in Portable Document Format. We then visualized the phenotypic diversity of those Alexander disease patients with uncommon presentations. This led to us identifying several issues that we need to overcome in our future work.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e28"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21012
Atsuko Yamaguchi, Terue Takatsuki, Yuka Tateisi, Felipe Soares
The coronavirus disease 2019 (COVID-19) pandemic has led to a flood of research papers and the information has been updated with considerable frequency. For society to derive benefits from this research, it is necessary to promote sharing up-to-date knowledge from these papers. However, because most research papers are written in English, it is difficult for people who are not familiar with English medical terms to obtain knowledge from them. To facilitate sharing knowledge from COVID-19 papers written in English for Japanese speakers, we tried to construct a dictionary with an open license by assigning Japanese terms to MeSH unique identifiers (UIDs) annotated to words in the texts of COVID-19 papers. Using this dictionary, 98.99% of all occurrences of MeSH terms in COVID-19 papers were covered. We also created a curated version of the dictionary and uploaded it to PubDictionary for wider use in the PubAnnotation system.
{"title":"Constructing Japanese MeSH term dictionaries related to the COVID-19 literature.","authors":"Atsuko Yamaguchi, Terue Takatsuki, Yuka Tateisi, Felipe Soares","doi":"10.5808/gi.21012","DOIUrl":"https://doi.org/10.5808/gi.21012","url":null,"abstract":"<p><p>The coronavirus disease 2019 (COVID-19) pandemic has led to a flood of research papers and the information has been updated with considerable frequency. For society to derive benefits from this research, it is necessary to promote sharing up-to-date knowledge from these papers. However, because most research papers are written in English, it is difficult for people who are not familiar with English medical terms to obtain knowledge from them. To facilitate sharing knowledge from COVID-19 papers written in English for Japanese speakers, we tried to construct a dictionary with an open license by assigning Japanese terms to MeSH unique identifiers (UIDs) annotated to words in the texts of COVID-19 papers. Using this dictionary, 98.99% of all occurrences of MeSH terms in COVID-19 papers were covered. We also created a curated version of the dictionary and uploaded it to PubDictionary for wider use in the PubAnnotation system.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e25"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.
{"title":"LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19.","authors":"Sizhuo Ouyang, Yuxing Wang, Kaiyin Zhou, Jingbo Xia","doi":"10.5808/gi.21013","DOIUrl":"10.5808/gi.21013","url":null,"abstract":"<p><p>Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e23"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510875/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21035
Haeun Lee, Cherl-Joon Lee, Dong Hee Kim, Chun-Sung Cho, Wonseok Shin, Kyudong Han
Digital PCR (dPCR) is the third-generation PCR that enables real-time absolute quantification without reference materials. Recently, global diagnosis companies have developed new dPCR equipment. In line with the development, the Lab On An Array (LOAA) dPCR analyzer (Optolane) was launched last year. The LOAA dPCR is a semiconductor chip-based separation PCR type equipment. The LOAA dPCR includes Micro Electro Mechanical System that can be injected by partitioning the target gene into 56 to 20,000 wells. The amount of target gene per wells is digitized to 0 or 1 as the number of well gradually increases to 20,000 wells because its principle follows Poisson distribution, which allows the LOAA dPCR to perform precise absolute quantification. LOAA determined region of interest first prior to dPCR operation. To exclude invalid wells for the quantification, the LOAA dPCR has applied various filtering methods using brightness, slope, baseline, and noise filters. As the coronavirus disease 2019 has now spread around the world, needs for diagnostic equipment of point of care testing (POCT) are increasing. The LOAA dPCR is expected to be suitable for POCT diagnosis due to its compact size and high accuracy. Here, we describe the quantitative principle of the LOAA dPCR and suggest that it can be applied to various fields.
{"title":"High-accuracy quantitative principle of a new compact digital PCR equipment: Lab On An Array.","authors":"Haeun Lee, Cherl-Joon Lee, Dong Hee Kim, Chun-Sung Cho, Wonseok Shin, Kyudong Han","doi":"10.5808/gi.21035","DOIUrl":"https://doi.org/10.5808/gi.21035","url":null,"abstract":"<p><p>Digital PCR (dPCR) is the third-generation PCR that enables real-time absolute quantification without reference materials. Recently, global diagnosis companies have developed new dPCR equipment. In line with the development, the Lab On An Array (LOAA) dPCR analyzer (Optolane) was launched last year. The LOAA dPCR is a semiconductor chip-based separation PCR type equipment. The LOAA dPCR includes Micro Electro Mechanical System that can be injected by partitioning the target gene into 56 to 20,000 wells. The amount of target gene per wells is digitized to 0 or 1 as the number of well gradually increases to 20,000 wells because its principle follows Poisson distribution, which allows the LOAA dPCR to perform precise absolute quantification. LOAA determined region of interest first prior to dPCR operation. To exclude invalid wells for the quantification, the LOAA dPCR has applied various filtering methods using brightness, slope, baseline, and noise filters. As the coronavirus disease 2019 has now spread around the world, needs for diagnostic equipment of point of care testing (POCT) are increasing. The LOAA dPCR is expected to be suitable for POCT diagnosis due to its compact size and high accuracy. Here, we describe the quantitative principle of the LOAA dPCR and suggest that it can be applied to various fields.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e34"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39508324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eucalyptus is one of the major plantation species with wide variety of industrial uses. Polymorphic and informative simple sequence repeats (SSRs) have broad range of applications in genetic analysis. In this study, two individuals of Eucalyptus tereticornis (ET217 and ET86), one individual each from E. camaldulensis (EC17) and E. grandis (EG9) were subjected to whole genome resequencing. Low coverage (10×) genome sequencing was used to find polymorphic SSRs between the individuals. Average number of SSR loci identified was 95,513 and the density of SSRs per Mb was from 157.39 in EG9 to 155.08 in EC17. Among all the SSRs detected, the most abundant repeat motifs were di-nucleotide (59.6%-62.5%), followed by tri- (23.7%-27.2%), tetra- (5.2%-5.6%), penta- (5.0%-5.3%), and hexa-nucleotide (2.7%-2.9%). The predominant SSR motif units were AG/CT and AAG/TTC. Computational genome analysis predicted the SSR length variations between the individuals and identified the gene functions of SSR containing sequences. Selected subset of polymorphic markers was validated in a full-sib family of eucalypts. Additionally, genome-wide characterization of single nucleotide polymorphisms, InDels and transcriptional regulators were carried out. These variations will find their utility in genome-wide association studies as well as understanding of molecular mechanisms involved in key economic traits. The genomic resources generated in this study would provide an impetus to integrate genomics in marker-trait associations and breeding of tropical eucalypts.
{"title":"Chromosome-specific polymorphic SSR markers in tropical eucalypt species using low coverage whole genome sequences: systematic characterization and validation.","authors":"Maheswari Patturaj, Aiswarya Munusamy, Nithishkumar Kannan, Ulaganathan Kandasamy, Yasodha Ramasamy","doi":"10.5808/gi.21031","DOIUrl":"https://doi.org/10.5808/gi.21031","url":null,"abstract":"<p><p>Eucalyptus is one of the major plantation species with wide variety of industrial uses. Polymorphic and informative simple sequence repeats (SSRs) have broad range of applications in genetic analysis. In this study, two individuals of Eucalyptus tereticornis (ET217 and ET86), one individual each from E. camaldulensis (EC17) and E. grandis (EG9) were subjected to whole genome resequencing. Low coverage (10×) genome sequencing was used to find polymorphic SSRs between the individuals. Average number of SSR loci identified was 95,513 and the density of SSRs per Mb was from 157.39 in EG9 to 155.08 in EC17. Among all the SSRs detected, the most abundant repeat motifs were di-nucleotide (59.6%-62.5%), followed by tri- (23.7%-27.2%), tetra- (5.2%-5.6%), penta- (5.0%-5.3%), and hexa-nucleotide (2.7%-2.9%). The predominant SSR motif units were AG/CT and AAG/TTC. Computational genome analysis predicted the SSR length variations between the individuals and identified the gene functions of SSR containing sequences. Selected subset of polymorphic markers was validated in a full-sib family of eucalypts. Additionally, genome-wide characterization of single nucleotide polymorphisms, InDels and transcriptional regulators were carried out. These variations will find their utility in genome-wide association studies as well as understanding of molecular mechanisms involved in key economic traits. The genomic resources generated in this study would provide an impetus to integrate genomics in marker-trait associations and breeding of tropical eucalypts.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e33"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21014
Felipe Soares, Yuka Tateisi, Terue Takatsuki, Atsuko Yamaguchi
Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.
{"title":"O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information.","authors":"Felipe Soares, Yuka Tateisi, Terue Takatsuki, Atsuko Yamaguchi","doi":"10.5808/gi.21014","DOIUrl":"https://doi.org/10.5808/gi.21014","url":null,"abstract":"<p><p>Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e26"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.21039
Jeong-An Gim, Kyung-Wan Baek, Young-Sool Hah, Ho Jin Choo, Ji-Seok Kim, Jun-Il Yoo
Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Although S. libertina has ecological, commercial, and clinical importance, its whole-genome has not been reported yet. Here, we revealed the genome of S. libertina through de novo assembly. We assembled the whole-genome of S. libertina and determined its transcriptome for the first time using Illumina NovaSeq 6000 platform. According to the k-mer analysis, the genome size of S. libertina was estimated to be 3.04 Gb. Using RepeatMasker, a total of 53.68% of repeats were identified in the genome assembly. Genome data of S. libertina reported in this study will be useful for identification and conservation of S. libertina in East Asia.
{"title":"Draft genome of Semisulcospira libertina, a species of freshwater snail.","authors":"Jeong-An Gim, Kyung-Wan Baek, Young-Sool Hah, Ho Jin Choo, Ji-Seok Kim, Jun-Il Yoo","doi":"10.5808/gi.21039","DOIUrl":"https://doi.org/10.5808/gi.21039","url":null,"abstract":"<p><p>Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Although S. libertina has ecological, commercial, and clinical importance, its whole-genome has not been reported yet. Here, we revealed the genome of S. libertina through de novo assembly. We assembled the whole-genome of S. libertina and determined its transcriptome for the first time using Illumina NovaSeq 6000 platform. According to the k-mer analysis, the genome size of S. libertina was estimated to be 3.04 Gb. Using RepeatMasker, a total of 53.68% of repeats were identified in the genome assembly. Genome data of S. libertina reported in this study will be useful for identification and conservation of S. libertina in East Asia.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e32"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510874/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01Epub Date: 2021-09-30DOI: 10.5808/gi.19.3.e1
Jin-Dong Kim, Kevin Bretonnel Cohen, Fabio Rinaldi, Zhiyong Lu, Hyun-Seok Park
2021 Korea Genome Organization This is an open-access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The special section is dedicated to reporting achievements of the 7th Biomedical Linked Annotation Hackathon (BLAH7). BLAH is an annual hackathon event which is organized to join forces of biomedical text mining for the goal to promote interoperability among text mining resources. This year, the 7th edition was held in January, 2021. Due to the pandemic, it was organized as an online event, with the special theme “coronavirus disease 2019 (COVID-19)”. The goal was to develop text mining resources to help address the pandemic situation. During the hackathon, 47 participants from 11 countries worked on voluntarily organized projects, and the results are reported in this special collection. This section includes seven application notes and one opinion article. The first application note by Hernandez et al. [1] presents a Twitter dataset which includes more than 120 million “potentially clinically-relevant” tweets. The tweets are automatically annotated for clinically important named entities like drugs and symptoms. The dataset is released publicly to facilitate research on mining social media data for biomedical and clinical applications. Lithgow-Serrano et al. [2] presents named entity annotation of the LitCovid [3] dataset using OntoGene’s Biomedical Entity Recogniser (OGER) [4] and shows its effectiveness for document classification. Ouyang et al. [5] presents the AGAC annotation [6] added on top of the PubTator [7] and OGER annotations and shows that the addition is potentially useful to mine regulatory or causal relationships between biomedical entities. The following three papers represent efforts for multilingualism of text mining. Barros et al. [8] presents a multilingual parallel corpus of PubMed articles for the language pairs English-Portuguese and English-Spanish. Their corpus was annotated for biomedical entities and also relationships between them, which was then used to develop a multilingual recommendation dataset for recommending biomedical entities to the authors of the articles. Yamaguchi et al. [9] and Soares et al. [10] are written by the same set of authors. They developed two versions of Japanese translation of MeSH terms, one through merging of existing resources and manual curation, and another through an automatic translation method, of which the results are reported in the two separate application notes. Larmande et al. [11] reports a revision to OryzaGP [12], a corpus of PubMed articles relevant to rice species, which are automatically annotated for proteins and genes. The last one by Dohi et al. [13] presents the authors’ opinion after their case study with Alexander disease towards visualizing the phenotype diversity.
{"title":"Editor's introduction to the special section on the 7th Biomedical Linked Annotation Hackathon (BLAH7).","authors":"Jin-Dong Kim, Kevin Bretonnel Cohen, Fabio Rinaldi, Zhiyong Lu, Hyun-Seok Park","doi":"10.5808/gi.19.3.e1","DOIUrl":"https://doi.org/10.5808/gi.19.3.e1","url":null,"abstract":"2021 Korea Genome Organization This is an open-access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The special section is dedicated to reporting achievements of the 7th Biomedical Linked Annotation Hackathon (BLAH7). BLAH is an annual hackathon event which is organized to join forces of biomedical text mining for the goal to promote interoperability among text mining resources. This year, the 7th edition was held in January, 2021. Due to the pandemic, it was organized as an online event, with the special theme “coronavirus disease 2019 (COVID-19)”. The goal was to develop text mining resources to help address the pandemic situation. During the hackathon, 47 participants from 11 countries worked on voluntarily organized projects, and the results are reported in this special collection. This section includes seven application notes and one opinion article. The first application note by Hernandez et al. [1] presents a Twitter dataset which includes more than 120 million “potentially clinically-relevant” tweets. The tweets are automatically annotated for clinically important named entities like drugs and symptoms. The dataset is released publicly to facilitate research on mining social media data for biomedical and clinical applications. Lithgow-Serrano et al. [2] presents named entity annotation of the LitCovid [3] dataset using OntoGene’s Biomedical Entity Recogniser (OGER) [4] and shows its effectiveness for document classification. Ouyang et al. [5] presents the AGAC annotation [6] added on top of the PubTator [7] and OGER annotations and shows that the addition is potentially useful to mine regulatory or causal relationships between biomedical entities. The following three papers represent efforts for multilingualism of text mining. Barros et al. [8] presents a multilingual parallel corpus of PubMed articles for the language pairs English-Portuguese and English-Spanish. Their corpus was annotated for biomedical entities and also relationships between them, which was then used to develop a multilingual recommendation dataset for recommending biomedical entities to the authors of the articles. Yamaguchi et al. [9] and Soares et al. [10] are written by the same set of authors. They developed two versions of Japanese translation of MeSH terms, one through merging of existing resources and manual curation, and another through an automatic translation method, of which the results are reported in the two separate application notes. Larmande et al. [11] reports a revision to OryzaGP [12], a corpus of PubMed articles relevant to rice species, which are automatically annotated for proteins and genes. The last one by Dohi et al. [13] presents the authors’ opinion after their case study with Alexander disease towards visualizing the phenotype diversity. ","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e20"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510870/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}