Genomics and Informatics最新文献

英文中文

Rapid and sensitive detection of Salmonella species targeting the hilA gene using a loop-mediated isothermal amplification assay. 利用环介导的等温扩增法快速灵敏地检测针对hilA基因的沙门氏菌。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21048

Jiyon Chu, Juyoun Shin, Shinseok Kang, Sun Shin, Yeun-Jun Chung

Salmonella species are among the major pathogens that cause foodborne illness outbreaks. In this study, we aimed to develop a loop-mediated isothermal amplification (LAMP) assay for the rapid and sensitive detection of Salmonella species. We designed LAMP primers targeting the hilA gene as a universal marker of Salmonella species. A total of seven Salmonella species strains and 11 non-Salmonella pathogen strains from eight different genera were used in this study. All Salmonella strains showed positive amplification signals with the Salmonella LAMP assay; however, there was no non-specific amplification signal for the non-Salmonella strains. The detection limit was 100 femtograms (20 copies per reaction), which was ~1,000 times more sensitive than the detection limits of the conventional polymerase chain reaction (PCR) assay (100 pg). The reaction time for a positive amplification signal was less than 20 minutes, which was less than one-third the time taken while using conventional PCR. In conclusion, our Salmonella LAMP assay accurately detected Salmonella species with a higher degree of sensitivity and greater rapidity than the conventional PCR assay, and it may be suitable for point-of-care testing in the field.

沙门氏菌是引起食源性疾病暴发的主要病原体之一。在这项研究中，我们旨在建立一种环介导的等温扩增(LAMP)方法来快速灵敏地检测沙门氏菌。我们设计了针对hilA基因的LAMP引物，作为沙门氏菌的通用标记。本研究共使用了7株沙门菌和11株非沙门菌病原菌，分别来自8个属。所有沙门氏菌LAMP扩增结果均为阳性;而对非沙门菌无非特异性扩增信号。检测限为100飞图(每个反应20拷贝)，灵敏度是传统PCR检测限(100 pg)的1000倍。阳性扩增信号的反应时间不到20分钟，不到传统PCR所用时间的三分之一。综上所述，与传统的PCR方法相比，LAMP法能够准确地检测出沙门氏菌种类，灵敏度更高，速度更快，可用于现场即时检测。

{"title":"Rapid and sensitive detection of Salmonella species targeting the hilA gene using a loop-mediated isothermal amplification assay.","authors":"Jiyon Chu, Juyoun Shin, Shinseok Kang, Sun Shin, Yeun-Jun Chung","doi":"10.5808/gi.21048","DOIUrl":"https://doi.org/10.5808/gi.21048","url":null,"abstract":"Salmonella species are among the major pathogens that cause foodborne illness outbreaks. In this study, we aimed to develop a loop-mediated isothermal amplification (LAMP) assay for the rapid and sensitive detection of Salmonella species. We designed LAMP primers targeting the hilA gene as a universal marker of Salmonella species. A total of seven Salmonella species strains and 11 non-Salmonella pathogen strains from eight different genera were used in this study. All Salmonella strains showed positive amplification signals with the Salmonella LAMP assay; however, there was no non-specific amplification signal for the non-Salmonella strains. The detection limit was 100 femtograms (20 copies per reaction), which was ~1,000 times more sensitive than the detection limits of the conventional polymerase chain reaction (PCR) assay (100 pg). The reaction time for a positive amplification signal was less than 20 minutes, which was less than one-third the time taken while using conventional PCR. In conclusion, our Salmonella LAMP assay accurately detected Salmonella species with a higher degree of sensitivity and greater rapidity than the conventional PCR assay, and it may be suitable for point-of-care testing in the field.","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e30"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A biomedically oriented automatically annotated Twitter COVID-19 dataset. 面向生物医学的自动注释Twitter COVID-19数据集。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21011

Luis Alberto Robles Hernandez, Tiffany J Callahan, Juan M Banda

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

多年来，像推特这样的社交媒体数据在生物医学研究中的使用逐渐增加。随着2019冠状病毒病（新冠肺炎）的大流行，研究人员转向了更多非传统的临床数据来源，以在近实时描述该疾病，研究干预措施的社会影响，以及新冠肺炎康复病例的后遗症。然而，由于手动注释的昂贵成本和识别正确文本所需的努力，手动策划的社交媒体数据集很难获得。当数据集可用时，它们通常非常小，并且它们的注释不会随着时间的推移很好地推广到更大的文档集。作为2021生物医学链接注释黑客马拉松的一部分，我们发布了超过1.2亿条自动注释推文的数据集，用于生物医学研究。结合最佳实践，我们确定了具有潜在高度临床相关性的推文。我们通过将几个基于SpaCy的注释框架与手动注释的黄金标准数据集进行比较来评估我们的工作。选择用于自动注释的最佳方法，我们对1.2亿条推文进行了注释，并公开发布，以供未来在生物医学领域的下游使用。

{"title":"A biomedically oriented automatically annotated Twitter COVID-19 dataset.","authors":"Luis Alberto Robles Hernandez, Tiffany J Callahan, Juan M Banda","doi":"10.5808/gi.21011","DOIUrl":"10.5808/gi.21011","url":null,"abstract":"The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e21"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Visualizing the phenotype diversity: a case study of Alexander disease. 表型多样性可视化:以亚历山大病为例。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21016

Eisuke Dohi, Ali Haider Bangash

Since only a small number of patients have a rare disease, it is difficult to identify all of the features of these diseases. This is especially true for patients uncommonly presenting with rare diseases. It can also be difficult for the patient, their families, and even clinicians to know which one of a number of disease phenotypes the patient is exhibiting. To address this issue, during Biomedical Linked Annotation Hackathon 7 (BLAH7), we tried to extract Alexander disease patient data in Portable Document Format. We then visualized the phenotypic diversity of those Alexander disease patients with uncommon presentations. This led to us identifying several issues that we need to overcome in our future work.

由于只有少数患者患有罕见疾病，因此很难确定这些疾病的所有特征。对于罕见疾病的患者尤其如此。对于患者、他们的家人，甚至临床医生来说，要知道患者表现出的是多种疾病表型中的哪一种也很困难。为了解决这个问题，在生物医学链接注释黑客马拉松7 (BLAH7)期间，我们尝试以便携式文档格式提取亚历山大病患者数据。然后，我们可视化的表型多样性，这些亚历山大病患者不常见的表现。这使我们确定了在未来工作中需要克服的几个问题。

引用次数: 1

Constructing Japanese MeSH term dictionaries related to the COVID-19 literature. 构建与COVID-19文献相关的日语MeSH术语词典。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21012

Atsuko Yamaguchi, Terue Takatsuki, Yuka Tateisi, Felipe Soares

The coronavirus disease 2019 (COVID-19) pandemic has led to a flood of research papers and the information has been updated with considerable frequency. For society to derive benefits from this research, it is necessary to promote sharing up-to-date knowledge from these papers. However, because most research papers are written in English, it is difficult for people who are not familiar with English medical terms to obtain knowledge from them. To facilitate sharing knowledge from COVID-19 papers written in English for Japanese speakers, we tried to construct a dictionary with an open license by assigning Japanese terms to MeSH unique identifiers (UIDs) annotated to words in the texts of COVID-19 papers. Using this dictionary, 98.99% of all occurrences of MeSH terms in COVID-19 papers were covered. We also created a curated version of the dictionary and uploaded it to PubDictionary for wider use in the PubAnnotation system.

2019冠状病毒病(COVID-19)大流行引发了大量研究论文，相关信息也得到了相当频繁的更新。为了使社会从这项研究中获益，有必要促进分享这些论文的最新知识。然而，由于大多数研究论文都是用英语写的，对于不熟悉英语医学术语的人来说，很难从中获得知识。为了方便日语使用者分享用英语撰写的COVID-19论文中的知识，我们尝试构建一个开放许可的词典，将日语术语分配给标注在COVID-19论文文本中的单词的MeSH唯一标识符(uid)。使用该词典，涵盖了新冠肺炎论文中出现的98.99%的MeSH术语。我们还创建了一个词典的精选版本，并将其上传到PubDictionary，以便在pubbannotation系统中得到更广泛的使用。

引用次数: 1

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19. LitCovid-AGAC：基于 COVID-19 的细胞和分子水平注释数据集。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21013

Sizhuo Ouyang, Yuxing Wang, Kaiyin Zhou, Jingbo Xia

Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

目前，冠状病毒病 2019（COVID-19）的文献急剧增加，文本量的增加使得进行大规模文本挖掘和知识发现成为可能。因此，如何对这些文本进行整理，以获取有关 COVID-19 机制的重要信息，成为生物医学自然语言处理（BioNLP）领域的一个关键问题。PubAnnotation 是一个对齐注释系统，它为生物馆员上传注释或合并其他外部注释提供了一个高效的平台。受整合多种有用 COVID-19 注释的启发，我们将三种注释资源合并到 LitCovid 数据集，并构建了交叉注释语料库 LitCovid-AGAC。该语料库包括 PubTator 中的 Mutation、Species、Gene、Disease，OGER 中的 GO、CHEBI，AGAC 中的 Var、MPA、CPA、NegReg、PosReg、Reg 等 12 个标签，以及 LitCovid 中的 50,018 篇 COVID-19 摘要。其中包含足够丰富的信息，有可能揭示 COVID-19 病理机制中隐藏的知识。

引用次数: 0

High-accuracy quantitative principle of a new compact digital PCR equipment: Lab On An Array. 一种新型紧凑型数字PCR设备的高精度定量原理:阵列实验室。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21035

Haeun Lee, Cherl-Joon Lee, Dong Hee Kim, Chun-Sung Cho, Wonseok Shin, Kyudong Han

Digital PCR (dPCR) is the third-generation PCR that enables real-time absolute quantification without reference materials. Recently, global diagnosis companies have developed new dPCR equipment. In line with the development, the Lab On An Array (LOAA) dPCR analyzer (Optolane) was launched last year. The LOAA dPCR is a semiconductor chip-based separation PCR type equipment. The LOAA dPCR includes Micro Electro Mechanical System that can be injected by partitioning the target gene into 56 to 20,000 wells. The amount of target gene per wells is digitized to 0 or 1 as the number of well gradually increases to 20,000 wells because its principle follows Poisson distribution, which allows the LOAA dPCR to perform precise absolute quantification. LOAA determined region of interest first prior to dPCR operation. To exclude invalid wells for the quantification, the LOAA dPCR has applied various filtering methods using brightness, slope, baseline, and noise filters. As the coronavirus disease 2019 has now spread around the world, needs for diagnostic equipment of point of care testing (POCT) are increasing. The LOAA dPCR is expected to be suitable for POCT diagnosis due to its compact size and high accuracy. Here, we describe the quantitative principle of the LOAA dPCR and suggest that it can be applied to various fields.

数字PCR (Digital PCR, dPCR)是第三代PCR技术，可以在没有参比物的情况下进行实时绝对定量。最近，全球诊断公司开发了新的dPCR设备。随着技术的发展，去年推出了阵列实验室(LOAA) dPCR分析仪(Optolane)。LOAA dPCR是一种基于半导体芯片的分离PCR型设备。LOAA dPCR包括微型机电系统，可以通过将目标基因划分到56至20,000个孔中进行注射。随着井数逐渐增加到2万口，每井的靶基因数量被数字化为0或1，因为其原理遵循泊松分布，这使得LOAA dPCR可以进行精确的绝对定量。在dPCR操作之前，LOAA首先确定感兴趣的区域。为了排除无效井进行定量，LOAA dPCR应用了各种过滤方法，包括亮度、斜率、基线和噪声滤波器。随着2019年冠状病毒病在全球蔓延，对护理点检测(POCT)诊断设备的需求正在增加。LOAA dPCR由于其体积小，准确度高，有望用于POCT的诊断。在这里，我们描述了LOAA dPCR的定量原理，并建议它可以应用于各个领域。

{"title":"High-accuracy quantitative principle of a new compact digital PCR equipment: Lab On An Array.","authors":"Haeun Lee, Cherl-Joon Lee, Dong Hee Kim, Chun-Sung Cho, Wonseok Shin, Kyudong Han","doi":"10.5808/gi.21035","DOIUrl":"https://doi.org/10.5808/gi.21035","url":null,"abstract":"Digital PCR (dPCR) is the third-generation PCR that enables real-time absolute quantification without reference materials. Recently, global diagnosis companies have developed new dPCR equipment. In line with the development, the Lab On An Array (LOAA) dPCR analyzer (Optolane) was launched last year. The LOAA dPCR is a semiconductor chip-based separation PCR type equipment. The LOAA dPCR includes Micro Electro Mechanical System that can be injected by partitioning the target gene into 56 to 20,000 wells. The amount of target gene per wells is digitized to 0 or 1 as the number of well gradually increases to 20,000 wells because its principle follows Poisson distribution, which allows the LOAA dPCR to perform precise absolute quantification. LOAA determined region of interest first prior to dPCR operation. To exclude invalid wells for the quantification, the LOAA dPCR has applied various filtering methods using brightness, slope, baseline, and noise filters. As the coronavirus disease 2019 has now spread around the world, needs for diagnostic equipment of point of care testing (POCT) are increasing. The LOAA dPCR is expected to be suitable for POCT diagnosis due to its compact size and high accuracy. Here, we describe the quantitative principle of the LOAA dPCR and suggest that it can be applied to various fields.","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e34"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39508324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Chromosome-specific polymorphic SSR markers in tropical eucalypt species using low coverage whole genome sequences: systematic characterization and validation. 利用低覆盖全基因组序列的热带桉树物种染色体特异性多态性SSR标记:系统表征和验证。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21031

Maheswari Patturaj, Aiswarya Munusamy, Nithishkumar Kannan, Ulaganathan Kandasamy, Yasodha Ramasamy

Eucalyptus is one of the major plantation species with wide variety of industrial uses. Polymorphic and informative simple sequence repeats (SSRs) have broad range of applications in genetic analysis. In this study, two individuals of Eucalyptus tereticornis (ET217 and ET86), one individual each from E. camaldulensis (EC17) and E. grandis (EG9) were subjected to whole genome resequencing. Low coverage (10×) genome sequencing was used to find polymorphic SSRs between the individuals. Average number of SSR loci identified was 95,513 and the density of SSRs per Mb was from 157.39 in EG9 to 155.08 in EC17. Among all the SSRs detected, the most abundant repeat motifs were di-nucleotide (59.6%-62.5%), followed by tri- (23.7%-27.2%), tetra- (5.2%-5.6%), penta- (5.0%-5.3%), and hexa-nucleotide (2.7%-2.9%). The predominant SSR motif units were AG/CT and AAG/TTC. Computational genome analysis predicted the SSR length variations between the individuals and identified the gene functions of SSR containing sequences. Selected subset of polymorphic markers was validated in a full-sib family of eucalypts. Additionally, genome-wide characterization of single nucleotide polymorphisms, InDels and transcriptional regulators were carried out. These variations will find their utility in genome-wide association studies as well as understanding of molecular mechanisms involved in key economic traits. The genomic resources generated in this study would provide an impetus to integrate genomics in marker-trait associations and breeding of tropical eucalypts.

桉树是具有广泛工业用途的主要人工林树种之一。多态和信息丰富的简单重复序列(SSRs)在遗传分析中有着广泛的应用。本研究对2个长角桉(ET217和ET86)、1个camaldulensis (EC17)和1个grandis (EG9)进行了全基因组重测序。采用低覆盖率(10x)基因组测序，寻找个体间多态SSRs。平均SSR位点数为95,513个，SSR位点密度为157.39 / Mb ~ 155.08 / Mb。在所有检测到的ssr序列中，重复序列最多的是二核苷酸(59.6% ~ 62.5%)，其次是三核苷酸(23.7% ~ 27.2%)、四核苷酸(5.2% ~ 5.6%)、五核苷酸(5.0% ~ 5.3%)和六核苷酸(2.7% ~ 2.9%)。优势SSR基序单元为AG/CT和AAG/TTC。计算基因组分析预测了个体间SSR长度的差异，鉴定了SSR序列的基因功能。选择的多态标记子集在一个全同胞桉树家族中得到验证。此外，还进行了单核苷酸多态性、InDels和转录调控因子的全基因组表征。这些变异将在全基因组关联研究以及对涉及关键经济性状的分子机制的理解中发挥作用。本研究产生的基因组资源将为热带桉树的标记性状关联和育种整合基因组学提供动力。

{"title":"Chromosome-specific polymorphic SSR markers in tropical eucalypt species using low coverage whole genome sequences: systematic characterization and validation.","authors":"Maheswari Patturaj, Aiswarya Munusamy, Nithishkumar Kannan, Ulaganathan Kandasamy, Yasodha Ramasamy","doi":"10.5808/gi.21031","DOIUrl":"https://doi.org/10.5808/gi.21031","url":null,"abstract":"Eucalyptus is one of the major plantation species with wide variety of industrial uses. Polymorphic and informative simple sequence repeats (SSRs) have broad range of applications in genetic analysis. In this study, two individuals of Eucalyptus tereticornis (ET217 and ET86), one individual each from E. camaldulensis (EC17) and E. grandis (EG9) were subjected to whole genome resequencing. Low coverage (10×) genome sequencing was used to find polymorphic SSRs between the individuals. Average number of SSR loci identified was 95,513 and the density of SSRs per Mb was from 157.39 in EG9 to 155.08 in EC17. Among all the SSRs detected, the most abundant repeat motifs were di-nucleotide (59.6%-62.5%), followed by tri- (23.7%-27.2%), tetra- (5.2%-5.6%), penta- (5.0%-5.3%), and hexa-nucleotide (2.7%-2.9%). The predominant SSR motif units were AG/CT and AAG/TTC. Computational genome analysis predicted the SSR length variations between the individuals and identified the gene functions of SSR containing sequences. Selected subset of polymorphic markers was validated in a full-sib family of eucalypts. Additionally, genome-wide characterization of single nucleotide polymorphisms, InDels and transcriptional regulators were carried out. These variations will find their utility in genome-wide association studies as well as understanding of molecular mechanisms involved in key economic traits. The genomic resources generated in this study would provide an impetus to integrate genomics in marker-trait associations and breeding of tropical eucalypts.","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e33"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39509825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information. O-JMeSH:通过机器翻译和互信息，创建中英文对照的MeSH uid词汇表。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21014

Felipe Soares, Yuka Tateisi, Terue Takatsuki, Atsuko Yamaguchi

Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.

以前为日语创建受控词汇表的方法是借助于现有的双语字典和转换规则来允许这种映射。然而，鉴于2019冠状病毒病(COVID-19)可能引入的新术语以及呼吸道和感染相关术语的重点，可能无法保证覆盖范围。我们建议在本工作中基于分配给COVID-19相关出版物的MeSH术语创建日语双语受控词汇表。为此，我们求助于几个双语词典的手动管理和基于机器翻译的计算方法，该方法包含这些术语的句子，并根据相互信息对单个术语的可能翻译进行排名。我们的结果表明，我们在LitCovid中实现了近99%的发生率覆盖率，而我们的计算方法对所有术语的平均准确率为63.33%，对药物和化学品的平均准确率为84.51%。

引用次数: 1

Draft genome of Semisulcospira libertina, a species of freshwater snail. 绘制淡水蜗牛半硫酸盐螺的基因组。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.21039

Jeong-An Gim, Kyung-Wan Baek, Young-Sool Hah, Ho Jin Choo, Ji-Seok Kim, Jun-Il Yoo

Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Although S. libertina has ecological, commercial, and clinical importance, its whole-genome has not been reported yet. Here, we revealed the genome of S. libertina through de novo assembly. We assembled the whole-genome of S. libertina and determined its transcriptome for the first time using Illumina NovaSeq 6000 platform. According to the k-mer analysis, the genome size of S. libertina was estimated to be 3.04 Gb. Using RepeatMasker, a total of 53.68% of repeats were identified in the genome assembly. Genome data of S. libertina reported in this study will be useful for identification and conservation of S. libertina in East Asia.

半硫酸盐螺，一种淡水蜗牛，广泛分布于东亚。它是一种重要的食物来源。此外，它是支睾吸虫病、肺吸虫病、吸虫病和其他寄生虫的媒介。尽管这种植物具有生态、商业和临床意义，但其全基因组尚未被报道。在这里，我们通过de novo组装揭示了S. libertina的基因组。我们利用Illumina NovaSeq 6000平台首次组装了S. libertina的全基因组，并测定了其转录组。根据k-mer分析，估计S. libertina的基因组大小为3.04 Gb。使用RepeatMasker，在基因组组装中共鉴定出53.68%的重复序列。本研究报告的金银花基因组数据将为金银花在东亚地区的鉴定和保护提供参考。

引用次数: 0

Editor's introduction to the special section on the 7th Biomedical Linked Annotation Hackathon (BLAH7). 编辑对第七届生物医学链接注释黑客马拉松(BLAH7)特别部分的介绍。

Q2 Agricultural and Biological Sciences

Genomics and Informatics

Pub Date : 2021-09-01 Epub Date: 2021-09-30 DOI: 10.5808/gi.19.3.e1

Jin-Dong Kim, Kevin Bretonnel Cohen, Fabio Rinaldi, Zhiyong Lu, Hyun-Seok Park

2021 Korea Genome Organization This is an open-access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The special section is dedicated to reporting achievements of the 7th Biomedical Linked Annotation Hackathon (BLAH7). BLAH is an annual hackathon event which is organized to join forces of biomedical text mining for the goal to promote interoperability among text mining resources. This year, the 7th edition was held in January, 2021. Due to the pandemic, it was organized as an online event, with the special theme “coronavirus disease 2019 (COVID-19)”. The goal was to develop text mining resources to help address the pandemic situation. During the hackathon, 47 participants from 11 countries worked on voluntarily organized projects, and the results are reported in this special collection. This section includes seven application notes and one opinion article. The first application note by Hernandez et al. [1] presents a Twitter dataset which includes more than 120 million “potentially clinically-relevant” tweets. The tweets are automatically annotated for clinically important named entities like drugs and symptoms. The dataset is released publicly to facilitate research on mining social media data for biomedical and clinical applications. Lithgow-Serrano et al. [2] presents named entity annotation of the LitCovid [3] dataset using OntoGene’s Biomedical Entity Recogniser (OGER) [4] and shows its effectiveness for document classification. Ouyang et al. [5] presents the AGAC annotation [6] added on top of the PubTator [7] and OGER annotations and shows that the addition is potentially useful to mine regulatory or causal relationships between biomedical entities. The following three papers represent efforts for multilingualism of text mining. Barros et al. [8] presents a multilingual parallel corpus of PubMed articles for the language pairs English-Portuguese and English-Spanish. Their corpus was annotated for biomedical entities and also relationships between them, which was then used to develop a multilingual recommendation dataset for recommending biomedical entities to the authors of the articles. Yamaguchi et al. [9] and Soares et al. [10] are written by the same set of authors. They developed two versions of Japanese translation of MeSH terms, one through merging of existing resources and manual curation, and another through an automatic translation method, of which the results are reported in the two separate application notes. Larmande et al. [11] reports a revision to OryzaGP [12], a corpus of PubMed articles relevant to rice species, which are automatically annotated for proteins and genes. The last one by Dohi et al. [13] presents the authors’ opinion after their case study with Alexander disease towards visualizing the phenotype diversity.

{"title":"Editor's introduction to the special section on the 7th Biomedical Linked Annotation Hackathon (BLAH7).","authors":"Jin-Dong Kim, Kevin Bretonnel Cohen, Fabio Rinaldi, Zhiyong Lu, Hyun-Seok Park","doi":"10.5808/gi.19.3.e1","DOIUrl":"https://doi.org/10.5808/gi.19.3.e1","url":null,"abstract":"2021 Korea Genome Organization This is an open-access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The special section is dedicated to reporting achievements of the 7th Biomedical Linked Annotation Hackathon (BLAH7). BLAH is an annual hackathon event which is organized to join forces of biomedical text mining for the goal to promote interoperability among text mining resources. This year, the 7th edition was held in January, 2021. Due to the pandemic, it was organized as an online event, with the special theme “coronavirus disease 2019 (COVID-19)”. The goal was to develop text mining resources to help address the pandemic situation. During the hackathon, 47 participants from 11 countries worked on voluntarily organized projects, and the results are reported in this special collection. This section includes seven application notes and one opinion article. The first application note by Hernandez et al. [1] presents a Twitter dataset which includes more than 120 million “potentially clinically-relevant” tweets. The tweets are automatically annotated for clinically important named entities like drugs and symptoms. The dataset is released publicly to facilitate research on mining social media data for biomedical and clinical applications. Lithgow-Serrano et al. [2] presents named entity annotation of the LitCovid [3] dataset using OntoGene’s Biomedical Entity Recogniser (OGER) [4] and shows its effectiveness for document classification. Ouyang et al. [5] presents the AGAC annotation [6] added on top of the PubTator [7] and OGER annotations and shows that the addition is potentially useful to mine regulatory or causal relationships between biomedical entities. The following three papers represent efforts for multilingualism of text mining. Barros et al. [8] presents a multilingual parallel corpus of PubMed articles for the language pairs English-Portuguese and English-Spanish. Their corpus was annotated for biomedical entities and also relationships between them, which was then used to develop a multilingual recommendation dataset for recommending biomedical entities to the authors of the articles. Yamaguchi et al. [9] and Soares et al. [10] are written by the same set of authors. They developed two versions of Japanese translation of MeSH terms, one through merging of existing resources and manual curation, and another through an automatic translation method, of which the results are reported in the two separate application notes. Larmande et al. [11] reports a revision to OryzaGP [12], a corpus of PubMed articles relevant to rice species, which are automatically annotated for proteins and genes. The last one by Dohi et al. [13] presents the authors’ opinion after their case study with Alexander disease towards visualizing the phenotype diversity. ","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"19 3","pages":"e20"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510870/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39511290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Genomics and Informatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀