首页 > 最新文献

Genomics & informatics最新文献

英文 中文
Direct-to-consumer genetic testing 直接面向消费者的基因检测
Pub Date : 2019-09-01 DOI: 10.5808/GI.2019.17.3.e34
Jong-Won Kim
Direct-to-consumer (DTC) genetic testing is a controversial issue although Korean Government is considering to expand DTC genetic testing. Preventing the exaggeration and abusing of DTC genetic testing is an important task considering the early history of DTC genetic testing in Korea. And the DTC genetic testing performance or method has been rarely reported to the scientific and/or medical community and reliability of DTC genetic testing needs to be assessed. Law enforcement needs to improve these issues. Also principle of transparency needs to be applied.
直接面向消费者(DTC)基因检测是一个有争议的问题,尽管韩国政府正在考虑扩大DTC基因检测。考虑到韩国DTC基因检测的早期历史,防止DTC基因检测被夸大和滥用是一项重要任务。DTC基因检测性能或方法很少向科学和/或医学界报告,需要评估DTC基因检测的可靠性。执法部门需要改进这些问题。此外,还需要适用透明度原则。
{"title":"Direct-to-consumer genetic testing","authors":"Jong-Won Kim","doi":"10.5808/GI.2019.17.3.e34","DOIUrl":"https://doi.org/10.5808/GI.2019.17.3.e34","url":null,"abstract":"Direct-to-consumer (DTC) genetic testing is a controversial issue although Korean Government is considering to expand DTC genetic testing. Preventing the exaggeration and abusing of DTC genetic testing is an important task considering the early history of DTC genetic testing in Korea. And the DTC genetic testing performance or method has been rarely reported to the scientific and/or medical community and reliability of DTC genetic testing needs to be assessed. Law enforcement needs to improve these issues. Also principle of transparency needs to be applied.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42760572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep learning for stage prediction in neuroblastoma using gene expression data 利用基因表达数据进行神经母细胞瘤阶段预测的深度学习
Pub Date : 2019-09-01 DOI: 10.5808/GI.2019.17.3.e30
Aron Park, S. Nam
Neuroblastoma is a major cause of cancer death in early childhood, and its timely and correct diagnosis is critical. Gene expression datasets have recently been considered as a powerful tool for cancer diagnosis and subtype classification. However, no attempts have yet been made to apply deep learning using gene expression to neuroblastoma classification, although deep learning has been applied to cancer diagnosis using image data. Taking the International Neuroblastoma Staging System stages as multiple classes, we designed a deep neural network using the gene expression patterns and stages of neuroblastoma patients. Despite a small patient population (n = 280), stage 1 and 4 patients were well distinguished. If it is possible to replicate this approach in a larger population, deep learning could play an important role in neuroblastoma staging.
神经母细胞瘤是癌症早期死亡的主要原因,其及时、正确的诊断至关重要。基因表达数据集最近被认为是癌症诊断和亚型分类的强大工具。然而,尽管深度学习已经应用于使用图像数据的癌症诊断,但还没有尝试将使用基因表达的深度学习应用于神经母细胞瘤分类。我们将国际神经母细胞瘤分期系统的分期分为多个类别,利用神经母细胞癌患者的基因表达模式和分期设计了一个深度神经网络。尽管患者人数较少(n=280),但第1期和第4期患者的差异很大。如果有可能在更大的人群中复制这种方法,深度学习可能在神经母细胞瘤的分期中发挥重要作用。
{"title":"Deep learning for stage prediction in neuroblastoma using gene expression data","authors":"Aron Park, S. Nam","doi":"10.5808/GI.2019.17.3.e30","DOIUrl":"https://doi.org/10.5808/GI.2019.17.3.e30","url":null,"abstract":"Neuroblastoma is a major cause of cancer death in early childhood, and its timely and correct diagnosis is critical. Gene expression datasets have recently been considered as a powerful tool for cancer diagnosis and subtype classification. However, no attempts have yet been made to apply deep learning using gene expression to neuroblastoma classification, although deep learning has been applied to cancer diagnosis using image data. Taking the International Neuroblastoma Staging System stages as multiple classes, we designed a deep neural network using the gene expression patterns and stages of neuroblastoma patients. Despite a small patient population (n = 280), stage 1 and 4 patients were well distinguished. If it is possible to replicate this approach in a larger population, deep learning could play an important role in neuroblastoma staging.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49302915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Trends in Genomics & Informatics: a statistical review of publications from 2003 to 2018 focusing on the most-studied genes and document clusters 基因组学和信息学趋势:2003年至2018年出版物的统计综述,重点关注研究最多的基因和文献集群
Pub Date : 2019-09-01 DOI: 10.5808/GI.2019.17.3.e25
Jihyeon Kim, Hee-Jo Nam, Hyun-Seok Park
Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Herein, we conduct a statistical analysis of the publications of Genomics & Informatics over the 16 years since its inception, with a particular focus on issues relating to article categories, word clouds, and the most-studied genes, drawing on recent reviews of the use of word frequencies in journal articles. Trends in the studies published in Genomics & Informatics are discussed both individually and collectively.
《基因组学与信息学》(NLM名称简称:Genomics Inform)是韩国基因组学会的官方刊物。在此,我们对《基因组学与信息学》自创刊以来16年来的出版物进行了统计分析,特别关注与文章类别、词云和研究最多的基因有关的问题,并借鉴了最近对期刊文章中词频使用的回顾。在基因组学和信息学上发表的研究趋势分别和集体讨论。
{"title":"Trends in Genomics & Informatics: a statistical review of publications from 2003 to 2018 focusing on the most-studied genes and document clusters","authors":"Jihyeon Kim, Hee-Jo Nam, Hyun-Seok Park","doi":"10.5808/GI.2019.17.3.e25","DOIUrl":"https://doi.org/10.5808/GI.2019.17.3.e25","url":null,"abstract":"Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Herein, we conduct a statistical analysis of the publications of Genomics & Informatics over the 16 years since its inception, with a particular focus on issues relating to article categories, word clouds, and the most-studied genes, drawing on recent reviews of the use of word frequencies in journal articles. Trends in the studies published in Genomics & Informatics are discussed both individually and collectively.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44453929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimization of a microarray for fission yeast 裂变酵母微阵列的优化
Pub Date : 2019-09-01 DOI: 10.5808/GI.2019.17.3.e28
Dong-Uk Kim, Minho Lee, Sangjo Han, Miyoung Nam, Sol Lee, Jaewoong Lee, Jihye Woo, Dongsup Kim, K. Hoe
Bar-code (tag) microarrays of yeast gene-deletion collections facilitate the systematic identification of genes required for growth in any condition of interest. Anti-sense strands of amplified bar-codes hybridize with ~10,000 (5,000 each for up- and down-tags) different kinds of sense-strand probes on an array. In this study, we optimized the hybridization processes of an array for fission yeast. Compared to the first version of the array (11 µm, 100K) consisting of three sectors with probe pairs (perfect match and mismatch), the second version (11 µm, 48K) could represent ~10,000 up-/down-tags in quadruplicate along with 1,508 negative controls in quadruplicate and a single set of 1,000 unique negative controls at random dispersed positions without mismatch pairs. For PCR, the optimal annealing temperature (maximizing yield and minimizing extra bands) was 58℃ for both tags. Intriguingly, up-tags required 3× higher amounts of blocking oligonucleotides than down-tags. A 1:1 mix ratio between up- and down-tags was satisfactory. A lower temperature (25℃) was optimal for cultivation instead of a normal temperature (30℃) because of extra temperature-sensitive mutants in a subset of the deletion library. Activation of frozen pooled cells for >1 day showed better resolution of intensity than no activation. A tag intensity analysis showed that tag(s) of 4,316 of the 4,526 strains tested were represented at least once; 3,706 strains were represented by both tags, 4,072 strains by up-tags only, and 3,950 strains by down-tags only. The results indicate that this microarray will be a powerful analytical platform for elucidating currently unknown gene functions.
酵母基因缺失集合的条形码(标签)微阵列促进了在任何感兴趣的条件下生长所需基因的系统鉴定。扩增条形码的反义链在阵列上与约10,000种不同的义链探针杂交(上下标签各5,000种)。在这项研究中,我们优化了一组裂变酵母的杂交过程。与第一个版本的阵列(11µm, 100K)由三个带探针对(完美匹配和不匹配)的扇区组成相比,第二个版本(11µm, 48K)可以代表约10,000个四副本的上/下标签,以及1,508个四副本的阴性对照,以及一组1,000个随机分散位置的唯一阴性对照,没有不匹配对。对于PCR,两个标签的最佳退火温度(产量最大化和额外条带最小化)为58℃。有趣的是,上标签比下标签需要3倍多的阻断寡核苷酸。上下标签之间的1:1混合比例是令人满意的。较低的温度(25℃)比常温(30℃)更适合培养,因为缺失文库的一个子集中有额外的温度敏感突变体。冷冻池细胞激活bbb10 1 d的分辨率比未激活的高。标记强度分析显示,在4526株菌株中,4316株的标记至少出现一次;3706株为双标签,4072株为单标签,3950株为单标签。结果表明,该微阵列将成为一个强大的分析平台,阐明目前未知的基因功能。
{"title":"Optimization of a microarray for fission yeast","authors":"Dong-Uk Kim, Minho Lee, Sangjo Han, Miyoung Nam, Sol Lee, Jaewoong Lee, Jihye Woo, Dongsup Kim, K. Hoe","doi":"10.5808/GI.2019.17.3.e28","DOIUrl":"https://doi.org/10.5808/GI.2019.17.3.e28","url":null,"abstract":"Bar-code (tag) microarrays of yeast gene-deletion collections facilitate the systematic identification of genes required for growth in any condition of interest. Anti-sense strands of amplified bar-codes hybridize with ~10,000 (5,000 each for up- and down-tags) different kinds of sense-strand probes on an array. In this study, we optimized the hybridization processes of an array for fission yeast. Compared to the first version of the array (11 µm, 100K) consisting of three sectors with probe pairs (perfect match and mismatch), the second version (11 µm, 48K) could represent ~10,000 up-/down-tags in quadruplicate along with 1,508 negative controls in quadruplicate and a single set of 1,000 unique negative controls at random dispersed positions without mismatch pairs. For PCR, the optimal annealing temperature (maximizing yield and minimizing extra bands) was 58℃ for both tags. Intriguingly, up-tags required 3× higher amounts of blocking oligonucleotides than down-tags. A 1:1 mix ratio between up- and down-tags was satisfactory. A lower temperature (25℃) was optimal for cultivation instead of a normal temperature (30℃) because of extra temperature-sensitive mutants in a subset of the deletion library. Activation of frozen pooled cells for >1 day showed better resolution of intensity than no activation. A tag intensity analysis showed that tag(s) of 4,316 of the 4,526 strains tested were represented at least once; 3,706 strains were represented by both tags, 4,072 strains by up-tags only, and 3,950 strains by down-tags only. The results indicate that this microarray will be a powerful analytical platform for elucidating currently unknown gene functions.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42657769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of neoantigens derived from alternative splicing and RNA modification 选择性剪接和RNA修饰新抗原的鉴定
Pub Date : 2019-08-22 DOI: 10.5808/GI.2019.17.3.e23
Jiyeon Park, Y. Chung
The acquisition of somatic mutations is the most common event in cancer. Neoantigens expressed from genes with mutations acquired during carcinogenesis can be tumor-specific. Since the immune system recognizes tumor-specific peptides, they are potential targets for personalized neoantigen-based immunotherapy. However, the discovery of druggable neoantigens remains challenging, suggesting that a deeper understanding of the mechanism of neoantigen generation and better strategies to identify them will be required to realize the promise of neoantigen-based immunotherapy. Alternative splicing and RNA editing events are emerging mechanisms leading to neoantigen production. In this review, we outline recent work involving the large-scale screening of neoantigens produced by alternative splicing and RNA editing. We also describe strategies to predict and validate neoantigens from RNA sequencing data.
体细胞突变的获得是癌症中最常见的事件。在癌变过程中获得的突变基因表达的新抗原可能是肿瘤特异性的。由于免疫系统识别肿瘤特异性肽,它们是个性化新抗原免疫治疗的潜在靶点。然而,发现可用药的新抗原仍然具有挑战性,这表明需要更深入地了解新抗原产生的机制和更好的策略来识别它们,以实现基于新抗原的免疫治疗的希望。选择性剪接和RNA编辑事件是导致新抗原产生的新兴机制。在这篇综述中,我们概述了最近的工作涉及大规模筛选由选择性剪接和RNA编辑产生的新抗原。我们还描述了从RNA测序数据预测和验证新抗原的策略。
{"title":"Identification of neoantigens derived from alternative splicing and RNA modification","authors":"Jiyeon Park, Y. Chung","doi":"10.5808/GI.2019.17.3.e23","DOIUrl":"https://doi.org/10.5808/GI.2019.17.3.e23","url":null,"abstract":"The acquisition of somatic mutations is the most common event in cancer. Neoantigens expressed from genes with mutations acquired during carcinogenesis can be tumor-specific. Since the immune system recognizes tumor-specific peptides, they are potential targets for personalized neoantigen-based immunotherapy. However, the discovery of druggable neoantigens remains challenging, suggesting that a deeper understanding of the mechanism of neoantigen generation and better strategies to identify them will be required to realize the promise of neoantigen-based immunotherapy. Alternative splicing and RNA editing events are emerging mechanisms leading to neoantigen production. In this review, we outline recent work involving the large-scale screening of neoantigens produced by alternative splicing and RNA editing. We also describe strategies to predict and validate neoantigens from RNA sequencing data.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48834315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
FusionScan: accurate prediction of fusion genes from RNA-Seq data FusionScan:从RNA-Seq数据中准确预测融合基因
Pub Date : 2019-07-23 DOI: 10.5808/GI.2019.17.3.e26
P. Kim, Y. Jang, Sanghyuk Lee
Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.
融合基因由于其潜在的致癌驱动因素,在癌症研究领域具有重要意义。RNA测序(RNA-Seq)数据是鉴定融合转录物最有用的来源。尽管到目前为止已经开发了许多算法,但大多数程序都会产生太多的误报,因此几乎不可能进行实验验证。我们仍然缺乏一个可靠的程序,以实现高精度和合理的召回率。在这里,我们介绍了FusionScan,一种高度优化的工具,用于从RNA-Seq数据预测融合转录物。我们专门搜索由融合边界处的完整外显子组成的分裂读数。以269例已知融合病例为参考,我们实施了各种映射和过滤策略,以在不丢弃真正融合的情况下去除假阳性。在使用三个具有验证融合案例的细胞系数据集(NCI-H660、K562和MCF-7)的性能测试中,FusionScan的性能显著优于其他现有程序,准确率和召回率分别达到60%和79%。模拟测试还表明,无论测序深度和读取长度如何,FusionScan都能恢复大部分真阳性,而不会产生大量假阳性。计算时间与其他领先工具相当。我们还提供了几种治疗手段,帮助用户轻松调查融合候选者的细节。我们相信FusionScan将是一个可靠、高效和方便的程序,用于检测符合临床和实验社区要求的融合转录本。FusionScan可在http://fusionscan.ewha.ac.kr/.
{"title":"FusionScan: accurate prediction of fusion genes from RNA-Seq data","authors":"P. Kim, Y. Jang, Sanghyuk Lee","doi":"10.5808/GI.2019.17.3.e26","DOIUrl":"https://doi.org/10.5808/GI.2019.17.3.e26","url":null,"abstract":"Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46412581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards cross-platform interoperability for machine-assisted text annotation 面向机器辅助文本注释的跨平台互操作性
Pub Date : 2019-06-01 DOI: 10.5808/GI.2019.17.2.e19
Richard Eckart de Castilho, Nancy Ide, Jin-Dong Kim, Jan-Christoph Klie, Keith Suderman
In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.
在本文中,我们研究了自然语言处理(NLP)的跨平台互操作性,特别是文本资源的注释,着眼于识别注释模型和过程的设计元素,这些元素对于实现跨不同平台的无缝通信来说是特别有问题的,或者是可接受的。该研究是在特定注释方法的背景下进行的,即机器辅助交互式注释(也称为人在循环注释)。这种方法需要能够自由地组合来自不同文档存储库的资源,访问各种自动注释各种语言现象的NLP工具,并使用复杂的注释编辑器,使交互式手动注释与动态机器学习相结合。我们考虑三个独立开发的平台,每个平台都使用不同的模型来表示文本上的注释,每个平台在这个过程中都扮演不同的角色。
{"title":"Towards cross-platform interoperability for machine-assisted text annotation","authors":"Richard Eckart de Castilho, Nancy Ide, Jin-Dong Kim, Jan-Christoph Klie, Keith Suderman","doi":"10.5808/GI.2019.17.2.e19","DOIUrl":"https://doi.org/10.5808/GI.2019.17.2.e19","url":null,"abstract":"In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41498901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Resources for assigning MeSH IDs to Japanese medical terms 用于将MeSH ID指定为日语医学术语的资源
Pub Date : 2019-06-01 DOI: 10.5808/GI.2019.17.2.e16
Yuka Tateisi
Medical Subject Headings (MeSH), a medical thesaurus created by the National Library of Medicine (NLM), is a useful resource for natural language processing (NLP). In this article, the current status of the Japanese version of Medical Subject Headings (MeSH) is reviewed. Online investigation found that Japanese-English dictionaries, which assign MeSH information to applicable terms, but use them for NLP, were found to be difficult to access, due to license restrictions. Here, we investigate an open-source Japanese-English glossary as an alternative method for assigning MeSH IDs to Japanese terms, to obtain preliminary data for NLP proof-of-concept.
医学主题词库(MeSH)是美国国家医学图书馆(NLM)创建的一个医学词库,是自然语言处理(NLP)的有用资源。本文综述了日本医学学科标题(MeSH)的现状。在线调查发现,由于许可证限制,日英词典很难访问,这些词典将MeSH信息分配给适用的术语,但将其用于NLP。在这里,我们研究了一个开源的日语-英语词汇表,作为将MeSH ID分配给日语术语的替代方法,以获得NLP概念验证的初步数据。
{"title":"Resources for assigning MeSH IDs to Japanese medical terms","authors":"Yuka Tateisi","doi":"10.5808/GI.2019.17.2.e16","DOIUrl":"https://doi.org/10.5808/GI.2019.17.2.e16","url":null,"abstract":"Medical Subject Headings (MeSH), a medical thesaurus created by the National Library of Medicine (NLM), is a useful resource for natural language processing (NLP). In this article, the current status of the Japanese version of Medical Subject Headings (MeSH) is reviewed. Online investigation found that Japanese-English dictionaries, which assign MeSH information to applicable terms, but use them for NLP, were found to be difficult to access, due to license restrictions. Here, we investigate an open-source Japanese-English glossary as an alternative method for assigning MeSH IDs to Japanese terms, to obtain preliminary data for NLP proof-of-concept.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49429558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving spaCy dependency annotation and PoS tagging web service using independent NER services 使用独立的NER服务改进spaCy依赖性注释和PoS标记web服务
Pub Date : 2019-06-01 DOI: 10.5808/GI.2019.17.2.e21
N. Colic, Fabio Rinaldi
Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.
依赖关系解析经常被用作许多文本分析管道中的一个组件。然而,性能,尤其是在专业领域,由于存在复杂的术语而受到影响。我们的假设是,包含命名实体注释可以提高依赖解析的速度和质量。作为BLAH5的一部分,我们构建了一个web服务,通过考虑第三方服务获得的命名实体注释,提供改进的依赖解析。我们的评估显示了改进的结果和更好的速度。
{"title":"Improving spaCy dependency annotation and PoS tagging web service using independent NER services","authors":"N. Colic, Fabio Rinaldi","doi":"10.5808/GI.2019.17.2.e21","DOIUrl":"https://doi.org/10.5808/GI.2019.17.2.e21","url":null,"abstract":"Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44390963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts PharmacoNER Tagger:一个基于深度学习的工具,用于自动在西班牙医学文本中查找化学物质和药物
Pub Date : 2019-06-01 DOI: 10.5808/GI.2019.17.2.e15
Jordi Armengol-Estapé, Felipe Soares, M. Marimon, Martin Krallinger
Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).
自动检测药物和化学物质的提及是随后提取化学物质与其他生物医学实体(如基因、蛋白质、疾病、不良反应或症状)的关系的关键。识别药物提及也是复杂事件类型(如药物剂量识别、药物治疗持续时间或药物再利用)的先决步骤。从形式上讲,这项任务被称为命名实体识别(NER),意思是自动识别运行文本中感兴趣的预定义实体的提及。在医学文本领域,对于化学实体识别(CER),基于手工制定的规则和基于图形的模型的技术可以提供足够的性能。近年来,自然语言处理领域主要转向深度学习,大多数涉及自然语言的任务的最新结果通常是通过人工神经网络获得的。在英文医学文本中进行药物名称识别的竞争性资源已经可用,并得到了大量使用,而在西班牙语等其他语言中,这些工具虽然明显需要,但却缺失了。在这项工作中,我们将现有的神经网络反应器系统NeuroNER应用于西班牙临床病例文本的特定领域,并扩展神经网络,使其能够考虑除纯文本之外的其他特征。NeuroNER可以被视为西班牙语言技术进步国家计划(TL计划)推动的西班牙药物和CER的竞争性基线系统。
{"title":"PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts","authors":"Jordi Armengol-Estapé, Felipe Soares, M. Marimon, Martin Krallinger","doi":"10.5808/GI.2019.17.2.e15","DOIUrl":"https://doi.org/10.5808/GI.2019.17.2.e15","url":null,"abstract":"Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43537837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
Genomics & informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1