Text mining in genomics and systems biology

A. Valencia
{"title":"Text mining in genomics and systems biology","authors":"A. Valencia","doi":"10.1145/1458449.1458453","DOIUrl":null,"url":null,"abstract":"There is an increasing need of complementing the information available for the analysis of biological systems in Systems Biology and Genomics projects. A need that makes interesting the integration of information directly extracted from textual sources using Information Extraction and Text Mining approaches. My group has been working in developing Text Mining approaches and in their integration in large-scale projects together with other experimental and bioinformatics methods. In this occasion I will present the developments related with the characterization of the human mitotic spindle apparatus, developed in the context of the ENFIN NoE. For these, and other, applications it is crucial to have an accurate estimation of the capacity of the current Text Mining systems. The BioCreative II challenge organized by CNIO, MITRE and NCBI in collaboration with the MINT and INTACT databases (http://biocreative.sourceforge.net, Genome Biology, August 2008 Special Issue) provides such an overview. BioCreative II was in two task: 1) gene name identification and normalization, where many systems were able to achieve a consistent 80% balance precision / recall. And 2) protein interaction detection that was divided in four sub-tasks: a) ranking of publications by their relevance on experimental determination of protein interactions, b) detection of protein interaction partners in text, c) detection of key sentences describing protein interactions, and d) detection of the experimental technique used to determine the interactions. The results were quite good in the categories of publication raking, detection of experimental methods, and highlighting of relevant sentences, while they pointed to persistent problems in the correct normalization of gene/protein names. Furthermore BioCreative has channel the collaboration of several teams for the creation of the first Text Mining meta-server (The BioCreative Meta-server, Leitner et al., Genome Biology 2008 BioCreative special issue). We are working now in the preparation of BioCreative III, with particular focus in fostering the creation of Text Mining systems that can be integrated in Genome analysis pipelines, and contribute effectively to the understanding of complex Biological Systems.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1458449.1458453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

There is an increasing need of complementing the information available for the analysis of biological systems in Systems Biology and Genomics projects. A need that makes interesting the integration of information directly extracted from textual sources using Information Extraction and Text Mining approaches. My group has been working in developing Text Mining approaches and in their integration in large-scale projects together with other experimental and bioinformatics methods. In this occasion I will present the developments related with the characterization of the human mitotic spindle apparatus, developed in the context of the ENFIN NoE. For these, and other, applications it is crucial to have an accurate estimation of the capacity of the current Text Mining systems. The BioCreative II challenge organized by CNIO, MITRE and NCBI in collaboration with the MINT and INTACT databases (http://biocreative.sourceforge.net, Genome Biology, August 2008 Special Issue) provides such an overview. BioCreative II was in two task: 1) gene name identification and normalization, where many systems were able to achieve a consistent 80% balance precision / recall. And 2) protein interaction detection that was divided in four sub-tasks: a) ranking of publications by their relevance on experimental determination of protein interactions, b) detection of protein interaction partners in text, c) detection of key sentences describing protein interactions, and d) detection of the experimental technique used to determine the interactions. The results were quite good in the categories of publication raking, detection of experimental methods, and highlighting of relevant sentences, while they pointed to persistent problems in the correct normalization of gene/protein names. Furthermore BioCreative has channel the collaboration of several teams for the creation of the first Text Mining meta-server (The BioCreative Meta-server, Leitner et al., Genome Biology 2008 BioCreative special issue). We are working now in the preparation of BioCreative III, with particular focus in fostering the creation of Text Mining systems that can be integrated in Genome analysis pipelines, and contribute effectively to the understanding of complex Biological Systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基因组学和系统生物学中的文本挖掘
在系统生物学和基因组学项目中,对生物系统分析可用信息的补充需求日益增加。使用信息提取和文本挖掘方法直接从文本源中提取信息的集成需求非常有趣。我的团队一直致力于开发文本挖掘方法,并将其与其他实验和生物信息学方法集成到大型项目中。在这个场合,我将介绍与人类有丝分裂纺锤体表征有关的发展,在ENFIN NoE的背景下发展。对于这些和其他应用来说,准确估计当前文本挖掘系统的容量是至关重要的。由CNIO、MITRE和NCBI与MINT和完好无损数据库(http://biocreative.sourceforge.net,基因组生物学,2008年8月特刊)合作组织的BioCreative II挑战赛提供了这样一个概述。BioCreative II有两个任务:1)基因名称识别和规范化,其中许多系统能够达到一致的80%的平衡精度/召回率。2)蛋白质相互作用检测,分为四个子任务:a)根据它们与蛋白质相互作用实验测定的相关性对出版物进行排序,b)检测文本中的蛋白质相互作用伙伴,c)检测描述蛋白质相互作用的关键句子,d)检测用于确定相互作用的实验技术。在出版物排名、实验方法检测和相关句子的突出显示方面,结果相当不错,但它们指出了基因/蛋白质名称正确规范化方面存在的持续问题。此外,BioCreative还引导了几个团队的合作,创建了第一个文本挖掘元服务器(the BioCreative元服务器,Leitner等人,Genome Biology 2008 BioCreative特刊)。我们目前正在筹备BioCreative III,重点是促进文本挖掘系统的创建,这些系统可以集成到基因组分析管道中,并有效地促进对复杂生物系统的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Construction of Multi-level Networks Incorporating Molecule, Cell, Organ and Phenotype Properties for Drug-induced Phenotype Prediction Integrative Database for Exploring Compound Combinations of Natural Products for Medical Effects TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1