Background: PubMed is designed to provide rapid, comprehensive retrieval of papers that discuss a given topic. However, because PubMed does not organize the search output further, it is difficult for users to grasp an overview of the retrieved literature according to non-topical dimensions, to drill-down to find individual articles relevant to a particular individual's need, or to browse the collection.
Results: In this paper, we present Anne O'Tate, a web-based tool that processes articles retrieved from PubMed and displays multiple aspects of the articles to the user, according to pre-defined categories such as the "most important" words found in titles or abstracts; topics; journals; authors; publication years; and affiliations. Clicking on a given item opens a new window that displays all papers that contain that item. One can navigate by drilling down through the categories progressively, e.g., one can first restrict the articles according to author name and then restrict that subset by affiliation. Alternatively, one can expand small sets of articles to display the most closely related articles. We also implemented a novel cluster-by-topic method that generates a concise set of topics covering most of the retrieved articles.
Conclusion: Anne O'Tate is an integrated, generic tool for summarization, drill-down and browsing of PubMed search results that accommodates a wide range of biomedical users and needs. It can be accessed at 4. Peer review and editorial matters for this article were handled by Aaron Cohen.
{"title":"Anne O'Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results.","authors":"Neil R Smalheiser, Wei Zhou, Vetle I Torvik","doi":"10.1186/1747-5333-3-2","DOIUrl":"https://doi.org/10.1186/1747-5333-3-2","url":null,"abstract":"<p><strong>Background: </strong>PubMed is designed to provide rapid, comprehensive retrieval of papers that discuss a given topic. However, because PubMed does not organize the search output further, it is difficult for users to grasp an overview of the retrieved literature according to non-topical dimensions, to drill-down to find individual articles relevant to a particular individual's need, or to browse the collection.</p><p><strong>Results: </strong>In this paper, we present Anne O'Tate, a web-based tool that processes articles retrieved from PubMed and displays multiple aspects of the articles to the user, according to pre-defined categories such as the \"most important\" words found in titles or abstracts; topics; journals; authors; publication years; and affiliations. Clicking on a given item opens a new window that displays all papers that contain that item. One can navigate by drilling down through the categories progressively, e.g., one can first restrict the articles according to author name and then restrict that subset by affiliation. Alternatively, one can expand small sets of articles to display the most closely related articles. We also implemented a novel cluster-by-topic method that generates a concise set of topics covering most of the retrieved articles.</p><p><strong>Conclusion: </strong>Anne O'Tate is an integrated, generic tool for summarization, drill-down and browsing of PubMed search results that accommodates a wide range of biomedical users and needs. It can be accessed at 4. Peer review and editorial matters for this article were handled by Aaron Cohen.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-3-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27268810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William A Baumgartner, K Bretonnel Cohen, Lawrence Hunter
Background: Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain.
Results: Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision.
Conclusion: The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net.
{"title":"An open-source framework for large-scale, flexible evaluation of biomedical text mining systems.","authors":"William A Baumgartner, K Bretonnel Cohen, Lawrence Hunter","doi":"10.1186/1747-5333-3-1","DOIUrl":"https://doi.org/10.1186/1747-5333-3-1","url":null,"abstract":"<p><strong>Background: </strong>Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain.</p><p><strong>Results: </strong>Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision.</p><p><strong>Conclusion: </strong>The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-3-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27225842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Biological organisms and their components are better conceived within categories based on similarity rather than on identity. Biologists routinely operate with similarity-based concepts such as "model organism" and "motif." There has been little exploration of the characteristics of the similarity-based categories that exist in biology. This study uses the case of the discovery and classification of zinc finger proteins to explore how biological categories based in similarity are represented.
Results: The existence of a category of "zinc finger proteins" was based in 1) a lumpy gradient of similarity, 2) a link between function and structure, 3) establishment of a range of appearance across systems and organisms, and 4) an evolutionary locus as a historically based common-ground.
Conclusion: More systematic application of the idea of similarity-based categorization might eliminate the assumption that biological characteristics can only contribute to narrow categorization of humans. It also raises possibilities for refining data-driven exploration efforts.
{"title":"Generalization through similarity: motif discourse in the discovery and elaboration of zinc finger proteins.","authors":"Celeste Michelle Condit, L Bruce Railsback","doi":"10.1186/1747-5333-2-5","DOIUrl":"https://doi.org/10.1186/1747-5333-2-5","url":null,"abstract":"<p><strong>Background: </strong>Biological organisms and their components are better conceived within categories based on similarity rather than on identity. Biologists routinely operate with similarity-based concepts such as \"model organism\" and \"motif.\" There has been little exploration of the characteristics of the similarity-based categories that exist in biology. This study uses the case of the discovery and classification of zinc finger proteins to explore how biological categories based in similarity are represented.</p><p><strong>Results: </strong>The existence of a category of \"zinc finger proteins\" was based in 1) a lumpy gradient of similarity, 2) a link between function and structure, 3) establishment of a range of appearance across systems and organisms, and 4) an evolutionary locus as a historically based common-ground.</p><p><strong>Conclusion: </strong>More systematic application of the idea of similarity-based categorization might eliminate the assumption that biological characteristics can only contribute to narrow categorization of humans. It also raises possibilities for refining data-driven exploration efforts.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-2-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27029359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helen L Johnson, William A Baumgartner, Martin Krallinger, K Bretonnel Cohen, Lawrence Hunter
Background: Most biomedical corpora have not been used outside of the lab that created them, despite the fact that the availability of the gold-standard evaluation data that they provide is one of the rate-limiting factors for the progress of biomedical text mining. Data suggest that one major factor affecting the use of a corpus outside of its home laboratory is the format in which it is distributed. This paper tests the hypothesis that corpus refactoring - changing the format of a corpus without altering its semantics - is a feasible goal, namely that it can be accomplished with a semi-automatable process and in a time-effcient way. We used simple text processing methods and limited human validation to convert the Protein Design Group corpus into two new formats: WordFreak and embedded XML. We tracked the total time expended and the success rates of the automated steps.
Results: The refactored corpus is available for download at the BioNLP SourceForge website http://bionlp.sourceforge.net. The total time expended was just over three person-weeks, consisting of about 102 hours of programming time (much of which is one-time development cost) and 20 hours of manual validation of automatic outputs. Additionally, the steps required to refactor any corpus are presented.
Conclusion: We conclude that refactoring of publicly available corpora is a technically and economically feasible method for increasing the usage of data already available for evaluating biomedical language processing systems.
{"title":"Corpus refactoring: a feasibility study.","authors":"Helen L Johnson, William A Baumgartner, Martin Krallinger, K Bretonnel Cohen, Lawrence Hunter","doi":"10.1186/1747-5333-2-4","DOIUrl":"https://doi.org/10.1186/1747-5333-2-4","url":null,"abstract":"<p><strong>Background: </strong>Most biomedical corpora have not been used outside of the lab that created them, despite the fact that the availability of the gold-standard evaluation data that they provide is one of the rate-limiting factors for the progress of biomedical text mining. Data suggest that one major factor affecting the use of a corpus outside of its home laboratory is the format in which it is distributed. This paper tests the hypothesis that corpus refactoring - changing the format of a corpus without altering its semantics - is a feasible goal, namely that it can be accomplished with a semi-automatable process and in a time-effcient way. We used simple text processing methods and limited human validation to convert the Protein Design Group corpus into two new formats: WordFreak and embedded XML. We tracked the total time expended and the success rates of the automated steps.</p><p><strong>Results: </strong>The refactored corpus is available for download at the BioNLP SourceForge website http://bionlp.sourceforge.net. The total time expended was just over three person-weeks, consisting of about 102 hours of programming time (much of which is one-time development cost) and 20 hours of manual validation of automatic outputs. Additionally, the steps required to refactor any corpus are presented.</p><p><strong>Conclusion: </strong>We conclude that refactoring of publicly available corpora is a technically and economically feasible method for increasing the usage of data already available for evaluating biomedical language processing systems.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-2-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40962167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nanotechnology research has lately been of intense interest because of its perceived potential for many diverse fields of science. Nanotechnology's tools have found application in diverse fields, from biology to device physics. By the 1990s, there was a concerted effort in the United States to develop a national initiative to promote such research. The success of this effort led to a significant influx of resources and interest in nanotechnology and nanobiotechnology and to the establishment of centralized research programs and facilities. Further government initiatives (at federal, state, and local levels) have firmly cemented these disciplines as 'big science,' with efforts increasingly concentrated at select laboratories and centers. In many respects, these trends mirror certain changes in academic science over the past twenty years, with a greater emphasis on applied science and research that can be more directly utilized for commercial applications.We also compare the National Nanotechnology Initiative and its successors to the Human Genome Project, another large-scale, government funded initiative. These precedents made acceptance of shifts in nanotechnology easier for researchers to accept, as they followed trends already established within most fields of science. Finally, these trends are examined in the design of technologies for detection and treatment of cancer, through the Alliance for Nanotechnology in Cancer initiative of the National Cancer Institute. Federal funding of these nanotechnology initiatives has allowed for expansion into diverse fields and the impetus for expanding the scope of research of several fields, especially biomedicine, though the ultimate utility and impact of all these efforts remains to be seen.
{"title":"Nano-Bio-Genesis: tracing the rise of nanotechnology and nanobiotechnology as 'big science'.","authors":"Rajan P Kulkarni","doi":"10.1186/1747-5333-2-3","DOIUrl":"https://doi.org/10.1186/1747-5333-2-3","url":null,"abstract":"<p><p> Nanotechnology research has lately been of intense interest because of its perceived potential for many diverse fields of science. Nanotechnology's tools have found application in diverse fields, from biology to device physics. By the 1990s, there was a concerted effort in the United States to develop a national initiative to promote such research. The success of this effort led to a significant influx of resources and interest in nanotechnology and nanobiotechnology and to the establishment of centralized research programs and facilities. Further government initiatives (at federal, state, and local levels) have firmly cemented these disciplines as 'big science,' with efforts increasingly concentrated at select laboratories and centers. In many respects, these trends mirror certain changes in academic science over the past twenty years, with a greater emphasis on applied science and research that can be more directly utilized for commercial applications.We also compare the National Nanotechnology Initiative and its successors to the Human Genome Project, another large-scale, government funded initiative. These precedents made acceptance of shifts in nanotechnology easier for researchers to accept, as they followed trends already established within most fields of science. Finally, these trends are examined in the design of technologies for detection and treatment of cancer, through the Alliance for Nanotechnology in Cancer initiative of the National Cancer Institute. Federal funding of these nanotechnology initiatives has allowed for expansion into diverse fields and the impetus for expanding the scope of research of several fields, especially biomedicine, though the ultimate utility and impact of all these efforts remains to be seen.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-2-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26830006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristina M Hettne, Marissa de Mos, Anke G J de Bruijn, Marc Weeber, Scott Boyer, Erik M van Mulligen, Montserrat Cases, Jordi Mestres, Johan van der Lei
Background: Collaborative efforts of physicians and basic scientists are often necessary in the investigation of complex disorders. Difficulties can arise, however, when large amounts of information need to reviewed. Advanced information retrieval can be beneficial in combining and reviewing data obtained from the various scientific fields. In this paper, a team of investigators with varying backgrounds has applied advanced information retrieval methods, in the form of text mining and entity relationship tools, to review the current literature, with the intention to generate new insights into the molecular mechanisms underlying a complex disorder. As an example of such a disorder the Complex Regional Pain Syndrome (CRPS) was chosen. CRPS is a painful and debilitating syndrome with a complex etiology that is still unraveled for a considerable part, resulting in suboptimal diagnosis and treatment.
Results: A text mining based approach combined with a simple network analysis identified Nuclear Factor kappa B (NFkappaB) as a possible central mediator in both the initiation and progression of CRPS.
Conclusion: The result shows the added value of a multidisciplinary approach combined with information retrieval in hypothesis discovery in biomedical research. The new hypothesis, which was derived in silico, provides a framework for further mechanistic studies into the underlying molecular mechanisms of CRPS and requires evaluation in clinical and epidemiological studies.
背景:在复杂疾病的调查中,医生和基础科学家的合作努力往往是必要的。但是,当需要审查大量信息时,就会出现困难。先进的信息检索技术有助于整合和审查从各个科学领域获得的数据。在本文中,一组具有不同背景的研究人员应用了先进的信息检索方法,以文本挖掘和实体关系工具的形式,来回顾当前的文献,以期对复杂疾病的分子机制产生新的见解。作为这种疾病的一个例子,复杂区域疼痛综合征(CRPS)被选择。CRPS是一种痛苦和使人衰弱的综合征,其病因复杂,在很大程度上仍未解开,导致诊断和治疗不理想。结果:基于文本挖掘的方法结合简单的网络分析确定了核因子κ B (NFkappaB)可能是CRPS发生和进展的中心介质。结论:多学科结合信息检索方法在生物医学研究假设发现中的附加价值。这一新假设来源于计算机,为进一步研究CRPS的潜在分子机制提供了一个框架,需要在临床和流行病学研究中进行评估。
{"title":"Applied information retrieval and multidisciplinary research: new mechanistic hypotheses in complex regional pain syndrome.","authors":"Kristina M Hettne, Marissa de Mos, Anke G J de Bruijn, Marc Weeber, Scott Boyer, Erik M van Mulligen, Montserrat Cases, Jordi Mestres, Johan van der Lei","doi":"10.1186/1747-5333-2-2","DOIUrl":"https://doi.org/10.1186/1747-5333-2-2","url":null,"abstract":"<p><strong>Background: </strong>Collaborative efforts of physicians and basic scientists are often necessary in the investigation of complex disorders. Difficulties can arise, however, when large amounts of information need to reviewed. Advanced information retrieval can be beneficial in combining and reviewing data obtained from the various scientific fields. In this paper, a team of investigators with varying backgrounds has applied advanced information retrieval methods, in the form of text mining and entity relationship tools, to review the current literature, with the intention to generate new insights into the molecular mechanisms underlying a complex disorder. As an example of such a disorder the Complex Regional Pain Syndrome (CRPS) was chosen. CRPS is a painful and debilitating syndrome with a complex etiology that is still unraveled for a considerable part, resulting in suboptimal diagnosis and treatment.</p><p><strong>Results: </strong>A text mining based approach combined with a simple network analysis identified Nuclear Factor kappa B (NFkappaB) as a possible central mediator in both the initiation and progression of CRPS.</p><p><strong>Conclusion: </strong>The result shows the added value of a multidisciplinary approach combined with information retrieval in hypothesis discovery in biomedical research. The new hypothesis, which was derived in silico, provides a framework for further mechanistic studies into the underlying molecular mechanisms of CRPS and requires evaluation in clinical and epidemiological studies.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-2-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26705394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data management and integration are complicated and ongoing problems that will require commitment of resources and expertise from the various biological science communities. Primary components of successful cross-scale integration are smooth information management and migration from one context to another. We call for a broadening of the definition of bioinformatics and bioinformatics training to span biological disciplines and biological scales. Training programs are needed that educate a new kind of informatics professional, Biological Information Specialists, to work in collaboration with various discipline-specific research personnel. Biological Information Specialists are an extension of the informationist movement that began within library and information science (LIS) over 30 years ago as a professional position to fill a gap in clinical medicine. These professionals will help advance science by improving access to scientific information and by freeing scientists who are not interested in data management to concentrate on their science.
{"title":"Biological information specialists for biological informatics.","authors":"P Bryan Heidorn, Carole L Palmer, Dan Wright","doi":"10.1186/1747-5333-2-1","DOIUrl":"https://doi.org/10.1186/1747-5333-2-1","url":null,"abstract":"<p><p>Data management and integration are complicated and ongoing problems that will require commitment of resources and expertise from the various biological science communities. Primary components of successful cross-scale integration are smooth information management and migration from one context to another. We call for a broadening of the definition of bioinformatics and bioinformatics training to span biological disciplines and biological scales. Training programs are needed that educate a new kind of informatics professional, Biological Information Specialists, to work in collaboration with various discipline-specific research personnel. Biological Information Specialists are an extension of the informationist movement that began within library and information science (LIS) over 30 years ago as a professional position to fill a gap in clinical medicine. These professionals will help advance science by improving access to scientific information and by freeing scientists who are not interested in data management to concentrate on their science.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-2-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26549396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco M Couto, Mário J Silva, Vivian Lee, Emily Dimmer, Evelyn Camon, Rolf Apweiler, Harald Kirsch, Dietrich Rebholz-Schuhmann
Background: Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators.
Results: In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins.
Conclusion: The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one to achieve high precision, which is crucial for the efficient support of GO curators. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.
{"title":"GOAnnotator: linking protein GO annotations to evidence text.","authors":"Francisco M Couto, Mário J Silva, Vivian Lee, Emily Dimmer, Evelyn Camon, Rolf Apweiler, Harald Kirsch, Dietrich Rebholz-Schuhmann","doi":"10.1186/1747-5333-1-19","DOIUrl":"https://doi.org/10.1186/1747-5333-1-19","url":null,"abstract":"<p><strong>Background: </strong>Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators.</p><p><strong>Results: </strong>In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins.</p><p><strong>Conclusion: </strong>The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one to achieve high precision, which is crucial for the efficient support of GO curators. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-1-19","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26454294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The annual James Arthur lecture series on the Evolution of the Human Brain was inaugurated at the American Museum of Natural History in 1932, through a bequest from a successful manufacturer with a particular interest in mechanisms. Karl Pribram's thirty-ninth lecture of the series, delivered in 1970, was a seminal event that heralded much of the research agenda, since pursued by representatives of diverse disciplines, that touches on the evolution of human uniqueness.
Discussion: In his James Arthur lecture Pribram raised questions about the coding of information in the brain and about the complex association between language, symbol, and the unique human cognitive system. These questions are as pertinent today as in 1970. The emergence of modern human symbolic cognition is often viewed as a gradual, incremental process, governed by inexorable natural selection and propelled by the apparent advantages of increasing intelligence. However, there are numerous theoretical considerations that render such a scenario implausible, and an examination of the pattern of acquisition of behavioral and anatomical novelties in human evolution indicates that, throughout, major change was both sporadic and rare. What is more, modern bony anatomy and brain size were apparently both achieved well before we have any evidence for symbolic behavior patterns. This suggests that the biological substrate underlying the symbolic thought that is so distinctive of Homo sapiens today was exaptively achieved, long before its potential was actually put to use. In which case we need to look for the agent, perforce a cultural one, that stimulated the adoption of symbolic thought patterns. That stimulus may well have been the spontaneous invention of articulate language.
{"title":"Karl Pribram, The James Arthur lectures, and what makes us human.","authors":"Ian Tattersall","doi":"10.1186/1747-5333-1-15","DOIUrl":"https://doi.org/10.1186/1747-5333-1-15","url":null,"abstract":"<p><strong>Background: </strong>The annual James Arthur lecture series on the Evolution of the Human Brain was inaugurated at the American Museum of Natural History in 1932, through a bequest from a successful manufacturer with a particular interest in mechanisms. Karl Pribram's thirty-ninth lecture of the series, delivered in 1970, was a seminal event that heralded much of the research agenda, since pursued by representatives of diverse disciplines, that touches on the evolution of human uniqueness.</p><p><strong>Discussion: </strong>In his James Arthur lecture Pribram raised questions about the coding of information in the brain and about the complex association between language, symbol, and the unique human cognitive system. These questions are as pertinent today as in 1970. The emergence of modern human symbolic cognition is often viewed as a gradual, incremental process, governed by inexorable natural selection and propelled by the apparent advantages of increasing intelligence. However, there are numerous theoretical considerations that render such a scenario implausible, and an examination of the pattern of acquisition of behavioral and anatomical novelties in human evolution indicates that, throughout, major change was both sporadic and rare. What is more, modern bony anatomy and brain size were apparently both achieved well before we have any evidence for symbolic behavior patterns. This suggests that the biological substrate underlying the symbolic thought that is so distinctive of Homo sapiens today was exaptively achieved, long before its potential was actually put to use. In which case we need to look for the agent, perforce a cultural one, that stimulated the adoption of symbolic thought patterns. That stimulus may well have been the spontaneous invention of articulate language.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-1-15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26413299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In 1970, Karl Pribram took on the immense challenge of asking the question, what makes us human? Nearly four decades later, the most significant finding has been the undeniable realization of how incredibly subtle and fine-scaled the unique biological features of our species must be. The recent explosion in the availability of large-scale sequence data, however, and the consequent emergence of comparative genomics, are rapidly transforming the study of human evolution. The field of comparative genomics is allowing us to reach unparalleled resolution, reframing our questions in reference to DNA sequence--the very unit that evolution operates on. But like any reductionist approach, it comes at a price. Comparative genomics may provide the necessary resolution for identifying rare DNA sequence differences in a vast sea of conservation, but ultimately we will have to face the challenge of figuring out how DNA sequence divergence translates into phenotypic divergence. Our goal here is to provide a brief outline of the major findings made in the study of human brain evolution since the Pribram lecture, focusing specifically on the field of comparative genomics. We then discuss the broader implications of these findings and the future challenges that are in store.
{"title":"What makes us human: revisiting an age-old question in the genomic era.","authors":"Nitzan Mekel-Bobrov, Bruce T Lahn","doi":"10.1186/1747-5333-1-18","DOIUrl":"https://doi.org/10.1186/1747-5333-1-18","url":null,"abstract":"<p><p>In 1970, Karl Pribram took on the immense challenge of asking the question, what makes us human? Nearly four decades later, the most significant finding has been the undeniable realization of how incredibly subtle and fine-scaled the unique biological features of our species must be. The recent explosion in the availability of large-scale sequence data, however, and the consequent emergence of comparative genomics, are rapidly transforming the study of human evolution. The field of comparative genomics is allowing us to reach unparalleled resolution, reframing our questions in reference to DNA sequence--the very unit that evolution operates on. But like any reductionist approach, it comes at a price. Comparative genomics may provide the necessary resolution for identifying rare DNA sequence differences in a vast sea of conservation, but ultimately we will have to face the challenge of figuring out how DNA sequence divergence translates into phenotypic divergence. Our goal here is to provide a brief outline of the major findings made in the study of human brain evolution since the Pribram lecture, focusing specifically on the field of comparative genomics. We then discuss the broader implications of these findings and the future challenges that are in store.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1747-5333-1-18","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26413216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}