首页 > 最新文献

Journal of biomedical discovery and collaboration最新文献

英文 中文
Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation. 医学主题词的两个相似度度量:对生物医学文本挖掘和作者姓名消歧的帮助。
Pub Date : 2016-04-06 DOI: 10.5210/disco.v7i0.6654
Neil R Smalheiser, Gary Bonifield

In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same individual (using the 2009 Author-ity author name disambiguation dataset as a gold standard). The two metrics are only modestly correlated with each other (r = 0.50), indicating that they capture different aspects of term usage. The article-based metric provides a measure of semantic relatedness, and MeSH term pairs that co-occur more often than expected by chance may reflect relations between the two terms. In contrast, the author metric is indicative of how individuals practice science, and may have value for author name disambiguation and studies of scientific discovery. We have calculated article metrics for all MeSH terms appearing in at least 25 articles in MEDLINE (as of 2014) and author metrics for MeSH terms published as of 2009. The dataset is freely available for download and can be queried at http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html. Handling editor: Elizabeth Workman, MLIS, PhD.

在本文中,我们已经创建并表征了几个相似性指标,用于将任何两个医学主题词(MeSH术语)相互关联。基于文章的度量衡量两个MeSH术语在同一篇文章的MEDLINE记录中出现的趋势。基于作者的指标衡量两个MeSH术语出现在同一个人撰写的文章主体中的趋势(使用2009年author- authority作者姓名消歧数据集作为黄金标准)。这两个指标彼此之间只有适度的相关性(r = 0.50),这表明它们捕获了术语使用的不同方面。基于文章的度量提供了语义相关性的度量,同时出现的MeSH术语对可能反映了两个术语之间的关系。相比之下,作者度量指标表明个人如何从事科学研究,可能对作者姓名歧义消除和科学发现研究有价值。我们计算了MEDLINE中至少25篇文章中出现的所有MeSH术语(截至2014年)的文章指标,以及截至2009年发表的MeSH术语的作者指标。该数据集可免费下载,并可在http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html上查询。代办编辑:Elizabeth Workman, MLIS,博士。
{"title":"Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.","authors":"Neil R Smalheiser,&nbsp;Gary Bonifield","doi":"10.5210/disco.v7i0.6654","DOIUrl":"https://doi.org/10.5210/disco.v7i0.6654","url":null,"abstract":"<p><p>In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same individual (using the 2009 Author-ity author name disambiguation dataset as a gold standard). The two metrics are only modestly correlated with each other (r = 0.50), indicating that they capture different aspects of term usage. The article-based metric provides a measure of semantic relatedness, and MeSH term pairs that co-occur more often than expected by chance may reflect relations between the two terms. In contrast, the author metric is indicative of how individuals practice science, and may have value for author name disambiguation and studies of scientific discovery. We have calculated article metrics for all MeSH terms appearing in at least 25 articles in MEDLINE (as of 2014) and author metrics for MeSH terms published as of 2009. The dataset is freely available for download and can be queried at http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html. Handling editor: Elizabeth Workman, MLIS, PhD. </p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5210/disco.v7i0.6654","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34573326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The language of discovery. 发现的语言
Pub Date : 2011-06-17 DOI: 10.5210/disco.v6i0.3634
Wiley Souba

Discovery, as a public attribution, and discovering, the act of conducting research, are experiences that entail "languaging" the unknown. This distinguishing property of language - its ability to bring forth, out of the unspoken realm, new knowledge, original ideas, and novel thinking - is essential to the discovery process. In sharing their ideas and views, scientists create co-negotiated linguistic distinctions that prompt the revision of established mental maps and the adoption of new ones. While scientific mastery entails command of the conversational domain unique to a specific discipline, there is an emerging conversational domain that must be mastered that goes beyond the language unique to any particular specialty. Mastery of this new conversational domain gives researchers access to their hidden mental maps that limit their ways of thinking about and doing science. The most effective scientists use language to recontextualize their approach to problem-solving, which triggers new insights (previously unavailable) that result in new discoveries. While language is not a replacement for intuition and other means of knowing, when we try to understand what's outside of language we have to use language to do so.

发现,作为一种公共属性,和发现,即开展研究的行为,都是需要用 "语言 "来表达 未知事物的经历。语言的这一显著特点--它能够从无言的领域中产生新的知识、独创的想法和新颖的思维--对于发现过程至关重要。在分享他们的想法和观点时,科学家们创造了共同商定的语言区别,促使人们修正既有的思维导图,并采用新的思维导图。虽然掌握科学知识需要掌握特定学科特有的会话领域,但有一个新兴的会话领域必须掌握,它超越了任何特定专业的特有语言。掌握了这一新的会话领域,研究人员就能接触到他们隐藏的思维图谱,这些思维图谱限制了他们思考和从事科学研究的方式。最有效的科学家使用语言来重新构建他们解决问题的方法,从而引发新的见解(以前没有的),产生新的发现。虽然语言不能取代直觉和其他认知方式,但当我们试图理解语言之外的东西时,我们必须使用语言来理解。
{"title":"The language of discovery.","authors":"Wiley Souba","doi":"10.5210/disco.v6i0.3634","DOIUrl":"10.5210/disco.v6i0.3634","url":null,"abstract":"<p><p>Discovery, as a public attribution, and discovering, the act of conducting research, are experiences that entail \"languaging\" the unknown. This distinguishing property of language - its ability to bring forth, out of the unspoken realm, new knowledge, original ideas, and novel thinking - is essential to the discovery process. In sharing their ideas and views, scientists create co-negotiated linguistic distinctions that prompt the revision of established mental maps and the adoption of new ones. While scientific mastery entails command of the conversational domain unique to a specific discipline, there is an emerging conversational domain that must be mastered that goes beyond the language unique to any particular specialty. Mastery of this new conversational domain gives researchers access to their hidden mental maps that limit their ways of thinking about and doing science. The most effective scientists use language to recontextualize their approach to problem-solving, which triggers new insights (previously unavailable) that result in new discoveries. While language is not a replacement for intuition and other means of knowing, when we try to understand what's outside of language we have to use language to do so.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3139986/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30253163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias associated with mining electronic health records. 与挖掘电子健康记录有关的偏见。
Pub Date : 2011-06-06 DOI: 10.5210/disco.v6i0.3581
George Hripcsak, Charles Knirsch, Li Zhou, Adam Wilcox, Genevieve Melton

Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.

与传统的人工策划的回顾性研究相比,大规模电子健康记录研究引入了偏见。我们使用了一项社区获得性肺炎研究的数据,我们有一个金标准来说明这种偏差。挑战包括数据不准确、不完整和复杂,它们可能产生扭曲的结果。我们发现naïve方法接近金标准,但少数病例的错误大大改变了死亡率。人工回顾揭示了选择和描述队列的错误,缩小队列可以改善结果。然而,一个明显缩小的队列可能包含其自身的难以估计的偏见。
{"title":"Bias associated with mining electronic health records.","authors":"George Hripcsak,&nbsp;Charles Knirsch,&nbsp;Li Zhou,&nbsp;Adam Wilcox,&nbsp;Genevieve Melton","doi":"10.5210/disco.v6i0.3581","DOIUrl":"https://doi.org/10.5210/disco.v6i0.3581","url":null,"abstract":"<p><p>Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5210/disco.v6i0.3581","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29918861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Literature-based Resurrection of Neglected Medical Discoveries. 以文学为基础的被忽视的医学发现的复活。
Pub Date : 2011-04-20 DOI: 10.5210/disco.v6i0.3515
Don R Swanson

It is possible to find in the medical literature many articles that have been neglected or ignored, in some cases for many years, but which are worth bringing to light because they report unusual findings that may be of current scientific interest. Resurrecting previously published but neglected hypotheses that have merit might be overlooked because it would seem to lack the novelty of "discovery" -- but the potential value of so doing is hardly arguable. Finding neglected hypotheses may be not only of great practical value, but also affords the opportunity to study the structure of such hypotheses in the hope of illuminating the more general problem of hypothesis generation.

在医学文献中有可能找到许多被忽视或忽视的文章,有些文章被忽视或忽视了很多年,但它们值得曝光,因为它们报告了可能引起当前科学兴趣的不寻常的发现。复活先前发表但被忽视的有价值的假设可能会被忽视,因为它似乎缺乏“发现”的新颖性——但这样做的潜在价值几乎是无可争议的。发现被忽视的假设可能不仅具有很大的实用价值,而且还提供了研究这些假设结构的机会,以期阐明更普遍的假设生成问题。
{"title":"Literature-based Resurrection of Neglected Medical Discoveries.","authors":"Don R Swanson","doi":"10.5210/disco.v6i0.3515","DOIUrl":"https://doi.org/10.5210/disco.v6i0.3515","url":null,"abstract":"<p><p>It is possible to find in the medical literature many articles that have been neglected or ignored, in some cases for many years, but which are worth bringing to light because they report unusual findings that may be of current scientific interest. Resurrecting previously published but neglected hypotheses that have merit might be overlooked because it would seem to lack the novelty of \"discovery\" -- but the potential value of so doing is hardly arguable. Finding neglected hypotheses may be not only of great practical value, but also affords the opportunity to study the structure of such hypotheses in the hope of illuminating the more general problem of hypothesis generation.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5210/disco.v6i0.3515","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29830973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology. 视觉分析工作流程的认知任务分析:探索系统生物学中的分子相互作用网络
Pub Date : 2011-03-21 DOI: 10.5210/disco.v6i0.3410
Barbara Mirel, Felix Eichinger, Benjamin J Keller, Matthias Kretzler

Background: Bioinformatics visualization tools are often not robust enough to support biomedical specialists’ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialist’s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease?

Results: From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow.

Conclusions: Our results imply that visualizations should make available to scientific users “bundles of features” consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.

背景:生物信息学可视化工具往往不够强大,无法支持生物医学专家进行复杂的探索性分析。这些工具需要适应科学家们针对特定转化研究问题实际执行的工作流程。为了了解这些工作流程并为其中一个流程建模,我们对生物医学专家的探索性工作流程进行了基于案例的认知任务分析:结果:通过认知任务分析,我们开发出了目标工作流程的四种互补表征。它们包括:使用场景、流程图、认知任务分类以及认知任务与以用户为中心的可视化需求之间的映射。这些表征捕捉了引导生物医学专家进行对假设至关重要的推断的认知任务流程。我们创建了可从战略上指导可视化开发的详细程度的表征,并根据用户对一小部分工作流程的要求制作了试验原型,从而证实了这一点:我们的研究结果表明,可视化应为科学用户提供 "功能组合",使其与工作流程中特定阶段有目的地执行的组合认知任务相一致。我们还强调了可视化的某些方面:(a) 需要更多的内置灵活性;(b) 对于协商意义至关重要;(c) 对于必要的元认知支持必不可少。
{"title":"A cognitive task analysis of a visual analytic workflow: Exploring molecular interaction networks in systems biology.","authors":"Barbara Mirel, Felix Eichinger, Benjamin J Keller, Matthias Kretzler","doi":"10.5210/disco.v6i0.3410","DOIUrl":"10.5210/disco.v6i0.3410","url":null,"abstract":"<p><strong>Background: </strong>Bioinformatics visualization tools are often not robust enough to support biomedical specialists’ complex exploratory analyses. Tools need to accommodate the workflows that scientists actually perform for specific translational research questions. To understand and model one of these workflows, we conducted a case-based, cognitive task analysis of a biomedical specialist’s exploratory workflow for the question: What functional interactions among gene products of high throughput expression data suggest previously unknown mechanisms of a disease?</p><p><strong>Results: </strong>From our cognitive task analysis four complementary representations of the targeted workflow were developed. They include: usage scenarios, flow diagrams, a cognitive task taxonomy, and a mapping between cognitive tasks and user-centered visualization requirements. The representations capture the flows of cognitive tasks that led a biomedical specialist to inferences critical to hypothesizing. We created representations at levels of detail that could strategically guide visualization development, and we confirmed this by making a trial prototype based on user requirements for a small portion of the workflow.</p><p><strong>Conclusions: </strong>Our results imply that visualizations should make available to scientific users “bundles of features” consonant with the compositional cognitive tasks purposefully enacted at specific points in the workflow. We also highlight certain aspects of visualizations that: (a) need more built-in flexibility; (b) are critical for negotiating meaning; and (c) are necessary for essential metacognitive support.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3090070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29785942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NEMO: Extraction and normalization of organization names from PubMed affiliations. NEMO:从 PubMed 隶属关系中提取组织名称并将其规范化。
Siddhartha Reddy Jonnalagadda, Philip Topham

Background: Today, there are more than 18 million articles related to biomedical research indexed in MEDLINE, and information derived from them could be used effectively to save the great amount of time and resources spent by government agencies in understanding the scientific landscape, including key opinion leaders and centers of excellence. Associating biomedical articles with organization names could significantly benefit the pharmaceutical marketing industry, health care funding agencies and public health officials and be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or collaborators. Large amount of extracted information helps in disambiguating organization names using machine-learning algorithms.

Results: We propose NEMO, a system for extracting organization names in the affiliation and normalizing them to a canonical organization name. Our parsing process involves multi-layered rule matching with multiple dictionaries. The system achieves more than 98% f-score in extracting organization names. Our process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. A high precision was also observed in normalization.

Conclusion: NEMO is the missing link in associating each biomedical paper and its authors to an organization name in its canonical form and the Geopolitical location of the organization. This research could potentially help in analyzing large social networks of organizations for landscaping a particular topic, improving performance of author disambiguation, adding weak links in the co-author network of authors, augmenting NLM's MARS system for correcting errors in OCR output of affiliation field, and automatically indexing the PubMed citations with the normalized organization name and country. Our system is available as a graphical user interface available for download along with this paper.

背景:目前,MEDLINE 索引了超过 1800 万篇与生物医学研究相关的文章,从这些文章中获取的信息可以有效地用于节省政府机构在了解科学领域(包括关键意见领袖和卓越中心)方面所花费的大量时间和资源。将生物医学文章与组织名称联系起来,可使医药营销行业、医疗保健资助机构和公共卫生官员受益匪浅,并有助于其他科学家规范作者姓名、自动创建引文、编制文章索引以及识别潜在的资源或合作者。提取的大量信息有助于使用机器学习算法对组织名称进行消歧:我们提出了 NEMO,这是一个用于提取隶属关系中的组织名称并将其规范化为规范组织名称的系统。我们的解析过程包括与多个字典进行多层规则匹配。该系统在提取组织名称方面的得分率超过 98%。我们的规范化过程包括基于局部序列对齐度量的聚类和基于查找连接组件的局部学习。在归一化过程中也观察到了较高的精确度:NEMO 是将每篇生物医学论文及其作者与组织名称的规范形式和组织的地缘政治位置联系起来的缺失环节。这项研究可能有助于分析大型组织的社会网络以美化特定主题、提高作者消歧的性能、增加作者合著网络中的薄弱环节、增强 NLM 的 MARS 系统以纠正 OCR 输出的隶属关系字段中的错误,以及用规范化的组织名称和国家自动编制 PubMed 引用索引。我们的系统可作为图形用户界面与本文一起下载。
{"title":"NEMO: Extraction and normalization of organization names from PubMed affiliations.","authors":"Siddhartha Reddy Jonnalagadda, Philip Topham","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Background: </strong>Today, there are more than 18 million articles related to biomedical research indexed in MEDLINE, and information derived from them could be used effectively to save the great amount of time and resources spent by government agencies in understanding the scientific landscape, including key opinion leaders and centers of excellence. Associating biomedical articles with organization names could significantly benefit the pharmaceutical marketing industry, health care funding agencies and public health officials and be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or collaborators. Large amount of extracted information helps in disambiguating organization names using machine-learning algorithms.</p><p><strong>Results: </strong>We propose NEMO, a system for extracting organization names in the affiliation and normalizing them to a canonical organization name. Our parsing process involves multi-layered rule matching with multiple dictionaries. The system achieves more than 98% f-score in extracting organization names. Our process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. A high precision was also observed in normalization.</p><p><strong>Conclusion: </strong>NEMO is the missing link in associating each biomedical paper and its authors to an organization name in its canonical form and the Geopolitical location of the organization. This research could potentially help in analyzing large social networks of organizations for landscaping a particular topic, improving performance of author disambiguation, adding weak links in the co-author network of authors, augmenting NLM's MARS system for correcting errors in OCR output of affiliation field, and automatically indexing the PubMed citations with the normalized organization name and country. Our system is available as a graphical user interface available for download along with this paper.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29331539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EpiphaNet: An Interactive Tool to Support Biomedical Discoveries. EpiphaNet:支持生物医学发现的交互式工具。
Trevor Cohen, G Kerr Whitfield, Roger W Schvaneveldt, Kavitha Mukund, Thomas Rindflesch

Unlabelled: Background. EpiphaNet is an interactive knowledge discovery system which enables researchers to explore visually sets of relations extracted from MEDLINE using a combination of language processing techniques. In this paper, we discuss the theoretical and methodological foundations of the system, and evaluate the utility of the models that underlie it for literature-based discovery. In addition, we present a summary of results drawn from a qualitative analysis of over six hours of interaction with the system by basic medical scientists.

Results: The system is able to simulate open and closed discovery, and is shown to generate associations that are both surprising and interesting within the area of expertise of the researchers concerned.

Conclusions: EpiphaNet provides an interactive visual representation of associations between concepts, which is derived from distributional statistics drawn from across the spectrum of biomedical citations in MEDLINE. This tool is available online, providing biomedical scientists with the opportunity to identify and explore associations of interest to them.

未标记的:背景。EpiphaNet是一个交互式知识发现系统,它使研究人员能够使用语言处理技术的组合来探索从MEDLINE提取的关系的可视化集。在本文中,我们讨论了该系统的理论和方法基础,并评估了基于文献发现的基础模型的效用。此外,我们提出的结果总结,从定性分析与系统的互动超过六个小时的基础医学科学家。结果:该系统能够模拟开放和封闭的发现,并显示出在相关研究人员的专业领域内产生令人惊讶和有趣的关联。结论:EpiphaNet提供了概念之间关联的交互式可视化表示,这源于MEDLINE中生物医学引用的分布统计数据。该工具可在线使用,为生物医学科学家提供了识别和探索他们感兴趣的关联的机会。
{"title":"EpiphaNet: An Interactive Tool to Support Biomedical Discoveries.","authors":"Trevor Cohen,&nbsp;G Kerr Whitfield,&nbsp;Roger W Schvaneveldt,&nbsp;Kavitha Mukund,&nbsp;Thomas Rindflesch","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Unlabelled: </strong>Background. EpiphaNet is an interactive knowledge discovery system which enables researchers to explore visually sets of relations extracted from MEDLINE using a combination of language processing techniques. In this paper, we discuss the theoretical and methodological foundations of the system, and evaluate the utility of the models that underlie it for literature-based discovery. In addition, we present a summary of results drawn from a qualitative analysis of over six hours of interaction with the system by basic medical scientists.</p><p><strong>Results: </strong>The system is able to simulate open and closed discovery, and is shown to generate associations that are both surprising and interesting within the area of expertise of the researchers concerned.</p><p><strong>Conclusions: </strong>EpiphaNet provides an interactive visual representation of associations between concepts, which is derived from distributional statistics drawn from across the spectrum of biomedical citations in MEDLINE. This tool is available online, providing biomedical scientists with the opportunity to identify and explore associations of interest to them.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990276/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40085089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers. 通过PubMed标识符检索基因表达微阵列数据集的召回率和偏差。
Heather Piwowar, Wendy Chapman

Background: The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too sparse and diverse to allow complex and accurate queries. In this study we examined the ability of PubMed article identifiers to locate publicly available gene expression microarray datasets, and investigated whether the retrieved datasets were representative of publicly available datasets found through statements of data sharing in the associated research articles.

Results: In a recent article, Ochsner and colleagues identified 397 studies that had generated gene expression microarray data. Their search of the full text of each publication for statements of data sharing revealed 203 publicly available datasets, including 179 in the Gene Expression Omnibus (GEO) or ArrayExpress databases. Our scripted search of GEO and ArrayExpress for PubMed identifiers of the same 397 studies returned 160 datasets, including six not found by the original search for data sharing statements. As a proportion of datasets found by either method, the search for data sharing statements identified 91.4% of the 209 publicly available datasets, compared to only 76.6% found by our search carried out using PubMed identifiers. Searching GEO or ArrayExpress alone retrieved 63.2% and 46.9% of all available datasets, respectively. There was no difference in the type of datasets found by PubMed identifier searches in terms of research theme or the technology used. However, the studies identified were more likely to have larger sample sizes, were more frequently cited, and published in higher impact journals.

Conclusions: Searching database entries using PubMed identifiers can identify the majority of publicly available datasets, but caution is required when this method is used to collect data for policy evaluation since studies in low impact journals are disproportionately excluded. We urge authors of all datasets to complete the citation fields for their dataset submissions once publication details are known, thereby ensuring their work has maximum visibility and can contribute to subsequent studies.

背景:有效和高效地定位公开可用的基因表达微阵列数据集的能力促进了这些潜在有价值资源的再利用。集中式生物医学数据库允许用户查询数据集元数据描述,但这些注释通常过于稀疏和多样化,无法实现复杂和准确的查询。在这项研究中,我们检查了PubMed文章标识符定位公开可用的基因表达微阵列数据集的能力,并调查了检索到的数据集是否代表了通过相关研究文章的数据共享声明发现的公开可用数据集。结果:在最近的一篇文章中,Ochsner及其同事发现了397项产生基因表达微阵列数据的研究。他们对每篇出版物的数据共享声明全文进行了搜索,发现了203个公开可用的数据集,其中179个在Gene Expression Omnibus (GEO)或ArrayExpress数据库中。我们用GEO和ArrayExpress编写脚本搜索相同的397项研究的PubMed标识符,返回160个数据集,其中包括6个数据共享语句的原始搜索未找到的数据集。作为通过任何一种方法找到的数据集的比例,对数据共享声明的搜索确定了209个公开可用数据集的91.4%,相比之下,使用PubMed标识符进行的搜索仅发现76.6%。单独搜索GEO或ArrayExpress分别检索到所有可用数据集的63.2%和46.9%。在研究主题或使用的技术方面,通过PubMed标识符搜索发现的数据集类型没有差异。然而,被确定的研究更有可能有更大的样本量,更频繁地被引用,并在更有影响力的期刊上发表。结论:使用PubMed标识符搜索数据库条目可以识别大多数公开可用的数据集,但当使用这种方法收集政策评估数据时需要谨慎,因为低影响力期刊的研究被不成比例地排除在外。我们敦促所有数据集的作者在了解出版细节后完成数据集提交的引文字段,从而确保他们的工作具有最大的可见性,并可以为后续研究做出贡献。
{"title":"Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers.","authors":"Heather Piwowar,&nbsp;Wendy Chapman","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Background: </strong>The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too sparse and diverse to allow complex and accurate queries. In this study we examined the ability of PubMed article identifiers to locate publicly available gene expression microarray datasets, and investigated whether the retrieved datasets were representative of publicly available datasets found through statements of data sharing in the associated research articles.</p><p><strong>Results: </strong>In a recent article, Ochsner and colleagues identified 397 studies that had generated gene expression microarray data. Their search of the full text of each publication for statements of data sharing revealed 203 publicly available datasets, including 179 in the Gene Expression Omnibus (GEO) or ArrayExpress databases. Our scripted search of GEO and ArrayExpress for PubMed identifiers of the same 397 studies returned 160 datasets, including six not found by the original search for data sharing statements. As a proportion of datasets found by either method, the search for data sharing statements identified 91.4% of the 209 publicly available datasets, compared to only 76.6% found by our search carried out using PubMed identifiers. Searching GEO or ArrayExpress alone retrieved 63.2% and 46.9% of all available datasets, respectively. There was no difference in the type of datasets found by PubMed identifier searches in terms of research theme or the technology used. However, the studies identified were more likely to have larger sample sizes, were more frequently cited, and published in higher impact journals.</p><p><strong>Conclusions: </strong>Searching database entries using PubMed identifiers can identify the majority of publicly available datasets, but caution is required when this method is used to collect data for policy evaluation since studies in low impact journals are disproportionately excluded. We urge authors of all datasets to complete the citation fields for their dataset submissions once publication details are known, thereby ensuring their work has maximum visibility and can contribute to subsequent studies.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28885476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLTrends: Graphing MEDLINE term usage over time. MLTrends:绘制MEDLINE术语随时间的使用情况。
Gareth A Palidwor, Miguel A Andrade-Navarro

The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors' MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. MLTrends may be used at: http://www.ogic.ca/mltrends.

研究人员和医生经常使用MEDLINE医学文献数据库来查找与他们感兴趣的领域有关的文章。通过对目前数据库中的1800万条记录按时间顺序进行分析,可以深入了解研究领域的历史变化,但是这种分析通常是复杂和耗时的。作者的MLTrends web应用程序绘制了MEDLINE中随时间变化的术语使用情况,允许确定生物医学术语的出现日期和术语使用强度的历史变化。MLTrends可在http://www.ogic.ca/mltrends上使用。
{"title":"MLTrends: Graphing MEDLINE term usage over time.","authors":"Gareth A Palidwor,&nbsp;Miguel A Andrade-Navarro","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors' MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. MLTrends may be used at: http://www.ogic.ca/mltrends.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28870764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLTrends: Graphing MEDLINE term usage over time MLTrends:绘制MEDLINE术语随时间的使用情况
Pub Date : 2010-01-22 DOI: 10.5210/DISCO.V5I0.2680
Gareth A. Palidwor, Miguel Andrade
The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on all titles and/or abstracts in MEDLINE and can include stop words. The resulting graphs may be normalized by total publications or words per year to facilitate term usage comparison between years.This makes MLTrends a powerful tool for rapid evaluation of the evolution of biomedical research and language in a graphical way. MLTrends may be used at: http://www.ogic.ca/mltrends
研究人员和医生经常使用MEDLINE医学文献数据库来查找与他们感兴趣的领域有关的文章。通过对目前数据库中的1800万条记录按时间顺序进行分析,可以深入了解研究领域和科学语言使用的历史变化,但是这种分析通常是复杂和耗时的。作者的MLTrends web应用程序绘制了MEDLINE中随时间变化的术语使用情况,允许确定生物医学术语的出现日期和术语使用强度的历史变化。考虑的术语是可以使用布尔运算符组合的单个单词或引用短语。MLTrends可以绘制每年MEDLINE中标题或摘要同时与多个查询术语匹配的记录数量。MEDLINE数据库存储在MLTrends服务器上并建立索引,从而允许在不到一秒的时间内完成查询并生成图形。查询可以在MEDLINE中的所有标题和/或摘要上执行,并且可以包括停止词。生成的图表可以按每年的总出版物或字数进行标准化,以方便不同年份之间的术语使用比较。这使得MLTrends成为以图形方式快速评估生物医学研究和语言发展的强大工具。MLTrends可在http://www.ogic.ca/mltrends上使用
{"title":"MLTrends: Graphing MEDLINE term usage over time","authors":"Gareth A. Palidwor, Miguel Andrade","doi":"10.5210/DISCO.V5I0.2680","DOIUrl":"https://doi.org/10.5210/DISCO.V5I0.2680","url":null,"abstract":"The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on all titles and/or abstracts in MEDLINE and can include stop words. The resulting graphs may be normalized by total publications or words per year to facilitate term usage comparison between years.This makes MLTrends a powerful tool for rapid evaluation of the evolution of biomedical research and language in a graphical way. MLTrends may be used at: http://www.ogic.ca/mltrends","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70826947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Journal of biomedical discovery and collaboration
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1