Inferring appropriate eligibility criteria in clinical trial protocols without labeled data

Angelo C. Restificar, S. Ananiadou
{"title":"Inferring appropriate eligibility criteria in clinical trial protocols without labeled data","authors":"Angelo C. Restificar, S. Ananiadou","doi":"10.1145/2390068.2390074","DOIUrl":null,"url":null,"abstract":"We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"38 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390068.2390074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在没有标记数据的临床试验方案中推断适当的资格标准
我们考虑了设计临床试验方案的用户任务,并提出了一种从潜在的大量候选者中输出最合适的资格标准的方法。我们收集的每个文档都是临床试验方案,它本身包含一套资格标准。给定一小组样本文档D', |D'|<<|D|,用户已经初步确定为相关的,例如,通过用户查询界面,我们的评分方法根据它们对当前正在设计的临床试验方案的适合程度自动建议D中的资格标准。我们将文档视为潜在主题的混合物,我们的方法通过应用一个三步过程来利用这一点。首先,我们使用潜在狄利克雷分配(latent Dirichlet Allocation, LDA)[3]来推断样本文档中的潜在主题。接下来,我们使用逻辑回归模型来计算给定候选标准属于特定主题的概率。最后,我们通过计算其期望值(从样本文档集推断的主题比例的概率加权和)来对每个标准进行评分。直观地说,候选标准属于样本中占主导地位的主题的概率越大,其期望值或分数就越高。实验结果表明,我们提出的方法分别比传统方法好8倍和9倍。,作为纳入和排除标准),而不是从相关文件中获得的一组候选人中随机选择。在用户模拟实验中,我们能够自动构建平均为75%和70%的资格标准。(用于纳入和排除标准)类似于正确的资格标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Construction of Multi-level Networks Incorporating Molecule, Cell, Organ and Phenotype Properties for Drug-induced Phenotype Prediction Integrative Database for Exploring Compound Combinations of Natural Products for Medical Effects TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1