Analysing the History of Autism Spectrum Disorder Using Topic Models

Adham Beykikhoshk, Dinh Q. Phung, Ognjen Arandjelovic, S. Venkatesh
{"title":"Analysing the History of Autism Spectrum Disorder Using Topic Models","authors":"Adham Beykikhoshk, Dinh Q. Phung, Ognjen Arandjelovic, S. Venkatesh","doi":"10.1109/DSAA.2016.65","DOIUrl":null,"url":null,"abstract":"We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data where the underlying topics evolve over time, the topic nuances in science result in new scientific directions to emerge. Therefore, we model the longitudinal literature data with a new approach that uses topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the topics are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examine two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to the public. This aids other researchers to analyse our results or apply the model to their data collections.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data where the underlying topics evolve over time, the topic nuances in science result in new scientific directions to emerge. Therefore, we model the longitudinal literature data with a new approach that uses topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the topics are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examine two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to the public. This aids other researchers to analyse our results or apply the model to their data collections.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用主题模型分析自闭症谱系障碍的历史
我们描述了一个新的框架,用于发现学术数据纵向收集的潜在主题,并跟踪他们的寿命和受欢迎程度。与社交媒体或新闻数据的潜在主题随着时间的推移而演变不同,科学中的主题细微差别导致新的科学方向出现。因此,我们用一种新的方法对纵向文献数据进行建模,这种方法使用的主题在一段时间内仍然可以识别。当前的研究在固定时间主题时,要么忽略时间维度,要么将其视为可交换的协变量,要么在自然建模时不跨时代共享主题。我们通过采用非参数贝叶斯方法来解决这些问题。我们假设数据是部分可交换的,并将其划分为连续的时期。然后,通过固定一个经常出现的中餐馆特许经营中的主题,我们在语料库上强加了一个静态的主题结构,这样主题就可以跨时代共享,并且可以在时代内共享文档。我们在与自闭症谱系障碍相关的医学文献集合上证明了所提出的框架的有效性。我们收集了大量的出版物,并仔细研究了该领域的两个重要研究问题作为案例研究。此外,我们将实验结果和模型的源代码免费提供给公众。这有助于其他研究人员分析我们的结果或将模型应用于他们的数据收集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data Task Composition in Crowdsourcing Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours Nonparametric Adjoint-Based Inference for Stochastic Differential Equations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1