Temporal information extraction from mental health records to identify duration of untreated psychosis.

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Biomedical Semantics Pub Date : 2020-03-10 DOI:10.1186/s13326-020-00220-2

Natalia Viani, Joyce Kam, Lucia Yin, André Bittar, Rina Dutta, Rashmi Patel, Robert Stewart, Sumithra Velupillai

{"title":"Temporal information extraction from mental health records to identify duration of untreated psychosis.","authors":"Natalia Viani, Joyce Kam, Lucia Yin, André Bittar, Rina Dutta, Rashmi Patel, Robert Stewart, Sumithra Velupillai","doi":"10.1186/s13326-020-00220-2","DOIUrl":null,"url":null,"abstract":"Background: Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. DUP estimation requires knowledge about when psychosis symptoms first started (symptom onset), and when psychosis treatment was initiated. Electronic health records (EHRs) represent a useful resource for retrospective clinical studies on DUP, but the core information underlying this construct is most likely to lie in free text, meaning it is not readily available for clinical research. Natural Language Processing (NLP) is a means to addressing this problem by automatically extracting relevant information in a structured form. As a first step, it is important to identify appropriate documents, i.e., those that are likely to include the information of interest. Next, temporal information extraction methods are needed to identify time references for early psychosis symptoms. This NLP challenge requires solving three different tasks: time expression extraction, symptom extraction, and temporal \"linking\". In this study, we focus on the first step, using two relevant EHR datasets.Results: We applied a rule-based NLP system for time expression extraction that we had previously adapted to a corpus of mental health EHRs from patients with a diagnosis of schizophrenia (first referrals). We extended this work by applying this NLP system to a larger set of documents and patients, to identify additional texts that would be relevant for our long-term goal, and developed a new corpus from a subset of these new texts (early intervention services). Furthermore, we added normalized value annotations (\"2011-05\") to the annotated time expressions (\"May 2011\") in both corpora. The finalized corpora were used for further NLP development and evaluation, with promising results (normalization accuracy 71-86%). To highlight the specificities of our annotation task, we also applied the final adapted NLP system to a different temporally annotated clinical corpus.Conclusions: Developing domain-specific methods is crucial to address complex NLP tasks such as symptom onset extraction and retrospective calculation of duration of a preclinical syndrome. To the best of our knowledge, this is the first clinical text resource annotated for temporal entities in the mental health domain.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"11 1","pages":"2"},"PeriodicalIF":2.0000,"publicationDate":"2020-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7063705/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-020-00220-2","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. DUP estimation requires knowledge about when psychosis symptoms first started (symptom onset), and when psychosis treatment was initiated. Electronic health records (EHRs) represent a useful resource for retrospective clinical studies on DUP, but the core information underlying this construct is most likely to lie in free text, meaning it is not readily available for clinical research. Natural Language Processing (NLP) is a means to addressing this problem by automatically extracting relevant information in a structured form. As a first step, it is important to identify appropriate documents, i.e., those that are likely to include the information of interest. Next, temporal information extraction methods are needed to identify time references for early psychosis symptoms. This NLP challenge requires solving three different tasks: time expression extraction, symptom extraction, and temporal "linking". In this study, we focus on the first step, using two relevant EHR datasets.

Results: We applied a rule-based NLP system for time expression extraction that we had previously adapted to a corpus of mental health EHRs from patients with a diagnosis of schizophrenia (first referrals). We extended this work by applying this NLP system to a larger set of documents and patients, to identify additional texts that would be relevant for our long-term goal, and developed a new corpus from a subset of these new texts (early intervention services). Furthermore, we added normalized value annotations ("2011-05") to the annotated time expressions ("May 2011") in both corpora. The finalized corpora were used for further NLP development and evaluation, with promising results (normalization accuracy 71-86%). To highlight the specificities of our annotation task, we also applied the final adapted NLP system to a different temporally annotated clinical corpus.

Conclusions: Developing domain-specific methods is crucial to address complex NLP tasks such as symptom onset extraction and retrospective calculation of duration of a preclinical syndrome. To the best of our knowledge, this is the first clinical text resource annotated for temporal entities in the mental health domain.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从精神健康记录中提取时间信息以确定未治疗精神病的持续时间。

背景：精神病未治疗持续时间（DUP）是心理健康领域一个重要的临床概念，因为较长的DUP可能与较差的干预结果相关。DUP估计需要了解精神病症状何时开始（症状发作），以及何时开始精神病治疗。电子健康记录（EHRs）是DUP回顾性临床研究的有用资源，但这种结构的核心信息很可能存在于免费文本中，这意味着它不容易用于临床研究。自然语言处理（NLP）是解决这一问题的一种方法，它可以自动地以结构化的形式提取相关信息。作为第一步，重要的是确定适当的文档，即那些可能包含感兴趣的信息的文档。其次，需要时间信息提取方法来识别早期精神病症状的时间参考。这个NLP挑战需要解决三个不同的任务：时间表达提取、症状提取和时间“链接”。在本研究中，我们将重点放在第一步，使用两个相关的电子病历数据集。结果：我们应用了一个基于规则的NLP系统来提取时间表达式，我们之前已经适应了精神分裂症诊断患者（首次转诊）的精神健康电子病历语料库。我们通过将该NLP系统应用于更大的文件和患者集来扩展这项工作，以识别与我们的长期目标相关的额外文本，并从这些新文本的子集（早期干预服务）中开发出新的语料库。此外，我们在两个语料库中的标注时间表达式（“May 2011”）上添加了规范化值标注（“2011-05”）。最终确定的语料库用于进一步的NLP开发和评估，结果令人鼓舞（归一化准确率为71-86%）。为了突出标注任务的特殊性，我们还将最终适应的NLP系统应用于不同的时间标注临床语料库。结论：开发特定领域的方法对于解决复杂的NLP任务至关重要，例如症状发作提取和临床前综合征持续时间的回顾性计算。据我们所知，这是第一个临床文本资源注解的时间实体在心理健康领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.