Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting最新文献_第4页

Bidirectional RNN for Medical Event Detection in Electronic Health Records 基于双向RNN的电子病历医疗事件检测

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2016-06-01 DOI: 10.18653/v1/N16-1056

Abhyuday N. Jagannatha, Hong Yu

Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding of EHRs. It has important applications in health informatics including pharmacovigilance and drug surveillance. The state of the art supervised machine learning models in this domain are based on Conditional Random Fields (CRFs) with features calculated from fixed context windows. In this application, we explored recurrent neural network frameworks and show that they significantly out-performed the CRF models.

从电子健康记录(EHR)笔记的非结构化文本中提取医疗事件及其属性的序列标记是实现电子健康记录语义理解的关键一步。它在包括药物警戒和药物监测在内的卫生信息学中有着重要的应用。该领域最先进的监督机器学习模型是基于条件随机场(CRFs)的，其特征是从固定的上下文窗口计算出来的。在这个应用中，我们探索了循环神经网络框架，并表明它们明显优于CRF模型。

引用次数: 271

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2015-06-05

Masoud Rouhizadeh, Richard Sproat, Jan van Santen

Restrictive and repetitive behavior (RRB) is a core symptom of autism spectrum disorder (ASD) and are manifest in language. Based on this, we expect children with autism to talk about fewer topics, and more repeatedly, during their conversations. We thus hypothesize a higher semantic overlap ratio between dialogue turns in children with ASD compared to those with typical development (TD). Participants of this study include children ages 4-8, 44 with TD and 25 with ASD without language impairment. We apply several semantic similarity metrics to the children's dialogue turns in semi-structured conversations with examiners. We find that children with ASD have significantly more semantically overlapping turns than children with TD, across different turn intervals. These results support our hypothesis, and could provide a convenient and robust ASD-specific behavioral marker.

限制性重复行为(RRB)是自闭症谱系障碍(ASD)的核心症状之一，主要表现在语言上。基于此，我们期望自闭症儿童在对话中谈论更少的话题，更多的重复。因此，我们假设与典型发育(TD)儿童相比，ASD儿童对话回合之间的语义重叠率更高。这项研究的参与者包括4-8岁的儿童，44岁患有TD, 25岁患有ASD，没有语言障碍。我们将几个语义相似度指标应用于儿童与考官的半结构化对话中的对话回合。我们发现，在不同的转弯间隔中，ASD儿童的语义重叠转弯明显多于TD儿童。这些结果支持了我们的假设，并可以提供一个方便和强大的asd特异性行为标记。

引用次数: 0

Automated morphological analysis of clinical language samples. 临床语言样本的自动形态学分析。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2015-06-05

Kyle Gorman, Steven Bedrick, Géza Kiss, Eric Morley, Rosemary Ingham, Metrah Mohammad, Katina Papadakis, Jan P H van Santen

Quantitative analysis of clinical language samples is a powerful tool for assessing and screening developmental language impairments, but requires extensive manual transcription, annotation, and calculation, resulting in error-prone results and clinical underutilization. We describe a system that performs automated morphological analysis needed to calculate statistics such as the mean length of utterance in morphemes (MLUM), so that these statistics can be computed directly from orthographic transcripts. Estimates of MLUM computed by this system are closely comparable to those produced by manual annotation. Our system can be used in conjunction with other automated annotation techniques, such as maze detection. This work represents an important first step towards increased automation of language sample analysis, and towards attendant benefits of automation, including clinical greater utilization and reduced variability in care delivery.

临床语言样本的定量分析是评估和筛查发育性语言障碍的有力工具，但需要大量的人工转录、注释和计算，导致结果容易出错和临床未充分利用。我们描述了一个执行自动形态学分析所需的系统，以计算统计数据，如语素中的平均话语长度(MLUM)，以便这些统计数据可以直接从正字法转录本中计算出来。该系统计算的MLUM估计值与人工标注的估计值接近。我们的系统可以与其他自动标注技术结合使用，比如迷宫检测。这项工作是朝着提高语言样本分析自动化迈出的重要的第一步，也是朝着自动化带来的好处迈出的重要的第一步，包括临床更大的利用率和减少医疗服务的可变性。

引用次数: 0

Automated morphological analysis of clinical language samples 临床语言样本的自动形态学分析

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2015-06-05 DOI: 10.3115/v1/W15-1213

Kyle Gorman, Steven Bedrick, G. Kiss, E. Morley, Rosemary Ingham, Metrah Mohammed, Katina Papadakis, J. V. Santen

Quantitative analysis of clinical language samples is a powerful tool for assessing and screening developmental language impairments, but requires extensive manual transcription, annotation, and calculation, resulting in error-prone results and clinical underutilization. We describe a system that performs automated morphological analysis needed to calculate statistics such as the mean length of utterance in morphemes (MLUM), so that these statistics can be computed directly from orthographic transcripts. Estimates of MLUM computed by this system are closely comparable to those produced by manual annotation. Our system can be used in conjunction with other automated annotation techniques, such as maze detection. This work represents an important first step towards increased automation of language sample analysis, and towards attendant benefits of automation, including clinical greater utilization and reduced variability in care delivery.

临床语言样本的定量分析是评估和筛查发育性语言障碍的有力工具，但需要大量的人工转录、注释和计算，导致结果容易出错和临床未充分利用。我们描述了一个执行自动形态学分析所需的系统，以计算统计数据，如语素中的平均话语长度(MLUM)，以便这些统计数据可以直接从正字法转录本中计算出来。该系统计算的MLUM估计值与人工标注的估计值接近。我们的系统可以与其他自动标注技术结合使用，比如迷宫检测。这项工作是朝着提高语言样本分析自动化迈出的重要的第一步，也是朝着自动化带来的好处迈出的重要的第一步，包括临床更大的利用率和减少医疗服务的可变性。

引用次数: 7

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1214

Masoud Rouhizadeh, R. Sproat, J. Santen

Restrictive and repetitive behavior (RRB) is a core symptom of autism spectrum disorder (ASD) and are manifest in language. Based on this, we expect children with autism to talk about fewer topics, and more repeatedly, during their conversations. We thus hypothesize a higher semantic overlap ratio between dialogue turns in children with ASD compared to those with typical development (TD). Participants of this study include children ages 4-8, 44 with TD and 25 with ASD without language impairment. We apply several semantic similarity metrics to the children's dialogue turns in semi-structured conversations with examiners. We find that children with ASD have significantly more semantically overlapping turns than children with TD, across different turn intervals. These results support our hypothesis, and could provide a convenient and robust ASD-specific behavioral marker.

限制性重复行为(RRB)是自闭症谱系障碍(ASD)的核心症状之一，主要表现在语言上。基于此，我们期望自闭症儿童在对话中谈论更少的话题，更多的重复。因此，我们假设与典型发育(TD)儿童相比，ASD儿童对话回合之间的语义重叠率更高。这项研究的参与者包括4-8岁的儿童，44岁患有TD, 25岁患有ASD，没有语言障碍。我们将几个语义相似度指标应用于儿童与考官的半结构化对话中的对话回合。我们发现，在不同的转弯间隔中，ASD儿童的语义重叠转弯明显多于TD儿童。这些结果支持了我们的假设，并可以提供一个方便和强大的asd特异性行为标记。

引用次数: 13

Anafora: A Web-based General Purpose Annotation Tool. 一个基于web的通用注释工具。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2013-06-01

Wei-Te Chen, Will Styler

Anafora is a newly-developed open source web-based text annotation tool built to be lightweight, flexible, easy to use and capable of annotating with a variety of schemas, simple and complex. Anafora allows secure web-based annotation of any plaintext file with both spanned (e.g. named entity or markable) and relation annotations, as well as adjudication for both types of annotation. Anafora offers automatic set assignment and progress-tracking, centralized and human-editable XML annotation schemas, and file-based storage and organization of data in a human-readable single-file XML format.

ananfora是一个新开发的基于web的开源文本注释工具，它轻量级、灵活、易于使用，并且能够使用各种简单和复杂的模式进行注释。ananfora允许对任何明文文件进行安全的基于web的注释，包括跨(例如，命名实体或可标记)和关系注释，以及对这两种类型的注释的裁决。ananfora提供了自动设置分配和进度跟踪，集中的和人类可编辑的XML注释模式，以及以人类可读的单文件XML格式的基于文件的数据存储和组织。

引用次数: 0

Distributional semantic models for the evaluation of disordered language. 无序语言评价的分布语义模型。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2013-06-01

Masoud Rouhizadeh, Emily Prud'hommeaux, Brian Roark, Jan van Santen

Atypical semantic and pragmatic expression is frequently reported in the language of children with autism. Although this atypicality often manifests itself in the use of unusual or unexpected words and phrases, the rate of use of such unexpected words is rarely directly measured or quantified. In this paper, we use distributional semantic models to automatically identify unexpected words in narrative retellings by children with autism. The classification of unexpected words is sufficiently accurate to distinguish the retellings of children with autism from those with typical development. These techniques demonstrate the potential of applying automated language analysis techniques to clinically elicited language data for diagnostic purposes.

非典型语义和语用表达在自闭症儿童的语言中经常被报道。尽管这种非典型性经常表现在使用不寻常或意想不到的单词和短语上，但这些意想不到的单词的使用频率很少被直接测量或量化。本文采用分布语义模型对自闭症儿童复述中的意外词进行自动识别。对意外单词的分类足够准确，可以区分自闭症儿童和正常发育儿童的复述。这些技术展示了将自动语言分析技术应用于临床提取的语言数据以用于诊断目的的潜力。

引用次数: 0

Hello, Who is Calling?: Can Words Reveal the Social Nature of Conversations? 喂，你是哪位?:语言能揭示对话的社会性吗?

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2012-01-01

Anthony Stark, Izhak Shafran, Jeffrey Kaye

This study aims to infer the social nature of conversations from their content automatically. To place this work in context, our motivation stems from the need to understand how social disengagement affects cognitive decline or depression among older adults. For this purpose, we collected a comprehensive and naturalistic corpus comprising of all the incoming and outgoing telephone calls from 10 subjects over the duration of a year. As a first step, we learned a binary classifier to filter out business related conversation, achieving an accuracy of about 85%. This classification task provides a convenient tool to probe the nature of telephone conversations. We evaluated the utility of openings and closing in differentiating personal calls, and find that empirical results on a large corpus do not support the hypotheses by Schegloff and Sacks that personal conversations are marked by unique closing structures. For classifying different types of social relationships such as family vs other, we investigated features related to language use (entropy), hand-crafted dictionary (LIWC) and topics learned using unsupervised latent Dirichlet models (LDA). Our results show that the posteriors over topics from LDA provide consistently higher accuracy (60-81%) compared to LIWC or language use features in distinguishing different types of conversations.

本研究旨在从对话内容中自动推断对话的社会性质。把这项工作放在背景中，我们的动机源于需要了解社会脱离如何影响老年人的认知能力下降或抑郁。为此，我们收集了一个全面而自然的语料库，包括10名受试者在一年中所有的来电和来电。作为第一步，我们学习了一个二元分类器来过滤掉与业务相关的对话，达到了大约85%的准确率。这个分类任务提供了一个方便的工具来探测电话交谈的性质。我们评估了开始和结束在区分个人呼叫中的效用，并发现大型语料库的实证结果不支持Schegloff和Sacks的假设，即个人对话以独特的结束结构为特征。为了对不同类型的社会关系(如家庭与他人)进行分类，我们研究了与语言使用(熵)、手工制作字典(LIWC)和使用无监督潜在狄利克雷模型(LDA)学习的主题相关的特征。我们的研究结果表明，与LIWC或语言使用特征相比，LDA的主题后置在区分不同类型的对话方面始终提供更高的准确性(60-81%)。

{"title":"Hello, Who is Calling?: Can Words Reveal the Social Nature of Conversations?","authors":"Anthony Stark, Izhak Shafran, Jeffrey Kaye","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study aims to infer the social nature of conversations from their content automatically. To place this work in context, our motivation stems from the need to understand how social disengagement affects cognitive decline or depression among older adults. For this purpose, we collected a comprehensive and naturalistic corpus comprising of all the incoming and outgoing telephone calls from 10 subjects over the duration of a year. As a first step, we learned a binary classifier to filter out business related conversation, achieving an accuracy of about 85%. This classification task provides a convenient tool to probe the nature of telephone conversations. We evaluated the utility of openings and closing in differentiating personal calls, and find that empirical results on a large corpus do not support the hypotheses by Schegloff and Sacks that personal conversations are marked by unique closing structures. For classifying different types of social relationships such as family vs other, we investigated features related to language use (entropy), hand-crafted dictionary (LIWC) and topics learned using unsupervised latent Dirichlet models (LDA). Our results show that the posteriors over topics from LDA provide consistently higher accuracy (60-81%) compared to LIWC or language use features in distinguishing different types of conversations.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":" ","pages":"112-119"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3886719/pdf/nihms399627.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32026890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0