Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献_第3页

A Personalized Predictive Framework for Multivariate Clinical Time Series via Adaptive Model Selection. 基于自适应模型选择的多变量临床时间序列个性化预测框架。

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2017-11-01 DOI: 10.1145/3132847.3132859

Zitao Liu, Milos Hauskrecht

Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is not straightforward. First, patient-specific variations are typically large and population-based models derived or learned from many different patients are often unable to support accurate predictions for each individual patient. Moreover, time series observed for one patient at any point in time may be too short and insufficient to learn a high-quality patient-specific model just from the patient's own data. To address these problems we propose, develop and experiment with a new adaptive forecasting framework for building multivariate clinical time series models for a patient and for supporting patient-specific predictions. The framework relies on the adaptive model switching approach that at any point in time selects the most promising time series model out of the pool of many possible models, and consequently, combines advantages of the population, patient-specific and short-term individualized predictive models. We demonstrate that the adaptive model switching framework is very promising approach to support personalized time series prediction, and that it is able to outperform predictions based on pure population and patient-specific models, as well as, other patient-specific model adaptation strategies.

为患者建立准确的临床时间序列预测模型对于了解患者病情、动态和最佳患者管理至关重要。不幸的是，这个过程并不简单。首先，患者特异性差异通常很大，从许多不同患者那里获得或学习的基于人群的模型往往无法支持对每个患者的准确预测。此外，在任何时间点观察到的一个患者的时间序列可能太短，不足以仅从患者自己的数据中学习高质量的患者特定模型。为了解决这些问题，我们提出、开发并试验了一种新的自适应预测框架，用于为患者构建多变量临床时间序列模型，并支持针对患者的预测。该框架依赖于自适应模型切换方法，在任何时间点从许多可能的模型池中选择最有希望的时间序列模型，从而结合了群体、患者特异性和短期个性化预测模型的优点。我们证明了自适应模型切换框架是一种非常有前途的支持个性化时间序列预测的方法，并且它能够优于基于纯群体和特定患者模型以及其他特定患者模型适应策略的预测。

{"title":"A Personalized Predictive Framework for Multivariate Clinical Time Series via Adaptive Model Selection.","authors":"Zitao Liu, Milos Hauskrecht","doi":"10.1145/3132847.3132859","DOIUrl":"https://doi.org/10.1145/3132847.3132859","url":null,"abstract":"Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is not straightforward. First, patient-specific variations are typically large and population-based models derived or learned from many different patients are often unable to support accurate predictions for each individual patient. Moreover, time series observed for one patient at any point in time may be too short and insufficient to learn a high-quality patient-specific model just from the patient's own data. To address these problems we propose, develop and experiment with a new adaptive forecasting framework for building multivariate clinical time series models for a patient and for supporting patient-specific predictions. The framework relies on the adaptive model switching approach that at any point in time selects the most promising time series model out of the pool of many possible models, and consequently, combines advantages of the population, patient-specific and short-term individualized predictive models. We demonstrate that the adaptive model switching framework is very promising approach to support personalized time series prediction, and that it is able to outperform predictions based on pure population and patient-specific models, as well as, other patient-specific model adaptation strategies.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1169-1177"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3132847.3132859","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35704480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation. 用于自动结构化临床文本注释的候选者-选择器神经架构

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2017-11-01 DOI: 10.1145/3132847.3132989

Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace

We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.

我们考虑的任务是用受控结构化医学词汇中的概念自动注释描述临床试验的自由文本。具体来说，我们的目标是建立一个模型，以推断出不同的（本体论）概念集，这些概念集描述了相关试验的互补性临床突出方面：入组人群、实施的干预措施和测量的结果，即 PICO 要素。这个重要的实际问题提出了几个关键挑战。一个问题是，由于词汇包含许多独特的概念，因此输出空间非常大。使这一问题更加复杂的是，该领域的注释数据收集成本高昂，因此数量稀少。此外，输出结果（每个 PICO 要素的概念集）是相互关联的：特定人群（如糖尿病患者）可能会使用某些干预概念（胰岛素疗法），同时有效地排除其他概念（放射疗法）。这种相关性应该加以利用。我们提出了一种新型神经模型来应对这些挑战。我们引入了 "候选-选择器 "架构，在该架构中，模型会考虑 PICO 要素的候选概念集，并根据要注释的输入文本评估其合理性。这依赖于一个 "候选集 "生成器，该生成器可以是学习得来的，也可以依赖于启发式方法。然后，一个条件判别神经模型会根据输入文本共同选择候选概念。我们将我们的方法与强大的基线进行了预测性能比较，结果表明我们的方法优于它们。最后，我们请领域专家对生成的注释质量进行了定性评估。

{"title":"A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.","authors":"Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace","doi":"10.1145/3132847.3132989","DOIUrl":"10.1145/3132847.3132989","url":null,"abstract":"We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1519-1528"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5752318/pdf/nihms927025.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35714383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FacetGist: Collective Extraction of Document Facets in Large Technical Corpora. FacetGist:大型技术语料库中文档facet的集合抽取。

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2016-10-01 DOI: 10.1145/2983323.2983828

Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, Jiawei Han

Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

考虑到大量可用的技术文档，自动组织和分类这些文档以能够理解并从中提取价值是至关重要的。为此，我们引入了一个新的研究问题，称为Facet Extraction。给定一组技术文档，Facet Extraction的目标是用人们可能感兴趣的关键方面(例如，应用程序、技术、评估指标和数据集)的一组概念自动标记每个文档。Facet Extraction有许多应用，包括文档摘要、文献检索、专利检索和商业智能。执行Facet提取的主要挑战来自多个来源:概念提取、概念到Facet匹配和Facet消歧义。为了应对这些挑战，我们开发了FacetGist，这是一个用于facet提取的框架。Facet提取涉及构建基于图的异构网络，以捕获跨多个局部句子级特征以及全局上下文特征的可用信息。然后，我们提出了一个联合优化问题，并提出了一种高效的基于图的标签传播算法来估计所提到的每个概念的方面。在两个领域的技术语料库上的实验结果表明，与竞争方案相比，Facet提取的精度和召回率都提高了25%以上。

{"title":"FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.","authors":"Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, Jiawei Han","doi":"10.1145/2983323.2983828","DOIUrl":"https://doi.org/10.1145/2983323.2983828","url":null,"abstract":"Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2016 ","pages":"871-880"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2983323.2983828","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9886648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis. 基于数据驱动的语境价移量化多主题情感分析。

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2016-10-01 DOI: 10.1145/2983323.2983793

Hongkun Yu, Jingbo Shang, Meichun Hsu, Malú Castellanos, Jiawei Han

Users often write reviews on different themes involving linguistic structures with complex sentiments. The sentiment polarity of a word can be different across themes. Moreover, contextual valence shifters may change sentiment polarity depending on the contexts that they appear in. Both challenges cannot be modeled effectively and explicitly in traditional sentiment analysis. Studying both phenomena requires multi-theme sentiment analysis at the word level, which is very interesting but significantly more challenging than overall polarity classification. To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. The framework formulates multi-theme sentiment by factorizing the review sentiments with theme/word embeddings and then derives the shifter effect learning problem as a logistic regression. The improvement of sentiment polarity classification accuracy demonstrates not only the importance of multi-theme and sentiment shifting, but also effectiveness of our framework. Human evaluations and case studies further show the success of multi-theme word sentiment predictions and automatic effect quantification of contextual valence shifters.

用户经常写不同主题的评论，涉及语言结构和复杂的情感。一个词的情感极性可能在不同的主题中有所不同。此外，情境效价转移者可能会根据其出现的情境改变情绪极性。在传统的情感分析中，这两个挑战都无法有效和明确地建模。研究这两种现象都需要在单词层面上进行多主题情感分析，这非常有趣，但比整体极性分类更具挑战性。为了同时解决多主题和情感转移问题，我们提出了一个数据驱动的框架来实现这两种功能:(1)在不同主题的评论中对同一词的极性预测，以及(2)发现和量化语境价转移。该框架通过将评论情感与主题/词嵌入进行因子化来形成多主题情感，然后通过逻辑回归推导出移位效应学习问题。情感极性分类准确率的提高不仅证明了多主题和情感转换的重要性，也证明了该框架的有效性。人类评估和案例研究进一步表明，多主题词情感预测和语境价移的自动效果量化是成功的。

{"title":"Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis.","authors":"Hongkun Yu, Jingbo Shang, Meichun Hsu, Malú Castellanos, Jiawei Han","doi":"10.1145/2983323.2983793","DOIUrl":"https://doi.org/10.1145/2983323.2983793","url":null,"abstract":"Users often write reviews on different themes involving linguistic structures with complex sentiments. The sentiment polarity of a word can be different across themes. Moreover, contextual valence shifters may change sentiment polarity depending on the contexts that they appear in. Both challenges cannot be modeled effectively and explicitly in traditional sentiment analysis. Studying both phenomena requires multi-theme sentiment analysis at the word level, which is very interesting but significantly more challenging than overall polarity classification. To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. The framework formulates multi-theme sentiment by factorizing the review sentiments with theme/word embeddings and then derives the shifter effect learning problem as a logistic regression. The improvement of sentiment polarity classification accuracy demonstrates not only the importance of multi-theme and sentiment shifting, but also effectiveness of our framework. Human evaluations and case studies further show the success of multi-theme word sentiment predictions and automatic effect quantification of contextual valence shifters.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"939-948"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2983323.2983793","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34760161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Medical Question Answering for Clinical Decision Support. 用于临床决策支持的医疗问题解答。

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2016-10-01 DOI: 10.1145/2983323.2983819

Travis R Goodwin, Sanda M Harabagiu

The goal of modern Clinical Decision Support (CDS) systems is to provide physicians with information relevant to their management of patient care. When faced with a medical case, a physician asks questions about the diagnosis, the tests, or treatments that should be administered. Recently, the TREC-CDS track has addressed this challenge by evaluating results of retrieving relevant scientific articles where the answers of medical questions in support of CDS can be found. Although retrieving relevant medical articles instead of identifying the answers was believed to be an easier task, state-of-the-art results are not yet sufficiently promising. In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer. Answer discovery is the result of probabilistic inference which operates on a probabilistic knowledge graph, automatically generated by processing the medical language of large collections of electronic medical records (EMRs). The probabilistic inference of answers combines knowledge from medical practice (EMRs) with knowledge from medical research (scientific articles). It also takes into account the medical knowledge automatically discerned from the medical case description. We show that this novel form of medical question answering (Q/A) produces very promising results in (a) identifying accurately the answers and (b) it improves medical article ranking by 40%.

现代临床决策支持（CDS）系统的目标是为医生提供与病人护理管理相关的信息。面对一个医疗病例，医生会提出有关诊断、检查或治疗的问题。最近，TREC-CDS 赛道通过评估检索相关科学文章的结果来应对这一挑战，在这些文章中可以找到支持 CDS 的医学问题的答案。虽然检索相关医学文章而不是确定答案被认为是一项更容易的任务，但目前的结果还不够理想。在本文中，我们本着 TREC-CDS 的精神，提出了一种新颖的医学问题解答框架，即首先发现答案，然后对包含答案的科学文章进行选择和排序。答案发现是概率推理的结果，而概率推理是在概率知识图谱上进行的，该知识图谱是通过处理大量电子病历（EMR）中的医学语言而自动生成的。答案的概率推理结合了医疗实践知识（电子病历）和医学研究知识（科学文章）。它还考虑了从病例描述中自动辨别出的医学知识。我们的研究表明，这种新颖的医学问题解答（Q/A）形式在以下方面产生了非常有前景的结果：(a) 准确识别答案；(b) 将医学文章的排名提高了 40%。

{"title":"Medical Question Answering for Clinical Decision Support.","authors":"Travis R Goodwin, Sanda M Harabagiu","doi":"10.1145/2983323.2983819","DOIUrl":"10.1145/2983323.2983819","url":null,"abstract":"The goal of modern Clinical Decision Support (CDS) systems is to provide physicians with information relevant to their management of patient care. When faced with a medical case, a physician asks questions about the diagnosis, the tests, or treatments that should be administered. Recently, the TREC-CDS track has addressed this challenge by evaluating results of retrieving relevant scientific articles where the answers of medical questions in support of CDS can be found. Although retrieving relevant medical articles instead of identifying the answers was believed to be an easier task, state-of-the-art results are not yet sufficiently promising. In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer. Answer discovery is the result of probabilistic inference which operates on a probabilistic knowledge graph, automatically generated by processing the medical language of large collections of electronic medical records (EMRs). The probabilistic inference of answers combines knowledge from medical practice (EMRs) with knowledge from medical research (scientific articles). It also takes into account the medical knowledge automatically discerned from the medical case description. We show that this novel form of medical question answering (Q/A) produces very promising results in (a) identifying accurately the answers and (b) it improves medical article ranking by 40%.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"297-306"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530755/pdf/nihms864927.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35228407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Mixtures-of-Trees Framework for Multi-Label Classification. 多标签分类的树混合物框架

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2014-01-01 DOI: 10.1145/2661829.2661989

Charmgil Hong, Iyad Batal, Milos Hauskrecht

We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods.

我们提出了一种新的多标签分类概率方法，旨在表示类别后验分布 P(Y|X)。我们的方法使用树状结构贝叶斯网络的混合物，可以充分利用条件树状结构模型的计算优势和混合物补偿树状结构限制的能力。我们开发了从数据中学习模型以及使用所学模型进行多标签预测的算法。在多个数据集上的实验表明，我们的方法优于几种最先进的多标签分类方法。

引用次数: 0

A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values 一种词尺度的概率潜变量模型用于人类价值观的检测

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2014-01-01 DOI: 10.1145/2661829.2661966

Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, K. Fleischmann, An-Shou Cheng

This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue — Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators.

本文描述了一个概率潜变量模型，该模型旨在检测作家在参与公共辩论时试图反映或呼吁的人类价值观，如正义或自由。提出的模型将句子中的单词视为根据特定值选择的;然后通过汇总与每个单词相关的值来估计每个句子所反映的值。该模型可以根据前一个词的影响来确定该词的人类价值。这种设计选择是由名词+名词、形容词+名词、动词+形容词等句法结构所驱动的。基于该模型的分类器在包含102个人工注释文档的测试集上进行了评估，这些文档关注一个有争议的政治问题——网络中立性，在这项任务中获得了最高的分类效率。我们还将我们提出的分类器与人类第二注释器进行了比较。因此，所提出的分类器的有效性在统计上与人类注释器相当。

{"title":"A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values","authors":"Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, K. Fleischmann, An-Shou Cheng","doi":"10.1145/2661829.2661966","DOIUrl":"https://doi.org/10.1145/2661829.2661966","url":null,"abstract":"This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue — Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"3 1","pages":"1489-1498"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90611060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Dyadic Event Attribution in Social Networks with Mixtures of Hawkes Processes. Hawkes过程混合社会网络中的二元事件归因。

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2013-01-01 DOI: 10.1145/2505515.2505609

Liangda Li, Hongyuan Zha

In many applications in social network analysis, it is important to model the interactions and infer the influence between pairs of actors, leading to the problem of dyadic event modeling which has attracted increasing interests recently. In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps. Existing works either use fixed model parameters and heuristic rules for event attribution, or assume the dyadic events across actor-pairs are independent. To address those shortcomings we propose a probabilistic model based on mixtures of Hawkes processes that simultaneously tackles event attribution and network parameter inference, taking into consideration the dependency among dyadic events that share at least one actor. We also investigate using additive models to incorporate regularization to avoid overfitting. Our experiments on both synthetic and real-world data sets on international armed conflicts suggest that the proposed new method is capable of significantly improve accuracy when compared with the state-of-the-art for dyadic event attribution.

在社会网络分析的许多应用中，对行为者对之间的相互作用和影响进行建模是很重要的，这导致了二元事件建模问题，近年来引起了人们越来越多的关注。本文主要研究了二元事件归属问题，这是二元事件建模中一个重要的缺失数据问题，需要根据观察到的时间戳推断出二元事件子集中缺失的行动者对。现有的研究要么使用固定的模型参数和启发式规则进行事件归因，要么假设跨行为者对的二元事件是独立的。为了解决这些缺点，我们提出了一个基于霍克斯过程混合的概率模型，该模型同时处理事件归因和网络参数推理，考虑到共享至少一个参与者的二元事件之间的依赖性。我们还研究了使用加性模型来纳入正则化以避免过拟合。我们对国际武装冲突的合成和真实数据集进行的实验表明，与最先进的二元事件归因方法相比，所提出的新方法能够显著提高准确性。

{"title":"Dyadic Event Attribution in Social Networks with Mixtures of Hawkes Processes.","authors":"Liangda Li, Hongyuan Zha","doi":"10.1145/2505515.2505609","DOIUrl":"https://doi.org/10.1145/2505515.2505609","url":null,"abstract":"In many applications in social network analysis, it is important to model the interactions and infer the influence between pairs of actors, leading to the problem of dyadic event modeling which has attracted increasing interests recently. In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps. Existing works either use fixed model parameters and heuristic rules for event attribution, or assume the dyadic events across actor-pairs are independent. To address those shortcomings we propose a probabilistic model based on mixtures of Hawkes processes that simultaneously tackles event attribution and network parameter inference, taking into consideration the dependency among dyadic events that share at least one actor. We also investigate using additive models to incorporate regularization to avoid overfitting. Our experiments on both synthetic and real-world data sets on international armed conflicts suggest that the proposed new method is capable of significantly improve accuracy when compared with the state-of-the-art for dyadic event attribution.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"1667-1672"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2505515.2505609","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32412835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Preprocessing of informal mathematical discourse in context ofcontrolled natural language 受控自然语言语境下的非正式数学语篇预处理

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2012-10-29 DOI: 10.1145/2396761.2398487

Raúl Ernesto Gutiérrez de Piñerez Reyes, Juan Francisco Díaz-Frías

Informal Mathematical Discourse (IMD) is characterized by the mixture of natural language and symbolic expressions in the context of textbooks, publications in mathematics and mathematical proof. We focused the IMD processing at the low level of discourse. In this paper, we proposed the preprocessing phase before the IMD structure analysis within the context of Controlled Natural Language (CNL). Our contribution is defined in context of the IMD processing and the use of machine learning; first, we present a CNL, a pure corpus and Matemathical Treebank for processing IMD; second, we present a preprocessing phase for IMD analysis with connectives disambiguation and verbs treatment, finally, we found a satisfactory result on input text parsing using a statistical parsing model. We will propagate these results for classification of argumentative informal practices via the low level discourse in IMD processing.

非正式数学话语(IMD)的特点是在教科书、数学出版物和数学证明的背景下混合了自然语言和符号表达。我们将IMD处理的重点放在低语篇层面。在本文中，我们提出了在受控自然语言(CNL)背景下IMD结构分析前的预处理阶段。我们的贡献是在IMD处理和机器学习使用的背景下定义的;首先，我们提出了用于处理IMD的CNL、纯语料库和数学树库;其次，我们提出了对IMD分析进行连接词消歧和动词处理的预处理阶段，最后，我们使用统计解析模型对输入文本进行了令人满意的分析。我们将传播这些结果，通过IMD处理中的低层次话语对辩论性非正式实践进行分类。

引用次数: 3

Enabling Ontology Based Semantic Queries in Biomedical Database Systems. 在生物医学数据库系统中实现基于本体的语义查询。

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2012-01-01 DOI: 10.1145/2396761.2398715

Shuai Zheng, Fusheng Wang, James Lu, Joel Saltz

While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations.

尽管当前的生物医学本体库提供了基本的查询功能，但要在有语义注释的生物医学数据库中直接支持基于本体的语义查询却非常困难或繁琐。造成这一问题的主要原因可能是本体和数据库的模型不匹配，以及两个系统的查询界面不匹配。为了充分实现基于本体的语义查询功能，我们开发了一个DBOntoLink系统，通过扩展数据库查询语言来提供统一的语义查询接口。有了DBOntoLink，语义查询可以直接自然地指定为数据库查询语言的扩展函数，而无需任何编程。DBOntoLink可通过定制适应不同的本体，并支持NCBO BioPortal托管的主要生物医学本体。我们演示了 DBOntoLink 在现实世界生物医学数据库中的应用，该数据库具有语义注释的医学图像注释。

{"title":"Enabling Ontology Based Semantic Queries in Biomedical Database Systems.","authors":"Shuai Zheng, Fusheng Wang, James Lu, Joel Saltz","doi":"10.1145/2396761.2398715","DOIUrl":"10.1145/2396761.2398715","url":null,"abstract":"While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"2651-2654"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567445/pdf/nihms-436207.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31325251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0