Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is not straightforward. First, patient-specific variations are typically large and population-based models derived or learned from many different patients are often unable to support accurate predictions for each individual patient. Moreover, time series observed for one patient at any point in time may be too short and insufficient to learn a high-quality patient-specific model just from the patient's own data. To address these problems we propose, develop and experiment with a new adaptive forecasting framework for building multivariate clinical time series models for a patient and for supporting patient-specific predictions. The framework relies on the adaptive model switching approach that at any point in time selects the most promising time series model out of the pool of many possible models, and consequently, combines advantages of the population, patient-specific and short-term individualized predictive models. We demonstrate that the adaptive model switching framework is very promising approach to support personalized time series prediction, and that it is able to outperform predictions based on pure population and patient-specific models, as well as, other patient-specific model adaptation strategies.
{"title":"A Personalized Predictive Framework for Multivariate Clinical Time Series via Adaptive Model Selection.","authors":"Zitao Liu, Milos Hauskrecht","doi":"10.1145/3132847.3132859","DOIUrl":"https://doi.org/10.1145/3132847.3132859","url":null,"abstract":"<p><p>Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is not straightforward. First, patient-specific variations are typically large and population-based models derived or learned from many different patients are often unable to support accurate predictions for each individual patient. Moreover, time series observed for one patient at any point in time may be too short and insufficient to learn a high-quality patient-specific model just from the patient's own data. To address these problems we propose, develop and experiment with a new adaptive forecasting framework for building multivariate clinical time series models for a patient and for supporting patient-specific predictions. The framework relies on the adaptive model switching approach that at any point in time selects the most promising time series model out of the pool of many possible models, and consequently, combines advantages of the population, patient-specific and short-term individualized predictive models. We demonstrate that the adaptive model switching framework is very promising approach to support personalized time series prediction, and that it is able to outperform predictions based on pure population and patient-specific models, as well as, other patient-specific model adaptation strategies.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1169-1177"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3132847.3132859","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35704480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace
We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.
{"title":"A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.","authors":"Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace","doi":"10.1145/3132847.3132989","DOIUrl":"10.1145/3132847.3132989","url":null,"abstract":"<p><p>We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the <i>PICO</i> elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of <i>candidate concepts</i> for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1519-1528"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5752318/pdf/nihms927025.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35714383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, Jiawei Han
Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.
{"title":"FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.","authors":"Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, Jiawei Han","doi":"10.1145/2983323.2983828","DOIUrl":"https://doi.org/10.1145/2983323.2983828","url":null,"abstract":"<p><p>Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (<i>e.g.</i>, application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple <i>local</i> sentence-level features, as well as <i>global</i> context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2016 ","pages":"871-880"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2983323.2983828","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9886648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongkun Yu, Jingbo Shang, Meichun Hsu, Malú Castellanos, Jiawei Han
Users often write reviews on different themes involving linguistic structures with complex sentiments. The sentiment polarity of a word can be different across themes. Moreover, contextual valence shifters may change sentiment polarity depending on the contexts that they appear in. Both challenges cannot be modeled effectively and explicitly in traditional sentiment analysis. Studying both phenomena requires multi-theme sentiment analysis at the word level, which is very interesting but significantly more challenging than overall polarity classification. To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. The framework formulates multi-theme sentiment by factorizing the review sentiments with theme/word embeddings and then derives the shifter effect learning problem as a logistic regression. The improvement of sentiment polarity classification accuracy demonstrates not only the importance of multi-theme and sentiment shifting, but also effectiveness of our framework. Human evaluations and case studies further show the success of multi-theme word sentiment predictions and automatic effect quantification of contextual valence shifters.
{"title":"Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis.","authors":"Hongkun Yu, Jingbo Shang, Meichun Hsu, Malú Castellanos, Jiawei Han","doi":"10.1145/2983323.2983793","DOIUrl":"https://doi.org/10.1145/2983323.2983793","url":null,"abstract":"<p><p>Users often write reviews on different themes involving linguistic structures with complex sentiments. The sentiment polarity of a word can be different across themes. Moreover, contextual valence shifters may change sentiment polarity depending on the contexts that they appear in. Both challenges cannot be modeled effectively and explicitly in traditional sentiment analysis. Studying both phenomena requires multi-theme sentiment analysis at the word level, which is very interesting but significantly more challenging than overall polarity classification. To simultaneously resolve the <i>multi-theme</i> and <i>sentiment shifting</i> problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. The framework formulates multi-theme sentiment by factorizing the review sentiments with theme/word embeddings and then derives the shifter effect learning problem as a logistic regression. The improvement of sentiment polarity classification accuracy demonstrates not only the importance of <i>multi-theme</i> and <i>sentiment shifting</i>, but also effectiveness of our framework. Human evaluations and case studies further show the success of multi-theme word sentiment predictions and automatic effect quantification of contextual valence shifters.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"939-948"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2983323.2983793","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34760161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of modern Clinical Decision Support (CDS) systems is to provide physicians with information relevant to their management of patient care. When faced with a medical case, a physician asks questions about the diagnosis, the tests, or treatments that should be administered. Recently, the TREC-CDS track has addressed this challenge by evaluating results of retrieving relevant scientific articles where the answers of medical questions in support of CDS can be found. Although retrieving relevant medical articles instead of identifying the answers was believed to be an easier task, state-of-the-art results are not yet sufficiently promising. In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer. Answer discovery is the result of probabilistic inference which operates on a probabilistic knowledge graph, automatically generated by processing the medical language of large collections of electronic medical records (EMRs). The probabilistic inference of answers combines knowledge from medical practice (EMRs) with knowledge from medical research (scientific articles). It also takes into account the medical knowledge automatically discerned from the medical case description. We show that this novel form of medical question answering (Q/A) produces very promising results in (a) identifying accurately the answers and (b) it improves medical article ranking by 40%.
{"title":"Medical Question Answering for Clinical Decision Support.","authors":"Travis R Goodwin, Sanda M Harabagiu","doi":"10.1145/2983323.2983819","DOIUrl":"10.1145/2983323.2983819","url":null,"abstract":"<p><p>The goal of modern Clinical Decision Support (CDS) systems is to provide physicians with information relevant to their management of patient care. When faced with a medical case, a physician asks questions about the diagnosis, the tests, or treatments that should be administered. Recently, the TREC-CDS track has addressed this challenge by evaluating results of retrieving relevant scientific articles where the answers of medical questions in support of CDS can be found. Although retrieving relevant medical articles instead of identifying the answers was believed to be an easier task, state-of-the-art results are not yet sufficiently promising. In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer. Answer discovery is the result of probabilistic inference which operates on a probabilistic knowledge graph, automatically generated by processing the medical language of large collections of electronic medical records (EMRs). The probabilistic inference of answers combines knowledge from medical practice (EMRs) with knowledge from medical research (scientific articles). It also takes into account the medical knowledge automatically discerned from the medical case description. We show that this novel form of medical question answering (Q/A) produces very promising results in (a) identifying accurately the answers and (b) it improves medical article ranking by 40%.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"297-306"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530755/pdf/nihms864927.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35228407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods.
{"title":"A Mixtures-of-Trees Framework for Multi-Label Classification.","authors":"Charmgil Hong, Iyad Batal, Milos Hauskrecht","doi":"10.1145/2661829.2661989","DOIUrl":"10.1145/2661829.2661989","url":null,"abstract":"<p><p>We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution <i>P</i>(<b>Y</b>|<b>X</b>). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2014 ","pages":"211-220"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410801/pdf/nihms679948.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33263106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, K. Fleischmann, An-Shou Cheng
This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue — Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators.
{"title":"A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values","authors":"Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, K. Fleischmann, An-Shou Cheng","doi":"10.1145/2661829.2661966","DOIUrl":"https://doi.org/10.1145/2661829.2661966","url":null,"abstract":"This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue — Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"3 1","pages":"1489-1498"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90611060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many applications in social network analysis, it is important to model the interactions and infer the influence between pairs of actors, leading to the problem of dyadic event modeling which has attracted increasing interests recently. In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps. Existing works either use fixed model parameters and heuristic rules for event attribution, or assume the dyadic events across actor-pairs are independent. To address those shortcomings we propose a probabilistic model based on mixtures of Hawkes processes that simultaneously tackles event attribution and network parameter inference, taking into consideration the dependency among dyadic events that share at least one actor. We also investigate using additive models to incorporate regularization to avoid overfitting. Our experiments on both synthetic and real-world data sets on international armed conflicts suggest that the proposed new method is capable of significantly improve accuracy when compared with the state-of-the-art for dyadic event attribution.
{"title":"Dyadic Event Attribution in Social Networks with Mixtures of Hawkes Processes.","authors":"Liangda Li, Hongyuan Zha","doi":"10.1145/2505515.2505609","DOIUrl":"https://doi.org/10.1145/2505515.2505609","url":null,"abstract":"<p><p>In many applications in social network analysis, it is important to model the interactions and infer the influence between pairs of actors, leading to the problem of dyadic event modeling which has attracted increasing interests recently. In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps. Existing works either use fixed model parameters and heuristic rules for event attribution, or assume the dyadic events across actor-pairs are independent. To address those shortcomings we propose a probabilistic model based on mixtures of Hawkes processes that simultaneously tackles event attribution and network parameter inference, taking into consideration the dependency among dyadic events that share at least one actor. We also investigate using additive models to incorporate regularization to avoid overfitting. Our experiments on both synthetic and real-world data sets on international armed conflicts suggest that the proposed new method is capable of significantly improve accuracy when compared with the state-of-the-art for dyadic event attribution.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"1667-1672"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2505515.2505609","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32412835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raúl Ernesto Gutiérrez de Piñerez Reyes, Juan Francisco Díaz-Frías
Informal Mathematical Discourse (IMD) is characterized by the mixture of natural language and symbolic expressions in the context of textbooks, publications in mathematics and mathematical proof. We focused the IMD processing at the low level of discourse. In this paper, we proposed the preprocessing phase before the IMD structure analysis within the context of Controlled Natural Language (CNL). Our contribution is defined in context of the IMD processing and the use of machine learning; first, we present a CNL, a pure corpus and Matemathical Treebank for processing IMD; second, we present a preprocessing phase for IMD analysis with connectives disambiguation and verbs treatment, finally, we found a satisfactory result on input text parsing using a statistical parsing model. We will propagate these results for classification of argumentative informal practices via the low level discourse in IMD processing.
{"title":"Preprocessing of informal mathematical discourse in context ofcontrolled natural language","authors":"Raúl Ernesto Gutiérrez de Piñerez Reyes, Juan Francisco Díaz-Frías","doi":"10.1145/2396761.2398487","DOIUrl":"https://doi.org/10.1145/2396761.2398487","url":null,"abstract":"Informal Mathematical Discourse (IMD) is characterized by the mixture of natural language and symbolic expressions in the context of textbooks, publications in mathematics and mathematical proof. We focused the IMD processing at the low level of discourse. In this paper, we proposed the preprocessing phase before the IMD structure analysis within the context of Controlled Natural Language (CNL). Our contribution is defined in context of the IMD processing and the use of machine learning; first, we present a CNL, a pure corpus and Matemathical Treebank for processing IMD; second, we present a preprocessing phase for IMD analysis with connectives disambiguation and verbs treatment, finally, we found a satisfactory result on input text parsing using a statistical parsing model. We will propagate these results for classification of argumentative informal practices via the low level discourse in IMD processing.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"468 1","pages":"1632-1636"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78332125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations.
{"title":"Enabling Ontology Based Semantic Queries in Biomedical Database Systems.","authors":"Shuai Zheng, Fusheng Wang, James Lu, Joel Saltz","doi":"10.1145/2396761.2398715","DOIUrl":"10.1145/2396761.2398715","url":null,"abstract":"<p><p>While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"2651-2654"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567445/pdf/nihms-436207.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31325251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management