Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey
Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox One's movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.
{"title":"Generalized Learning of Neural Network Based Semantic Similarity Models and Its Application in Movie Search","authors":"Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey","doi":"10.1109/ICDMW.2015.34","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.34","url":null,"abstract":"Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox One's movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lack of the global knowledge of land-cover changes limits our understanding of the earth system, hinders natural resource management and also compounds risks. Remote sensing data provides an opportunity to automatically detect and monitor land-cover changes. Although changes in land cover can be observed from remote sensing time series, most traditional change point detection algorithms do not perform well due to the unique properties of the remote sensing data, such as noise, missing values and seasonality. We propose an online change point detection method that addresses these challenges. Using an independent validation set, we show that the proposed method performs better than the four baseline methods in both of the two testing regions, which has ecologically diverse features.
{"title":"Online Change Detection Algorithm for Noisy Time-Series: An Application Tonear-Real Time Burned Area Mapping","authors":"Xi C. Chen, Vipin Kumar, James H. Faghmous","doi":"10.1109/ICDMW.2015.237","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.237","url":null,"abstract":"Lack of the global knowledge of land-cover changes limits our understanding of the earth system, hinders natural resource management and also compounds risks. Remote sensing data provides an opportunity to automatically detect and monitor land-cover changes. Although changes in land cover can be observed from remote sensing time series, most traditional change point detection algorithms do not perform well due to the unique properties of the remote sensing data, such as noise, missing values and seasonality. We propose an online change point detection method that addresses these challenges. Using an independent validation set, we show that the proposed method performs better than the four baseline methods in both of the two testing regions, which has ecologically diverse features.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128643532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we show how we can use Foursquare check-ins to understand the behavior of tourists that would be hard using traditional methods, such as surveys. For that, we analyze the behavior of tourists and residents in four popular cities around the world in four continents: London, New York, Rio de Janeiro, and Tokyo. We perform a spatio-temporal study of properties of the behavior of these two classes of users (tourists and residents). We have identified, for instance, that some locations have features that are more correlated with the tourists' behavior, and also that even in places frequented by tourists and residents there are clear distinction in the patterns of behavior of these groups of users. Our study also enables to identify which and when sights are popular. Our results could be useful in several cases, for example, to help in the development of new place recommendation systems for tourists, or to help city planners to better support tourists in their cities.
{"title":"Beyond Sights: Large Scale Study of Tourists' Behavior Using Foursquare Data","authors":"A. Ferreira, Thiago H. Silva, A. Loureiro","doi":"10.1109/ICDMW.2015.234","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.234","url":null,"abstract":"In this paper, we show how we can use Foursquare check-ins to understand the behavior of tourists that would be hard using traditional methods, such as surveys. For that, we analyze the behavior of tourists and residents in four popular cities around the world in four continents: London, New York, Rio de Janeiro, and Tokyo. We perform a spatio-temporal study of properties of the behavior of these two classes of users (tourists and residents). We have identified, for instance, that some locations have features that are more correlated with the tourists' behavior, and also that even in places frequented by tourists and residents there are clear distinction in the patterns of behavior of these groups of users. Our study also enables to identify which and when sights are popular. Our results could be useful in several cases, for example, to help in the development of new place recommendation systems for tourists, or to help city planners to better support tourists in their cities.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128667464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditi Adhikari, V. Zheng, Hong Cao, Miao Lin, Yuan Fang, K. Chang
Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface -- people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content from the Web for augmented reality. We demonstrate the system effectiveness via a test bed established in a real mall of Singapore.
{"title":"IntelligShop: Enabling Intelligent Shopping in Malls through Location-Based Augmented Reality","authors":"Aditi Adhikari, V. Zheng, Hong Cao, Miao Lin, Yuan Fang, K. Chang","doi":"10.1109/ICDMW.2015.103","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.103","url":null,"abstract":"Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface -- people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content from the Web for augmented reality. We demonstrate the system effectiveness via a test bed established in a real mall of Singapore.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past few years, the rapid emergence of massive open online courses (MOOCs) has sparked a great deal of research interest in MOOC data analytics. Dropout prediction, or identifying students at risk of dropping out of a course, is an important problem to study due to the high attrition rate commonly found on many MOOC platforms. The methods proposed recently for dropout prediction apply relatively simple machine learning methods like support vector machines and logistic regression, using features that reflect such student activities as lecture video watching and forum activities on a MOOC platform during the study period of a course. Since the features are captured continuously for each student over a period of time, dropout prediction is essentially a time series prediction problem. By regarding dropout prediction as a sequence classification problem, we propose some temporal models for solving it. In particular, based on extensive experiments conducted on two MOOCs offered on Coursera and edX, a recurrent neural network (RNN) model with long short-term memory (LSTM) cells beats the baseline methods as well as our other proposed methods by a large margin.
{"title":"Temporal Models for Predicting Student Dropout in Massive Open Online Courses","authors":"Mi Fei, D. Yeung","doi":"10.1109/ICDMW.2015.174","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.174","url":null,"abstract":"Over the past few years, the rapid emergence of massive open online courses (MOOCs) has sparked a great deal of research interest in MOOC data analytics. Dropout prediction, or identifying students at risk of dropping out of a course, is an important problem to study due to the high attrition rate commonly found on many MOOC platforms. The methods proposed recently for dropout prediction apply relatively simple machine learning methods like support vector machines and logistic regression, using features that reflect such student activities as lecture video watching and forum activities on a MOOC platform during the study period of a course. Since the features are captured continuously for each student over a period of time, dropout prediction is essentially a time series prediction problem. By regarding dropout prediction as a sequence classification problem, we propose some temporal models for solving it. In particular, based on extensive experiments conducted on two MOOCs offered on Coursera and edX, a recurrent neural network (RNN) model with long short-term memory (LSTM) cells beats the baseline methods as well as our other proposed methods by a large margin.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.
Twitter和Facebook等社交媒体的突出导致了大量数据的收集,事件检测可以提供有用的结果。事件检测的一个重要方面是对被检测事件的位置估计。社交媒体提供了各种各样的位置线索,例如智能设备的地理注释、用户个人资料中的位置字段和消息内容。在这些线索中,消息内容需要更多的精力来处理,但它通常更具信息性。在本文中,我们专注于从社交媒体消息中提取地点名称,即地名识别。我们提出了一个混合系统,它使用基于规则和基于机器学习的技术从推文中提取地名。将条件随机场(Conditional Random Fields, CRF)作为机器学习工具,定义词性标签和连接窗口等特征,构建用于地名识别的条件随机场模型。在基于规则的部分中,使用正则表达式来定义一些地名识别模式,并提供简单的规范化级别,以便处理文本中的非正式性。实验结果表明,该方法比以往的研究方法具有更高的地名识别率。
{"title":"Toponym Recognition in Social Media for Estimating the Location of Events","authors":"M. Sagcan, P. Senkul","doi":"10.1109/ICDMW.2015.167","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.167","url":null,"abstract":"Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Community Question Answering (CQA) is a popular online service for people asking and answering questions. Recently, with accumulation of users and contents in CQA platforms, their answer quality has aroused wide concern. Expert finding has been proposed as one way to address such problem, which aims at finding suitable answerers who can give high-quality answers. In this paper, we formalize expert finding as a learning to rank task by leveraging the user feedback on answers (i.e., the votes of answers) as the "relevance" labels. To achieve this task, we present a listwise learning to rank approach, which is referred to as ListEF. In the ListEF approach, realizing that questions in CQA are relatively short and usually attached with tags, we propose a tagword topic model (TTM) to derive high-quality topical representations of questions. Based on TTM, we develop a COmpetition-based User exPertise Extraction (COUPE) method to capture user expertise features for given questions. We adopt the widely used listwise learning to rank method LambdaMART to train the ranking function. Finally, for a given question, we rank candidate users in descending order of the scores calculated by the trained ranking function, and select the users with high rankings as candidate experts. Experimental results on Stack Overflow show both our TTM and ListEF approach are effective with significant improvements over state-of-art methods.
{"title":"Exploiting User Feedback for Expert Finding in Community Question Answering","authors":"Xiang Cheng, Shuguang Zhu, Gang Chen, Sen Su","doi":"10.1109/ICDMW.2015.181","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.181","url":null,"abstract":"Community Question Answering (CQA) is a popular online service for people asking and answering questions. Recently, with accumulation of users and contents in CQA platforms, their answer quality has aroused wide concern. Expert finding has been proposed as one way to address such problem, which aims at finding suitable answerers who can give high-quality answers. In this paper, we formalize expert finding as a learning to rank task by leveraging the user feedback on answers (i.e., the votes of answers) as the \"relevance\" labels. To achieve this task, we present a listwise learning to rank approach, which is referred to as ListEF. In the ListEF approach, realizing that questions in CQA are relatively short and usually attached with tags, we propose a tagword topic model (TTM) to derive high-quality topical representations of questions. Based on TTM, we develop a COmpetition-based User exPertise Extraction (COUPE) method to capture user expertise features for given questions. We adopt the widely used listwise learning to rank method LambdaMART to train the ranking function. Finally, for a given question, we rank candidate users in descending order of the scores calculated by the trained ranking function, and select the users with high rankings as candidate experts. Experimental results on Stack Overflow show both our TTM and ListEF approach are effective with significant improvements over state-of-art methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122354906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Algorithmic composition refers to the process of composing a melody automatically using algorithms. A bulk of methods have been proposed for this task. Among them, a novel idea is to utilize the correlation information between the pitches of melodies and the tones of lyrics for melody composition. Unfortunately, the existing method adopting this idea suffers from several severe shortcomings and thus the merits of the above idea are not fully utilized. In this paper, we propose a new technique to capture the above correlation information based on the concepts of pitch trends and tone trends. Based on this technique, we design a new algorithm called Trend-MC for melody composition which avoids the shortcomings of the existing method. We also developed a software with the Trend-MC algorithm as its core. We demonstrate that the software could compose nice melodies with the input of lyrics.
{"title":"Trend-MC: A Melody Composer by Constructing from Frequent Trend-Based Patterns","authors":"Cheng Long, R. C. Wong, R. W. Sze","doi":"10.1109/ICDMW.2015.165","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.165","url":null,"abstract":"Algorithmic composition refers to the process of composing a melody automatically using algorithms. A bulk of methods have been proposed for this task. Among them, a novel idea is to utilize the correlation information between the pitches of melodies and the tones of lyrics for melody composition. Unfortunately, the existing method adopting this idea suffers from several severe shortcomings and thus the merits of the above idea are not fully utilized. In this paper, we propose a new technique to capture the above correlation information based on the concepts of pitch trends and tone trends. Based on this technique, we design a new algorithm called Trend-MC for melody composition which avoids the shortcomings of the existing method. We also developed a software with the Trend-MC algorithm as its core. We demonstrate that the software could compose nice melodies with the input of lyrics.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132070260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In batch systems monitoring information at the level of individual jobs is crucial to optimize resource utilization and prevent misusage. However, especially the usage of network resources is difficult to track. In order to understand usage patterns in modern computing clusters, a more detailed monitoring than existent solutions is required. A monitoring on job level leads to dynamic graphs of processes with attached time series data of e.g. network resource usage. Utilizing clustering, common usage patterns can be identified and outliers detected. This work provides an overview about ongoing efforts to cluster dynamic graphs in the context of distributed streams of monitoring events.
{"title":"Clustering Evolving Batch System Jobs for Online Anomaly Detection","authors":"E. Kuehn","doi":"10.1109/ICDMW.2015.219","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.219","url":null,"abstract":"In batch systems monitoring information at the level of individual jobs is crucial to optimize resource utilization and prevent misusage. However, especially the usage of network resources is difficult to track. In order to understand usage patterns in modern computing clusters, a more detailed monitoring than existent solutions is required. A monitoring on job level leads to dynamic graphs of processes with attached time series data of e.g. network resource usage. Utilizing clustering, common usage patterns can be identified and outliers detected. This work provides an overview about ongoing efforts to cluster dynamic graphs in the context of distributed streams of monitoring events.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130223204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mondal, I. Chaturvedi, Dipankar Das, Rajiv Bajpai, Sivaji Bandyopadhyay
The continuous sophistication in clinical informationprocessing motivates the development of a dictionary likeWordNet for Medical Events in order to convey the valuableinformation (e.g., event definition, sense based contextualdescription, polarity etc.) to the experts (e.g. medicalpractitioners) and non-experts (e.g. patients) in their respective fields. The present paper reports the enrichment of medical terms such as identifying and describing events, times and the relations between them in clinical text by employing three different lexical resources namely seed list of medical events collected from SemEval 2015 Task-6, the WordNet and an English medical dictionary. In particular, we develop WordNet for Medical Events (WME) that uses contextual information for word sense disambiguation of medical terms and reduce the communication gap between doctors and patients. We have proposed two approaches (Sequential and Combined) for identifying the proper sense of a medical event based on each of the three types of texts. The polarity lexicons e.g., SentiWordNet, Affect Word List and Taboda's adjective list have been used for implementing the polarity based Word Sense Disambiguation of the medical events from their glosses as extracted from the lexicalresources. The proposed WME out-performed a previouslyproposed Lesk Word Sense Disambiguation in the range of 10-20%.
临床信息处理的不断复杂化促使像wordnet这样的医学事件词典的发展,以便向各自领域的专家(例如医疗从业者)和非专家(例如患者)传达有价值的信息(例如事件定义,基于感觉的上下文描述,极性等)。本文利用从SemEval 2015 Task-6中收集的医学事件种子列表、WordNet和英语医学词典三种不同的词汇资源,报道了临床文本中识别和描述事件、时间及其之间关系等医学术语的丰富。特别地,我们开发了用于医学事件的WordNet (WME),它使用上下文信息来消除医学术语的词义歧义,减少了医生和病人之间的沟通差距。我们提出了两种方法(顺序和组合),用于根据三种类型的文本中的每一种来确定医学事件的正确含义。利用极性词汇如SentiWordNet、Affect Word List和Taboda’s形容词List,实现了从词汇资源中提取医学事件的词汇表中基于极性的词义消歧。提出的WME在10-20%的范围内优于先前提出的Lesk词义消歧。
{"title":"Lexical Resource for Medical Events: A Polarity Based Approach","authors":"A. Mondal, I. Chaturvedi, Dipankar Das, Rajiv Bajpai, Sivaji Bandyopadhyay","doi":"10.1109/ICDMW.2015.170","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.170","url":null,"abstract":"The continuous sophistication in clinical informationprocessing motivates the development of a dictionary likeWordNet for Medical Events in order to convey the valuableinformation (e.g., event definition, sense based contextualdescription, polarity etc.) to the experts (e.g. medicalpractitioners) and non-experts (e.g. patients) in their respective fields. The present paper reports the enrichment of medical terms such as identifying and describing events, times and the relations between them in clinical text by employing three different lexical resources namely seed list of medical events collected from SemEval 2015 Task-6, the WordNet and an English medical dictionary. In particular, we develop WordNet for Medical Events (WME) that uses contextual information for word sense disambiguation of medical terms and reduce the communication gap between doctors and patients. We have proposed two approaches (Sequential and Combined) for identifying the proper sense of a medical event based on each of the three types of texts. The polarity lexicons e.g., SentiWordNet, Affect Word List and Taboda's adjective list have been used for implementing the polarity based Word Sense Disambiguation of the medical events from their glosses as extracted from the lexicalresources. The proposed WME out-performed a previouslyproposed Lesk Word Sense Disambiguation in the range of 10-20%.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127894867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}