首页 > 最新文献

2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文 中文
Generalized Learning of Neural Network Based Semantic Similarity Models and Its Application in Movie Search 基于神经网络语义相似度模型的广义学习及其在电影搜索中的应用
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.34
Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey
Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox One's movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.
近年来,利用神经网络方法对文本语义相似度进行建模,显著提高了文本信息检索的性能。然而,这些基于神经网络的潜在语义模型大多是通过简单的用户行为记录数据(如点击(查询、文档)对)来训练的,并且所有的点击对都被假设为一致的正例。因此,现有的模型参数学习方法没有区分可能反映不同相关信息的数据样本。在本文中,我们放宽了这一假设,并提出了一种新的学习方法,通过广义损失函数来捕捉训练样本在更细粒度的标签结构下的微妙相关性差异。我们已经将其应用于Xbox One的电影搜索任务,其中基于会话的用户行为信息是可用的,训练样本的粒度相关性差异来自会话日志。与现有方法相比,我们的新广义损失函数通过几个用户参与指标显示出优越的测试性能。在现代搜索引擎中广泛使用的梯度增强决策树模型中,当从我们的模型计算的分数作为语义相似特征时,它也会产生显著的性能提升。
{"title":"Generalized Learning of Neural Network Based Semantic Similarity Models and Its Application in Movie Search","authors":"Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey","doi":"10.1109/ICDMW.2015.34","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.34","url":null,"abstract":"Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox One's movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Online Change Detection Algorithm for Noisy Time-Series: An Application Tonear-Real Time Burned Area Mapping 噪声时间序列的在线变化检测算法:在近实时烧伤面积映射中的应用
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.237
Xi C. Chen, Vipin Kumar, James H. Faghmous
Lack of the global knowledge of land-cover changes limits our understanding of the earth system, hinders natural resource management and also compounds risks. Remote sensing data provides an opportunity to automatically detect and monitor land-cover changes. Although changes in land cover can be observed from remote sensing time series, most traditional change point detection algorithms do not perform well due to the unique properties of the remote sensing data, such as noise, missing values and seasonality. We propose an online change point detection method that addresses these challenges. Using an independent validation set, we show that the proposed method performs better than the four baseline methods in both of the two testing regions, which has ecologically diverse features.
缺乏关于土地覆盖变化的全球知识限制了我们对地球系统的理解,阻碍了自然资源的管理,也加剧了风险。遥感数据提供了自动探测和监测土地覆盖变化的机会。虽然可以从遥感时间序列中观测到土地覆盖的变化,但由于遥感数据的噪声、缺失值和季节性等特性,大多数传统的变化点检测算法的性能并不好。我们提出了一种在线变化点检测方法来解决这些挑战。通过独立的验证集,我们发现该方法在具有生态多样性特征的两个测试区域都优于四种基线方法。
{"title":"Online Change Detection Algorithm for Noisy Time-Series: An Application Tonear-Real Time Burned Area Mapping","authors":"Xi C. Chen, Vipin Kumar, James H. Faghmous","doi":"10.1109/ICDMW.2015.237","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.237","url":null,"abstract":"Lack of the global knowledge of land-cover changes limits our understanding of the earth system, hinders natural resource management and also compounds risks. Remote sensing data provides an opportunity to automatically detect and monitor land-cover changes. Although changes in land cover can be observed from remote sensing time series, most traditional change point detection algorithms do not perform well due to the unique properties of the remote sensing data, such as noise, missing values and seasonality. We propose an online change point detection method that addresses these challenges. Using an independent validation set, we show that the proposed method performs better than the four baseline methods in both of the two testing regions, which has ecologically diverse features.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128643532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Beyond Sights: Large Scale Study of Tourists' Behavior Using Foursquare Data 超越视野:利用Foursquare数据对游客行为进行大规模研究
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.234
A. Ferreira, Thiago H. Silva, A. Loureiro
In this paper, we show how we can use Foursquare check-ins to understand the behavior of tourists that would be hard using traditional methods, such as surveys. For that, we analyze the behavior of tourists and residents in four popular cities around the world in four continents: London, New York, Rio de Janeiro, and Tokyo. We perform a spatio-temporal study of properties of the behavior of these two classes of users (tourists and residents). We have identified, for instance, that some locations have features that are more correlated with the tourists' behavior, and also that even in places frequented by tourists and residents there are clear distinction in the patterns of behavior of these groups of users. Our study also enables to identify which and when sights are popular. Our results could be useful in several cases, for example, to help in the development of new place recommendation systems for tourists, or to help city planners to better support tourists in their cities.
在本文中,我们展示了如何使用Foursquare签到来了解游客的行为,这将很难使用传统的方法,如调查。为此,我们分析了四大洲四个受欢迎城市的游客和居民的行为:伦敦、纽约、里约热内卢和东京。我们对这两类用户(游客和居民)的行为属性进行了时空研究。例如,我们已经确定,一些地点具有与游客行为更相关的特征,并且即使在游客和居民经常光顾的地方,这些用户群体的行为模式也有明显的区别。我们的研究还能够确定哪些景点以及何时受欢迎。我们的研究结果在很多情况下都是有用的,例如,帮助开发新的游客推荐系统,或者帮助城市规划者更好地支持城市中的游客。
{"title":"Beyond Sights: Large Scale Study of Tourists' Behavior Using Foursquare Data","authors":"A. Ferreira, Thiago H. Silva, A. Loureiro","doi":"10.1109/ICDMW.2015.234","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.234","url":null,"abstract":"In this paper, we show how we can use Foursquare check-ins to understand the behavior of tourists that would be hard using traditional methods, such as surveys. For that, we analyze the behavior of tourists and residents in four popular cities around the world in four continents: London, New York, Rio de Janeiro, and Tokyo. We perform a spatio-temporal study of properties of the behavior of these two classes of users (tourists and residents). We have identified, for instance, that some locations have features that are more correlated with the tourists' behavior, and also that even in places frequented by tourists and residents there are clear distinction in the patterns of behavior of these groups of users. Our study also enables to identify which and when sights are popular. Our results could be useful in several cases, for example, to help in the development of new place recommendation systems for tourists, or to help city planners to better support tourists in their cities.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128667464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
IntelligShop: Enabling Intelligent Shopping in Malls through Location-Based Augmented Reality 智能商店:通过基于位置的增强现实实现商场中的智能购物
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.103
Aditi Adhikari, V. Zheng, Hong Cao, Miao Lin, Yuan Fang, K. Chang
Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface -- people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content from the Web for augmented reality. We demonstrate the system effectiveness via a test bed established in a real mall of Singapore.
购物体验对市民和游客都很重要。我们提出智能商店,一个新颖的基于位置的增强现实应用程序,支持智能购物体验的商场。智能商店的关键功能是提供增强现实界面——人们可以简单地使用无处不在的智能手机面对商场零售商,然后智能商店会自动识别零售商,并从各种来源(包括博客、论坛和可公开访问的社交媒体)获取他们的在线评论,显示在手机上。从技术上讲,智能商店解决了两个具有挑战性的数据挖掘问题,包括强大的特征学习以支持异构智能手机的本地化,以及学习查询以自动收集来自Web的零售商内容以用于增强现实。我们通过在新加坡一个真实商场建立的试验台验证了系统的有效性。
{"title":"IntelligShop: Enabling Intelligent Shopping in Malls through Location-Based Augmented Reality","authors":"Aditi Adhikari, V. Zheng, Hong Cao, Miao Lin, Yuan Fang, K. Chang","doi":"10.1109/ICDMW.2015.103","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.103","url":null,"abstract":"Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface -- people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content from the Web for augmented reality. We demonstrate the system effectiveness via a test bed established in a real mall of Singapore.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Temporal Models for Predicting Student Dropout in Massive Open Online Courses 大规模在线开放课程学生退学预测的时间模型
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.174
Mi Fei, D. Yeung
Over the past few years, the rapid emergence of massive open online courses (MOOCs) has sparked a great deal of research interest in MOOC data analytics. Dropout prediction, or identifying students at risk of dropping out of a course, is an important problem to study due to the high attrition rate commonly found on many MOOC platforms. The methods proposed recently for dropout prediction apply relatively simple machine learning methods like support vector machines and logistic regression, using features that reflect such student activities as lecture video watching and forum activities on a MOOC platform during the study period of a course. Since the features are captured continuously for each student over a period of time, dropout prediction is essentially a time series prediction problem. By regarding dropout prediction as a sequence classification problem, we propose some temporal models for solving it. In particular, based on extensive experiments conducted on two MOOCs offered on Coursera and edX, a recurrent neural network (RNN) model with long short-term memory (LSTM) cells beats the baseline methods as well as our other proposed methods by a large margin.
在过去的几年里,大规模在线开放课程(MOOC)的迅速兴起引发了对MOOC数据分析的大量研究兴趣。由于在许多MOOC平台上常见的高流失率,辍学预测或识别有辍学风险的学生是一个重要的研究问题。最近提出的辍学预测方法采用相对简单的机器学习方法,如支持向量机和逻辑回归,使用反映学生活动的特征,如在课程学习期间在MOOC平台上观看讲座视频和论坛活动。由于在一段时间内连续捕获每个学生的特征,因此辍学预测本质上是一个时间序列预测问题。将dropout预测视为一个序列分类问题,提出了求解该问题的时间模型。特别是,基于在Coursera和edX上提供的两个mooc上进行的大量实验,具有长短期记忆(LSTM)细胞的递归神经网络(RNN)模型大大优于基线方法以及我们提出的其他方法。
{"title":"Temporal Models for Predicting Student Dropout in Massive Open Online Courses","authors":"Mi Fei, D. Yeung","doi":"10.1109/ICDMW.2015.174","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.174","url":null,"abstract":"Over the past few years, the rapid emergence of massive open online courses (MOOCs) has sparked a great deal of research interest in MOOC data analytics. Dropout prediction, or identifying students at risk of dropping out of a course, is an important problem to study due to the high attrition rate commonly found on many MOOC platforms. The methods proposed recently for dropout prediction apply relatively simple machine learning methods like support vector machines and logistic regression, using features that reflect such student activities as lecture video watching and forum activities on a MOOC platform during the study period of a course. Since the features are captured continuously for each student over a period of time, dropout prediction is essentially a time series prediction problem. By regarding dropout prediction as a sequence classification problem, we propose some temporal models for solving it. In particular, based on extensive experiments conducted on two MOOCs offered on Coursera and edX, a recurrent neural network (RNN) model with long short-term memory (LSTM) cells beats the baseline methods as well as our other proposed methods by a large margin.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 192
Toponym Recognition in Social Media for Estimating the Location of Events 基于社交媒体地名识别的事件位置估计
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.167
M. Sagcan, P. Senkul
Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.
Twitter和Facebook等社交媒体的突出导致了大量数据的收集,事件检测可以提供有用的结果。事件检测的一个重要方面是对被检测事件的位置估计。社交媒体提供了各种各样的位置线索,例如智能设备的地理注释、用户个人资料中的位置字段和消息内容。在这些线索中,消息内容需要更多的精力来处理,但它通常更具信息性。在本文中,我们专注于从社交媒体消息中提取地点名称,即地名识别。我们提出了一个混合系统,它使用基于规则和基于机器学习的技术从推文中提取地名。将条件随机场(Conditional Random Fields, CRF)作为机器学习工具,定义词性标签和连接窗口等特征,构建用于地名识别的条件随机场模型。在基于规则的部分中,使用正则表达式来定义一些地名识别模式,并提供简单的规范化级别,以便处理文本中的非正式性。实验结果表明,该方法比以往的研究方法具有更高的地名识别率。
{"title":"Toponym Recognition in Social Media for Estimating the Location of Events","authors":"M. Sagcan, P. Senkul","doi":"10.1109/ICDMW.2015.167","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.167","url":null,"abstract":"Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Exploiting User Feedback for Expert Finding in Community Question Answering 利用用户反馈在社区问答中寻找专家
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.181
Xiang Cheng, Shuguang Zhu, Gang Chen, Sen Su
Community Question Answering (CQA) is a popular online service for people asking and answering questions. Recently, with accumulation of users and contents in CQA platforms, their answer quality has aroused wide concern. Expert finding has been proposed as one way to address such problem, which aims at finding suitable answerers who can give high-quality answers. In this paper, we formalize expert finding as a learning to rank task by leveraging the user feedback on answers (i.e., the votes of answers) as the "relevance" labels. To achieve this task, we present a listwise learning to rank approach, which is referred to as ListEF. In the ListEF approach, realizing that questions in CQA are relatively short and usually attached with tags, we propose a tagword topic model (TTM) to derive high-quality topical representations of questions. Based on TTM, we develop a COmpetition-based User exPertise Extraction (COUPE) method to capture user expertise features for given questions. We adopt the widely used listwise learning to rank method LambdaMART to train the ranking function. Finally, for a given question, we rank candidate users in descending order of the scores calculated by the trained ranking function, and select the users with high rankings as candidate experts. Experimental results on Stack Overflow show both our TTM and ListEF approach are effective with significant improvements over state-of-art methods.
社区问答(CQA)是一种流行的在线服务,供人们提问和回答问题。近年来,随着CQA平台用户和内容的积累,其答题质量引起了广泛关注。专家寻找被提出作为解决这一问题的一种方法,其目的是寻找能够给出高质量答案的合适的答案。在本文中,我们将专家发现形式化为一种学习,通过利用用户对答案的反馈(即对答案的投票)作为“相关性”标签来对任务进行排名。为了完成这项任务,我们提出了一种列表学习排序方法,称为ListEF。在ListEF方法中,意识到CQA中的问题相对较短并且通常带有标签,我们提出了一个标签主题模型(TTM)来获得问题的高质量主题表示。基于TTM,我们开发了一种基于竞争的用户专业知识提取(COUPE)方法来捕获给定问题的用户专业知识特征。我们采用了广泛使用的列表学习排序方法LambdaMART来训练排序函数。最后,对于给定的问题,我们按照训练好的排序函数计算出的分数降序对候选用户进行排序,并选择排名高的用户作为候选专家。关于堆栈溢出的实验结果表明,我们的TTM和ListEF方法都是有效的,与最先进的方法相比有了显著的改进。
{"title":"Exploiting User Feedback for Expert Finding in Community Question Answering","authors":"Xiang Cheng, Shuguang Zhu, Gang Chen, Sen Su","doi":"10.1109/ICDMW.2015.181","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.181","url":null,"abstract":"Community Question Answering (CQA) is a popular online service for people asking and answering questions. Recently, with accumulation of users and contents in CQA platforms, their answer quality has aroused wide concern. Expert finding has been proposed as one way to address such problem, which aims at finding suitable answerers who can give high-quality answers. In this paper, we formalize expert finding as a learning to rank task by leveraging the user feedback on answers (i.e., the votes of answers) as the \"relevance\" labels. To achieve this task, we present a listwise learning to rank approach, which is referred to as ListEF. In the ListEF approach, realizing that questions in CQA are relatively short and usually attached with tags, we propose a tagword topic model (TTM) to derive high-quality topical representations of questions. Based on TTM, we develop a COmpetition-based User exPertise Extraction (COUPE) method to capture user expertise features for given questions. We adopt the widely used listwise learning to rank method LambdaMART to train the ranking function. Finally, for a given question, we rank candidate users in descending order of the scores calculated by the trained ranking function, and select the users with high rankings as candidate experts. Experimental results on Stack Overflow show both our TTM and ListEF approach are effective with significant improvements over state-of-art methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122354906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Trend-MC: A Melody Composer by Constructing from Frequent Trend-Based Patterns 趋势- mc:从频繁的基于趋势的模式构建旋律作曲家
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.165
Cheng Long, R. C. Wong, R. W. Sze
Algorithmic composition refers to the process of composing a melody automatically using algorithms. A bulk of methods have been proposed for this task. Among them, a novel idea is to utilize the correlation information between the pitches of melodies and the tones of lyrics for melody composition. Unfortunately, the existing method adopting this idea suffers from several severe shortcomings and thus the merits of the above idea are not fully utilized. In this paper, we propose a new technique to capture the above correlation information based on the concepts of pitch trends and tone trends. Based on this technique, we design a new algorithm called Trend-MC for melody composition which avoids the shortcomings of the existing method. We also developed a software with the Trend-MC algorithm as its core. We demonstrate that the software could compose nice melodies with the input of lyrics.
算法作曲是指使用算法自动作曲旋律的过程。对于这项任务,已经提出了大量的方法。其中一个新颖的思路是利用旋律音高与歌词音调之间的关联信息进行旋律创作。不幸的是,采用这种思想的现有方法有几个严重的缺点,因此没有充分利用上述思想的优点。在本文中,我们提出了一种基于音高趋势和音调趋势的概念来获取上述相关信息的新技术。在此基础上,我们设计了一种新的旋律作曲算法Trend-MC,避免了现有方法的不足。我们还开发了以Trend-MC算法为核心的软件。我们证明了该软件可以在输入歌词的情况下创作出优美的旋律。
{"title":"Trend-MC: A Melody Composer by Constructing from Frequent Trend-Based Patterns","authors":"Cheng Long, R. C. Wong, R. W. Sze","doi":"10.1109/ICDMW.2015.165","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.165","url":null,"abstract":"Algorithmic composition refers to the process of composing a melody automatically using algorithms. A bulk of methods have been proposed for this task. Among them, a novel idea is to utilize the correlation information between the pitches of melodies and the tones of lyrics for melody composition. Unfortunately, the existing method adopting this idea suffers from several severe shortcomings and thus the merits of the above idea are not fully utilized. In this paper, we propose a new technique to capture the above correlation information based on the concepts of pitch trends and tone trends. Based on this technique, we design a new algorithm called Trend-MC for melody composition which avoids the shortcomings of the existing method. We also developed a software with the Trend-MC algorithm as its core. We demonstrate that the software could compose nice melodies with the input of lyrics.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132070260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Clustering Evolving Batch System Jobs for Online Anomaly Detection 聚类演化批处理系统作业在线异常检测
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.219
E. Kuehn
In batch systems monitoring information at the level of individual jobs is crucial to optimize resource utilization and prevent misusage. However, especially the usage of network resources is difficult to track. In order to understand usage patterns in modern computing clusters, a more detailed monitoring than existent solutions is required. A monitoring on job level leads to dynamic graphs of processes with attached time series data of e.g. network resource usage. Utilizing clustering, common usage patterns can be identified and outliers detected. This work provides an overview about ongoing efforts to cluster dynamic graphs in the context of distributed streams of monitoring events.
在批处理系统中,监控单个作业级别的信息对于优化资源利用和防止滥用至关重要。但是,特别是网络资源的使用情况很难跟踪。为了理解现代计算集群中的使用模式,需要比现有解决方案更详细的监控。对作业级别的监控可以生成进程的动态图形,并附带诸如网络资源使用等时间序列数据。利用聚类,可以识别常见的使用模式并检测异常值。这项工作概述了在分布式监控事件流上下文中对动态图进行聚类的持续努力。
{"title":"Clustering Evolving Batch System Jobs for Online Anomaly Detection","authors":"E. Kuehn","doi":"10.1109/ICDMW.2015.219","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.219","url":null,"abstract":"In batch systems monitoring information at the level of individual jobs is crucial to optimize resource utilization and prevent misusage. However, especially the usage of network resources is difficult to track. In order to understand usage patterns in modern computing clusters, a more detailed monitoring than existent solutions is required. A monitoring on job level leads to dynamic graphs of processes with attached time series data of e.g. network resource usage. Utilizing clustering, common usage patterns can be identified and outliers detected. This work provides an overview about ongoing efforts to cluster dynamic graphs in the context of distributed streams of monitoring events.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130223204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Lexical Resource for Medical Events: A Polarity Based Approach 医学事件的词汇资源:基于极性的方法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.170
A. Mondal, I. Chaturvedi, Dipankar Das, Rajiv Bajpai, Sivaji Bandyopadhyay
The continuous sophistication in clinical informationprocessing motivates the development of a dictionary likeWordNet for Medical Events in order to convey the valuableinformation (e.g., event definition, sense based contextualdescription, polarity etc.) to the experts (e.g. medicalpractitioners) and non-experts (e.g. patients) in their respective fields. The present paper reports the enrichment of medical terms such as identifying and describing events, times and the relations between them in clinical text by employing three different lexical resources namely seed list of medical events collected from SemEval 2015 Task-6, the WordNet and an English medical dictionary. In particular, we develop WordNet for Medical Events (WME) that uses contextual information for word sense disambiguation of medical terms and reduce the communication gap between doctors and patients. We have proposed two approaches (Sequential and Combined) for identifying the proper sense of a medical event based on each of the three types of texts. The polarity lexicons e.g., SentiWordNet, Affect Word List and Taboda's adjective list have been used for implementing the polarity based Word Sense Disambiguation of the medical events from their glosses as extracted from the lexicalresources. The proposed WME out-performed a previouslyproposed Lesk Word Sense Disambiguation in the range of 10-20%.
临床信息处理的不断复杂化促使像wordnet这样的医学事件词典的发展,以便向各自领域的专家(例如医疗从业者)和非专家(例如患者)传达有价值的信息(例如事件定义,基于感觉的上下文描述,极性等)。本文利用从SemEval 2015 Task-6中收集的医学事件种子列表、WordNet和英语医学词典三种不同的词汇资源,报道了临床文本中识别和描述事件、时间及其之间关系等医学术语的丰富。特别地,我们开发了用于医学事件的WordNet (WME),它使用上下文信息来消除医学术语的词义歧义,减少了医生和病人之间的沟通差距。我们提出了两种方法(顺序和组合),用于根据三种类型的文本中的每一种来确定医学事件的正确含义。利用极性词汇如SentiWordNet、Affect Word List和Taboda’s形容词List,实现了从词汇资源中提取医学事件的词汇表中基于极性的词义消歧。提出的WME在10-20%的范围内优于先前提出的Lesk词义消歧。
{"title":"Lexical Resource for Medical Events: A Polarity Based Approach","authors":"A. Mondal, I. Chaturvedi, Dipankar Das, Rajiv Bajpai, Sivaji Bandyopadhyay","doi":"10.1109/ICDMW.2015.170","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.170","url":null,"abstract":"The continuous sophistication in clinical informationprocessing motivates the development of a dictionary likeWordNet for Medical Events in order to convey the valuableinformation (e.g., event definition, sense based contextualdescription, polarity etc.) to the experts (e.g. medicalpractitioners) and non-experts (e.g. patients) in their respective fields. The present paper reports the enrichment of medical terms such as identifying and describing events, times and the relations between them in clinical text by employing three different lexical resources namely seed list of medical events collected from SemEval 2015 Task-6, the WordNet and an English medical dictionary. In particular, we develop WordNet for Medical Events (WME) that uses contextual information for word sense disambiguation of medical terms and reduce the communication gap between doctors and patients. We have proposed two approaches (Sequential and Combined) for identifying the proper sense of a medical event based on each of the three types of texts. The polarity lexicons e.g., SentiWordNet, Affect Word List and Taboda's adjective list have been used for implementing the polarity based Word Sense Disambiguation of the medical events from their glosses as extracted from the lexicalresources. The proposed WME out-performed a previouslyproposed Lesk Word Sense Disambiguation in the range of 10-20%.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127894867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1