Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

英文中文

A data mining driven risk profiling method for road asset management 基于数据挖掘的道路资产管理风险分析方法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488204

D. Emerson, J. Weligamage, R. Nayak

Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. The proposed method analyses a complex and formerly impenetrable volume of data from roads and crashes using data mining. This method rapidly identifies roads with elevated crash-rate, potentially due to skid resistance deficit, for investigation. A hypothetical skid resistance/crash risk curve is developed for each road segment, driven by the model deployed in a novel regression tree extrapolation method. The method potentially solves the problem of missing skid resistance values which occurs during network-wide crash analysis, and allows risk assessment of the major proportion of roads without skid resistance values.

路面防滑阻力已被证明与道路碰撞风险有很强的关系，然而，应用目前使用调查级别来识别容易发生碰撞的道路的方法是有问题的，因为它们可能无法识别正常以外的危险道路。该方法利用数据挖掘技术分析复杂且以前难以理解的道路和碰撞数据量。该方法可以快速识别出碰撞率较高的道路，这可能是由于防滑缺陷造成的，以便进行调查。在一种新的回归树外推方法中部署的模型驱动下，为每个路段开发了一个假设的防滑/碰撞风险曲线。该方法潜在地解决了在全网碰撞分析中出现的滑阻值缺失问题，并允许对大部分没有滑阻值的道路进行风险评估。

引用次数: 1

LAICOS: an open source platform for personalized social web search LAICOS:一个用于个性化社交网络搜索的开源平台

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487705

Mohamed Reda Bouadjenek, Hakim Hacid, M. Bouzeghoub

In this paper, we introduce LAICOS, a social Web search engine as a contribution to the growing area of Social Information Retrieval (SIR). Social information and personalization are at the heart of LAICOS. On the one hand, the social context of documents is added as a layer to their textual content traditionally used for indexing to provide Personalized Social Document Representations. On the other hand, the social context of users is used for the query expansion process using the Personalized Social Query Expansion framework (PSQE) proposed in our earlier works. We describe the different components of the system while relying on social bookmarking systems as a source of social information for personalizing and enhancing the IR process. We show how the internal structure of indexes as well as the query expansion process operated using social information.

在本文中，我们介绍了LAICOS，一个社交网络搜索引擎，为社会信息检索(SIR)的发展做出了贡献。社会信息和个性化是LAICOS的核心。一方面，将文档的社会上下文作为一层添加到传统上用于索引的文本内容中，以提供个性化的社会文档表示。另一方面，使用我们在早期工作中提出的个性化社会查询扩展框架(PSQE)，将用户的社会上下文用于查询扩展过程。我们描述了该系统的不同组成部分，同时依靠社会书签系统作为个性化和增强IR过程的社会信息来源。我们展示了索引的内部结构以及查询扩展过程如何使用社会信息进行操作。

引用次数: 19

Mining evidences for named entity disambiguation 命名实体消歧的证据挖掘

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487681

Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan

Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.

命名实体消歧的任务是消除自然语言文本中提到的命名实体的歧义，并将它们链接到知识库(如Wikipedia)中的相应条目。这种消歧有助于增强可读性，并为纯文本添加语义。它也是从非结构化文本构建高质量信息网络或知识图谱的核心步骤。以前的研究通过利用知识库中的各种文本和结构特征来解决这个问题。大多数提出的算法都假设知识库可以提供足够明确和有用的信息，以帮助消除对正确实体的提及的歧义。然而，现有的知识库很少是完整的(可能永远不会)，因此导致在不熟悉上下文的短查询上的性能很差。在这种情况下，我们需要收集分散在内部和外部语料库中的额外证据来扩充知识库，增强知识库的消歧能力。在这项工作中，我们提出了一个生成模型和一个增量算法来自动挖掘文档中的有用证据。通过对“背景主题”和“未知实体”的具体建模，我们的模型能够从噪声信息中获取有用的证据。实验结果表明，我们提出的方法明显优于最先进的方法:将来自tweet的短查询的消歧准确率从43%(基线)提高到86%。

{"title":"Mining evidences for named entity disambiguation","authors":"Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan","doi":"10.1145/2487575.2487681","DOIUrl":"https://doi.org/10.1145/2487575.2487681","url":null,"abstract":"Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of \"background topic\" and \"unknown entities\", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90199149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101

Extracting social events for learning better information diffusion models 提取社会事件以学习更好的信息扩散模型

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487584

Shuyang Lin, Fengjiao Wang, Qingbo Hu, Philip S. Yu

Learning of the information diffusion model is a fundamental problem in the study of information diffusion in social networks. Existing approaches learn the diffusion models from events in social networks. However, events in social networks may have different underlying reasons. Some of them may be caused by the social influence inside the network, while others may reflect external trends in the ``real world''. Most existing work on the learning of diffusion models does not distinguish the events caused by the social influence from those caused by external trends. In this paper, we extract social events from data streams in social networks, and then use the extracted social events to improve the learning of information diffusion models. We propose a LADP (Latent Action Diffusion Path) model to incorporate the information diffusion model with the model of external trends, and then design an EM-based algorithm to infer the diffusion probabilities, the external trends and the sources of events efficiently.

信息扩散模型的学习是社会网络中信息扩散研究的一个基本问题。现有的方法是从社会网络中的事件中学习扩散模型。然而，社交网络中的事件可能有不同的潜在原因。其中一些可能是由网络内部的社会影响造成的，而另一些则可能反映了“现实世界”的外部趋势。大多数关于扩散模型学习的现有工作没有区分由社会影响引起的事件和由外部趋势引起的事件。本文从社交网络的数据流中提取社交事件，然后利用提取的社交事件来改进信息扩散模型的学习。我们提出了一种将信息扩散模型与外部趋势模型相结合的LADP (Latent Action Diffusion Path)模型，然后设计了一种基于em的算法来有效地推断扩散概率、外部趋势和事件来源。

引用次数: 30

A transfer learning based framework of crowd-selection on twitter 基于迁移学习的twitter人群选择框架

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487708

Zhou Zhao, D. Yan, Wilfred Ng, Shi Gao

Crowd selection is essential to crowd sourcing applications, since choosing the right workers with particular expertise to carry out crowdsourced tasks is extremely important. The central problem is simple but tricky: given a crowdsourced task, who are the most knowledgable users to ask? In this demo, we show our framework that tackles the problem of crowdsourced task assignment on Twitter according to the social activities of its users. Since user profiles on Twitter do not reveal user interests and skills, we transfer the knowledge from categorized Yahoo! Answers datasets for learning user expertise. Then, we select the right crowd for certain tasks based on user expertise. We study the effectiveness of our system using extensive user evaluation. We further engage the attendees to participate a game called--Whom to Ask on Twitter?. This helps understand our ideas in an interactive manner. Our crowd selection can be accessed by the following url http://webproject2.cse.ust.hk:8034/tcrowd/.

人群选择对于众包应用程序至关重要，因为选择具有特定专业知识的合适员工来执行众包任务非常重要。核心问题很简单但很棘手:给定一个众包任务，谁是最有知识的用户?在这个演示中，我们展示了我们的框架，它可以根据Twitter用户的社交活动来处理Twitter上的众包任务分配问题。由于Twitter上的用户资料不会显示用户的兴趣和技能，我们将从雅虎分类中转移这些信息。回答用于学习用户专业知识的数据集。然后，我们根据用户的专业知识为某些任务选择合适的人群。我们通过广泛的用户评估来研究我们系统的有效性。我们进一步鼓励与会者参与一个名为“在Twitter上问谁”的游戏。这有助于以互动的方式理解我们的想法。我们的人群选择可以通过以下url http://webproject2.cse.ust.hk:8034/tcrowd/访问。

引用次数: 32

Learning to question: leveraging user preferences for shopping advice 学会质疑:利用用户偏好来获得购物建议

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487653

Mahashweta Das, G. D. F. Morales, A. Gionis, Ingmar Weber

We present ShoppingAdvisor, a novel recommender system that helps users in shopping for technical products. ShoppingAdvisor leverages both user preferences and technical product attributes in order to generate its suggestions. The system elicits user preferences via a tree-shaped flowchart, where each node is a question to the user. At each node, ShoppingAdvisor suggests a ranking of products matching the preferences of the user, and that gets progressively refined along the path from the tree's root to one of its leafs. In this paper we show (i) how to learn the structure of the tree, i.e., which questions to ask at each node, and (ii) how to produce a suitable ranking at each node. First, we adapt the classical top-down strategy for building decision trees in order to find the best user attribute to ask at each node. Differently from decision trees, ShoppingAdvisor partitions the user space rather than the product space. Second, we show how to employ a learning-to-rank approach in order to learn, for each node of the tree, a ranking of products appropriate to the users who reach that node. We experiment with two real-world datasets for cars and cameras, and a synthetic one. We use mean reciprocal rank to evaluate ShoppingAdvisor, and show how the performance increases by more than 50% along the path from root to leaf. We also show how collaborative recommendation algorithms such as k-nearest neighbor benefits from feature selection done by the ShoppingAdvisor tree. Our experiments show that ShoppingAdvisor produces good quality interpretable recommendations, while requiring less input from users and being able to handle the cold-start problem.

我们介绍ShoppingAdvisor，一个新颖的推荐系统，帮助用户购买技术产品。ShoppingAdvisor利用用户偏好和技术产品属性来生成建议。系统通过树形流程图引出用户偏好，其中每个节点都是对用户的一个问题。在每个节点上，ShoppingAdvisor建议与用户偏好匹配的产品排名，并沿着从树的根到其中一个叶子的路径逐步改进。在本文中，我们展示了(i)如何学习树的结构，即在每个节点上问哪些问题，以及(ii)如何在每个节点上产生合适的排名。首先，我们采用经典的自顶向下策略来构建决策树，以便在每个节点上找到最佳的用户属性。与决策树不同，ShoppingAdvisor划分的是用户空间而不是产品空间。其次，我们展示了如何使用学习排序方法，以便为树的每个节点学习适合到达该节点的用户的产品排序。我们用两个真实世界的汽车和摄像头数据集和一个合成数据集进行实验。我们使用平均倒数排名来评估ShoppingAdvisor，并展示了性能如何沿着从根到叶的路径增加50%以上。我们还展示了协同推荐算法(如k近邻)如何从ShoppingAdvisor树所做的特征选择中受益。我们的实验表明ShoppingAdvisor产生了高质量的可解释推荐，同时需要较少的用户输入，并且能够处理冷启动问题。

{"title":"Learning to question: leveraging user preferences for shopping advice","authors":"Mahashweta Das, G. D. F. Morales, A. Gionis, Ingmar Weber","doi":"10.1145/2487575.2487653","DOIUrl":"https://doi.org/10.1145/2487575.2487653","url":null,"abstract":"We present ShoppingAdvisor, a novel recommender system that helps users in shopping for technical products. ShoppingAdvisor leverages both user preferences and technical product attributes in order to generate its suggestions. The system elicits user preferences via a tree-shaped flowchart, where each node is a question to the user. At each node, ShoppingAdvisor suggests a ranking of products matching the preferences of the user, and that gets progressively refined along the path from the tree's root to one of its leafs. In this paper we show (i) how to learn the structure of the tree, i.e., which questions to ask at each node, and (ii) how to produce a suitable ranking at each node. First, we adapt the classical top-down strategy for building decision trees in order to find the best user attribute to ask at each node. Differently from decision trees, ShoppingAdvisor partitions the user space rather than the product space. Second, we show how to employ a learning-to-rank approach in order to learn, for each node of the tree, a ranking of products appropriate to the users who reach that node. We experiment with two real-world datasets for cars and cameras, and a synthetic one. We use mean reciprocal rank to evaluate ShoppingAdvisor, and show how the performance increases by more than 50% along the path from root to leaf. We also show how collaborative recommendation algorithms such as k-nearest neighbor benefits from feature selection done by the ShoppingAdvisor tree. Our experiments show that ShoppingAdvisor produces good quality interpretable recommendations, while requiring less input from users and being able to handle the cold-start problem.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88180149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Automatic selection of social media responses to news 自动选择社交媒体对新闻的反应

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487659

Tadej Štajner, B. Thomee, Ana-Maria Popescu, M. Pennacchiotti, A. Jaimes

Social media responses to news have increasingly gained in importance as they can enhance a consumer's news reading experience, promote information sharing and aid journalists in assessing their readership's response to a story. Given that the number of responses to an online news article may be huge, a common challenge is that of selecting only the most interesting responses for display. This paper addresses this challenge by casting message selection as an optimization problem. We define an objective function which jointly models the messages' utility scores and their entropy. We propose a near-optimal solution to the underlying optimization problem, which leverages the submodularity property of the objective function. Our solution first learns the utility of individual messages in isolation and then produces a diverse selection of interesting messages by maximizing the defined objective function. The intuitions behind our work are that an interesting selection of messages contains diverse, informative, opinionated and popular messages referring to the news article, written mostly by users that have authority on the topic. Our intuitions are embodied by a rich set of content, social and user features capturing the aforementioned aspects. We evaluate our approach through both human and automatic experiments, and demonstrate it outperforms the state of the art. Additionally, we perform an in-depth analysis of the annotated ``interesting'' responses, shedding light on the subjectivity around the selection process and the perception of interestingness.

社交媒体对新闻的反应越来越重要，因为它们可以增强消费者的新闻阅读体验，促进信息共享，并帮助记者评估读者对新闻的反应。考虑到对一篇在线新闻文章的回复数量可能是巨大的，一个常见的挑战是只选择最有趣的回复来显示。本文通过将消息选择转换为优化问题来解决这一挑战。我们定义了一个目标函数来联合建模消息的效用分数和它们的熵。我们利用目标函数的子模块化特性，提出了一个潜在优化问题的近最优解。我们的解决方案首先学习孤立的单个消息的效用，然后通过最大化定义的目标函数来产生有趣消息的各种选择。我们的工作背后的直觉是，一个有趣的消息选择包含了不同的，信息丰富的，固执己见的和流行的消息，涉及新闻文章，主要是由在这个话题上有权威的用户写的。我们的直觉体现在丰富的内容、社交和用户功能中，这些功能捕捉了上述方面。我们通过人类和自动实验来评估我们的方法，并证明它优于最先进的技术。此外，我们对标注的“有趣”回答进行了深入分析，揭示了选择过程中的主观性和对趣味性的感知。

{"title":"Automatic selection of social media responses to news","authors":"Tadej Štajner, B. Thomee, Ana-Maria Popescu, M. Pennacchiotti, A. Jaimes","doi":"10.1145/2487575.2487659","DOIUrl":"https://doi.org/10.1145/2487575.2487659","url":null,"abstract":"Social media responses to news have increasingly gained in importance as they can enhance a consumer's news reading experience, promote information sharing and aid journalists in assessing their readership's response to a story. Given that the number of responses to an online news article may be huge, a common challenge is that of selecting only the most interesting responses for display. This paper addresses this challenge by casting message selection as an optimization problem. We define an objective function which jointly models the messages' utility scores and their entropy. We propose a near-optimal solution to the underlying optimization problem, which leverages the submodularity property of the objective function. Our solution first learns the utility of individual messages in isolation and then produces a diverse selection of interesting messages by maximizing the defined objective function. The intuitions behind our work are that an interesting selection of messages contains diverse, informative, opinionated and popular messages referring to the news article, written mostly by users that have authority on the topic. Our intuitions are embodied by a rich set of content, social and user features capturing the aforementioned aspects. We evaluate our approach through both human and automatic experiments, and demonstrate it outperforms the state of the art. Additionally, we perform an in-depth analysis of the annotated ``interesting'' responses, shedding light on the subjectivity around the selection process and the perception of interestingness.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78607046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

An online system with end-user services: mining novelty concepts from tv broadcast subtitles 具有终端用户服务的在线系统:从电视广播字幕中挖掘新概念

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487715

Mika Rautiainen, Jouni Sarvanko, A. Heikkinen, M. Ylianttila, V. Kostakos

Better tools for content-based access of video are needed to improve access to time-continuous video data. Particularly information about linear TV broadcast programs has been available in a form limited to program guides that provide short manually described overviews of the program content. Recent development in digitalization of TV broadcasting and emergence of web-based services for catch-up and on-demand viewing bring out new possibilities to access data. In this paper we introduce our data mining system and accompanying services for summarizing Finnish DVB broadcast streams from seven national channels. We describe how data mining of novelty concepts can be extracted from DVB subtitles to augment web-based "Catch-Up TV Guide" and "Novelty Cloud" TV services. Furthermore, our system allows accessing media fragments as Picture Quotes via generated word lists and provides content-based recommendations to find new programs that have content similar to the user selected programs. Our index consists of over 180 000 programs that are used to recommend relevant programs. The service has been under development and available online since 2010. It has registered over 5000 user sessions.

为了改进对时间连续视频数据的访问，需要更好的基于内容的视频访问工具。特别是关于线性电视广播节目的信息，已经以一种仅限于节目指南的形式提供了简短的手动描述的节目内容概述。电视广播数字化的最新发展和基于网络的追播和点播服务的出现，为获取数据提供了新的可能性。本文介绍了我们的数据挖掘系统和配套服务，用于总结芬兰七个国家频道的DVB广播流。我们描述了如何从DVB字幕中提取新颖性概念的数据挖掘，以增强基于web的“追赶电视指南”和“新颖性云”电视服务。此外，我们的系统允许通过生成的单词列表访问图片引用的媒体片段，并提供基于内容的推荐，以查找与用户选择的节目内容相似的新节目。我们的索引包含超过18万个节目，用于推荐相关节目。这项服务自2010年以来一直在开发中，并可在线使用。它已经注册了5000多个用户会话。

{"title":"An online system with end-user services: mining novelty concepts from tv broadcast subtitles","authors":"Mika Rautiainen, Jouni Sarvanko, A. Heikkinen, M. Ylianttila, V. Kostakos","doi":"10.1145/2487575.2487715","DOIUrl":"https://doi.org/10.1145/2487575.2487715","url":null,"abstract":"Better tools for content-based access of video are needed to improve access to time-continuous video data. Particularly information about linear TV broadcast programs has been available in a form limited to program guides that provide short manually described overviews of the program content. Recent development in digitalization of TV broadcasting and emergence of web-based services for catch-up and on-demand viewing bring out new possibilities to access data. In this paper we introduce our data mining system and accompanying services for summarizing Finnish DVB broadcast streams from seven national channels. We describe how data mining of novelty concepts can be extracted from DVB subtitles to augment web-based \"Catch-Up TV Guide\" and \"Novelty Cloud\" TV services. Furthermore, our system allows accessing media fragments as Picture Quotes via generated word lists and provides content-based recommendations to find new programs that have content similar to the user selected programs. Our index consists of over 180 000 programs that are used to recommend relevant programs. The service has been under development and available online since 2010. It has registered over 5000 user sessions.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"102 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77466265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-label classification by mining label and instance correlations from heterogeneous information networks 从异构信息网络中挖掘标签和实例关联的多标签分类

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487577

Xiangnan Kong, Bokai Cao, Philip S. Yu

Multi-label classification is prevalent in many real-world applications, where each example can be associated with a set of multiple labels simultaneously. The key challenge of multi-label classification comes from the large space of all possible label sets, which is exponential to the number of candidate labels. Most previous work focuses on exploiting correlations among different labels to facilitate the learning process. It is usually assumed that the label correlations are given beforehand or can be derived directly from data samples by counting their label co-occurrences. However, in many real-world multi-label classification tasks, the label correlations are not given and can be hard to learn directly from data samples within a moderate-sized training set. Heterogeneous information networks can provide abundant knowledge about relationships among different types of entities including data samples and class labels. In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process. By mining the linkage structure of heterogeneous information networks, multiple types of relationships among different class labels and data samples can be extracted. Then we can use these relationships to effectively infer the correlations among different class labels in general, as well as the dependencies among the label sets of data examples inter-connected in the network. Empirical studies on real-world tasks demonstrate that the performance of multi-label classification can be effectively boosted using heterogeneous information net- works.

多标签分类在许多实际应用程序中很普遍，其中每个示例可以同时与一组多个标签相关联。多标签分类的关键挑战来自于所有可能的标签集的大空间，这是候选标签数量的指数。大多数先前的工作集中在利用不同标签之间的相关性来促进学习过程。通常假设标签相关性是事先给定的，或者可以通过计算它们的标签共现而直接从数据样本中得出。然而，在许多现实世界的多标签分类任务中，标签相关性没有给定，并且很难直接从中等规模的训练集中的数据样本中学习。异构信息网络可以提供丰富的关于不同类型实体(包括数据样本和类标签)之间关系的知识。在本文中，我们提出使用异构信息网络来促进多标签分类过程。通过挖掘异构信息网络的链接结构，可以提取不同类标签和数据样本之间的多种类型的关系。然后，我们可以利用这些关系来有效地推断出不同类别标签之间的相关性，以及网络中相互连接的数据示例的标签集之间的依赖关系。对现实任务的实证研究表明，异构信息网络可以有效地提高多标签分类的性能。

{"title":"Multi-label classification by mining label and instance correlations from heterogeneous information networks","authors":"Xiangnan Kong, Bokai Cao, Philip S. Yu","doi":"10.1145/2487575.2487577","DOIUrl":"https://doi.org/10.1145/2487575.2487577","url":null,"abstract":"Multi-label classification is prevalent in many real-world applications, where each example can be associated with a set of multiple labels simultaneously. The key challenge of multi-label classification comes from the large space of all possible label sets, which is exponential to the number of candidate labels. Most previous work focuses on exploiting correlations among different labels to facilitate the learning process. It is usually assumed that the label correlations are given beforehand or can be derived directly from data samples by counting their label co-occurrences. However, in many real-world multi-label classification tasks, the label correlations are not given and can be hard to learn directly from data samples within a moderate-sized training set. Heterogeneous information networks can provide abundant knowledge about relationships among different types of entities including data samples and class labels. In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process. By mining the linkage structure of heterogeneous information networks, multiple types of relationships among different class labels and data samples can be extracted. Then we can use these relationships to effectively infer the correlations among different class labels in general, as well as the dependencies among the label sets of data examples inter-connected in the network. Empirical studies on real-world tasks demonstrate that the performance of multi-label classification can be effectively boosted using heterogeneous information net- works.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77547555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Big data analytics for healthcare 医疗保健大数据分析

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pub Date : 2013-08-11 DOI: 10.1145/2487575.2506178

Jimeng Sun, C. Reddy

Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. In this tutorial, we introduce the characteristics and related mining challenges on dealing with big medical data. Many of those insights come from medical informatics community, which is highly related to data mining but focuses on biomedical specifics. We survey various related papers from data mining venues as well as medical informatics venues to share with the audiences key problems and trends in healthcare analytics research, with different applications ranging from clinical text mining, predictive modeling, survival analysis, patient similarity, genetic data analysis, and public health. The tutorial will include several case studies dealing with some of the important healthcare applications.

在各种医疗保健组织(支付方、提供商、制药公司)中已经可以获得大量异构医疗数据。这些数据可以成为一种有利的资源，为改进医疗服务和减少浪费提供见解。这些数据集的巨大和复杂性在分析和随后的实际临床环境应用中提出了巨大的挑战。在本教程中，我们将介绍处理大医疗数据的特点和相关的挖掘挑战。其中许多见解来自医学信息学社区，它与数据挖掘高度相关，但侧重于生物医学细节。我们调查了来自数据挖掘和医学信息学领域的各种相关论文，与观众分享医疗分析研究中的关键问题和趋势，包括临床文本挖掘、预测建模、生存分析、患者相似性、遗传数据分析和公共卫生等不同应用。本教程将包括几个案例研究，涉及一些重要的医疗保健应用程序。

引用次数: 92

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀