首页 > 最新文献

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献

英文 中文
More influence means less work: fast latent dirichlet allocation by influence scheduling 影响越大,工作量越少:利用影响调度实现快速潜狄利克雷分配
Mirwaes Wahabzada, K. Kersting, A. Pilz, C. Bauckhage
There have recently been considerable advances in fast inference for (online) latent Dirichlet allocation (LDA). While it is widely recognized that the scheduling of documents in stochastic optimization and in turn in LDA may have significant consequences, this issue remains largely unexplored. Instead, practitioners schedule documents essentially uniformly at random, due perhaps to ease of implementation, and to the lack of clear guidelines on scheduling the documents. In this work, we address this issue and propose to schedule documents for an update that exert a disproportionately large influence on the topics of the corpus before less influential ones. More precisely, we justify to sample documents randomly biased towards those ones with higher norms to form mini-batches. On several real-world datasets, including 3M articles from Wikipedia and 8M from PubMed, we demonstrate that the resulting influence scheduled LDA can handily analyze massive document collections and find topic models as good or better than those found with online LDA, often at a fraction of time.
近年来,在(在线)潜在狄利克雷分配(LDA)快速推理方面取得了相当大的进展。虽然人们普遍认识到随机优化和LDA中的文档调度可能会产生重大后果,但这个问题在很大程度上仍未被探索。相反,从业者基本上是随机地统一地安排文档,这可能是由于易于实现,以及缺乏关于安排文档的明确指导方针。在这项工作中,我们解决了这个问题,并建议将对语料库主题产生不成比例的大影响的文档安排在影响较小的主题之前进行更新。更准确地说,我们证明了抽样文档随机偏向那些具有较高规范的文档,以形成小批量。在几个真实世界的数据集上,包括来自Wikipedia的3M篇文章和来自PubMed的8M篇文章,我们证明了所得到的影响调度LDA可以方便地分析大量文档集合,并找到与在线LDA相同或更好的主题模型,通常只需要很短的时间。
{"title":"More influence means less work: fast latent dirichlet allocation by influence scheduling","authors":"Mirwaes Wahabzada, K. Kersting, A. Pilz, C. Bauckhage","doi":"10.1145/2063576.2063944","DOIUrl":"https://doi.org/10.1145/2063576.2063944","url":null,"abstract":"There have recently been considerable advances in fast inference for (online) latent Dirichlet allocation (LDA). While it is widely recognized that the scheduling of documents in stochastic optimization and in turn in LDA may have significant consequences, this issue remains largely unexplored. Instead, practitioners schedule documents essentially uniformly at random, due perhaps to ease of implementation, and to the lack of clear guidelines on scheduling the documents.\u0000 In this work, we address this issue and propose to schedule documents for an update that exert a disproportionately large influence on the topics of the corpus before less influential ones. More precisely, we justify to sample documents randomly biased towards those ones with higher norms to form mini-batches. On several real-world datasets, including 3M articles from Wikipedia and 8M from PubMed, we demonstrate that the resulting influence scheduled LDA can handily analyze massive document collections and find topic models as good or better than those found with online LDA, often at a fraction of time.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"27 1","pages":"2273-2276"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74125151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Examining the "leftness" property of Wikipedia categories 检查维基百科分类的“左性”属性
Karl Gyllstrom, Marie-Francine Moens
Wikipedia's rich category structure has helped make it one of the largest semantic taxonomies in existence, a property that has been central to much recent research. However, Wikipedia's category representation is simplistic: an article contains a single list of categories, with no data about their relative importance. We investigate the ordering of category lists to determine how a category's position in the list correlates with its relevance to the article and overall significance. We identify a number of interesting connections between a category's position and its persistence within the article, age, popularity, size, and descriptiveness.
维基百科丰富的分类结构使其成为现存最大的语义分类法之一,这是最近许多研究的核心。然而,维基百科的分类表示是简单的:一篇文章包含一个单一的类别列表,没有关于它们相对重要性的数据。我们调查类别列表的顺序,以确定类别在列表中的位置如何与文章的相关性和整体意义相关。我们发现类别的位置与其在文章中的持久性、年龄、受欢迎程度、大小和描述性之间存在许多有趣的联系。
{"title":"Examining the \"leftness\" property of Wikipedia categories","authors":"Karl Gyllstrom, Marie-Francine Moens","doi":"10.1145/2063576.2063953","DOIUrl":"https://doi.org/10.1145/2063576.2063953","url":null,"abstract":"Wikipedia's rich category structure has helped make it one of the largest semantic taxonomies in existence, a property that has been central to much recent research. However, Wikipedia's category representation is simplistic: an article contains a single list of categories, with no data about their relative importance. We investigate the ordering of category lists to determine how a category's position in the list correlates with its relevance to the article and overall significance. We identify a number of interesting connections between a category's position and its persistence within the article, age, popularity, size, and descriptiveness.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"38 1","pages":"2309-2312"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74456995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Large-scale behavioral targeting with a social twist 具有社会扭曲的大规模行为目标
Kun Liu, Lei Tang
Behavioral targeting (BT) is a widely used technique for online advertising. It leverages information collected on an individual's web-browsing behavior, such as page views, search queries and ad clicks, to select the ads most relevant to user to display. With the proliferation of social networks, it is possible to relate the behavior of individuals and their social connections. Although the similarity among connected individuals are well established (i.e., homophily), it is still not clear whether and how we can leverage the activities of one's friends for behavioral targeting; whether forecasts derived from such social information are more accurate than standard behavioral targeting models. In this paper, we strive to answer these questions by evaluating the predictive power of social data across 60 consumer domains on a large online network of over 180 million users in a period of two and a half months. To our best knowledge, this is the most comprehensive study of social data in the context of behavioral targeting on such an unprecedented scale. Our analysis offers interesting insights into the value of social data for developing the next generation of targeting services.
行为定位(BT)是一种广泛应用于网络广告的技术。它利用收集到的个人网络浏览行为信息,如页面浏览量、搜索查询和广告点击,来选择与用户最相关的广告来显示。随着社会网络的扩散,将个人的行为与其社会关系联系起来是可能的。尽管相互联系的个体之间的相似性已经确立(即同质性),但我们是否以及如何利用一个人的朋友的活动来进行行为定位仍然不清楚;从这些社会信息中得出的预测是否比标准的行为定位模型更准确。在本文中,我们通过在两个半月的时间内评估超过1.8亿用户的大型在线网络上60个消费者领域的社交数据的预测能力,努力回答这些问题。据我们所知,这是在如此空前规模的行为定位背景下对社会数据进行的最全面的研究。我们的分析为开发下一代目标服务的社交数据价值提供了有趣的见解。
{"title":"Large-scale behavioral targeting with a social twist","authors":"Kun Liu, Lei Tang","doi":"10.1145/2063576.2063838","DOIUrl":"https://doi.org/10.1145/2063576.2063838","url":null,"abstract":"Behavioral targeting (BT) is a widely used technique for online advertising. It leverages information collected on an individual's web-browsing behavior, such as page views, search queries and ad clicks, to select the ads most relevant to user to display. With the proliferation of social networks, it is possible to relate the behavior of individuals and their social connections. Although the similarity among connected individuals are well established (i.e., homophily), it is still not clear whether and how we can leverage the activities of one's friends for behavioral targeting; whether forecasts derived from such social information are more accurate than standard behavioral targeting models. In this paper, we strive to answer these questions by evaluating the predictive power of social data across 60 consumer domains on a large online network of over 180 million users in a period of two and a half months. To our best knowledge, this is the most comprehensive study of social data in the context of behavioral targeting on such an unprecedented scale. Our analysis offers interesting insights into the value of social data for developing the next generation of targeting services.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"1 1","pages":"1815-1824"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73271046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A pretopological framework for the automatic construction of lexical-semantic structures from texts 一种用于文本词汇语义结构自动构建的预拓扑框架
G. Cleuziou, D. Buscaldi, Vincent Levorato, G. Dias
We present in this paper a new approach for the automatic generation of lexical structures from texts. This tedious task is based on the strong hypothesis that simple statistical observations on textual usages can provide pieces of semantics about the lexicon. Using such "naive" observations only, we propose a (pre)-topological framework to formalize and combine various hypothesis on textual data usages and then to derive a structure similar to usual lexical knowledge basis such as WordNet. In addition we also consider the evaluation problem for obtained lexical structures ; a multi-level evaluation strategy is proposed that measures the fitting between a given reference structure and automatically generated structures on different point of views : intrinsic/structural and application-based points of view. The evaluation strategy is then used to quantify the contribution of the new structuring approach with respect to the corresponding solution proposed by (Sanderson et al. 2000) on two case studies that differs on the domain and the size of the lexicon.
本文提出了一种自动生成文本词汇结构的新方法。这项繁琐的任务基于一个强有力的假设,即对文本用法的简单统计观察可以提供关于词汇的语义。仅使用这种“幼稚”的观察,我们提出了一个(预)拓扑框架来形式化和组合文本数据使用的各种假设,然后派生出类似于通常的词汇知识基础(如WordNet)的结构。此外,我们还考虑了获得的词汇结构的评价问题;提出了一种多层次评价策略,从内在/结构和基于应用的不同角度衡量给定参考结构与自动生成结构之间的拟合程度。然后使用评估策略来量化新结构化方法相对于(Sanderson et al. 2000)在两个不同领域和词典大小的案例研究中提出的相应解决方案的贡献。
{"title":"A pretopological framework for the automatic construction of lexical-semantic structures from texts","authors":"G. Cleuziou, D. Buscaldi, Vincent Levorato, G. Dias","doi":"10.1145/2063576.2063990","DOIUrl":"https://doi.org/10.1145/2063576.2063990","url":null,"abstract":"We present in this paper a new approach for the automatic generation of lexical structures from texts. This tedious task is based on the strong hypothesis that simple statistical observations on textual usages can provide pieces of semantics about the lexicon. Using such \"naive\" observations only, we propose a (pre)-topological framework to formalize and combine various hypothesis on textual data usages and then to derive a structure similar to usual lexical knowledge basis such as WordNet. In addition we also consider the evaluation problem for obtained lexical structures ; a multi-level evaluation strategy is proposed that measures the fitting between a given reference structure and automatically generated structures on different point of views : intrinsic/structural and application-based points of view. The evaluation strategy is then used to quantify the contribution of the new structuring approach with respect to the corresponding solution proposed by (Sanderson et al. 2000) on two case studies that differs on the domain and the size of the lexicon.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"19 1","pages":"2453-2456"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75321984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Statistical information retrieval modelling: from the probability ranking principle to recent advances in diversity, portfolio theory, and beyond 统计信息检索模型:从概率排序原则到多样性、投资组合理论等方面的最新进展
Jun Wang, Kevyn Collins-Thompson
Statistical modelling of Information Retrieval (IR) systems is a key driving force in the development of the IR field. The goal of this tutorial is to provide a comprehensive and up-to-date introduction to statistical IR modelling. We take a fresh and systematic perspective from the viewpoint of portfolio theory of IR and risk management. A unified treatment and new insights will be given to reflect the recent developments of considering the ranked retrieval results as a whole. Recent research progress in diversification, risk management, and portfolio theory will be covered, in addition to classic methods such as Maron and Kuhns' Probabilistic Indexing, Robertson-Sparck Jones model (and the resulting BM25 formula) and language modelling approaches. The tutorial also reviews the resulting practical algorithms of risk-aware query expansion, diverse ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as web search, multimedia retrieval, and collaborative filtering are also introduced, as well as discussion of new opportunities for future research and applications that intersect among information retrieval, knowledge management, and databases.
信息检索系统的统计建模是信息检索领域发展的重要推动力。本教程的目的是为统计IR建模提供全面和最新的介绍。我们从投资组合理论和风险管理的角度出发,以一种全新的、系统的视角来看待这一问题。将给出统一的处理方法和新的见解,以反映将排序检索结果作为一个整体考虑的最新发展。除了马龙和库恩斯的概率索引、罗伯逊-斯帕克琼斯模型(以及由此产生的BM25公式)和语言建模方法等经典方法外,还将涵盖多样化、风险管理和投资组合理论方面的最新研究进展。本教程还回顾了风险感知查询扩展、多样化排序、IR度量优化及其性能评估的实际算法。本文还介绍了诸如网络搜索、多媒体检索和协同过滤等实际IR应用,并讨论了信息检索、知识管理和数据库之间交叉的未来研究和应用的新机会。
{"title":"Statistical information retrieval modelling: from the probability ranking principle to recent advances in diversity, portfolio theory, and beyond","authors":"Jun Wang, Kevyn Collins-Thompson","doi":"10.1145/2063576.2064033","DOIUrl":"https://doi.org/10.1145/2063576.2064033","url":null,"abstract":"Statistical modelling of Information Retrieval (IR) systems is a key driving force in the development of the IR field. The goal of this tutorial is to provide a comprehensive and up-to-date introduction to statistical IR modelling. We take a fresh and systematic perspective from the viewpoint of portfolio theory of IR and risk management. A unified treatment and new insights will be given to reflect the recent developments of considering the ranked retrieval results as a whole. Recent research progress in diversification, risk management, and portfolio theory will be covered, in addition to classic methods such as Maron and Kuhns' Probabilistic Indexing, Robertson-Sparck Jones model (and the resulting BM25 formula) and language modelling approaches. The tutorial also reviews the resulting practical algorithms of risk-aware query expansion, diverse ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as web search, multimedia retrieval, and collaborative filtering are also introduced, as well as discussion of new opportunities for future research and applications that intersect among information retrieval, knowledge management, and databases.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"13 29 1","pages":"2603-2604"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78662160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative exploratory search in real-world context 现实环境中的协作探索性搜索
Naoki Tani, Danushka Bollegala, N. P. Chandrasiri, Keisuke Okamoto, Kazunari Nawa, S. Iitsuka, Y. Matsuo
We propose Collaborative Exploratory Search (CES), which is an integration of dialog analysis and web search that involves multiparty collaboration to accomplish an exploratory information retrieval goal. Given a real-time dialog between users on a single topic; we define CES as the task of automatically detecting the topic of the dialog and retrieving task-relevant web pages to support the dialog. To recognize the task of the dialog, we apply the Author--Topic model as a topic model. Then, attribute extraction is applied to the dialog to obtain the attributes of the tasks. Finally, a specific search query is generated to identify the task-relevant information. We implement and evaluate the CES system for a commercial in-vehicle conversation. We also develop an iPad application that listens to conversations among users and continuously retrieves relevant web pages. Our experimental results reveal that the proposed method outperforms existing methods, which demonstrates the potential usefulness of collaborative exploratory search with practically usable accuracy levels.
我们提出了协作探索性搜索(CES),它是对话分析和网络搜索的集成,涉及多方协作,以实现探索性信息检索目标。给定用户之间关于单一主题的实时对话;我们将CES定义为自动检测对话主题并检索与任务相关的网页以支持对话的任务。为了识别对话框的任务,我们应用Author- Topic模型作为主题模型。然后,对对话框进行属性提取,获得任务的属性。最后,生成一个特定的搜索查询来标识与任务相关的信息。我们将CES系统用于商业车载对话并对其进行评估。我们还开发了一款iPad应用程序,可以监听用户之间的对话,并不断检索相关网页。我们的实验结果表明,该方法优于现有的方法,这表明协作探索搜索具有实际可用的精度水平的潜在有用性。
{"title":"Collaborative exploratory search in real-world context","authors":"Naoki Tani, Danushka Bollegala, N. P. Chandrasiri, Keisuke Okamoto, Kazunari Nawa, S. Iitsuka, Y. Matsuo","doi":"10.1145/2063576.2063909","DOIUrl":"https://doi.org/10.1145/2063576.2063909","url":null,"abstract":"We propose Collaborative Exploratory Search (CES), which is an integration of dialog analysis and web search that involves multiparty collaboration to accomplish an exploratory information retrieval goal. Given a real-time dialog between users on a single topic; we define CES as the task of automatically detecting the topic of the dialog and retrieving task-relevant web pages to support the dialog. To recognize the task of the dialog, we apply the Author--Topic model as a topic model. Then, attribute extraction is applied to the dialog to obtain the attributes of the tasks. Finally, a specific search query is generated to identify the task-relevant information. We implement and evaluate the CES system for a commercial in-vehicle conversation. We also develop an iPad application that listens to conversations among users and continuously retrieves relevant web pages. Our experimental results reveal that the proposed method outperforms existing methods, which demonstrates the potential usefulness of collaborative exploratory search with practically usable accuracy levels.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"8 1","pages":"2137-2140"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78398134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Spreadsheet-based complex data transformation 基于电子表格的复杂数据转换
Vu Hung, B. Benatallah, Régis Saint-Paul
Spreadsheets are used by millions of users as a routine all-purpose data management tool. It is now increasingly necessary for external applications and services to consume spreadsheet data. In this paper, we investigate the problem of transforming spreadsheet data to structured formats required by these applications and services. Unlike prior methods, we propose a novel approach in which transformation logic is embedded into a familiar and expressive spreadsheet-like formula mapping language. Popular transformation patterns provided by transformation languages and mapping tools, that are relevant to spreadsheet-based data transformation, are supported in the language via formulas. Consequently, the language avoids cluttering the source spreadsheets with transformations and turns out to be helpful when multiple schemas are targeted. We implemented a prototype and evaluated the benefits of our approach via experiments in a real application. The experimental results confirmed the benefits of our approach.
电子表格被数百万用户用作常规的通用数据管理工具。现在,外部应用程序和服务越来越需要使用电子表格数据。在本文中,我们研究了将电子表格数据转换为这些应用程序和服务所需的结构化格式的问题。与之前的方法不同,我们提出了一种新的方法,将转换逻辑嵌入到熟悉的、表达性的电子表格式公式映射语言中。由转换语言和映射工具提供的与基于电子表格的数据转换相关的流行转换模式通过公式在语言中得到支持。因此,该语言避免了转换使源电子表格混乱,并且在针对多个模式时非常有用。我们实现了一个原型,并通过在实际应用程序中的实验评估了我们的方法的好处。实验结果证实了我们的方法的好处。
{"title":"Spreadsheet-based complex data transformation","authors":"Vu Hung, B. Benatallah, Régis Saint-Paul","doi":"10.1145/2063576.2063829","DOIUrl":"https://doi.org/10.1145/2063576.2063829","url":null,"abstract":"Spreadsheets are used by millions of users as a routine all-purpose data management tool. It is now increasingly necessary for external applications and services to consume spreadsheet data. In this paper, we investigate the problem of transforming spreadsheet data to structured formats required by these applications and services. Unlike prior methods, we propose a novel approach in which transformation logic is embedded into a familiar and expressive spreadsheet-like formula mapping language. Popular transformation patterns provided by transformation languages and mapping tools, that are relevant to spreadsheet-based data transformation, are supported in the language via formulas. Consequently, the language avoids cluttering the source spreadsheets with transformations and turns out to be helpful when multiple schemas are targeted. We implemented a prototype and evaluated the benefits of our approach via experiments in a real application. The experimental results confirmed the benefits of our approach.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"77 ","pages":"1749-1754"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2063576.2063829","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72430786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Social ranking for spoken web search 口语网络搜索的社会排名
Shrey Sahay, Nitendra Rajput, Niketan Pansare
Spoken Web is an alternative Web for low-literacy users in the developing world. People can create audio content over phone and share on the Spoken Web. This enables easy creation of locally relevant content. Even on the World Wide Web in developed regions, the recent increase in traffic is due to the locally relevant content created on social networking sites. This paper argues that content search and ranking in the new scenario needs a re-look. The generic model of using in-links for ranking such content is not an appropriate measure of the content relevance in such a collaborative Web 2.0 world. This paper aims to bring the social context in Spoken Web ranking. We formulate a relationship function between the query-creator and the content-creator and use this as one measure of the content relevance to the user. The relationship function uses the geographical location of the two people and their prior browsing preferences as parameters to determine the relationship between the two users. Further we also determine the trustability of the content based on the content creator's acceptance measure by the social network. We use these two features in addition to the term-frequency - inverse-term-frequency match to rank the search results in context of the social network of the query-creator and provide a more specific and socially relevant result to the user.
口语网络是发展中国家低文化水平用户的另一种选择。人们可以通过电话创建音频内容,并在口语网络上分享。这使得轻松创建本地相关内容成为可能。即使在发达地区的万维网上,最近流量的增加也是由于社交网站上创建的与当地相关的内容。本文认为,新场景下的内容搜索和排名需要重新审视。在这样一个协作的Web 2.0世界中,使用内链接对内容进行排序的通用模型并不是衡量内容相关性的合适方法。本文旨在将社会语境引入口语网络排名中。我们在查询创建者和内容创建者之间建立了一个关系函数,并将其用作衡量内容与用户相关性的一种方法。关系函数使用两个人的地理位置和他们之前的浏览偏好作为参数来确定两个用户之间的关系。此外,我们还根据内容创作者对社交网络的接受程度来确定内容的可信度。我们使用这两个特征,再加上词频-逆词频匹配,在查询创建者的社交网络上下文中对搜索结果进行排序,并为用户提供更具体和与社交相关的结果。
{"title":"Social ranking for spoken web search","authors":"Shrey Sahay, Nitendra Rajput, Niketan Pansare","doi":"10.1145/2063576.2063840","DOIUrl":"https://doi.org/10.1145/2063576.2063840","url":null,"abstract":"Spoken Web is an alternative Web for low-literacy users in the developing world. People can create audio content over phone and share on the Spoken Web. This enables easy creation of locally relevant content. Even on the World Wide Web in developed regions, the recent increase in traffic is due to the locally relevant content created on social networking sites. This paper argues that content search and ranking in the new scenario needs a re-look. The generic model of using in-links for ranking such content is not an appropriate measure of the content relevance in such a collaborative Web 2.0 world. This paper aims to bring the social context in Spoken Web ranking. We formulate a relationship function between the query-creator and the content-creator and use this as one measure of the content relevance to the user. The relationship function uses the geographical location of the two people and their prior browsing preferences as parameters to determine the relationship between the two users. Further we also determine the trustability of the content based on the content creator's acceptance measure by the social network. We use these two features in addition to the term-frequency - inverse-term-frequency match to rank the search results in context of the social network of the query-creator and provide a more specific and socially relevant result to the user.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"4 4 1","pages":"1835-1840"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75941534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Privacy protected knowledge management in services with emphasis on quality data 隐私保护服务中的知识管理,强调数据质量
Debapriyo Majumdar, R. Catherine, S. Ikbal, Karthik Venkat Ramanan
Improving productivity of practitioners through effective knowledge management and delivering high quality service in Application Management Services (AMS) domain, are key focus areas for all IT services organizations. One source of historical knowledge in AMS is the large amount of resolved problem ticket data which are often confidential, immensely valuable, but majority of it is of very bad quality. In this paper, we present a knowledge management tool that detects the quality of information present in problem tickets and enables effective knowledge search in tickets by prioritizing quality data in the search ranking. The tool facilitates leveraging of knowledge across different AMS accounts, while preserving data privacy, by masking client confidential information. It also extracts several relevant entities contained in the noisy unstructured text entered in the tickets and presents them to the users. We present several experimental evaluations and a pilot study conducted with an AMS account which show that our tool is effective and leads to substantial improvement in productivity of the practitioners.
通过有效的知识管理和在应用程序管理服务(AMS)领域提供高质量的服务来提高从业人员的生产力,是所有IT服务组织关注的关键领域。AMS中历史知识的一个来源是大量已解决的问题单数据,这些数据通常是机密的,非常有价值,但其中大多数质量非常差。在本文中,我们提出了一种知识管理工具,它可以检测问题票证中存在的信息质量,并通过在搜索排名中优先考虑质量数据来实现票证中的有效知识搜索。该工具通过屏蔽客户机密信息来促进跨不同AMS帐户的知识利用,同时保护数据隐私。它还提取了门票中输入的嘈杂的非结构化文本中包含的几个相关实体,并将它们呈现给用户。我们提出了几个实验评估和试点研究进行了一个AMS帐户,这表明我们的工具是有效的,并导致从业者的生产力大幅提高。
{"title":"Privacy protected knowledge management in services with emphasis on quality data","authors":"Debapriyo Majumdar, R. Catherine, S. Ikbal, Karthik Venkat Ramanan","doi":"10.1145/2063576.2063848","DOIUrl":"https://doi.org/10.1145/2063576.2063848","url":null,"abstract":"Improving productivity of practitioners through effective knowledge management and delivering high quality service in Application Management Services (AMS) domain, are key focus areas for all IT services organizations. One source of historical knowledge in AMS is the large amount of resolved problem ticket data which are often confidential, immensely valuable, but majority of it is of very bad quality. In this paper, we present a knowledge management tool that detects the quality of information present in problem tickets and enables effective knowledge search in tickets by prioritizing quality data in the search ranking. The tool facilitates leveraging of knowledge across different AMS accounts, while preserving data privacy, by masking client confidential information. It also extracts several relevant entities contained in the noisy unstructured text entered in the tickets and presents them to the users. We present several experimental evaluations and a pilot study conducted with an AMS account which show that our tool is effective and leads to substantial improvement in productivity of the practitioners.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"144 2","pages":"1889-1894"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2063576.2063848","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72482215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Robust video fingerprinting based on hierarchical symmetric difference feature 基于层次对称差分特征的鲁棒视频指纹识别
Jungho Lee, Seungjae Lee, Yong-seok Seo, Won-young Yoo
The piracy of copyrighted digital content over the Internet infringes copyrights and damages the digital content industry. Accordingly, identifying and monitoring technology on the online content service like fingerprinting is getting valuable through the explosion of digital content sharing. This paper proposes a robust video fingerprinting feature to identify a modified video clip from a large scale database. Hierarchical symmetric difference feature is proposed in order to offer efficient video fingerprinting. The feature is robust and pairwise independent against various video modifications such as compression, resizing, or cropping. Moreover, videos undergoing a transformation such as flipping or mirroring can be identified by simply disordering the bit pattern of fingerprints. The performance of the proposed feature is extensively experimented on 6,482 hours of database and the experimental results show that the proposed fingerprinting is efficient and robust against various modifications.
在互联网上盗版受版权保护的数字内容侵犯了版权,损害了数字内容产业。因此,随着数字内容共享的爆炸式增长,指纹等在线内容服务的识别和监控技术变得越来越有价值。本文提出了一种鲁棒的视频指纹特征,用于从大型数据库中识别修改后的视频片段。为了实现高效的视频指纹识别,提出了层次对称差分特征。该功能健壮且独立于各种视频修改,如压缩、调整大小或裁剪。此外,经过翻转或镜像等变换的视频可以通过简单地打乱指纹的位模式来识别。在6,482小时的数据库中对该特征进行了大量的实验,实验结果表明,该特征对各种修改都具有良好的鲁棒性和有效性。
{"title":"Robust video fingerprinting based on hierarchical symmetric difference feature","authors":"Jungho Lee, Seungjae Lee, Yong-seok Seo, Won-young Yoo","doi":"10.1145/2063576.2063897","DOIUrl":"https://doi.org/10.1145/2063576.2063897","url":null,"abstract":"The piracy of copyrighted digital content over the Internet infringes copyrights and damages the digital content industry. Accordingly, identifying and monitoring technology on the online content service like fingerprinting is getting valuable through the explosion of digital content sharing. This paper proposes a robust video fingerprinting feature to identify a modified video clip from a large scale database. Hierarchical symmetric difference feature is proposed in order to offer efficient video fingerprinting. The feature is robust and pairwise independent against various video modifications such as compression, resizing, or cropping. Moreover, videos undergoing a transformation such as flipping or mirroring can be identified by simply disordering the bit pattern of fingerprints. The performance of the proposed feature is extensively experimented on 6,482 hours of database and the experimental results show that the proposed fingerprinting is efficient and robust against various modifications.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"8 1","pages":"2089-2092"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74972888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management
全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1