首页 > 最新文献

Proceedings of the 21st ACM international conference on Information and knowledge management最新文献

英文 中文
PicAlert!: a system for privacy-aware image classification and retrieval PicAlert !:一个具有隐私意识的图像分类和检索系统
Sergej Zerr, Stefan Siersdorfer, Jonathon S. Hare
Photo publishing in Social Networks and other Web2.0 applications has become very popular due to the pervasive availability of cheap digital cameras, powerful batch upload tools and a huge amount of storage space. A portion of uploaded images are of a highly sensitive nature, disclosing many details of the users' private life. We have developed a web service which can detect private images within a user's photo stream and provide support in making privacy decisions in the sharing context. In addition, we present a privacy-oriented image search application which automatically identifies potentially sensitive images in the result set and separates them from the remaining pictures.
由于廉价数码相机的普及、强大的批量上传工具和巨大的存储空间,在社交网络和其他Web2.0应用程序中发布照片已经变得非常流行。上传的部分图片具有高度敏感的性质,泄露了用户私生活的许多细节。我们开发了一种网络服务,可以检测用户照片流中的私人图像,并在共享环境中为隐私决策提供支持。此外,我们提出了一个面向隐私的图像搜索应用程序,该应用程序自动识别结果集中潜在的敏感图像,并将它们与其余图像分开。
{"title":"PicAlert!: a system for privacy-aware image classification and retrieval","authors":"Sergej Zerr, Stefan Siersdorfer, Jonathon S. Hare","doi":"10.1145/2396761.2398735","DOIUrl":"https://doi.org/10.1145/2396761.2398735","url":null,"abstract":"Photo publishing in Social Networks and other Web2.0 applications has become very popular due to the pervasive availability of cheap digital cameras, powerful batch upload tools and a huge amount of storage space. A portion of uploaded images are of a highly sensitive nature, disclosing many details of the users' private life. We have developed a web service which can detect private images within a user's photo stream and provide support in making privacy decisions in the sharing context. In addition, we present a privacy-oriented image search application which automatically identifies potentially sensitive images in the result set and separates them from the remaining pictures.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Incorporating variability in user behavior into systems based evaluation 将用户行为的可变性纳入基于系统的评估中
Ben Carterette, E. Kanoulas, Emine Yilmaz
Click logs present a wealth of evidence about how users interact with a search system. This evidence has been used for many things: learning rankings, personalizing, evaluating effectiveness, and more. But it is almost always distilled into point estimates of feature or parameter values, ignoring what may be the most salient feature of users---their variability. No two users interact with a system in exactly the same way, and even a single user may interact with results for the same query differently depending on information need, mood, time of day, and a host of other factors. We present a Bayesian approach to using logs to compute posterior distributions for probabilistic models of user interactions. Since they are distributions rather than point estimates, they naturally capture variability in the population. We show how to cluster posterior distributions to discover patterns of user interactions in logs, and discuss how to use the clusters to evaluate search engines according to a user model. Because the approach is Bayesian, our methods can be applied to very large logs (such as those possessed by Web search engines) as well as very small (such as those found in almost any other setting).
点击日志提供了大量关于用户如何与搜索系统交互的证据。这种证据被用于很多事情:学习排名、个性化、评估有效性等等。但它几乎总是被提炼成特征或参数值的点估计,而忽略了可能是用户最显著的特征——他们的可变性。没有两个用户以完全相同的方式与系统交互,甚至单个用户也可能根据信息需求、心情、一天中的时间和许多其他因素,以不同的方式与同一查询的结果交互。我们提出了一种贝叶斯方法,使用日志来计算用户交互概率模型的后验分布。由于它们是分布而不是点估计,因此它们自然地捕捉到了种群中的可变性。我们展示了如何聚类后验分布来发现日志中的用户交互模式,并讨论了如何根据用户模型使用聚类来评估搜索引擎。由于该方法是贝叶斯方法,因此我们的方法既可以应用于非常大的日志(例如Web搜索引擎拥有的日志),也可以应用于非常小的日志(例如几乎在任何其他设置中发现的日志)。
{"title":"Incorporating variability in user behavior into systems based evaluation","authors":"Ben Carterette, E. Kanoulas, Emine Yilmaz","doi":"10.1145/2396761.2396782","DOIUrl":"https://doi.org/10.1145/2396761.2396782","url":null,"abstract":"Click logs present a wealth of evidence about how users interact with a search system. This evidence has been used for many things: learning rankings, personalizing, evaluating effectiveness, and more. But it is almost always distilled into point estimates of feature or parameter values, ignoring what may be the most salient feature of users---their variability. No two users interact with a system in exactly the same way, and even a single user may interact with results for the same query differently depending on information need, mood, time of day, and a host of other factors. We present a Bayesian approach to using logs to compute posterior distributions for probabilistic models of user interactions. Since they are distributions rather than point estimates, they naturally capture variability in the population. We show how to cluster posterior distributions to discover patterns of user interactions in logs, and discuss how to use the clusters to evaluate search engines according to a user model. Because the approach is Bayesian, our methods can be applied to very large logs (such as those possessed by Web search engines) as well as very small (such as those found in almost any other setting).","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128463578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Discovering logical knowledge for deep question answering 发现逻辑知识,进行深度问答
Zhao Liu, Xipeng Qiu, L. Cao, Xuanjing Huang
Most open-domain question answering systems achieve better performances with large corpora, such as Web, by taking advantage of information redundancy. However, explicit answers are not always mentioned in the corpus, many answers are implicitly contained and can only be deducted by inference. In this paper, we propose an approach to discover logical knowledge for deep question answering, which automatically extracts knowledge in an unsupervised, domain-independent manner from background texts and reasons out implicit answers for the questions. Firstly, we use semantic role labeling to transform natural language expressions to predicates in first-order logic. Then we use association analysis to uncover the implicit relations among these predicates and build propositions for inference. Since our knowledge is drawn from different sources, we use Markov logic to merge multiple knowledge bases without resolving their inconsistencies. Our experiments show that these propositions can improve the performance of question answering significantly.
大多数开放域问答系统通过利用信息冗余,在Web等大型语料库中获得了较好的性能。然而,语料库中并不总是提到显式答案,许多答案是隐含的,只能通过推理来推导。在本文中,我们提出了一种用于深度问答的逻辑知识发现方法,该方法以无监督、领域独立的方式从背景文本中自动提取知识,并推断出问题的隐含答案。首先,我们使用语义角色标记将自然语言表达式转换为一阶逻辑中的谓词。然后,我们使用关联分析来揭示这些谓词之间的隐含关系,并构建推理命题。由于我们的知识来自不同的来源,我们使用马尔可夫逻辑来合并多个知识库,而不解决它们的不一致性。我们的实验表明,这些命题可以显著提高问答的性能。
{"title":"Discovering logical knowledge for deep question answering","authors":"Zhao Liu, Xipeng Qiu, L. Cao, Xuanjing Huang","doi":"10.1145/2396761.2398544","DOIUrl":"https://doi.org/10.1145/2396761.2398544","url":null,"abstract":"Most open-domain question answering systems achieve better performances with large corpora, such as Web, by taking advantage of information redundancy. However, explicit answers are not always mentioned in the corpus, many answers are implicitly contained and can only be deducted by inference. In this paper, we propose an approach to discover logical knowledge for deep question answering, which automatically extracts knowledge in an unsupervised, domain-independent manner from background texts and reasons out implicit answers for the questions. Firstly, we use semantic role labeling to transform natural language expressions to predicates in first-order logic. Then we use association analysis to uncover the implicit relations among these predicates and build propositions for inference. Since our knowledge is drawn from different sources, we use Markov logic to merge multiple knowledge bases without resolving their inconsistencies. Our experiments show that these propositions can improve the performance of question answering significantly.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multiview hierarchical bayesian regression model andapplication to online advertising 多视图层次贝叶斯回归模型及其在网络广告中的应用
Tianbing Xu, Ruofei Zhang, Zhen Guo
With the development of Web applications, large scale data are popular; and they are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about multi-view data with implicit structure. In this paper, we propose a novel hierarchical Bayesian mixture regression model, which discovers and then exploits the relationships among multiple views of the data to perform various machine learning tasks. A stochastic EM inference and learning algorithm is derived; and a parallel implementation in Hadoop MapReduce [9] paradigm is developed to scale up the learning. We apply the developed model and algorithm on click-through-rate (CTR) prediction and campaign targeting recommendation in online advertising to measure its effectiveness. The experiments on both synthetic data and large scale ads serving data from a real world online advertising exchange demonstrate the superior CTR prediction accuracy of our method compared to existing state-of-the-art methods. The results also show that our model can recommend high performance targeting features for online advertising campaigns.
随着Web应用程序的发展,大规模数据越来越受欢迎;它们不仅越来越丰富,而且以各种方式无处不在地与用户和其他对象相互连接,从而产生了具有隐式结构的多视图数据。在本文中,我们提出了一种新的分层贝叶斯混合回归模型,该模型发现并利用数据的多个视图之间的关系来执行各种机器学习任务。推导了一种随机电磁推理和学习算法;并在Hadoop MapReduce[9]范式中开发了一个并行实现来扩展学习。我们将所开发的模型和算法应用于网络广告的点击率预测和活动目标推荐,以衡量其有效性。在合成数据和来自真实世界在线广告交易所的大规模广告服务数据上的实验表明,与现有的最先进的方法相比,我们的方法具有更高的点击率预测精度。结果还表明,我们的模型可以为在线广告活动推荐高性能的目标功能。
{"title":"Multiview hierarchical bayesian regression model andapplication to online advertising","authors":"Tianbing Xu, Ruofei Zhang, Zhen Guo","doi":"10.1145/2396761.2396825","DOIUrl":"https://doi.org/10.1145/2396761.2396825","url":null,"abstract":"With the development of Web applications, large scale data are popular; and they are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about multi-view data with implicit structure. In this paper, we propose a novel hierarchical Bayesian mixture regression model, which discovers and then exploits the relationships among multiple views of the data to perform various machine learning tasks. A stochastic EM inference and learning algorithm is derived; and a parallel implementation in Hadoop MapReduce [9] paradigm is developed to scale up the learning. We apply the developed model and algorithm on click-through-rate (CTR) prediction and campaign targeting recommendation in online advertising to measure its effectiveness. The experiments on both synthetic data and large scale ads serving data from a real world online advertising exchange demonstrate the superior CTR prediction accuracy of our method compared to existing state-of-the-art methods. The results also show that our model can recommend high performance targeting features for online advertising campaigns.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MEET: a generalized framework for reciprocal recommender systems 相互推荐系统的广义框架
Lei Li, Tao Li
Reciprocal recommender systems refer to systems from which users can obtain recommendations of other individuals by satisfying preferences of both parties being involved. Different from the traditional user-item recommendation, reciprocal recommenders focus on the preferences of both parties simultaneously, as well as some special properties in terms of "reciprocal". In this paper, we propose MEET -- a generalized framework for reciprocal recommendation, in which we model the correlations of users as a bipartite graph that maintains both local and global "reciprocal" utilities. The local utility captures users' mutual preferences, whereas the global utility manages the overall quality of the entire reciprocal network. Extensive empirical evaluation on two real-world data sets (online dating and online recruiting) demonstrates the effectiveness of our proposed framework compared with existing recommendation algorithms. Our analysis also provides deep insights into the special aspects of reciprocal recommenders that differentiate them from user-item recommender systems.
互惠推荐系统是指用户通过满足参与双方的偏好来获得其他个人推荐的系统。与传统的用户-物品推荐不同,互惠推荐同时关注双方的偏好,并具有“互惠”方面的一些特殊属性。在本文中,我们提出了MEET——一个互惠推荐的广义框架,其中我们将用户的相关性建模为维护局部和全局“互惠”效用的二部图。本地效用捕获用户的共同偏好,而全局效用管理整个互惠网络的整体质量。对两个真实世界数据集(在线约会和在线招聘)的广泛实证评估表明,与现有推荐算法相比,我们提出的框架是有效的。我们的分析还提供了对互惠推荐的特殊方面的深刻见解,这些方面区别于用户-物品推荐系统。
{"title":"MEET: a generalized framework for reciprocal recommender systems","authors":"Lei Li, Tao Li","doi":"10.1145/2396761.2396770","DOIUrl":"https://doi.org/10.1145/2396761.2396770","url":null,"abstract":"Reciprocal recommender systems refer to systems from which users can obtain recommendations of other individuals by satisfying preferences of both parties being involved. Different from the traditional user-item recommendation, reciprocal recommenders focus on the preferences of both parties simultaneously, as well as some special properties in terms of \"reciprocal\". In this paper, we propose MEET -- a generalized framework for reciprocal recommendation, in which we model the correlations of users as a bipartite graph that maintains both local and global \"reciprocal\" utilities. The local utility captures users' mutual preferences, whereas the global utility manages the overall quality of the entire reciprocal network. Extensive empirical evaluation on two real-world data sets (online dating and online recruiting) demonstrates the effectiveness of our proposed framework compared with existing recommendation algorithms. Our analysis also provides deep insights into the special aspects of reciprocal recommenders that differentiate them from user-item recommender systems.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129651140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Importance weighted passive learning 重要性加权被动学习
Shuaiqiang Wang, Xiaoming Xi, Yilong Yin
Importance weighted active learning (IWAL) introduces a weighting scheme to measure the importance of each instance for correcting the sampling bias of the probability distributions between training and test datasets. However, the weighting scheme of IWAL involves the distribution of the test data, which can be straightforwardly estimated in active learning by interactively querying users for labels of selected test instances, but difficult for conventional learning where there are no interactions with users, referred as passive learning. In this paper, we investigate the insufficient sampling bias problem, i.e., bias occurs only because of insufficient samples, but the sampling process is unbiased. In doing this, we present two assumptions on the sampling bias, based on which we propose a practical weighting scheme for the empirical loss function in conventional passive learning, and present IWPL, an importance weighted passive learning framework. Furthermore, we provide IWSVM, an importance weighted SVM for validation. Extensive experiments demonstrate significant advantages of IWSVM on benchmarks and synthetic datasets.
重要性加权主动学习(IWAL)引入了一种加权方案来衡量每个实例的重要性,以纠正训练数据集和测试数据集之间概率分布的抽样偏差。然而,IWAL的加权方案涉及到测试数据的分布,在主动学习中可以通过交互地查询用户所选测试实例的标签来直接估计测试数据的分布,但在没有与用户交互的传统学习中则比较困难,称为被动学习。在本文中,我们研究了不充分抽样偏差问题,即仅由于样本不足而产生偏差,但抽样过程是无偏的。在此过程中,我们提出了关于采样偏差的两个假设,在此基础上,我们提出了传统被动学习中经验损失函数的实用加权方案,并提出了IWPL,一个重要加权被动学习框架。此外,我们提供了IWSVM,一种重要加权支持向量机进行验证。大量的实验证明了IWSVM在基准测试和合成数据集上的显著优势。
{"title":"Importance weighted passive learning","authors":"Shuaiqiang Wang, Xiaoming Xi, Yilong Yin","doi":"10.1145/2396761.2398611","DOIUrl":"https://doi.org/10.1145/2396761.2398611","url":null,"abstract":"Importance weighted active learning (IWAL) introduces a weighting scheme to measure the importance of each instance for correcting the sampling bias of the probability distributions between training and test datasets. However, the weighting scheme of IWAL involves the distribution of the test data, which can be straightforwardly estimated in active learning by interactively querying users for labels of selected test instances, but difficult for conventional learning where there are no interactions with users, referred as passive learning. In this paper, we investigate the insufficient sampling bias problem, i.e., bias occurs only because of insufficient samples, but the sampling process is unbiased. In doing this, we present two assumptions on the sampling bias, based on which we propose a practical weighting scheme for the empirical loss function in conventional passive learning, and present IWPL, an importance weighted passive learning framework. Furthermore, we provide IWSVM, an importance weighted SVM for validation. Extensive experiments demonstrate significant advantages of IWSVM on benchmarks and synthetic datasets.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130504567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SHB 2012: international workshop on smart health and wellbeing SHB 2012:智能健康和福祉国际讲习班
Christopher C. Yang, Hsinchun Chen, H. Wactlar, Combi Carlo, Xuning Tang
The Smart Health and Wellbeing workshop is organized to develop a platform for authors to discuss fundamental principles, algorithms or applications of intelligent data acquisition, processing and analysis of healthcare data. We are particularly interested in information and knowledge management papers, in which the approaches are accompanied by an in-depth experimental evaluation with real world data. This paper provides an overview of the workshop and the accepted contributions.
智能健康与福祉研讨会旨在为作者提供一个平台,讨论智能数据采集、处理和分析的基本原理、算法或应用。我们对信息和知识管理的论文特别感兴趣,这些论文的方法伴随着对真实世界数据的深入实验评估。本文提供了研讨会的概述和接受的贡献。
{"title":"SHB 2012: international workshop on smart health and wellbeing","authors":"Christopher C. Yang, Hsinchun Chen, H. Wactlar, Combi Carlo, Xuning Tang","doi":"10.1145/2396761.2398756","DOIUrl":"https://doi.org/10.1145/2396761.2398756","url":null,"abstract":"The Smart Health and Wellbeing workshop is organized to develop a platform for authors to discuss fundamental principles, algorithms or applications of intelligent data acquisition, processing and analysis of healthcare data. We are particularly interested in information and knowledge management papers, in which the approaches are accompanied by an in-depth experimental evaluation with real world data. This paper provides an overview of the workshop and the accepted contributions.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126919444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full-text citation analysis: enhancing bibliometric and scientific publication ranking 全文引文分析:提高文献计量学和科学出版物排名
Xiaozhong Liu, Jinsong Zhang, Chun Guo
The goal of this paper is to use innovative text and graph mining algorithms along with full-text citation analysis and topic modeling to enhance classical bibliometric analysis and publication ranking. By utilizing citation contexts extracted from a large number of full-text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. Based on 104 topics (labeled with keywords) and their review papers, the cited publications of each review paper are assumed as "important publications" for ranking evaluation. The result shows that full text citation and publication content prior topic distribution along with the PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance for academic IR system.
本文的目标是利用创新的文本和图形挖掘算法以及全文引文分析和主题建模来增强经典文献计量分析和出版物排名。通过利用从大量全文出版物中提取的引文上下文,每个引文或出版物由一组预定义主题的概率分布表示,其中每个主题由作者贡献的关键字标记。然后,我们使用出版物/引文主题分布来生成具有顶点先验和边缘转移概率分布的引文图。每个给定主题的发表重要性分数由PageRank计算,并具有边和顶点先验分布。基于104个主题(标注关键词)及其综述论文,假设每篇综述论文的被引出版物为“重要出版物”进行排名评价。结果表明,全文引用和出版物内容优先主题分布结合PageRank算法可以显著提高学术IR系统的文献计量分析和科学出版物排名性能。
{"title":"Full-text citation analysis: enhancing bibliometric and scientific publication ranking","authors":"Xiaozhong Liu, Jinsong Zhang, Chun Guo","doi":"10.1145/2396761.2398555","DOIUrl":"https://doi.org/10.1145/2396761.2398555","url":null,"abstract":"The goal of this paper is to use innovative text and graph mining algorithms along with full-text citation analysis and topic modeling to enhance classical bibliometric analysis and publication ranking. By utilizing citation contexts extracted from a large number of full-text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. Based on 104 topics (labeled with keywords) and their review papers, the cited publications of each review paper are assumed as \"important publications\" for ranking evaluation. The result shows that full text citation and publication content prior topic distribution along with the PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance for academic IR system.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129037555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Predicting CTR of new ads via click prediction 通过点击预测来预测新广告的点击率
Alexander Kolesnikov, Yury Logachev, V. A. Topinskiy
Predicting CTR of ads on the search result page is an urgent topic. The reason for this is that choosing the right advertisement greatly affects revenue of the search engine and advertisers and user's satisfaction. For ads with the large click history it is quite clear how to predict CTR by utilizing statistical data. But for new ads with a poor click history such approach is not robust and reliable. We suggest a model for predicting CTR of such new ads. Contrary to the previous models of predicting CTR of new ads, our model uses events - clicks and skips1 instead of the observed CTR. In addition we have implemented several novel features, that resulted into the increase of the performance of our model. Offline and online experiments on the real search engine system demonstrated that our model outperforms the baseline and the approaches suggested in previous papers.
预测搜索结果页面上广告的点击率是一个紧迫的话题。这样做的原因是,选择合适的广告极大地影响了搜索引擎和广告商的收入以及用户的满意度。对于具有大量点击记录的广告,如何利用统计数据来预测点击率是非常清楚的。但对于点击记录不佳的新广告,这种方法并不稳健和可靠。我们提出了一个模型来预测这些新广告的点击率。与之前预测新广告点击率的模型相反,我们的模型使用事件-点击和跳过1而不是观察到的点击率。此外,我们还实现了几个新颖的功能,从而提高了模型的性能。在真实的搜索引擎系统上进行的离线和在线实验表明,我们的模型优于基线和先前论文中提出的方法。
{"title":"Predicting CTR of new ads via click prediction","authors":"Alexander Kolesnikov, Yury Logachev, V. A. Topinskiy","doi":"10.1145/2396761.2398688","DOIUrl":"https://doi.org/10.1145/2396761.2398688","url":null,"abstract":"Predicting CTR of ads on the search result page is an urgent topic. The reason for this is that choosing the right advertisement greatly affects revenue of the search engine and advertisers and user's satisfaction. For ads with the large click history it is quite clear how to predict CTR by utilizing statistical data. But for new ads with a poor click history such approach is not robust and reliable. We suggest a model for predicting CTR of such new ads. Contrary to the previous models of predicting CTR of new ads, our model uses events - clicks and skips1 instead of the observed CTR. In addition we have implemented several novel features, that resulted into the increase of the performance of our model. Offline and online experiments on the real search engine system demonstrated that our model outperforms the baseline and the approaches suggested in previous papers.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130611700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Weighted linear kernel with tree transformed features for malware detection 基于树变换特征的加权线性核
Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan
Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.
由于类分布不平衡、噪声、缺失值和异构类型特征等数据不规则性问题,从网络流量中检测恶意软件是一个具有挑战性的问题。为了解决这些问题,本文提出了一种两阶段分类的恶意软件检测方法。该框架最初使用随机森林作为宏观分类器来分离恶意和非恶意网络流,然后使用一组单类支持向量机分类器来识别特定类型的恶意软件。针对数据不完美问题,提出了一种基于树的特征构建方法。由于支持向量机分类器的性能往往取决于用于计算每对数据点之间相似度的核函数,因此设计合适的核函数对于准确识别恶意软件类至关重要。我们提出了一种简单的算法,在树变换的特征上构造加权线性核,并证明了它在从真实网络流量数据中检测恶意软件方面的有效性。
{"title":"Weighted linear kernel with tree transformed features for malware detection","authors":"Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan","doi":"10.1145/2396761.2398622","DOIUrl":"https://doi.org/10.1145/2396761.2398622","url":null,"abstract":"Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130637947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings of the 21st ACM international conference on Information and knowledge management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1