首页 > 最新文献

Proceedings of the 21st ACM international conference on Information and knowledge management最新文献

英文 中文
DUBMMSM'12: international workshop on data-driven user behavioral modeling and mining from social media 社交媒体数据驱动的用户行为建模与挖掘国际研讨会[j]
J. Mahmud, James Caverlee, Jeffrey Nichols, J. O'Donovan, Michelle X. Zhou
Massive amounts of data are being generated on social media sites, such as Twitter and Facebook. This data can be used to better understand people, such as their personality traits, perceptions, and preferences, and predict their behavior. This deeper understanding of users and their behaviors can benefit a wide range of intelligent applications, such as advertising, social recommender systems, and personalized knowledge management. These applications will also benefit individual users themselves by optimizing their experiences across a wide variety of domains, such as retail, healthcare, and education. Since mining and understanding user behavior from social media often requires interdisciplinary effort, including machine learning, text mining, human-computer interaction, and social science, our workshop aims to bring together researchers and practitioners from multiple fields to discuss the creation of deeper models of individual users by mining the content that they publish and the social networking behavior that they exhibit.
Twitter和Facebook等社交媒体网站正在产生大量数据。这些数据可以用来更好地了解人们,比如他们的个性特征、观念和偏好,并预测他们的行为。这种对用户及其行为的更深入的理解可以使广泛的智能应用受益,例如广告、社交推荐系统和个性化知识管理。通过优化零售、医疗保健和教育等广泛领域的体验,这些应用程序还将使个人用户本身受益。由于从社交媒体中挖掘和理解用户行为通常需要跨学科的努力,包括机器学习、文本挖掘、人机交互和社会科学,我们的研讨会旨在汇集来自多个领域的研究人员和实践者,通过挖掘他们发布的内容和他们展示的社交网络行为来讨论个人用户的更深层次模型的创建。
{"title":"DUBMMSM'12: international workshop on data-driven user behavioral modeling and mining from social media","authors":"J. Mahmud, James Caverlee, Jeffrey Nichols, J. O'Donovan, Michelle X. Zhou","doi":"10.1145/2396761.2398751","DOIUrl":"https://doi.org/10.1145/2396761.2398751","url":null,"abstract":"Massive amounts of data are being generated on social media sites, such as Twitter and Facebook. This data can be used to better understand people, such as their personality traits, perceptions, and preferences, and predict their behavior. This deeper understanding of users and their behaviors can benefit a wide range of intelligent applications, such as advertising, social recommender systems, and personalized knowledge management. These applications will also benefit individual users themselves by optimizing their experiences across a wide variety of domains, such as retail, healthcare, and education. Since mining and understanding user behavior from social media often requires interdisciplinary effort, including machine learning, text mining, human-computer interaction, and social science, our workshop aims to bring together researchers and practitioners from multiple fields to discuss the creation of deeper models of individual users by mining the content that they publish and the social networking behavior that they exhibit.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114770250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Author-conference topic-connection model for academic network search 学术网络搜索的作者-会议-主题连接模型
Jianwen Wang, Xiaohua Hu, Xinhui Tu, Tingting He
This paper proposes a novel topic model, Author-Conference Topic-Connection (ACTC) Model for academic network search. The ACTC Model extends the author-conference-topic (ACT) model by adding subject of the conference and the latent mapping information between subjects and topics. It simultaneously models topical aspects of papers, authors and conferences with two latent topic layers: a subject layer corresponding to conference topic, and a topic layer corresponding to the word topic. Each author would be associated with a multinomial distribution over subjects of conference (eg., KM, DB, IR for CIKM 2012), the conference(CIKM 2012), and the topics are respectively generated from a sampled subject. Then the words are generated from the sampled topics. We conduct experiments on a data set with 8,523 authors, 22,487 papers and 1,243 conferences from the well-known Arnetminer website, and train the model with different number of subjects and topics. For a qualitative evaluation, we compare ACTC with three others models LDA, Author-Topic (AT) and ACT in academic search services. Experiments show that ACTC can effectively capture the semantic connection between different types of information in academic network and perform well in expert searching and conference searching.
本文提出了一种新颖的学术网络搜索主题模型——作者-会议-主题连接模型。ACTC模型通过添加会议主题和主题与主题之间的潜在映射信息,扩展了作者-会议-主题(ACT)模型。它同时用两个潜在的主题层对论文、作者和会议的主题方面进行建模:一个与会议主题对应的主题层,一个与单词主题对应的主题层。每个作者将与会议主题的多项分布相关联。, KM, DB, IR (CIKM 2012),会议(CIKM 2012)和主题分别由采样主题生成。然后从采样的主题中生成单词。我们对来自知名网站Arnetminer的8,523位作者,22,487篇论文和1,243次会议的数据集进行实验,并使用不同数量的主题和主题来训练模型。为了进行定性评价,我们将ACTC与学术搜索服务中的其他三种模型LDA、作者-主题(AT)和ACT进行了比较。实验表明,ACTC可以有效地捕获学术网络中不同类型信息之间的语义联系,在专家搜索和会议搜索中表现良好。
{"title":"Author-conference topic-connection model for academic network search","authors":"Jianwen Wang, Xiaohua Hu, Xinhui Tu, Tingting He","doi":"10.1145/2396761.2398597","DOIUrl":"https://doi.org/10.1145/2396761.2398597","url":null,"abstract":"This paper proposes a novel topic model, Author-Conference Topic-Connection (ACTC) Model for academic network search. The ACTC Model extends the author-conference-topic (ACT) model by adding subject of the conference and the latent mapping information between subjects and topics. It simultaneously models topical aspects of papers, authors and conferences with two latent topic layers: a subject layer corresponding to conference topic, and a topic layer corresponding to the word topic. Each author would be associated with a multinomial distribution over subjects of conference (eg., KM, DB, IR for CIKM 2012), the conference(CIKM 2012), and the topics are respectively generated from a sampled subject. Then the words are generated from the sampled topics. We conduct experiments on a data set with 8,523 authors, 22,487 papers and 1,243 conferences from the well-known Arnetminer website, and train the model with different number of subjects and topics. For a qualitative evaluation, we compare ACTC with three others models LDA, Author-Topic (AT) and ACT in academic search services. Experiments show that ACTC can effectively capture the semantic connection between different types of information in academic network and perform well in expert searching and conference searching.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127770067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Theme chronicle model: chronicle consists of timestamp and topical words over each theme 主题编年史模型:编年史由时间戳和每个主题的主题词组成
N. Kawamae
This paper presents a topic model that discovers the correlation patterns in a given time-stamped document collection and how these patterns evolve over time. Our proposal, the theme chronicle model (TCM) divides traditional topics into temporal and stable topics to detect the change of each theme over time; previous topic models ignore these differences and characterize trends as merely bursts of topics. TCM introduces a theme topic (stable topic), a trend topic (temporal topic), timestamps, and a latent switch variable in each token to realize these differences. Its topic layers allow TCM to capture not only word co-occurrence patterns in each theme, but also word co-occurrence patterns at any given time in each theme as trends. Experiments on various data sets show that the proposed model is useful as a generative model to discover fine-grained tightly coherent topics, takes advantage of previous models, and then assigns values for new documents.
本文提出了一个主题模型,该模型可以发现给定时间戳文档集合中的相关模式,以及这些模式如何随时间演变。本文提出的主题纪事模型(TCM)将传统主题分为时变主题和稳定主题,以检测每个主题随时间的变化;以前的主题模型忽略了这些差异,将趋势描述为仅仅是主题的爆发。TCM在每个令牌中引入主题主题(稳定主题)、趋势主题(时态主题)、时间戳和潜在开关变量来实现这些差异。它的主题层使TCM不仅可以捕获每个主题中的词共现模式,还可以捕获每个主题中任何给定时间的词共现模式作为趋势。在不同数据集上的实验表明,该模型可以作为一种生成模型来发现细粒度紧密连贯的主题,利用已有的模型,然后为新文档赋值。
{"title":"Theme chronicle model: chronicle consists of timestamp and topical words over each theme","authors":"N. Kawamae","doi":"10.1145/2396761.2398573","DOIUrl":"https://doi.org/10.1145/2396761.2398573","url":null,"abstract":"This paper presents a topic model that discovers the correlation patterns in a given time-stamped document collection and how these patterns evolve over time. Our proposal, the theme chronicle model (TCM) divides traditional topics into temporal and stable topics to detect the change of each theme over time; previous topic models ignore these differences and characterize trends as merely bursts of topics. TCM introduces a theme topic (stable topic), a trend topic (temporal topic), timestamps, and a latent switch variable in each token to realize these differences. Its topic layers allow TCM to capture not only word co-occurrence patterns in each theme, but also word co-occurrence patterns at any given time in each theme as trends. Experiments on various data sets show that the proposed model is useful as a generative model to discover fine-grained tightly coherent topics, takes advantage of previous models, and then assigns values for new documents.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126532772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MAGIK: managing completeness of data MAGIK:管理数据的完整性
Ognjen Savkovic, Paramita Mirza, Sergey Paramonov, W. Nutt
MAGIK demonstrates how to use meta-information about the completeness of a database to assess the quality of the answers returned by a query. The system holds so-called table-completeness (TC) statements, by which one can express that a table is partially complete, that is, it contains all facts about some aspect of the domain. Given a query, MAGIK determines from such meta-information whether the database contains sufficient data for the query answer to be complete. If, according to the TC statements, the database content is not sufficient for a complete answer, MAGIK explains which further TC statements are needed to guarantee completeness. MAGIK extends and complements theoretical work on modeling and reasoning about data completeness by providing the first implementation of a reasoner. The reasoner operates by translating completeness reasoning tasks into logic programs, which are executed by an answer set engine.
MAGIK演示了如何使用关于数据库完整性的元信息来评估查询返回的答案的质量。系统持有所谓的表完备性(TC)语句,通过这些语句,人们可以表示一个表是部分完备的,也就是说,它包含了关于域的某些方面的所有事实。给定一个查询,MAGIK从这些元信息中确定数据库是否包含足够的数据来完成查询答案。如果根据TC语句,数据库内容不足以提供完整的答案,MAGIK解释需要哪些进一步的TC语句来保证完整性。MAGIK通过提供推理器的第一个实现,扩展和补充了关于数据完整性建模和推理的理论工作。推理器通过将完整性推理任务转换为逻辑程序来运行,逻辑程序由答案集引擎执行。
{"title":"MAGIK: managing completeness of data","authors":"Ognjen Savkovic, Paramita Mirza, Sergey Paramonov, W. Nutt","doi":"10.1145/2396761.2398741","DOIUrl":"https://doi.org/10.1145/2396761.2398741","url":null,"abstract":"MAGIK demonstrates how to use meta-information about the completeness of a database to assess the quality of the answers returned by a query. The system holds so-called table-completeness (TC) statements, by which one can express that a table is partially complete, that is, it contains all facts about some aspect of the domain. Given a query, MAGIK determines from such meta-information whether the database contains sufficient data for the query answer to be complete. If, according to the TC statements, the database content is not sufficient for a complete answer, MAGIK explains which further TC statements are needed to guarantee completeness. MAGIK extends and complements theoretical work on modeling and reasoning about data completeness by providing the first implementation of a reasoner. The reasoner operates by translating completeness reasoning tasks into logic programs, which are executed by an answer set engine.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127905048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
CloST: a hadoop-based storage system for big spatio-temporal data analytics CloST:基于hadoop的大时空数据分析存储系统
Haoyu Tan, Wuman Luo, L. Ni
During the past decade, various GPS-equipped devices have generated a tremendous amount of data with time and location information, which we refer to as big spatio-temporal data. In this paper, we present the design and implementation of CloST, a scalable big spatio-temporal data storage system to support data analytics using Hadoop. The main objective of CloST is to avoid scan the whole dataset when a spatio-temporal range is given. To this end, we propose a novel data model which has special treatments on three core attributes including an object id, a location and a time. Based on this data model, CloST hierarchically partitions data using all core attributes which enables efficient parallel processing of spatio-temporal range scans. According to the data characteristics, we devise a compact storage structure which reduces the storage size by an order of magnitude. In addition, we proposes scalable bulk loading algorithms capable of incrementally adding new data into the system. We conduct our experiments using a very large GPS log dataset and the results show that CloST has fast data loading speed, desirable scalability in query processing, as well as high data compression ratio.
在过去的十年中,各种配备gps的设备产生了大量的时间和位置信息数据,我们称之为大时空数据。在本文中,我们提出了CloST的设计和实现,CloST是一个可扩展的大型时空数据存储系统,支持使用Hadoop进行数据分析。CloST的主要目标是避免在给定时空范围时扫描整个数据集。为此,我们提出了一种新的数据模型,该模型对对象id、位置和时间三个核心属性进行了特殊处理。基于该数据模型,CloST使用所有核心属性分层划分数据,从而实现对时空范围扫描的高效并行处理。根据数据的特点,我们设计了一种紧凑的存储结构,将存储空间减小了一个数量级。此外,我们提出了可扩展的批量加载算法,能够增量地向系统中添加新数据。我们使用一个非常大的GPS日志数据集进行了实验,结果表明CloST具有快速的数据加载速度、良好的查询处理可扩展性和较高的数据压缩比。
{"title":"CloST: a hadoop-based storage system for big spatio-temporal data analytics","authors":"Haoyu Tan, Wuman Luo, L. Ni","doi":"10.1145/2396761.2398589","DOIUrl":"https://doi.org/10.1145/2396761.2398589","url":null,"abstract":"During the past decade, various GPS-equipped devices have generated a tremendous amount of data with time and location information, which we refer to as big spatio-temporal data. In this paper, we present the design and implementation of CloST, a scalable big spatio-temporal data storage system to support data analytics using Hadoop. The main objective of CloST is to avoid scan the whole dataset when a spatio-temporal range is given. To this end, we propose a novel data model which has special treatments on three core attributes including an object id, a location and a time. Based on this data model, CloST hierarchically partitions data using all core attributes which enables efficient parallel processing of spatio-temporal range scans. According to the data characteristics, we devise a compact storage structure which reduces the storage size by an order of magnitude. In addition, we proposes scalable bulk loading algorithms capable of incrementally adding new data into the system. We conduct our experiments using a very large GPS log dataset and the results show that CloST has fast data loading speed, desirable scalability in query processing, as well as high data compression ratio.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121795093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
On compressing weighted time-evolving graphs 关于压缩加权时间演化图
Wei Liu, Andrey Kan, Jeffrey Chan, J. Bailey, C. Leckie, J. Pei, K. Ramamohanarao
Existing graph compression techniquesmostly focus on static graphs. However for many practical graphs such as social networks the edge weights frequently change over time. This phenomenon raises the question of how to compress dynamic graphs while maintaining most of their intrinsic structural patterns at each time snapshot. In this paper we show that the encoding cost of a dynamic graph is proportional to the heterogeneity of a three dimensional tensor that represents the dynamic graph. We propose an effective algorithm that compresses a dynamic graph by reducing the heterogeneity of its tensor representation, and at the same time also maintains a maximum lossy compression error at any time stamp of the dynamic graph. The bounded compression error benefits compressed graphs in that they retain good approximations of the original edge weights, and hence properties of the original graph (such as shortest paths) are well preserved. To the best of our knowledge, this is the first work that compresses weighted dynamic graphs with bounded lossy compression error at any time snapshot of the graph.
现有的图形压缩技术主要集中在静态图形上。然而,对于许多实际图(如社交网络),边的权重经常随时间变化。这种现象提出了一个问题,即如何压缩动态图,同时在每个时间快照中保持其大部分固有结构模式。在本文中,我们证明了动态图的编码成本与表示动态图的三维张量的异质性成正比。我们提出了一种有效的算法,通过减少动态图张量表示的异质性来压缩动态图,同时在动态图的任何时间戳上保持最大的有损压缩误差。有界压缩误差有利于压缩图,因为它们保留了原始边缘权重的良好近似值,因此原始图的属性(如最短路径)得到了很好的保留。据我们所知,这是第一个在图的任何时间快照中压缩有界有损压缩误差的加权动态图的工作。
{"title":"On compressing weighted time-evolving graphs","authors":"Wei Liu, Andrey Kan, Jeffrey Chan, J. Bailey, C. Leckie, J. Pei, K. Ramamohanarao","doi":"10.1145/2396761.2398630","DOIUrl":"https://doi.org/10.1145/2396761.2398630","url":null,"abstract":"Existing graph compression techniquesmostly focus on static graphs. However for many practical graphs such as social networks the edge weights frequently change over time. This phenomenon raises the question of how to compress dynamic graphs while maintaining most of their intrinsic structural patterns at each time snapshot. In this paper we show that the encoding cost of a dynamic graph is proportional to the heterogeneity of a three dimensional tensor that represents the dynamic graph. We propose an effective algorithm that compresses a dynamic graph by reducing the heterogeneity of its tensor representation, and at the same time also maintains a maximum lossy compression error at any time stamp of the dynamic graph. The bounded compression error benefits compressed graphs in that they retain good approximations of the original edge weights, and hence properties of the original graph (such as shortest paths) are well preserved. To the best of our knowledge, this is the first work that compresses weighted dynamic graphs with bounded lossy compression error at any time snapshot of the graph.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115861310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Constructing test collections by inferring document relevance via extracted relevant information 通过提取相关信息推断文档的相关性来构建测试集合
Shahzad Rajput, Matthew Ekstrand-Abueg, Virgil Pavlu, J. Aslam
The goal of a typical information retrieval system is to satisfy a user's information need---e.g., by providing an answer or information "nugget"---while the actual search space of a typical information retrieval system consists of documents---i.e., collections of nuggets. In this paper, we characterize this relationship between nuggets and documents and discuss applications to system evaluation. In particular, for the problem of test collection construction for IR system evaluation, we demonstrate a highly efficient algorithm for simultaneously obtaining both relevant documents and relevant information. Our technique exploits the mutually reinforcing relationship between relevant documents and relevant information, yielding document-based test collections whose efficiency and efficacy exceed those of typical Cranfield-style test collections, while also generating sets of highly relevant information.
典型的信息检索系统的目标是满足用户的信息需求。,通过提供答案或信息“块”,而典型的信息检索系统的实际搜索空间由文档组成。,收集金块。在本文中,我们描述了掘金和文档之间的这种关系,并讨论了在系统评估中的应用。特别是针对红外系统评价的测试集构建问题,提出了一种同时获取相关文档和相关信息的高效算法。我们的技术利用了相关文档和相关信息之间相互加强的关系,产生了基于文档的测试集合,其效率和功效超过了典型的克兰菲尔德风格的测试集合,同时还生成了高度相关的信息集。
{"title":"Constructing test collections by inferring document relevance via extracted relevant information","authors":"Shahzad Rajput, Matthew Ekstrand-Abueg, Virgil Pavlu, J. Aslam","doi":"10.1145/2396761.2396783","DOIUrl":"https://doi.org/10.1145/2396761.2396783","url":null,"abstract":"The goal of a typical information retrieval system is to satisfy a user's information need---e.g., by providing an answer or information \"nugget\"---while the actual search space of a typical information retrieval system consists of documents---i.e., collections of nuggets. In this paper, we characterize this relationship between nuggets and documents and discuss applications to system evaluation. In particular, for the problem of test collection construction for IR system evaluation, we demonstrate a highly efficient algorithm for simultaneously obtaining both relevant documents and relevant information. Our technique exploits the mutually reinforcing relationship between relevant documents and relevant information, yielding document-based test collections whose efficiency and efficacy exceed those of typical Cranfield-style test collections, while also generating sets of highly relevant information.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131356785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Do ads compete or collaborate?: designing click models with full relationship incorporated 广告是竞争还是合作?:设计包含完整关系的点击模型
Xin Xin, Irwin King, Ritesh Agrawal, Michael R. Lyu, Heyan Huang
Traditionally click models predict click-through rate (CTR) of an advertisement (ad) independent of other ads. Recent researches however indicate that the CTR of an ad is dependent on the quality of the ad itself but also of the neighboring ads. Using historical click-through data of a commercially available ad server, we identify two types (competing and collaborating) of influences among sponsored ads and further propose a novel click-model, Full Relation Model (FRM), which explicitly models dependencies between ads. On a test data, FRM shows significant improvement in CTR prediction as compared to earlier click models.
传统的点击模型预测广告(广告)的点击率(CTR)独立于其他广告。然而,最近的研究表明,广告的点击率不仅取决于广告本身的质量,还取决于邻近广告的质量。使用商业广告服务器的历史点击率数据,我们确定了赞助广告之间的两种类型(竞争和协作)的影响,并进一步提出了一种新的点击模型,全关系模型(FRM)。在测试数据上,与早期的点击模型相比,FRM在点击率预测方面显示出显着的改进。
{"title":"Do ads compete or collaborate?: designing click models with full relationship incorporated","authors":"Xin Xin, Irwin King, Ritesh Agrawal, Michael R. Lyu, Heyan Huang","doi":"10.1145/2396761.2398528","DOIUrl":"https://doi.org/10.1145/2396761.2398528","url":null,"abstract":"Traditionally click models predict click-through rate (CTR) of an advertisement (ad) independent of other ads. Recent researches however indicate that the CTR of an ad is dependent on the quality of the ad itself but also of the neighboring ads. Using historical click-through data of a commercially available ad server, we identify two types (competing and collaborating) of influences among sponsored ads and further propose a novel click-model, Full Relation Model (FRM), which explicitly models dependencies between ads. On a test data, FRM shows significant improvement in CTR prediction as compared to earlier click models.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130000944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting enriched contextual information for mobile app classification 利用丰富的上下文信息进行移动应用程序分类
Hengshu Zhu, Huanhuan Cao, Enhong Chen, Hui Xiong, Jilei Tian
A key step for the mobile app usage analysis is to classify apps into some predefined categories. However, it is a nontrivial task to effectively classify mobile apps due to the limited contextual information available for the analysis. To this end, in this paper, we propose an approach to first enrich the contextual information of mobile apps by exploiting the additional Web knowledge from the Web search engine. Then, inspired by the observation that different types of mobile apps may be relevant to different real-world contexts, we also extract some contextual features for mobile apps from the context-rich device logs of mobile users. Finally, we combine all the enriched contextual information into a Maximum Entropy model for training a mobile app classifier. The experimental results based on 443 mobile users' device logs clearly show that our approach outperforms two state-of-the-art benchmark methods with a significant margin.
移动应用使用分析的关键步骤是将应用划分为一些预定义的类别。然而,由于可用于分析的上下文信息有限,有效地分类移动应用程序是一项艰巨的任务。为此,在本文中,我们提出了一种方法,首先通过利用来自Web搜索引擎的额外Web知识来丰富移动应用程序的上下文信息。然后,在观察到不同类型的移动应用可能与不同的现实环境相关的启发下,我们还从移动用户的上下文丰富的设备日志中提取了移动应用的一些上下文特征。最后,我们将所有丰富的上下文信息组合到一个最大熵模型中,用于训练移动应用分类器。基于443个移动用户设备日志的实验结果清楚地表明,我们的方法在性能上明显优于两种最先进的基准测试方法。
{"title":"Exploiting enriched contextual information for mobile app classification","authors":"Hengshu Zhu, Huanhuan Cao, Enhong Chen, Hui Xiong, Jilei Tian","doi":"10.1145/2396761.2398484","DOIUrl":"https://doi.org/10.1145/2396761.2398484","url":null,"abstract":"A key step for the mobile app usage analysis is to classify apps into some predefined categories. However, it is a nontrivial task to effectively classify mobile apps due to the limited contextual information available for the analysis. To this end, in this paper, we propose an approach to first enrich the contextual information of mobile apps by exploiting the additional Web knowledge from the Web search engine. Then, inspired by the observation that different types of mobile apps may be relevant to different real-world contexts, we also extract some contextual features for mobile apps from the context-rich device logs of mobile users. Finally, we combine all the enriched contextual information into a Maximum Entropy model for training a mobile app classifier. The experimental results based on 443 mobile users' device logs clearly show that our approach outperforms two state-of-the-art benchmark methods with a significant margin.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130042412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Cager: a framework for cross-page search 跨页面搜索的框架
Zhumin Chen, Byron J. Gao, Qi Kang
Existing search engines have page as the unit of information of retrieval. They typically return a ranked list of pages, each being a search result containing the query keywords. This within-one-page constraint disallows utilization of relationship information that is often available and greatly beneficial. To utilize relationship information and improve search precision, we explore cross-page search, where each answer is a logical page consisting of multiple closely related pages that collectively contain the query keywords. We have implemented a prototype Cager, providing cross-page search and visualization over real dataset.
现有的搜索引擎以页面作为检索信息的单位。它们通常返回一个页面排序列表,每个页面都是包含查询关键字的搜索结果。这种单页约束不允许使用通常可用且非常有益的关系信息。为了利用关系信息并提高搜索精度,我们探索了跨页面搜索,其中每个答案是由多个紧密相关的页面组成的逻辑页面,这些页面共同包含查询关键字。我们已经实现了一个原型Cager,提供真实数据集的跨页面搜索和可视化。
{"title":"Cager: a framework for cross-page search","authors":"Zhumin Chen, Byron J. Gao, Qi Kang","doi":"10.1145/2396761.2398733","DOIUrl":"https://doi.org/10.1145/2396761.2398733","url":null,"abstract":"Existing search engines have page as the unit of information of retrieval. They typically return a ranked list of pages, each being a search result containing the query keywords. This within-one-page constraint disallows utilization of relationship information that is often available and greatly beneficial. To utilize relationship information and improve search precision, we explore cross-page search, where each answer is a logical page consisting of multiple closely related pages that collectively contain the query keywords. We have implemented a prototype Cager, providing cross-page search and visualization over real dataset.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130143284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 21st ACM international conference on Information and knowledge management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1