首页 > 最新文献

Proceedings of the 21st ACM international conference on Information and knowledge management最新文献

英文 中文
Mining high utility itemsets without candidate generation 挖掘高效用项目集而不生成候选项目集
Mengchi Liu, Jun-Feng Qu
High utility itemsets refer to the sets of items with high utility like profit in a database, and efficient mining of high utility itemsets plays a crucial role in many real-life applications and is an important research issue in data mining area. To identify high utility itemsets, most existing algorithms first generate candidate itemsets by overestimating their utilities, and subsequently compute the exact utilities of these candidates. These algorithms incur the problem that a very large number of candidates are generated, but most of the candidates are found out to be not high utility after their exact utilities are computed. In this paper, we propose an algorithm, called HUI-Miner (High Utility Itemset Miner), for high utility itemset mining. HUI-Miner uses a novel structure, called utility-list, to store both the utility information about an itemset and the heuristic information for pruning the search space of HUI-Miner. By avoiding the costly generation and utility computation of numerous candidate itemsets, HUI-Miner can efficiently mine high utility itemsets from the utility-lists constructed from a mined database. We compared HUI-Miner with the state-of-the-art algorithms on various databases, and experimental results show that HUI-Miner outperforms these algorithms in terms of both running time and memory consumption.
高效用项集是指数据库中具有利润等高效用的项集,高效用项集的高效挖掘在许多实际应用中起着至关重要的作用,是数据挖掘领域的一个重要研究课题。为了识别高效用项目集,大多数现有算法首先通过高估候选项目集的效用来生成候选项目集,然后计算这些候选项目集的确切效用。这些算法产生了一个问题,即生成了非常多的候选对象,但在计算了它们的确切效用后,发现大多数候选对象的效用并不高。在本文中,我们提出了一种用于高效用项集挖掘的算法,称为HUI-Miner (High Utility Itemset Miner)。HUI-Miner使用一种新颖的结构——效用列表(utility-list)来存储项目集的效用信息和HUI-Miner搜索空间修剪的启发式信息。通过避免大量候选项目集的生成和效用计算,HUI-Miner可以从挖掘数据库构建的效用列表中高效地挖掘出高效用的项目集。我们将HUI-Miner与各种数据库上最先进的算法进行了比较,实验结果表明,HUI-Miner在运行时间和内存消耗方面都优于这些算法。
{"title":"Mining high utility itemsets without candidate generation","authors":"Mengchi Liu, Jun-Feng Qu","doi":"10.1145/2396761.2396773","DOIUrl":"https://doi.org/10.1145/2396761.2396773","url":null,"abstract":"High utility itemsets refer to the sets of items with high utility like profit in a database, and efficient mining of high utility itemsets plays a crucial role in many real-life applications and is an important research issue in data mining area. To identify high utility itemsets, most existing algorithms first generate candidate itemsets by overestimating their utilities, and subsequently compute the exact utilities of these candidates. These algorithms incur the problem that a very large number of candidates are generated, but most of the candidates are found out to be not high utility after their exact utilities are computed. In this paper, we propose an algorithm, called HUI-Miner (High Utility Itemset Miner), for high utility itemset mining. HUI-Miner uses a novel structure, called utility-list, to store both the utility information about an itemset and the heuristic information for pruning the search space of HUI-Miner. By avoiding the costly generation and utility computation of numerous candidate itemsets, HUI-Miner can efficiently mine high utility itemsets from the utility-lists constructed from a mined database. We compared HUI-Miner with the state-of-the-art algorithms on various databases, and experimental results show that HUI-Miner outperforms these algorithms in terms of both running time and memory consumption.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131561423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 594
An evaluation of corpus-driven measures of medical concept similarity for information retrieval 基于语料库的医学概念相似度信息检索评价
B. Koopman, G. Zuccon, P. Bruza, Laurianne Sitbon, Michael Lawley
Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.
医学概念之间的语义相似性度量是医学信息学中许多技术的核心,包括医学信息检索中的查询扩展。以前的工作主要考虑基于词库的语义相似度路径度量,并没有深入比较不同的语料库驱动方法。我们评估了八种常见的语料库驱动度量在捕获语义相关性方面的有效性,并将这些与由医学专业人员评估的人类判断的概念对进行比较。我们的研究结果表明,某些语料库驱动的测量与人类的判断有很强的相关性(约0.8)。一个重要的发现是,性能受到启动测量时使用的语料库的选择的显著影响,即用作语料库驱动相似性的证据。本文为医学信息学的语义相似度量的实现提供了指导方针,并总结了对医学信息检索的启示。
{"title":"An evaluation of corpus-driven measures of medical concept similarity for information retrieval","authors":"B. Koopman, G. Zuccon, P. Bruza, Laurianne Sitbon, Michael Lawley","doi":"10.1145/2396761.2398661","DOIUrl":"https://doi.org/10.1145/2396761.2398661","url":null,"abstract":"Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127021180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Information-complete and redundancy-free keyword search over large data graphs 对大型数据图进行信息完整和无冗余的关键字搜索
Byron J. Gao, Zhumin Chen, Qi Kang
Keyword search over graphs has a wide array of applications in querying structured, semi-structured and unstructured data. Existing models typically use minimal trees or bounded subgraphs as query answers. While such models emphasize relevancy, they would suffer from incompleteness of information and redundancy among answers, making it difficult for users to effectively explore query answers. To overcome these drawbacks, we propose a novel cluster-based model, where query answers are relevancy-connected clusters. A cluster is a subgraph induced from a maximal set of relevancy-connected nodes. Such clusters are coherent and relevant, yet complete and redundancy free. They can be of arbitrary shape in contrast to the sphere-shaped bounded subgraphs in existing models. We also propose an efficient search algorithm and a corresponding graph index for large, disk-resident data graphs.
图上的关键字搜索在查询结构化、半结构化和非结构化数据方面有着广泛的应用。现有模型通常使用最小树或有界子图作为查询答案。虽然这种模型强调相关性,但会存在答案之间信息不完整和冗余的问题,用户难以有效地探索查询答案。为了克服这些缺点,我们提出了一种新的基于聚类的模型,其中查询答案是关联连接的聚类。聚类是由关联连接节点的最大集合产生的子图。这样的集群是连贯和相关的,但完整和无冗余。它们可以是任意形状,而不是现有模型中的球形有界子图。我们还提出了一种高效的搜索算法和相应的图索引,用于大型磁盘驻留数据图。
{"title":"Information-complete and redundancy-free keyword search over large data graphs","authors":"Byron J. Gao, Zhumin Chen, Qi Kang","doi":"10.1145/2396761.2398712","DOIUrl":"https://doi.org/10.1145/2396761.2398712","url":null,"abstract":"Keyword search over graphs has a wide array of applications in querying structured, semi-structured and unstructured data. Existing models typically use minimal trees or bounded subgraphs as query answers. While such models emphasize relevancy, they would suffer from incompleteness of information and redundancy among answers, making it difficult for users to effectively explore query answers. To overcome these drawbacks, we propose a novel cluster-based model, where query answers are relevancy-connected clusters. A cluster is a subgraph induced from a maximal set of relevancy-connected nodes. Such clusters are coherent and relevant, yet complete and redundancy free. They can be of arbitrary shape in contrast to the sphere-shaped bounded subgraphs in existing models. We also propose an efficient search algorithm and a corresponding graph index for large, disk-resident data graphs.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Finding top k most influential spatial facilities over uncertain objects 找出对不确定对象影响最大的 k 个空间设施
Liming Zhan, Ying Zhang, W. Zhang, Xuemin Lin
Uncertainty is inherent in many important applications, such as location-based services (LBS), sensor monitoring and radio-frequency identification (RFID). Recently, considerable research efforts have been put into the field of uncertainty-aware spatial query processing. In this paper, we study the problem of finding top k most influential facilities over a set of uncertain objects, which is an important spatial query in the above applications. Based on the maximal utility principle, we propose a new ranking model to identify the top k most influential facilities, which carefully captures influence of facilities on the uncertain objects. By utilizing two uncertain object indexing techniques, R-tree and U-Quadtree, effective and efficient algorithms are proposed following the filtering and verification paradigm, which significantly improves the performance of the algorithms in terms of CPU and I/O costs. Comprehensive experiments on real datasets demonstrate the effectiveness and efficiency of our techniques.
不确定性是基于位置的服务(LBS)、传感器监测和射频识别(RFID)等许多重要应用的固有特性。最近,人们在不确定性感知空间查询处理领域投入了大量研究。在本文中,我们研究了在一组不确定对象中寻找前 k 个最有影响力设施的问题,这是上述应用中的一个重要空间查询问题。基于最大效用原则,我们提出了一种新的排序模型来识别前 k 个最有影响力的设施,该模型仔细捕捉了设施对不确定对象的影响。利用 R 树和 U 四叉树这两种不确定对象索引技术,按照过滤和验证范式提出了有效和高效的算法,在 CPU 和 I/O 成本方面显著提高了算法的性能。在真实数据集上进行的综合实验证明了我们技术的有效性和效率。
{"title":"Finding top k most influential spatial facilities over uncertain objects","authors":"Liming Zhan, Ying Zhang, W. Zhang, Xuemin Lin","doi":"10.1145/2396761.2396878","DOIUrl":"https://doi.org/10.1145/2396761.2396878","url":null,"abstract":"Uncertainty is inherent in many important applications, such as location-based services (LBS), sensor monitoring and radio-frequency identification (RFID). Recently, considerable research efforts have been put into the field of uncertainty-aware spatial query processing. In this paper, we study the problem of finding top k most influential facilities over a set of uncertain objects, which is an important spatial query in the above applications. Based on the maximal utility principle, we propose a new ranking model to identify the top k most influential facilities, which carefully captures influence of facilities on the uncertain objects. By utilizing two uncertain object indexing techniques, R-tree and U-Quadtree, effective and efficient algorithms are proposed following the filtering and verification paradigm, which significantly improves the performance of the algorithms in terms of CPU and I/O costs. Comprehensive experiments on real datasets demonstrate the effectiveness and efficiency of our techniques.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127692478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Time feature selection for identifying active household members 识别活跃家庭成员的时间特征选择
P. Campos, Alejandro Bellogín, F. Díez, Iván Cantador
Popular online rental services such as Netflix and MoviePilot often manage household accounts. A household account is usually shared by various users who live in the same house, but in general does not provide a mechanism by which current active users are identified, and thus leads to considerable difficulties for making effective personalized recommendations. The identification of the active household members, defined as the discrimination of the users from a given household who are interacting with a system (e.g. an on-demand video service), is thus an interesting challenge for the recommender systems research community. In this paper, we formulate the above task as a classification problem, and address it by means of global and local feature selection methods and classifiers that only exploit time features from past item consumption records. The results obtained from a series of experiments on a real dataset show that some of the proposed methods are able to select relevant time features, which allow simple classifiers to accurately identify active members of household accounts.
Netflix和MoviePilot等流行的在线租赁服务通常管理家庭账户。一个家庭帐户通常是由住在同一栋房子里的不同用户共享的,但通常不提供识别当前活跃用户的机制,因此在做出有效的个性化推荐方面存在相当大的困难。因此,活跃家庭成员的识别,定义为对与系统(例如,点播视频服务)交互的给定家庭用户的歧视,是推荐系统研究社区面临的一个有趣挑战。在本文中,我们将上述任务表述为一个分类问题,并通过全局和局部特征选择方法以及仅利用过去物品消费记录中的时间特征的分类器来解决它。在真实数据集上的一系列实验结果表明,所提出的一些方法能够选择相关的时间特征,从而使简单的分类器能够准确地识别家庭账户的活跃成员。
{"title":"Time feature selection for identifying active household members","authors":"P. Campos, Alejandro Bellogín, F. Díez, Iván Cantador","doi":"10.1145/2396761.2398628","DOIUrl":"https://doi.org/10.1145/2396761.2398628","url":null,"abstract":"Popular online rental services such as Netflix and MoviePilot often manage household accounts. A household account is usually shared by various users who live in the same house, but in general does not provide a mechanism by which current active users are identified, and thus leads to considerable difficulties for making effective personalized recommendations. The identification of the active household members, defined as the discrimination of the users from a given household who are interacting with a system (e.g. an on-demand video service), is thus an interesting challenge for the recommender systems research community. In this paper, we formulate the above task as a classification problem, and address it by means of global and local feature selection methods and classifiers that only exploit time features from past item consumption records. The results obtained from a series of experiments on a real dataset show that some of the proposed methods are able to select relevant time features, which allow simple classifiers to accurately identify active members of household accounts.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132720408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A tensor encoding model for semantic processing 语义处理的张量编码模型
Mike Symonds, P. Bruza, Laurianne Sitbon, I. Turner
This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.
本文提出并评价了一种基于语料库的语义处理方法。基于语料库的模型直接从文本中构建单词的表示,不需要预先存在的语言知识,并且在许多认知任务中已经证明了心理相关的表现。然而,它们在过去因没有纳入足够的结构信息而受到批评。利用最近尝试克服这一弱点的想法,我们开发了一个增强的张量编码模型来构建用于语义处理的词义表示。与健壮的基线模型相比,我们的增强模型在许多语义处理任务上表现出优越的性能。
{"title":"A tensor encoding model for semantic processing","authors":"Mike Symonds, P. Bruza, Laurianne Sitbon, I. Turner","doi":"10.1145/2396761.2398617","DOIUrl":"https://doi.org/10.1145/2396761.2398617","url":null,"abstract":"This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132728743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Accelerating locality preserving nonnegative matrix factorization 加速保局域非负矩阵分解
Guanhong Yao, Deng Cai
Matrix factorization techniques have been frequently applied in information retrieval, computer vision and pattern recognition. Among them, Non-negative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts-based in the human brain. Locality Preserving Non-negative Matrix Factorization (LPNMF) is a recently proposed graph-based NMF extension which tries to preserves the intrinsic geometric structure of the data. Compared with the original NMF, LPNMF has more discriminating power on data representa- tion thanks to its geometrical interpretation and outstanding ability to discover the hidden topics. However, the computa- tional complexity of LPNMF is O(n3), where n is the number of samples. In this paper, we propose a novel approach called Accelerated LPNMF (A-LPNMF) to solve the com- putational issue of LPNMF. Specifically, A-LPNMF selects p (p j n) landmark points from the data and represents all the samples as the sparse linear combination of these landmarks. The non-negative factors which incorporates the geometric structure can then be efficiently computed. Experimental results on the real data sets demonstrate the effectiveness and efficiency of our proposed method.
矩阵分解技术在信息检索、计算机视觉和模式识别等领域得到了广泛的应用。其中,非负矩阵分解(NMF)因其对自然发生的数据的心理和生理解释而受到广泛关注,这些数据的表示可能是基于人脑的部分。局域保持非负矩阵分解(Locality Preserving Non-negative Matrix Factorization, LPNMF)是最近提出的一种基于图的非负矩阵分解扩展,它试图保留数据的固有几何结构。与原始的NMF相比,LPNMF由于其几何解释和突出的隐藏主题发现能力,在数据表示方面具有更强的判别能力。然而,LPNMF的计算复杂度为O(n3),其中n为样本数。在本文中,我们提出了一种新的方法,称为加速LPNMF (a -LPNMF)来解决LPNMF的计算问题。具体来说,A-LPNMF从数据中选择p (p j n)个地标点,并将所有样本表示为这些地标的稀疏线性组合。然后可以有效地计算包含几何结构的非负因子。在实际数据集上的实验结果证明了该方法的有效性和高效性。
{"title":"Accelerating locality preserving nonnegative matrix factorization","authors":"Guanhong Yao, Deng Cai","doi":"10.1145/2396761.2398618","DOIUrl":"https://doi.org/10.1145/2396761.2398618","url":null,"abstract":"Matrix factorization techniques have been frequently applied in information retrieval, computer vision and pattern recognition. Among them, Non-negative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts-based in the human brain. Locality Preserving Non-negative Matrix Factorization (LPNMF) is a recently proposed graph-based NMF extension which tries to preserves the intrinsic geometric structure of the data. Compared with the original NMF, LPNMF has more discriminating power on data representa- tion thanks to its geometrical interpretation and outstanding ability to discover the hidden topics. However, the computa- tional complexity of LPNMF is O(n3), where n is the number of samples. In this paper, we propose a novel approach called Accelerated LPNMF (A-LPNMF) to solve the com- putational issue of LPNMF. Specifically, A-LPNMF selects p (p j n) landmark points from the data and represents all the samples as the sparse linear combination of these landmarks. The non-negative factors which incorporates the geometric structure can then be efficiently computed. Experimental results on the real data sets demonstrate the effectiveness and efficiency of our proposed method.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"160 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PLEAD 2012: politics, elections and data 恳求2012:政治、选举和数据
Ingmar Weber, A. Popescu, M. Pennacchiotti
What is the role of the internet in politics general and during campaigns in particular? And what is the role of large amounts of user data in all of this? In the 2008 U.S. presidential campaign the Democrats were far more successful than the Republicans in utilizing online media for mobilization, co-ordination and fundraising. For the first time, social media and the Internet played a fundamental role in political campaigns. However, technical research in this area has been surprisingly limited and fragmented. The goal of this workshop is to bring together, for the first time, researchers working at the intersection of social network analysis, computational social science and political science, to share and discuss their ideas in a common forum; and to inspire further developments in this growing, fascinating field. The workshop has Filippo Menczer as keynote speaker, it includes technical presentations of accepted papers and concludes with a panel discussion where scientists and media experts from different fields can interact and share views.
互联网在政治中,特别是在竞选期间扮演什么角色?大量的用户数据在这一切中扮演着什么角色?在2008年美国总统竞选中,民主党在利用网络媒体进行动员、协调和筹款方面远比共和党成功。社交媒体和互联网第一次在政治竞选中发挥了重要作用。然而,这一领域的技术研究令人惊讶地有限和分散。本次研讨会的目标是首次将社会网络分析、计算社会科学和政治学交叉领域的研究人员聚集在一起,在一个共同的论坛上分享和讨论他们的想法;并激发这个不断发展的迷人领域的进一步发展。本次研讨会由Filippo Menczer担任主讲嘉宾,内容包括已接受论文的技术演示,并以小组讨论结束,来自不同领域的科学家和媒体专家可以在小组讨论中进行互动和分享观点。
{"title":"PLEAD 2012: politics, elections and data","authors":"Ingmar Weber, A. Popescu, M. Pennacchiotti","doi":"10.1145/2396761.2398759","DOIUrl":"https://doi.org/10.1145/2396761.2398759","url":null,"abstract":"What is the role of the internet in politics general and during campaigns in particular? And what is the role of large amounts of user data in all of this? In the 2008 U.S. presidential campaign the Democrats were far more successful than the Republicans in utilizing online media for mobilization, co-ordination and fundraising. For the first time, social media and the Internet played a fundamental role in political campaigns. However, technical research in this area has been surprisingly limited and fragmented. The goal of this workshop is to bring together, for the first time, researchers working at the intersection of social network analysis, computational social science and political science, to share and discuss their ideas in a common forum; and to inspire further developments in this growing, fascinating field. The workshop has Filippo Menczer as keynote speaker, it includes technical presentations of accepted papers and concludes with a panel discussion where scientists and media experts from different fields can interact and share views.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133415280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sort-based query-adaptive loading of R-trees 基于排序的查询自适应r树加载
Daniar Achakeev, B. Seeger, P. Widmayer
Bulk-loading of R-trees has been an important problem in academia and industry for more than twenty years. Current algorithms create R-trees without any information about the expected query profile. However, query profiles are extremely useful for the design of efficient indexes. In this paper, we address this deficiency and present query-adaptive algorithms for building R-trees optimally designed for a given query profile. Since optimal R-tree loading is NP-hard (even without tuning the structure to a query profile), we provide efficient, easy to implement heuristics. Our sort-based algorithms for query-adaptive loading consist of two steps: First, sorting orders are identified resulting in better R-trees than those obtained from standard space-filling curves. Second, for a given sorting order, we propose a dynamic programming algorithm for generating R-trees in linear runtime. Our experimental results confirm that our algorithms generally create significantly better R-trees than the ones obtained from standard sort-based loading algorithms, even when the query profile is unknown.
二十多年来,r树的批量加载一直是学术界和工业界的一个重要问题。当前的算法创建r树时没有任何关于预期查询配置文件的信息。然而,查询概要文件对于设计高效索引非常有用。在本文中,我们解决了这一不足,并提出了针对给定查询配置文件优化设计的用于构建r树的查询自适应算法。由于最优r树加载是np困难的(即使没有将结构调优到查询配置文件),因此我们提供了高效、易于实现的启发式方法。我们基于排序的查询自适应加载算法包括两个步骤:首先,确定排序顺序,得到比标准空间填充曲线得到的r树更好的r树。其次,对于给定的排序顺序,我们提出了一种在线性运行时生成r树的动态规划算法。我们的实验结果证实,即使在查询概要文件未知的情况下,我们的算法通常也比标准的基于排序的加载算法创建的r树要好得多。
{"title":"Sort-based query-adaptive loading of R-trees","authors":"Daniar Achakeev, B. Seeger, P. Widmayer","doi":"10.1145/2396761.2398577","DOIUrl":"https://doi.org/10.1145/2396761.2398577","url":null,"abstract":"Bulk-loading of R-trees has been an important problem in academia and industry for more than twenty years. Current algorithms create R-trees without any information about the expected query profile. However, query profiles are extremely useful for the design of efficient indexes. In this paper, we address this deficiency and present query-adaptive algorithms for building R-trees optimally designed for a given query profile. Since optimal R-tree loading is NP-hard (even without tuning the structure to a query profile), we provide efficient, easy to implement heuristics. Our sort-based algorithms for query-adaptive loading consist of two steps: First, sorting orders are identified resulting in better R-trees than those obtained from standard space-filling curves. Second, for a given sorting order, we propose a dynamic programming algorithm for generating R-trees in linear runtime. Our experimental results confirm that our algorithms generally create significantly better R-trees than the ones obtained from standard sort-based loading algorithms, even when the query profile is unknown.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"422 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133910577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia 探索同步关键字和关键句子提取:使用维基百科改进基于图的排名
Xun Wang, Lei Wang, Jiwei Li, Sujian Li
Summarization and Keyword Selection are two important tasks in NLP community. Although both aim to summarize the source articles, they are usually treated separately by using sentences or words. In this paper, we propose a two-level graph based ranking algorithm to generate summarization and extract keywords at the same time. Previous works have reached a consensus that important sentence is composed by important keywords. In this paper, we further study the mutual impact between them through context analysis. We use Wikipedia to build a two-level concept-based graph, instead of traditional term-based graph, to express their homogenous relationship and heterogeneous relationship. We run PageRank and HITS rank on the graph to adjust both homogenous and heterogeneous relationships. A more reasonable relatedness value will be got for key sentence selection and keyword selection. We evaluate our algorithm on TAC 2011 data set. Traditional term-based approach achieves a score of 0.255 in ROUGE-1 and a score of 0.037 and ROUGE-2 and our approach can improve them to 0.323 and 0.048 separately.
摘要和关键词选择是自然语言处理领域的两项重要工作。虽然两者的目的都是总结源文章,但它们通常是用句子或单词分开对待的。在本文中,我们提出了一种基于两级图的排序算法来同时生成摘要和提取关键词。以往的工作已经达成共识,重要的句子是由重要的关键词组成的。本文通过语境分析,进一步研究二者之间的相互影响。我们利用维基百科构建了一个基于概念的两层图,而不是传统的基于术语的图,来表达它们的同质关系和异质关系。我们在图上运行PageRank和HITS排名来调整同质和异构关系。对于关键句的选择和关键字的选择,会得到一个更合理的关联度值。我们在TAC 2011数据集上评估了我们的算法。传统的基于术语的方法在ROUGE-1中的得分为0.255,在ROUGE-2中的得分为0.037,而我们的方法可以将它们分别提高到0.323和0.048。
{"title":"Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia","authors":"Xun Wang, Lei Wang, Jiwei Li, Sujian Li","doi":"10.1145/2396761.2398706","DOIUrl":"https://doi.org/10.1145/2396761.2398706","url":null,"abstract":"Summarization and Keyword Selection are two important tasks in NLP community. Although both aim to summarize the source articles, they are usually treated separately by using sentences or words. In this paper, we propose a two-level graph based ranking algorithm to generate summarization and extract keywords at the same time. Previous works have reached a consensus that important sentence is composed by important keywords. In this paper, we further study the mutual impact between them through context analysis. We use Wikipedia to build a two-level concept-based graph, instead of traditional term-based graph, to express their homogenous relationship and heterogeneous relationship. We run PageRank and HITS rank on the graph to adjust both homogenous and heterogeneous relationships. A more reasonable relatedness value will be got for key sentence selection and keyword selection. We evaluate our algorithm on TAC 2011 data set. Traditional term-based approach achieves a score of 0.255 in ROUGE-1 and a score of 0.037 and ROUGE-2 and our approach can improve them to 0.323 and 0.048 separately.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132274391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the 21st ACM international conference on Information and knowledge management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1