首页 > 最新文献

Proceedings of the 22nd ACM international conference on Information & Knowledge Management最新文献

英文 中文
Mining entity attribute synonyms via compact clustering 通过紧凑聚类挖掘实体属性同义词
Yanen Li, B. Hsu, ChengXiang Zhai, Kuansan Wang
Entity attribute values, such as "lord of the rings" for movie.title or "infant" for shoe.gender, are atomic components of entity expressions. Discovering alternative surface forms of attribute values is important for improving entity recognition and retrieval. In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. The framework can integrate signals from multiple information sources into a similarity function between attribute values. And the weights of these signals are optimized in an unsupervised manner. Extensive experiments across multiple domains demonstrate the effectiveness of our clustering framework for mining entity attribute synonyms.
实体属性值,例如“指环王”表示电影。鞋的标题或“婴儿”。性别是实体表达式的原子组成部分。发现属性值的替代表面形式对于改进实体识别和检索非常重要。在这项工作中,我们提出了一种新的紧凑聚类框架来共同识别一组属性值的同义词。该框架可以将来自多个信息源的信号整合成属性值之间的相似函数。并以无监督的方式对这些信号的权重进行优化。跨多个领域的大量实验证明了我们的聚类框架在挖掘实体属性同义词方面的有效性。
{"title":"Mining entity attribute synonyms via compact clustering","authors":"Yanen Li, B. Hsu, ChengXiang Zhai, Kuansan Wang","doi":"10.1145/2505515.2505608","DOIUrl":"https://doi.org/10.1145/2505515.2505608","url":null,"abstract":"Entity attribute values, such as \"lord of the rings\" for movie.title or \"infant\" for shoe.gender, are atomic components of entity expressions. Discovering alternative surface forms of attribute values is important for improving entity recognition and retrieval. In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. The framework can integrate signals from multiple information sources into a similarity function between attribute values. And the weights of these signals are optimized in an unsupervised manner. Extensive experiments across multiple domains demonstrate the effectiveness of our clustering framework for mining entity attribute synonyms.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81563841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Learning to rank for question routing in community question answering 学习在社区问答中对问题路由进行排序
Zongcheng Ji, Bin Wang
This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly posted question and the user profile, while ignoring the important statistical features, including the question-specific statistical feature and the user-specific statistical features. Moreover, traditional methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first collected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the models with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained models. Extensive experiments conducted on a real world CQA dataset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance.
本文研究了社区问答(CQA)中的问题路由(QR)问题,其目的是将新发布的问题路由到最有可能回答这些问题的潜在答题者。解决这一问题的传统方法只考虑新发布的问题与用户简介之间的文本相似度特征,而忽略了重要的统计特征,包括特定于问题的统计特征和特定于用户的统计特征。此外,传统的方法是基于无监督学习的,不容易将丰富的特征引入其中。本文提出了一种基于概念排序学习的QR分类框架。首先收集由三元组(q、提问者、回答者)组成的训练集。然后,通过引入每个CQA会话中提问者和回答者之间的内在关系来捕获用户对问题q的专业程度的内在标签/顺序,提出两种不同的方法,包括基于svm和基于rankingsvm的方法,从训练集中学习具有不同示例创建过程的模型。最后,使用训练好的模型对潜在答案进行排序。在Stack Overflow的真实CQA数据集上进行的大量实验表明,我们提出的两种方法都可以优于传统的查询似然语言模型(QLLM)和最先进的基于潜在狄利克雷分配的模型(LDA)。具体来说,基于rankingsvm的方法在统计上比基于svm的方法有了显著的改进,获得了最佳的性能。
{"title":"Learning to rank for question routing in community question answering","authors":"Zongcheng Ji, Bin Wang","doi":"10.1145/2505515.2505670","DOIUrl":"https://doi.org/10.1145/2505515.2505670","url":null,"abstract":"This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly posted question and the user profile, while ignoring the important statistical features, including the question-specific statistical feature and the user-specific statistical features. Moreover, traditional methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first collected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the models with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained models. Extensive experiments conducted on a real world CQA dataset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87789641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Generating informative snippet to maximize item visibility 生成信息片段,以最大限度地提高项目可见性
Mahashweta Das, Habibur Rahman, Gautam Das, Vagelis Hristidis
The widespread use and growing popularity of online collaborative content sites has created rich resources for users to consult in order to make purchasing decisions on various items such as e-commerce products, restaurants, etc. Ideally, a user wants to quickly decide whether an item is desirable, from the list of items returned as a result of her search query. This has created new challenges for producers/manufacturers (e.g., Dell) or retailers (e.g., Amazon, eBay) of such items to compose succinct summarizations of web item descriptions, henceforth referred to as snippets, that are likely to maximize the items' visibility among users. We exploit the availability of user feedback in collaborative content sites in the form of tags to identify the most important item attributes that must be highlighted in an item snippet. We investigate the problem of finding the top-k best snippets for an item that are likely to maximize the probability that the user preference (available in the form of search query) is satisfied. Since a search query returns multiple relevant items, we also study the problem of finding the best diverse set of snippets for the items in order to maximize the probability of a user liking at least one of the top items. We develop an exact top-k algorithm for each of the problem and perform detailed experiments on synthetic and real data crawled from the web to to demonstrate the utility of our problems and effectiveness of our solutions.
在线协同内容网站的广泛使用和日益普及,为用户在电子商务产品、餐饮等各种项目的购买决策提供了丰富的参考资源。理想情况下,用户希望从搜索查询返回的项目列表中快速决定是否需要某项。这给这些商品的生产者/制造商(如戴尔)或零售商(如亚马逊、eBay)带来了新的挑战,他们需要编写简洁的网络商品描述摘要,因此被称为片段,这样才能最大限度地提高商品在用户中的可见性。我们利用协作内容站点中用户反馈的可用性,以标签的形式来识别最重要的项目属性,这些属性必须在项目摘要中突出显示。我们研究的问题是找到一个项目的前k个最佳片段,这些片段有可能最大化满足用户偏好(以搜索查询的形式提供)的概率。由于搜索查询返回多个相关项目,我们还研究了为这些项目找到最佳多样化片段集的问题,以便最大化用户喜欢至少一个顶级项目的概率。我们为每个问题开发了一个精确的top-k算法,并对从网络上抓取的合成和真实数据进行了详细的实验,以证明我们的问题的实用性和解决方案的有效性。
{"title":"Generating informative snippet to maximize item visibility","authors":"Mahashweta Das, Habibur Rahman, Gautam Das, Vagelis Hristidis","doi":"10.1145/2505515.2505606","DOIUrl":"https://doi.org/10.1145/2505515.2505606","url":null,"abstract":"The widespread use and growing popularity of online collaborative content sites has created rich resources for users to consult in order to make purchasing decisions on various items such as e-commerce products, restaurants, etc. Ideally, a user wants to quickly decide whether an item is desirable, from the list of items returned as a result of her search query. This has created new challenges for producers/manufacturers (e.g., Dell) or retailers (e.g., Amazon, eBay) of such items to compose succinct summarizations of web item descriptions, henceforth referred to as snippets, that are likely to maximize the items' visibility among users. We exploit the availability of user feedback in collaborative content sites in the form of tags to identify the most important item attributes that must be highlighted in an item snippet. We investigate the problem of finding the top-k best snippets for an item that are likely to maximize the probability that the user preference (available in the form of search query) is satisfied. Since a search query returns multiple relevant items, we also study the problem of finding the best diverse set of snippets for the items in order to maximize the probability of a user liking at least one of the top items. We develop an exact top-k algorithm for each of the problem and perform detailed experiments on synthetic and real data crawled from the web to to demonstrate the utility of our problems and effectiveness of our solutions.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88776178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
pEDM: online-forecasting for smart energy analytics pEDM:智能能源分析的在线预测
Lars Dannecker, Philipp J. Rösch, Ulrike Fischer, Gordon Gaumnitz, Wolfgang Lehner, Gregor Hackenbroich
Continuous balancing of energy demand and supply is a fundamental prerequisite for the stability of energy grids and requires accurate forecasts of electricity consumption and production at any point in time. Today's Energy Data Management (EDM) systems already provide accurate predictions, but typically employ a very time-consuming and inflexible forecasting process. However, emerging trends such as intra-day trading and an increasing share of renewable energy sources need a higher forecasting efficiency. Additionally, the wide variety of applications in the energy domain pose different requirements with respect to runtime and accuracy and thus, require flexible control of the forecasting process. To solve this issue, we introduce our novel online forecasting process as part of our EDM system called pEDM. The online forecasting process rapidly provides forecasting results and iteratively refines them over time. Thus, we avoid long calculation times and allow applications to adapt the process to their needs. Our evaluation shows that our online forecasting process offers a very efficient and flexible way of providing forecasts to the requesting applications.
能源需求和供应的持续平衡是电网稳定的基本前提,需要对任何时间点的电力消费和生产进行准确的预测。今天的能源数据管理(EDM)系统已经提供了准确的预测,但通常采用非常耗时且不灵活的预测过程。然而,日内交易和可再生能源份额增加等新兴趋势需要更高的预测效率。此外,能源领域的各种应用对运行时间和准确性提出了不同的要求,因此需要灵活地控制预测过程。为了解决这个问题,我们引入了新的在线预测流程,作为我们的EDM系统pEDM的一部分。在线预测过程快速提供预测结果,并随着时间的推移迭代地改进它们。因此,我们避免了长时间的计算,并允许应用程序根据自己的需要调整过程。我们的评估表明,我们的在线预测流程为请求应用程序提供了一种非常有效和灵活的预测方式。
{"title":"pEDM: online-forecasting for smart energy analytics","authors":"Lars Dannecker, Philipp J. Rösch, Ulrike Fischer, Gordon Gaumnitz, Wolfgang Lehner, Gregor Hackenbroich","doi":"10.1145/2505515.2505588","DOIUrl":"https://doi.org/10.1145/2505515.2505588","url":null,"abstract":"Continuous balancing of energy demand and supply is a fundamental prerequisite for the stability of energy grids and requires accurate forecasts of electricity consumption and production at any point in time. Today's Energy Data Management (EDM) systems already provide accurate predictions, but typically employ a very time-consuming and inflexible forecasting process. However, emerging trends such as intra-day trading and an increasing share of renewable energy sources need a higher forecasting efficiency. Additionally, the wide variety of applications in the energy domain pose different requirements with respect to runtime and accuracy and thus, require flexible control of the forecasting process. To solve this issue, we introduce our novel online forecasting process as part of our EDM system called pEDM. The online forecasting process rapidly provides forecasting results and iteratively refines them over time. Thus, we avoid long calculation times and allow applications to adapt the process to their needs. Our evaluation shows that our online forecasting process offers a very efficient and flexible way of providing forecasts to the requesting applications.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89643000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An empirical study of top-n recommendation for venture finance 风险融资top-n推荐的实证研究
T. Stone, Weinan Zhang, Xiaoxue Zhao
This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. By venture finance, specifically, we are interested in the investment activity of venture capital (VC) firms and their investment partners. We have access to a dataset of recorded venture financings (i.e., investments) by VCs and their investment partners in private US companies. This research was undertaken in partnership with Correlation Ventures, a venture capital firm who are pioneering the use of predictive analytics in order to better inform investment decision making. This paper undertakes a detailed empirical study and data analysis then demonstrates the efficacy of recommender systems in this novel application domain.
本文研究了风险投资领域的top-N投资机会推荐问题。具体来说,我们对风险投资(VC)公司及其投资伙伴的投资活动感兴趣。我们有一个记录了风投及其投资伙伴对美国私营公司的风险融资(即投资)的数据集。这项研究是与风险投资公司Correlation Ventures合作进行的,该公司率先使用预测分析来更好地为投资决策提供信息。本文进行了详细的实证研究和数据分析,然后证明了推荐系统在这一新的应用领域的有效性。
{"title":"An empirical study of top-n recommendation for venture finance","authors":"T. Stone, Weinan Zhang, Xiaoxue Zhao","doi":"10.1145/2505515.2507882","DOIUrl":"https://doi.org/10.1145/2505515.2507882","url":null,"abstract":"This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. By venture finance, specifically, we are interested in the investment activity of venture capital (VC) firms and their investment partners. We have access to a dataset of recorded venture financings (i.e., investments) by VCs and their investment partners in private US companies. This research was undertaken in partnership with Correlation Ventures, a venture capital firm who are pioneering the use of predictive analytics in order to better inform investment decision making. This paper undertakes a detailed empirical study and data analysis then demonstrates the efficacy of recommender systems in this novel application domain.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84454030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A hybrid approach for privacy-preserving processing of knn queries in mobile database systems 移动数据库系统中已知查询隐私保护处理的混合方法
Shixin Tian, Ying Cai, Q. Zheng
In mobile object database systems, both query issuers and queried objects are subject to location privacy intrusion. One solution to this problem is to have users reduce their location resolution when making location update. Such location cloaking allows mobile objects to achieve a desired level of protection, but may not produce accurate query results. Alternatively, one can apply cryptography techniques such as secure multiparty computation to compute the spatial relationship among mobile objects without having mobile objects to disclose their location at all. This strategy produces high quality query results, but in general are computation-intensive, especially when a large number of mobile objects are involved. In this paper, we present a hybrid approach that mitigates the above dilemma. Our idea is to compute approximate query results based on cloaked location information and then refine query results by applying homomorphic encryption. We demonstrate that this approach can be used for efficient and privacy-preserving processing of KNN queries and evaluate its performance through simulation.
在移动对象数据库系统中,查询发布方和被查询对象都存在位置隐私入侵问题。这个问题的一个解决方案是让用户在进行位置更新时降低其位置分辨率。这种位置隐藏允许移动对象达到所需的保护级别,但可能无法产生准确的查询结果。或者,可以应用诸如安全多方计算之类的加密技术来计算移动对象之间的空间关系,而不需要移动对象透露它们的位置。这种策略产生高质量的查询结果,但通常是计算密集型的,特别是当涉及大量移动对象时。在本文中,我们提出了一种缓解上述困境的混合方法。我们的想法是基于隐藏的位置信息计算近似的查询结果,然后通过应用同态加密来优化查询结果。我们证明了这种方法可以用于KNN查询的高效和隐私保护处理,并通过仿真评估了其性能。
{"title":"A hybrid approach for privacy-preserving processing of knn queries in mobile database systems","authors":"Shixin Tian, Ying Cai, Q. Zheng","doi":"10.1145/2505515.2507814","DOIUrl":"https://doi.org/10.1145/2505515.2507814","url":null,"abstract":"In mobile object database systems, both query issuers and queried objects are subject to location privacy intrusion. One solution to this problem is to have users reduce their location resolution when making location update. Such location cloaking allows mobile objects to achieve a desired level of protection, but may not produce accurate query results. Alternatively, one can apply cryptography techniques such as secure multiparty computation to compute the spatial relationship among mobile objects without having mobile objects to disclose their location at all. This strategy produces high quality query results, but in general are computation-intensive, especially when a large number of mobile objects are involved. In this paper, we present a hybrid approach that mitigates the above dilemma. Our idea is to compute approximate query results based on cloaked location information and then refine query results by applying homomorphic encryption. We demonstrate that this approach can be used for efficient and privacy-preserving processing of KNN queries and evaluate its performance through simulation.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84530739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Assessing sparse information extraction using semantic contexts 使用语义上下文评估稀疏信息提取
Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu
One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.
信息提取的一个重要假设是,越频繁的提取越有可能是正确的。稀疏信息提取具有挑战性,因为无论语料库有多大,语料库中只有少量证据支持的提取。一项名为REALM的开创性工作学习hmm对语义关系的上下文进行建模,以评估提取。这是非常昂贵的,并且为上下文显示的语义并不显式。在这项工作中,我们引入了一种轻量级的、显式的语义方法来进行稀疏信息提取。我们使用由数百万个概念、实体和属性组成的大型语义网络来显式地建模语义关系的上下文。实验表明,我们的方法在保持更高效率的同时,比最先进的基于HMM的方法提高了至少11.2%的提取f分数。
{"title":"Assessing sparse information extraction using semantic contexts","authors":"Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu","doi":"10.1145/2505515.2505598","DOIUrl":"https://doi.org/10.1145/2505515.2505598","url":null,"abstract":"One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84428864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automated probabilistic modeling for relational data 关系数据的自动概率建模
Sameer Singh, T. Graepel
Probabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption has been slow due to the high level of expertise expected both in probability and in the domain from the user. Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for the attributes, latent variables that cluster the records, and factors that reflect and represent the foreign key links, whilst allowing efficient inference. Experiments demonstrate the accuracy of the model and scalability of inference on synthetic and real-world data.
关系数据的概率图形模型表示提供了许多所需的特性,例如缺失值的推断、错误检测、数据的可视化以及对关系查询的概率性回答。然而,由于对用户在概率和领域方面的高水平专业知识的期望,采用速度很慢。我们提出了一种方法,该方法使用关系数据库模式自动为数据库构建贝叶斯图形模型,而不是要求领域专家指定数据的概率依赖性。这个结果模型包含属性的自定义分布、聚集记录的潜在变量以及反映和表示外键链接的因素,同时允许有效的推断。实验证明了该模型在综合数据和实际数据上的准确性和可扩展性。
{"title":"Automated probabilistic modeling for relational data","authors":"Sameer Singh, T. Graepel","doi":"10.1145/2505515.2507828","DOIUrl":"https://doi.org/10.1145/2505515.2507828","url":null,"abstract":"Probabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption has been slow due to the high level of expertise expected both in probability and in the domain from the user. Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for the attributes, latent variables that cluster the records, and factors that reflect and represent the foreign key links, whilst allowing efficient inference. Experiments demonstrate the accuracy of the model and scalability of inference on synthetic and real-world data.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84389983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Supporting exploratory people search: a study of factor transparency and user control 支持探索性人员搜索:因素透明度和用户控制的研究
Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue
People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users' general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.
人物搜索是近年来一个活跃的研究课题。相关工作包括专家寻找、合作者推荐、链接预测和社会匹配。然而,这些任务的不同目标和探索性使得很难开发一种适用于每个任务的灵活的人员搜索方法。在这个项目中,我们开发了PeopleExplorer,这是一个交互式的人物搜索系统,在寻找人物时支持探索性搜索任务。在系统中,用户可以通过选择和调整关键标准来指定自己的任务目标。考虑了三个标准:内容相关性,候选人权威性和用户与候选人之间的社会相似性。该项目首次尝试为探索性人员搜索增加透明度,并让用户完全控制搜索过程。通过对24名参与者进行四项不同任务的实验,该系统得到了评估。结果表明,在相当的时间和精力下,我们系统的用户在他们的人员搜索任务中表现得比使用基线系统的用户要好得多。系统用户在查询重构和候选项选择上也表现出了许多独特的行为。我们发现,在不同的任务中,用户对三个标准的总体看法是不同的,这证实了我们关于在人物搜索系统中建模任务差异和用户差异的假设。
{"title":"Supporting exploratory people search: a study of factor transparency and user control","authors":"Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue","doi":"10.1145/2505515.2505684","DOIUrl":"https://doi.org/10.1145/2505515.2505684","url":null,"abstract":"People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users' general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"2012 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86356299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
AKBC 2013: third workshop on automated knowledge base construction AKBC 2013:第三届自动化知识库建设研讨会
Fabian M. Suchanek, S. Riedel, Sameer Singh, P. Talukdar
The AKBC 2013 workshop aims to be a venue of excellence and vision in the area of knowledge base construction. This year's workshop will feature keynotes by ten leading researchers in the field, including from Google, Microsoft, Stanford, and CMU. The submissions focus on visionary ideas instead of on experimental evaluation. Nineteen accepted papers will be presented as posters, with nine exceptional papers also highlighted as spotlight talks. Thereby, the workshop aims provides a vivid forum of discussion about the field of automated knowledge base construction.
AKBC 2013研讨会旨在成为知识库建设领域的卓越和远见的场所。今年的研讨会将由该领域的10位主要研究人员发表主题演讲,其中包括来自谷歌、微软、斯坦福大学和CMU的研究人员。提交的作品侧重于有远见的想法,而不是实验评估。19篇被接受的论文将以海报的形式展示,9篇优秀的论文也将作为聚光灯演讲。因此,研讨会的目的是为自动化知识库建设领域提供一个生动的讨论论坛。
{"title":"AKBC 2013: third workshop on automated knowledge base construction","authors":"Fabian M. Suchanek, S. Riedel, Sameer Singh, P. Talukdar","doi":"10.1145/2505515.2505806","DOIUrl":"https://doi.org/10.1145/2505515.2505806","url":null,"abstract":"The AKBC 2013 workshop aims to be a venue of excellence and vision in the area of knowledge base construction. This year's workshop will feature keynotes by ten leading researchers in the field, including from Google, Microsoft, Stanford, and CMU. The submissions focus on visionary ideas instead of on experimental evaluation. Nineteen accepted papers will be presented as posters, with nine exceptional papers also highlighted as spotlight talks. Thereby, the workshop aims provides a vivid forum of discussion about the field of automated knowledge base construction.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"242 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83697660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings of the 22nd ACM international conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1