首页 > 最新文献

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)最新文献

英文 中文
The Dawn of today's popular domains: A study of the archived German Web over 18 years 当今流行域名的曙光:对18年来存档的德国网络的研究
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910901
Helge Holzmann, W. Nejdl, Avishek Anand
The Web has been around and maturing for 25 years. The popular websites of today have undergone vast changes during this period, with a few being there almost since the beginning and many new ones becoming popular over the years. This makes it worthwhile to take a look at how these sites have evolved and what they might tell us about the future of the Web. We therefore embarked on a longitudinal study spanning almost the whole period of the Web, based on data collected by the Internet Archive starting in 1996, to retrospectively analyze how the popular Web as of now has evolved over the past 18 years. For our study we focused on the German Web, specifically on the top 100 most popular websites in 17 categories. This paper presents a selection of the most interesting findings in terms of volume, size as well as age of the Web. While related work in the field of Web Dynamics has mainly focused on change rates and analyzed datasets spanning less than a year, we looked at the evolution of websites over 18 years. We found that around 70% of the pages we investigated are younger than a year, with an observed exponential growth in age as well as in size up to now. If this growth rate continues, the number of pages from the popular domains will almost double in the next two years. In addition, we give insights into our data set, provided by the Internet Archive, which hosts the largest and most complete Web archive as of today.
Web已经出现并成熟了25年。今天流行的网站在这一时期经历了巨大的变化,有一些几乎从一开始就在那里,许多新的网站在过去的几年里变得流行起来。因此,有必要研究一下这些网站是如何演变的,以及它们可能告诉我们的有关Web未来的信息。因此,我们开始了一项纵向研究,基于互联网档案馆从1996年开始收集的数据,几乎跨越了整个互联网时期,回顾性地分析了在过去的18年里,流行的网络是如何演变的。在我们的研究中,我们关注的是德国的网络,特别是在17个类别中排名前100的最受欢迎的网站。本文从网络的数量、大小和年龄等方面精选了一些最有趣的发现。虽然Web Dynamics领域的相关工作主要集中在变化率上,并分析了不到一年的数据集,但我们研究了18年来网站的演变。我们发现,在我们调查的页面中,大约有70%的页面年龄小于1年,到目前为止,年龄和大小都呈指数级增长。如果这一增长速度继续下去,热门域名的页面数量将在未来两年内几乎翻一番。此外,我们还提供了对Internet Archive提供的数据集的见解,Internet Archive托管着迄今为止最大、最完整的Web存档。
{"title":"The Dawn of today's popular domains: A study of the archived German Web over 18 years","authors":"Helge Holzmann, W. Nejdl, Avishek Anand","doi":"10.1145/2910896.2910901","DOIUrl":"https://doi.org/10.1145/2910896.2910901","url":null,"abstract":"The Web has been around and maturing for 25 years. The popular websites of today have undergone vast changes during this period, with a few being there almost since the beginning and many new ones becoming popular over the years. This makes it worthwhile to take a look at how these sites have evolved and what they might tell us about the future of the Web. We therefore embarked on a longitudinal study spanning almost the whole period of the Web, based on data collected by the Internet Archive starting in 1996, to retrospectively analyze how the popular Web as of now has evolved over the past 18 years. For our study we focused on the German Web, specifically on the top 100 most popular websites in 17 categories. This paper presents a selection of the most interesting findings in terms of volume, size as well as age of the Web. While related work in the field of Web Dynamics has mainly focused on change rates and analyzed datasets spanning less than a year, we looked at the evolution of websites over 18 years. We found that around 70% of the pages we investigated are younger than a year, with an observed exponential growth in age as well as in size up to now. If this growth rate continues, the number of pages from the popular domains will almost double in the next two years. In addition, we give insights into our data set, provided by the Internet Archive, which hosts the largest and most complete Web archive as of today.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126938816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Music information seeking via social Q&A: An analysis of questions in Music StackExchange community 通过社交问答寻求音乐信息:Music StackExchange社区问题分析
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910914
Hengyi Fu, Yun Fan
In this paper we report preliminary findings based on a quantitative analysis of data from a music social Q&A site, Music StackExchange, focusing on real-life music information needs, uses, and seeking. Eight major topic categories and a two-level taxonomy for question type/intent and the characteristics of questions in each category are presented. Our findings suggest that Q&A sites are a fruitful resource for identifying users' music information needs, how these needs are expressed, and intended uses for the information. On Music StackExchange, users' questioning behaviors were motivated by the recognition of knowledge gaps, lack of resources, need for others' opinions, or interest in research issues, spanning different topics. This study is explorative in nature and the results could improve the understanding of everyday life music information seeking. The findings can inform music librarians and general-purpose music information systems designers of the needs, requirements, and approaches to enhance music related controlled vocabularies, and improve search engines and online knowledge sharing communities to categorize and provide users with more relevant music information.
在本文中,我们报告了基于对音乐社交问答网站music StackExchange数据的定量分析的初步发现,重点关注现实生活中的音乐信息需求,使用和寻求。提出了8个主要的主题类别和问题类型/意图的两级分类法以及每个类别中问题的特征。我们的研究结果表明,在识别用户的音乐信息需求、这些需求是如何表达的以及这些信息的预期用途方面,问答网站是一个卓有成效的资源。在Music StackExchange上,用户的提问行为是由认识到知识差距、缺乏资源、需要别人的意见或对研究问题的兴趣所驱动的,跨越了不同的主题。本研究具有探索性,其结果可以提高对日常生活音乐信息寻求的理解。研究结果可以为音乐图书馆员和通用音乐信息系统设计师提供需求、要求和方法,以增强与音乐相关的受控词汇表,并改进搜索引擎和在线知识共享社区,以分类并为用户提供更多相关的音乐信息。
{"title":"Music information seeking via social Q&A: An analysis of questions in Music StackExchange community","authors":"Hengyi Fu, Yun Fan","doi":"10.1145/2910896.2910914","DOIUrl":"https://doi.org/10.1145/2910896.2910914","url":null,"abstract":"In this paper we report preliminary findings based on a quantitative analysis of data from a music social Q&A site, Music StackExchange, focusing on real-life music information needs, uses, and seeking. Eight major topic categories and a two-level taxonomy for question type/intent and the characteristics of questions in each category are presented. Our findings suggest that Q&A sites are a fruitful resource for identifying users' music information needs, how these needs are expressed, and intended uses for the information. On Music StackExchange, users' questioning behaviors were motivated by the recognition of knowledge gaps, lack of resources, need for others' opinions, or interest in research issues, spanning different topics. This study is explorative in nature and the results could improve the understanding of everyday life music information seeking. The findings can inform music librarians and general-purpose music information systems designers of the needs, requirements, and approaches to enhance music related controlled vocabularies, and improve search engines and online knowledge sharing communities to categorize and provide users with more relevant music information.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130770031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Preliminary exploration of the effect of time constraint on search interactions on webpages 时间约束对网页搜索交互影响的初步探讨
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925463
Chang Liu, Tao Xu
This study explored the effect of time constraint on searchers' interactions during two kinds of tasks through conducting a user experiment. The results demonstrated users' did not tend to accelerate their reading or decision speed given time constraint, but to select fewer pages to read, i.e. visit fewer content pages and search result pages (SERPs); and they had more mouse clicks but fewer keystrokes per page when searching with time constraint. The results also showed the different effects of time constraint on search interactions on pages for two types of tasks. The results have implications for the design of digital library systems that take account users' time constraint or time pressure.
本研究通过用户实验,探讨了时间约束对两类任务中搜索者交互行为的影响。结果表明,在给定的时间限制下,用户并不倾向于加快阅读或决策速度,而是倾向于选择更少的页面进行阅读,即访问更少的内容页面和搜索结果页面(serp);在有时间限制的情况下,他们点击鼠标的次数更多,但每页的按键次数更少。结果还显示,对于两种类型的任务,时间限制对页面上搜索交互的不同影响。研究结果对考虑用户时间限制或时间压力的数字图书馆系统设计具有启示意义。
{"title":"Preliminary exploration of the effect of time constraint on search interactions on webpages","authors":"Chang Liu, Tao Xu","doi":"10.1145/2910896.2925463","DOIUrl":"https://doi.org/10.1145/2910896.2925463","url":null,"abstract":"This study explored the effect of time constraint on searchers' interactions during two kinds of tasks through conducting a user experiment. The results demonstrated users' did not tend to accelerate their reading or decision speed given time constraint, but to select fewer pages to read, i.e. visit fewer content pages and search result pages (SERPs); and they had more mouse clicks but fewer keystrokes per page when searching with time constraint. The results also showed the different effects of time constraint on search interactions on pages for two types of tasks. The results have implications for the design of digital library systems that take account users' time constraint or time pressure.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132428572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Issues of dealing with fluid data in digital libraries 数字图书馆中流动数据的处理问题
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2926738
Soo-yeon Hwang, M. Cragin, M. Lesk, Yu-Hung Lin, Daniel O'Connor
This panel discusses the issues of dealing with fluid data and curating new data in digital libraries.
该小组讨论了在数字图书馆中处理流动数据和管理新数据的问题。
{"title":"Issues of dealing with fluid data in digital libraries","authors":"Soo-yeon Hwang, M. Cragin, M. Lesk, Yu-Hung Lin, Daniel O'Connor","doi":"10.1145/2910896.2926738","DOIUrl":"https://doi.org/10.1145/2910896.2926738","url":null,"abstract":"This panel discusses the issues of dealing with fluid data and curating new data in digital libraries.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122809419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving similar document retrieval using a recursive pseudo relevance feedback strategy 使用递归伪相关反馈策略改进类似文档检索
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925468
Kyle Williams, C. Lee Giles
We present a recursive pseudo relevance feedback strategy for improving retrieval performance in similarity search. The strategy recursively searches on search results returned for a given query and produces a tree that is used for ranking. Experiments on the Reuters 21578 and WebKB datasets show how the strategy leads to a significant improvement in similarity search performance.
为了提高相似度搜索的检索性能,提出了一种递归伪相关反馈策略。该策略对给定查询返回的搜索结果进行递归搜索,并生成用于排序的树。在Reuters 21578和WebKB数据集上的实验表明,该策略显著提高了相似度搜索的性能。
{"title":"Improving similar document retrieval using a recursive pseudo relevance feedback strategy","authors":"Kyle Williams, C. Lee Giles","doi":"10.1145/2910896.2925468","DOIUrl":"https://doi.org/10.1145/2910896.2925468","url":null,"abstract":"We present a recursive pseudo relevance feedback strategy for improving retrieval performance in similarity search. The strategy recursively searches on search results returned for a given query and produces a tree that is used for ranking. Experiments on the Reuters 21578 and WebKB datasets show how the strategy leads to a significant improvement in similarity search performance.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"61 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133648971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Physical samples and digital libraries 物理样本和数字图书馆
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2926736
Unmil Karadkar, K. Lehnert, C. Lenhardt
Research in disciplines such as the earth and biological sciences depends on the availability of representative physical samples that have been collected at substantial cost and effort and some are irreplaceable. The EarthCube iSamples (Internet of Samples in the Earth Sciences) Research Coordination Network (RCN), funded by the National Science Foundation, aims to connect physical samples and sample collections across the Earth Sciences with digital data infrastructures to revolutionize their utility in the support of science. The goal of this workshop is to attract a broad audience comprising of earth scientists and other scientists working with physical samples, data curators, and computer and information scientists to learn from each other about the requirements of physical as well as digital sample and collection management.
地球科学和生物科学等学科的研究依赖于有代表性的物理样品的可用性,这些样品是花费大量成本和精力收集的,有些是不可替代的。EarthCube iSamples(地球科学样本互联网)研究协调网络(RCN)由美国国家科学基金会资助,旨在将地球科学领域的物理样本和样本收集与数字数据基础设施连接起来,以彻底改变它们在支持科学方面的效用。本次研讨会的目标是吸引广泛的听众,包括地球科学家和其他从事物理样本工作的科学家、数据管理员、计算机和信息科学家,以相互学习物理和数字样本和收集管理的要求。
{"title":"Physical samples and digital libraries","authors":"Unmil Karadkar, K. Lehnert, C. Lenhardt","doi":"10.1145/2910896.2926736","DOIUrl":"https://doi.org/10.1145/2910896.2926736","url":null,"abstract":"Research in disciplines such as the earth and biological sciences depends on the availability of representative physical samples that have been collected at substantial cost and effort and some are irreplaceable. The EarthCube iSamples (Internet of Samples in the Earth Sciences) Research Coordination Network (RCN), funded by the National Science Foundation, aims to connect physical samples and sample collections across the Earth Sciences with digital data infrastructures to revolutionize their utility in the support of science. The goal of this workshop is to attract a broad audience comprising of earth scientists and other scientists working with physical samples, data curators, and computer and information scientists to learn from each other about the requirements of physical as well as digital sample and collection management.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134610004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating the quality of educational answers in community question-answering 社区问答中教育答案的质量评价
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910900
Long T. Le, C. Shah, Erik Choi
Community Question-Answering (CQA), where questions and answers are generated by peers, has become a popular method of information seeking in online environments. While the content repositories created through CQA sites have been used widely to support general purpose tasks, using them as online digital libraries that support educational needs is an emerging practice. Horizontal CQA services, such as Yahoo! Answers, and vertical CQA services, such as Brainly, are aiming to help students improve their learning process by answering their educational questions. In these services, receiving high quality answer(s) to a question is a critical factor not only for user satisfaction, but also for supporting learning. However, the questions are not necessarily answered by experts, and the askers may not have enough knowledge and skill to evaluate the quality of the answers they receive. This could be problematic when students build their own knowledge base by applying inaccurate information or knowledge acquired from online sources. Using moderators could alleviate this problem. However, a moderator's evaluation of answer quality may be inconsistent because it is based on their subjective assessments. Employing human assessors may also be insufficient due to the large amount of content available on a CQA site. To address these issues, we propose a framework for automatically assessing the quality of answers. This is achieved by integrating different groups of features - personal, community-based, textual, and contextual - to build a classification model and determine what constitutes answer quality. To test this evaluation framework, we collected more than 10 million educational answers posted by more than 3 million users on Brainly's United States and Poland sites. The experiments conducted on these datasets show that the model using Random Forest (RF) achieves more than 83% accuracy in identifying high quality of answers. In addition, the findings indicate that personal and community-based features have more prediction power in assessing answer quality. Our approach also achieves high values on other key metrics such as F1-score and Area under ROC curve. The work reported here can be useful in many other contexts where providing automatic quality assessment in a digital repository of textual information is paramount.
社区问答(Community question - answer, CQA)是一种由同伴生成问题和答案的方式,已成为在线环境中一种流行的信息搜索方法。虽然通过CQA站点创建的内容存储库已被广泛用于支持通用任务,但将它们用作支持教育需求的在线数字图书馆是一种新兴的实践。横向CQA服务,如Yahoo!Answers和垂直CQA服务,如Brainly,旨在通过回答学生的教育问题来帮助他们改善学习过程。在这些服务中,获得高质量的问题答案不仅是用户满意度的关键因素,也是支持学习的关键因素。然而,这些问题不一定由专家回答,提问者可能没有足够的知识和技能来评估他们收到的答案的质量。当学生通过应用不准确的信息或从网上获得的知识来建立自己的知识库时,这可能会产生问题。使用版主可以缓解这个问题。然而,版主对答案质量的评价可能不一致,因为这是基于他们的主观评估。由于CQA站点上有大量可用的内容,雇用人工评估员也可能是不够的。为了解决这些问题,我们提出了一个自动评估答案质量的框架。这是通过整合不同的特征组——个人的、基于社区的、文本的和上下文的——来建立一个分类模型,并确定构成答案质量的因素来实现的。为了测试这个评估框架,我们收集了Brainly美国和波兰网站上300多万用户发布的1000多万个教育答案。在这些数据集上进行的实验表明,使用随机森林(RF)的模型在识别高质量答案方面达到了83%以上的准确率。此外,研究结果表明,个人特征和社区特征在评估答案质量方面具有更强的预测能力。我们的方法在其他关键指标上也获得了高值,例如f1分数和ROC曲线下的面积。这里报告的工作在许多其他环境中是有用的,在这些环境中,在文本信息的数字存储库中提供自动质量评估是至关重要的。
{"title":"Evaluating the quality of educational answers in community question-answering","authors":"Long T. Le, C. Shah, Erik Choi","doi":"10.1145/2910896.2910900","DOIUrl":"https://doi.org/10.1145/2910896.2910900","url":null,"abstract":"Community Question-Answering (CQA), where questions and answers are generated by peers, has become a popular method of information seeking in online environments. While the content repositories created through CQA sites have been used widely to support general purpose tasks, using them as online digital libraries that support educational needs is an emerging practice. Horizontal CQA services, such as Yahoo! Answers, and vertical CQA services, such as Brainly, are aiming to help students improve their learning process by answering their educational questions. In these services, receiving high quality answer(s) to a question is a critical factor not only for user satisfaction, but also for supporting learning. However, the questions are not necessarily answered by experts, and the askers may not have enough knowledge and skill to evaluate the quality of the answers they receive. This could be problematic when students build their own knowledge base by applying inaccurate information or knowledge acquired from online sources. Using moderators could alleviate this problem. However, a moderator's evaluation of answer quality may be inconsistent because it is based on their subjective assessments. Employing human assessors may also be insufficient due to the large amount of content available on a CQA site. To address these issues, we propose a framework for automatically assessing the quality of answers. This is achieved by integrating different groups of features - personal, community-based, textual, and contextual - to build a classification model and determine what constitutes answer quality. To test this evaluation framework, we collected more than 10 million educational answers posted by more than 3 million users on Brainly's United States and Poland sites. The experiments conducted on these datasets show that the model using Random Forest (RF) achieves more than 83% accuracy in identifying high quality of answers. In addition, the findings indicate that personal and community-based features have more prediction power in assessing answer quality. Our approach also achieves high values on other key metrics such as F1-score and Area under ROC curve. The work reported here can be useful in many other contexts where providing automatic quality assessment in a digital repository of textual information is paramount.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116173889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A methodology to evaluate triple confidence and detect incorrect triples in knowledge bases 一种在知识库中评估三重置信度和检测不正确三重的方法
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925456
Haihua Xie, Xiaoqing Lu, Zhi Tang, Mao Ye
The accuracy of the contents of a knowledge base determines the effectiveness of knowledge service applications, thus, it is necessary to evaluate the confidence of triples when a knowledge base is built. This study introduces a generic computational methodology to compute the confidence values of triples in knowledge bases and detect potentially incorrect ones for further verification. The major contributions of the proposed methodology are as follows: (1) A process to compute the confidence values of triples is designed; (2) New algorithms are proposed to adjust the term frequency and inverse document frequency values of each triple; (3) A method to build a support vector machine (SVM) classifier based on the selected triples used for incorrect triple detection is presented.
知识库内容的准确性决定了知识服务应用的有效性,因此在构建知识库时需要对三元组的置信度进行评估。本研究引入了一种通用的计算方法来计算知识库中三元组的置信度值,并检测潜在的不正确值以供进一步验证。该方法的主要贡献如下:(1)设计了一个计算三元组置信值的过程;(2)提出了调整每个三元组的词频和逆文档频率值的新算法;(3)提出了一种基于所选三元组构建支持向量机分类器的方法,用于错误三元组检测。
{"title":"A methodology to evaluate triple confidence and detect incorrect triples in knowledge bases","authors":"Haihua Xie, Xiaoqing Lu, Zhi Tang, Mao Ye","doi":"10.1145/2910896.2925456","DOIUrl":"https://doi.org/10.1145/2910896.2925456","url":null,"abstract":"The accuracy of the contents of a knowledge base determines the effectiveness of knowledge service applications, thus, it is necessary to evaluate the confidence of triples when a knowledge base is built. This study introduces a generic computational methodology to compute the confidence values of triples in knowledge bases and detect potentially incorrect ones for further verification. The major contributions of the proposed methodology are as follows: (1) A process to compute the confidence values of triples is designed; (2) New algorithms are proposed to adjust the term frequency and inverse document frequency values of each triple; (3) A method to build a support vector machine (SVM) classifier based on the selected triples used for incorrect triple detection is presented.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123043789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Leveraging tweet ranking in an optimization framework for tweet timeline generation 利用tweet排名在tweet时间线生成的优化框架
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925453
Lili Yao, Feifan Fan, Yansong Feng, Dongyan Zhao
When users search in Twitter, they are overloaded with a mass of microblog posts every time, which are not particularly informative and lack of meaningful organization. Therefore, it is helpful to produce a summarized tweet timeline about the topic. The tweet timeline generation is such a task aiming at selecting a small set of representative tweets to generate meaningful timeline. In this paper, we introduce an optimization framework to jointly model the relevance, novelty and coverage of the tweet timeline, including effective tweet ranking algorithm. Extensive experiments on the public TREC 2014 dataset demonstrate our method can achieve very competitive results against the state-of-art TTG systems.
当用户在Twitter上进行搜索时,每次都会被大量的微博超载,这些微博的信息量并不特别大,也缺乏有意义的组织。因此,生成关于该主题的汇总tweet时间轴是有帮助的。推文时间线生成就是这样一个任务,目的是选择一小部分有代表性的推文,生成有意义的时间线。在本文中,我们引入了一个优化框架来联合建模推文时间轴的相关性、新颖性和覆盖率,包括有效的推文排名算法。在公共TREC 2014数据集上的大量实验表明,我们的方法可以获得与最先进的TTG系统非常有竞争力的结果。
{"title":"Leveraging tweet ranking in an optimization framework for tweet timeline generation","authors":"Lili Yao, Feifan Fan, Yansong Feng, Dongyan Zhao","doi":"10.1145/2910896.2925453","DOIUrl":"https://doi.org/10.1145/2910896.2925453","url":null,"abstract":"When users search in Twitter, they are overloaded with a mass of microblog posts every time, which are not particularly informative and lack of meaningful organization. Therefore, it is helpful to produce a summarized tweet timeline about the topic. The tweet timeline generation is such a task aiming at selecting a small set of representative tweets to generate meaningful timeline. In this paper, we introduce an optimization framework to jointly model the relevance, novelty and coverage of the tweet timeline, including effective tweet ranking algorithm. Extensive experiments on the public TREC 2014 dataset demonstrate our method can achieve very competitive results against the state-of-art TTG systems.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123273135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge extraction for literature review 文献综述的知识提取
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925441
T. Erekhinskaya, Mithun Balakrishna, M. Tatu, Steven D. Werner, D. Moldovan
Researchers in all domains need to keep abreast with recent scientific advances. Finding relevant publications and reviewing them is a labor-intensive task that lacks efficient automatic tools to support it. Current tools are limited to standard keyword-based search systems that return potentially relevant documents and then leave the user with a monumental task of sifting through them. In this paper, we present a semantic-driven system to automatically extract the most important knowledge from a publication and reduces the effort required for the literature review. The system extracts key findings from biomedical papers in PubMed, populates a predefined template and displays it. This allows the user to get the key ideas of the content even before opening or downloading the publication.
所有领域的研究人员都需要跟上最新的科学进展。查找相关出版物并审查它们是一项劳动密集型任务,缺乏有效的自动工具来支持它。目前的工具仅限于标准的基于关键字的搜索系统,这些搜索系统返回可能相关的文档,然后留给用户一项艰巨的任务,即筛选这些文档。在本文中,我们提出了一个语义驱动的系统来自动从出版物中提取最重要的知识,并减少了文献综述所需的工作量。该系统从PubMed上的生物医学论文中提取关键发现,填充一个预定义的模板并显示出来。这允许用户甚至在打开或下载出版物之前就获得内容的关键思想。
{"title":"Knowledge extraction for literature review","authors":"T. Erekhinskaya, Mithun Balakrishna, M. Tatu, Steven D. Werner, D. Moldovan","doi":"10.1145/2910896.2925441","DOIUrl":"https://doi.org/10.1145/2910896.2925441","url":null,"abstract":"Researchers in all domains need to keep abreast with recent scientific advances. Finding relevant publications and reviewing them is a labor-intensive task that lacks efficient automatic tools to support it. Current tools are limited to standard keyword-based search systems that return potentially relevant documents and then leave the user with a monumental task of sifting through them. In this paper, we present a semantic-driven system to automatically extract the most important knowledge from a publication and reduces the effort required for the literature review. The system extracts key findings from biomedical papers in PubMed, populates a predefined template and displays it. This allows the user to get the key ideas of the content even before opening or downloading the publication.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129485373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1