首页 > 最新文献

Australasian Document Computing Symposium最新文献

英文 中文
Crisis management knowledge from social media 来自社交媒体的危机管理知识
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537740
K. Kreiner, A. Immonen, H. Suominen
More and more crisis managers, crisis communicators and laypeople use Twitter and other social media to provide or seek crisis information. In this paper, we focus on retrospective conversion of human-safety related data to crisis management knowledge. First, we study how Twitter data can be classified into the seven categories of the United Nations Development Program Security Model (i.e., Food, Health, Politics, Economic, Personal, Community, and Environment). We conclude that these topic categories are applicable, and supplementing them with classification of individual authors into more generic sources of data (i.e., Official authorities, Media, and Laypeople) allows curating data and assessing crisis maturity. Second, we introduce automated classifiers, based on supervised learning and decision rules, for both tasks and evaluate their correctness. This evaluation uses two datasets collected during the crises of Queensland floods and NZ Earthquake in 2011. The topic classifier performs well in the major categories (i.e., 120--190 training instances) of Economic (F = 0.76) and Community (F = 0.67) while in the minor categories (i.e., 0--60 training instances) the results are more modest (F ≤ 0.41). The source classifier shows excellent results (F ≥ 0.83) in all categories.
越来越多的危机管理者、危机传播者和外行人使用Twitter等社交媒体提供或寻求危机信息。在本文中,我们着重于回顾性地将人类安全相关数据转换为危机管理知识。首先,我们研究如何将Twitter数据划分为联合国开发计划署安全模型的七个类别(即食品,健康,政治,经济,个人,社区和环境)。我们得出结论,这些主题类别是适用的,并通过将个人作者分类为更通用的数据来源(即官方机构、媒体和非专业人士)来补充它们,从而可以管理数据并评估危机成熟度。其次,我们为这两个任务引入基于监督学习和决策规则的自动分类器,并评估它们的正确性。本评估使用了2011年昆士兰州洪水和新西兰地震危机期间收集的两个数据集。主题分类器在经济(F = 0.76)和社区(F = 0.67)的主要类别(即120- 190个训练实例)中表现良好,而在次要类别(即0- 60个训练实例)中结果更为温和(F≤0.41)。源分类器在所有类别中均表现出优异的结果(F≥0.83)。
{"title":"Crisis management knowledge from social media","authors":"K. Kreiner, A. Immonen, H. Suominen","doi":"10.1145/2537734.2537740","DOIUrl":"https://doi.org/10.1145/2537734.2537740","url":null,"abstract":"More and more crisis managers, crisis communicators and laypeople use Twitter and other social media to provide or seek crisis information. In this paper, we focus on retrospective conversion of human-safety related data to crisis management knowledge. First, we study how Twitter data can be classified into the seven categories of the United Nations Development Program Security Model (i.e., Food, Health, Politics, Economic, Personal, Community, and Environment). We conclude that these topic categories are applicable, and supplementing them with classification of individual authors into more generic sources of data (i.e., Official authorities, Media, and Laypeople) allows curating data and assessing crisis maturity. Second, we introduce automated classifiers, based on supervised learning and decision rules, for both tasks and evaluate their correctness. This evaluation uses two datasets collected during the crises of Queensland floods and NZ Earthquake in 2011. The topic classifier performs well in the major categories (i.e., 120--190 training instances) of Economic (F = 0.76) and Community (F = 0.67) while in the minor categories (i.e., 0--60 training instances) the results are more modest (F ≤ 0.41). The source classifier shows excellent results (F ≥ 0.83) in all categories.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Using eye tracking for evaluating web search interfaces 使用眼动追踪来评估网页搜索界面
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537747
H. A. Maqbali, Falk Scholer, J. Thom, Mingfang Wu
Using eye tracking in the evaluation of web search interfaces can provide rich information on users' information search behaviour, particularly in the matter of user interaction with different informative components on a search results screen. One of the main issues affecting the use of eye tracking in research is the quality of captured eye movements (calibration), therefore, in this paper, we propose a method that allows us to determine the quality of calibration, since the existing eye tracking system (Tobii Studio) does not provide any criteria for this aspect. Another issue addressed in this paper is the adaptation of gaze direction. We use a black screen displaying for 3 seconds between screens to avoid the effect of the previous screen on user gaze direction on the coming screen. A further issue when employing eye tracking in the evaluation of web search interfaces is the selection of the appropriate filter for the raw gaze-points data. In our studies, we filtered this data by removing noise, identifying gaze points that occur in Area of Interests (AOIs), optimising gaze data and identifying viewed AOIs.
利用眼动追踪技术对网络搜索界面进行评价,可以提供丰富的用户信息搜索行为信息,特别是在用户与搜索结果屏幕上不同信息组件的交互方面。影响眼动追踪在研究中使用的主要问题之一是捕捉到的眼球运动(校准)的质量,因此,在本文中,我们提出了一种方法,可以让我们确定校准的质量,因为现有的眼动追踪系统(Tobii Studio)没有提供这方面的任何标准。本文研究的另一个问题是注视方向的自适应。我们在屏幕之间使用黑屏显示3秒,以避免前一个屏幕对用户在下一个屏幕上的注视方向的影响。在评估网页搜索界面时使用眼动追踪的另一个问题是为原始注视点数据选择合适的过滤器。在我们的研究中,我们通过去除噪声、识别出现在兴趣区域(aoi)中的凝视点、优化凝视数据和识别已查看的aoi来过滤这些数据。
{"title":"Using eye tracking for evaluating web search interfaces","authors":"H. A. Maqbali, Falk Scholer, J. Thom, Mingfang Wu","doi":"10.1145/2537734.2537747","DOIUrl":"https://doi.org/10.1145/2537734.2537747","url":null,"abstract":"Using eye tracking in the evaluation of web search interfaces can provide rich information on users' information search behaviour, particularly in the matter of user interaction with different informative components on a search results screen. One of the main issues affecting the use of eye tracking in research is the quality of captured eye movements (calibration), therefore, in this paper, we propose a method that allows us to determine the quality of calibration, since the existing eye tracking system (Tobii Studio) does not provide any criteria for this aspect. Another issue addressed in this paper is the adaptation of gaze direction. We use a black screen displaying for 3 seconds between screens to avoid the effect of the previous screen on user gaze direction on the coming screen. A further issue when employing eye tracking in the evaluation of web search interfaces is the selection of the appropriate filter for the raw gaze-points data. In our studies, we filtered this data by removing noise, identifying gaze points that occur in Area of Interests (AOIs), optimising gaze data and identifying viewed AOIs.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114860020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An enterprise search paradigm based on extended query auto-completion: do we still need search and navigation? 基于扩展查询自动完成的企业搜索范例:我们还需要搜索和导航吗?
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537743
D. Hawking, K. Griffiths
Enterprise query auto-completion (QAC) can allow website or intranet visitors to satisfy a need more efficiently than traditional searching and browsing. The limited scope of an enterprise makes it possible to satisfy a high proportion of information needs through completion. Further, the availability of structured sources of completions such as product catalogues compensates for sparsity of log data. Extended forms (X-QAC) can give access to information that is inaccessible via a conventional crawled index. We show that it can be guaranteed that for every suggestion there is a prefix which causes it to appear in the top k suggestions. Using university query logs and structured lists, we quantify the significant keystroke savings attributable to this guarantee (worst case). Such savings may be of particular value for mobile devices. A user experiment showed that a staff lookup task took an average of 61% longer with a conventional search interface than with an X-QAC system. Using wine catalogue data we demonstrate a further extension which allows a user to home in on desired items in faceted-navigation style. We also note that advertisements can be triggered from QAC. Given the advantages and power of X-QAC systems, we envisage that websites and intranets of the [near] future will provide less navigation and rely less on conventional search.
企业查询自动完成(QAC)可以使网站或内部网访问者比传统的搜索和浏览更有效地满足需求。企业有限的范围使得通过完成可以满足很大比例的信息需求。此外,诸如产品目录之类的结构化完成源的可用性弥补了日志数据的稀疏性。扩展表单(X-QAC)可以提供对通过常规抓取索引无法访问的信息的访问。我们证明,可以保证每个建议都有一个前缀,使其出现在前k个建议中。使用大学查询日志和结构化列表,我们量化了由于这种保证(最坏的情况)而节省的重要击键量。这种节省可能对移动设备特别有价值。一项用户实验表明,使用传统搜索界面查找人员的时间比使用X-QAC系统平均要长61%。使用葡萄酒目录数据,我们演示了一个进一步的扩展,它允许用户在面导航样式中找到所需的项目。我们还注意到广告可以从QAC触发。考虑到X-QAC系统的优势和能力,我们设想在[不久的]将来,网站和内内网将提供更少的导航,更少地依赖传统的搜索。
{"title":"An enterprise search paradigm based on extended query auto-completion: do we still need search and navigation?","authors":"D. Hawking, K. Griffiths","doi":"10.1145/2537734.2537743","DOIUrl":"https://doi.org/10.1145/2537734.2537743","url":null,"abstract":"Enterprise query auto-completion (QAC) can allow website or intranet visitors to satisfy a need more efficiently than traditional searching and browsing. The limited scope of an enterprise makes it possible to satisfy a high proportion of information needs through completion. Further, the availability of structured sources of completions such as product catalogues compensates for sparsity of log data. Extended forms (X-QAC) can give access to information that is inaccessible via a conventional crawled index.\u0000 We show that it can be guaranteed that for every suggestion there is a prefix which causes it to appear in the top k suggestions. Using university query logs and structured lists, we quantify the significant keystroke savings attributable to this guarantee (worst case). Such savings may be of particular value for mobile devices. A user experiment showed that a staff lookup task took an average of 61% longer with a conventional search interface than with an X-QAC system.\u0000 Using wine catalogue data we demonstrate a further extension which allows a user to home in on desired items in faceted-navigation style. We also note that advertisements can be triggered from QAC.\u0000 Given the advantages and power of X-QAC systems, we envisage that websites and intranets of the [near] future will provide less navigation and rely less on conventional search.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132101759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Conditional collocation in Japanese 日语中的条件搭配
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537736
Takumi Sonoda, T. Miura
Analysis of Collocation is targeted for Natural Language Processing (NLP). From a linguistic perspective, collocation provides us with a way to place words close together in a natural manner. By this approach, we can examine deep structure of semantics through words and their situation. Although there have been some investigation based on co-occurrence, few discussion has been made about conditional collocation. In this investigation, we discuss a computational approach to extract conditional collocation by using data mining and statistical techniques.
搭配分析是自然语言处理(NLP)的目标。从语言学的角度来看,搭配为我们提供了一种将单词以自然的方式排列在一起的方式。通过这种方法,我们可以通过词语及其情境来考察语义的深层结构。虽然已有一些基于共现现象的研究,但关于条件搭配的讨论却很少。在这项研究中,我们讨论了一种利用数据挖掘和统计技术提取条件搭配的计算方法。
{"title":"Conditional collocation in Japanese","authors":"Takumi Sonoda, T. Miura","doi":"10.1145/2537734.2537736","DOIUrl":"https://doi.org/10.1145/2537734.2537736","url":null,"abstract":"Analysis of Collocation is targeted for Natural Language Processing (NLP). From a linguistic perspective, collocation provides us with a way to place words close together in a natural manner. By this approach, we can examine deep structure of semantics through words and their situation. Although there have been some investigation based on co-occurrence, few discussion has been made about conditional collocation. In this investigation, we discuss a computational approach to extract conditional collocation by using data mining and statistical techniques.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128325627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Managing short postings lists 管理短帖子列表
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537738
A. Trotman, Xiangfei Jia, Matt Crane
Previous work has examined space saving and throughput increasing techniques for long postings lists in an inverted file search engine. In this contribution we show that highly sporadic terms (terms that occur in 1 or 2 documents) are a high proportion of the unique terms in the collection and that these terms are seen in queries. The previously known space saving method of storing their short postings lists in the vocabulary is compared to storing in the postings file. We quantify the saving as about 6.5%, with no loss in precision, and suggest the adoption of this technique.
以前的工作已经研究了在反向文件搜索引擎中为长帖子列表节省空间和提高吞吐量的技术。在这个贡献中,我们展示了高度零星的术语(出现在1或2个文档中的术语)在集合中的唯一术语中所占的比例很高,并且这些术语在查询中可以看到。将以前已知的将短帖子列表存储在词汇表中的节省空间的方法与存储在帖子文件中的方法进行比较。我们将节省量化为6.5%左右,精度没有损失,并建议采用这种技术。
{"title":"Managing short postings lists","authors":"A. Trotman, Xiangfei Jia, Matt Crane","doi":"10.1145/2537734.2537738","DOIUrl":"https://doi.org/10.1145/2537734.2537738","url":null,"abstract":"Previous work has examined space saving and throughput increasing techniques for long postings lists in an inverted file search engine. In this contribution we show that highly sporadic terms (terms that occur in 1 or 2 documents) are a high proportion of the unique terms in the collection and that these terms are seen in queries. The previously known space saving method of storing their short postings lists in the vocabulary is compared to storing in the postings file. We quantify the saving as about 6.5%, with no loss in precision, and suggest the adoption of this technique.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133220356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Visual summarisation of text for surveillance and situational awareness in hospitals 用于医院监控和态势感知的可视化文本摘要
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537739
H. Suominen, L. Hanlen
Nosocomial infections (NIs, any infection that a patient contracts in a healthcare institution) cost 100, 000 lives and five billion dollars per year for 300 million Americans alone. Surveillance in hospitals holds the potential of reducing NI rates by more than thirty per cent but performing this task by hand is impossible at scale of every appointment, examination, intervention, and other event in healthcare. Narratives in patient records can indicate NIs and their automated processing could scale out surveillance. This paper describes a text summarisation system for NI surveillance and situational awareness in hospitals. The system is a cascaded sentence, report, and patient classifier. It generates three types of visual summaries for an input of patient narratives and ward maps: cross-sectional statuses at the same point of time, longitudinal trends in time, and highlighted text to see the textual evidence leading to a given status or trend. This gives evidence for and against a given NI in the levels of hospitals, wards, patients, reports, and sentences. The system has excellent recall and precision (e.g., 0.95 and 0.71 for reports) in summarisation for the subset of NIs from fungal species on 1,880 authentic records of 527 patients from 3 hospitals. To demonstrate the system design, we have developed a mobile iPad compatible web-application and a simulation with eighteen patients on three medical wards in one hospital during one month with 61 records in total. The design is extendable to other summarisation tasks.
医院感染(NIs,病人在医疗机构感染的任何感染)每年仅对3亿美国人就造成10万人死亡和50亿美元的损失。医院的监测有可能将NI率降低30%以上,但在每次预约、检查、干预和医疗保健中的其他事件的规模上,手工执行这项任务是不可能的。患者记录中的叙述可以表明NIs,其自动化处理可以扩大监测范围。本文介绍了一种用于医院NI监控和态势感知的文本摘要系统。该系统是一个级联的句子、报告和患者分类器。它为输入患者叙述和病房地图生成三种类型的视觉摘要:同一时间点的横断面状态、纵向趋势和突出显示文本,以查看导致给定状态或趋势的文本证据。这在医院、病房、病人、报告和判决的层面上提供了支持和反对特定NI的证据。该系统对来自3家医院的527名患者的1880份真实记录中来自真菌物种的NIs子集具有出色的召回率和准确性(例如,报告为0.95和0.71)。为了演示系统设计,我们开发了一个移动iPad兼容的web应用程序,并模拟了一家医院三个病房的18名患者在一个月内的61条记录。该设计可扩展到其他摘要任务。
{"title":"Visual summarisation of text for surveillance and situational awareness in hospitals","authors":"H. Suominen, L. Hanlen","doi":"10.1145/2537734.2537739","DOIUrl":"https://doi.org/10.1145/2537734.2537739","url":null,"abstract":"Nosocomial infections (NIs, any infection that a patient contracts in a healthcare institution) cost 100, 000 lives and five billion dollars per year for 300 million Americans alone. Surveillance in hospitals holds the potential of reducing NI rates by more than thirty per cent but performing this task by hand is impossible at scale of every appointment, examination, intervention, and other event in healthcare. Narratives in patient records can indicate NIs and their automated processing could scale out surveillance. This paper describes a text summarisation system for NI surveillance and situational awareness in hospitals. The system is a cascaded sentence, report, and patient classifier. It generates three types of visual summaries for an input of patient narratives and ward maps: cross-sectional statuses at the same point of time, longitudinal trends in time, and highlighted text to see the textual evidence leading to a given status or trend. This gives evidence for and against a given NI in the levels of hospitals, wards, patients, reports, and sentences. The system has excellent recall and precision (e.g., 0.95 and 0.71 for reports) in summarisation for the subset of NIs from fungal species on 1,880 authentic records of 527 patients from 3 hospitals. To demonstrate the system design, we have developed a mobile iPad compatible web-application and a simulation with eighteen patients on three medical wards in one hospital during one month with 61 records in total. The design is extendable to other summarisation tasks.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the magic of WAND 探索魔杖的魔力
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537744
M. Petri, J. Culpepper, Alistair Moffat
Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND processing strategy is one way of reducing the amount of computation necessary when executing such a query. The value of WAND has now been validated in a wide range of studies, and has become one of the key baselines against which all new top-k processing algorithms are benchmarked. However, most previous implementations of WAND-based retrieval approaches have been in the context of the BM25 Okapi similarity scoring regime. Here we measure the performance of WAND in the context of the alternative Language Model similarity score computation, and find that the dramatic efficiency gains reported in previous studies are no longer achievable. That is, when the primary goal of a retrieval system is to maximize effectiveness, WAND is relatively unhelpful in terms of attaining the secondary objective of maximizing query throughput rates. However, the BM-WAND algorithm does in fact help reducing the percentage of postings to be scored, but with additional computational overhead. We explore a variety of tradeoffs between scoring metric and processing regime and present new insight into how score-safe algorithms interact with rank scoring.
Web搜索服务每秒处理数千个查询,并从包含大量数据的集合中过滤它们的答案。对查询的快速响应是关键的服务期望。众所周知的WAND处理策略是减少执行此类查询时所需计算量的一种方法。WAND的价值现在已经在广泛的研究中得到了验证,并且已经成为所有新的top-k处理算法的基准之一。然而,以前大多数基于wand的检索方法的实现都是在BM25霍加皮相似性评分制度的背景下实现的。在这里,我们测量了WAND在替代语言模型相似度评分计算上下文中的性能,并发现以前研究中报道的显着效率增益不再可以实现。也就是说,当检索系统的主要目标是最大化效率时,WAND在实现最大化查询吞吐量的次要目标方面相对没有帮助。然而,BM-WAND算法实际上确实有助于减少要评分的帖子的百分比,但会带来额外的计算开销。我们探索了评分指标和处理制度之间的各种权衡,并提出了分数安全算法如何与排名评分相互作用的新见解。
{"title":"Exploring the magic of WAND","authors":"M. Petri, J. Culpepper, Alistair Moffat","doi":"10.1145/2537734.2537744","DOIUrl":"https://doi.org/10.1145/2537734.2537744","url":null,"abstract":"Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND processing strategy is one way of reducing the amount of computation necessary when executing such a query. The value of WAND has now been validated in a wide range of studies, and has become one of the key baselines against which all new top-k processing algorithms are benchmarked. However, most previous implementations of WAND-based retrieval approaches have been in the context of the BM25 Okapi similarity scoring regime. Here we measure the performance of WAND in the context of the alternative Language Model similarity score computation, and find that the dramatic efficiency gains reported in previous studies are no longer achievable. That is, when the primary goal of a retrieval system is to maximize effectiveness, WAND is relatively unhelpful in terms of attaining the secondary objective of maximizing query throughput rates. However, the BM-WAND algorithm does in fact help reducing the percentage of postings to be scored, but with additional computational overhead. We explore a variety of tradeoffs between scoring metric and processing regime and present new insight into how score-safe algorithms interact with rank scoring.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114490684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Power walk: revisiting the random surfer 快步走:重新拜访随机的冲浪者
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537749
L. Park, S. Simoff
Measurement of graph centrality provides us with an indication of the importance or popularity of each vertex in a graph. When dealing with graphs that are not centrally controlled (such as the Web, social networks and academic citation graphs), centrality measure must 1) correlate with vertex importance/popularity, 2) scale well in terms of computation, and 3) be difficult to manipulate by individuals. The Random Surfer probability transition model, combined with Eigenvalue Centrality produced PageRank, which has shown to satisfy the required properties. Existing centrality measures (including PageRank) make the assumption that all directed edges are positive, implying an endorsement. Recent work on sentiment analysis has shown that this assumption is not valid. In this article, we introduce a new method of transitioning a graph, called Power Walk, that can successfully compute centrality scores for graphs with real weighted edges. We show that it satisfies the desired properties, and that its computation time and centrality ranking is similar to when using the Random Surfer model for non-negative matrices. Finally, stability and convergence analysis shows us that both stability and convergence when using the power method, are dependent on the Power Walk parameter β.
图中心性的度量为我们提供了图中每个顶点的重要性或受欢迎程度的指示。当处理非集中控制的图(如网络、社交网络和学术引文图)时,中心性度量必须1)与顶点重要性/流行度相关,2)在计算方面具有良好的规模,3)难以被个人操纵。将随机冲浪者概率转移模型与特征值中心性相结合,得到了满足要求的PageRank。现有的中心性度量(包括PageRank)假设所有有向边都是正的,这意味着背书。最近对情绪分析的研究表明,这种假设是不成立的。在本文中,我们介绍了一种新的图的过渡方法,称为Power Walk,它可以成功地计算具有真实加权边的图的中心性分数。我们证明了它满足期望的性质,并且它的计算时间和中心性排序与使用非负矩阵的Random Surfer模型时相似。最后,稳定性和收敛性分析表明,使用功率法时,稳定性和收敛性都依赖于功率步行参数β。
{"title":"Power walk: revisiting the random surfer","authors":"L. Park, S. Simoff","doi":"10.1145/2537734.2537749","DOIUrl":"https://doi.org/10.1145/2537734.2537749","url":null,"abstract":"Measurement of graph centrality provides us with an indication of the importance or popularity of each vertex in a graph. When dealing with graphs that are not centrally controlled (such as the Web, social networks and academic citation graphs), centrality measure must 1) correlate with vertex importance/popularity, 2) scale well in terms of computation, and 3) be difficult to manipulate by individuals. The Random Surfer probability transition model, combined with Eigenvalue Centrality produced PageRank, which has shown to satisfy the required properties. Existing centrality measures (including PageRank) make the assumption that all directed edges are positive, implying an endorsement. Recent work on sentiment analysis has shown that this assumption is not valid. In this article, we introduce a new method of transitioning a graph, called Power Walk, that can successfully compute centrality scores for graphs with real weighted edges. We show that it satisfies the desired properties, and that its computation time and centrality ranking is similar to when using the Random Surfer model for non-negative matrices. Finally, stability and convergence analysis shows us that both stability and convergence when using the power method, are dependent on the Power Walk parameter β.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115233638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards information retrieval evaluation with reduced and only positive judgements 向信息检索评价的减少和只有积极的判断
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537748
Diego Mollá Aliod, David Martínez, Iman Amini
This paper proposes a document distance-based approach to automatically expand the number of available relevance judgements when those are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published clinical systematic review. We show that evaluations based on these expanded relevance judgements are more reliable than those using only the initially available judgements. We also show the impact of such an evaluation approach as the number of initial judgements decreases.
本文提出了一种基于文档距离的方法,当可用的相关判断数量有限并减少到只有正面判断时,该方法可以自动扩展可用的相关判断数量。例如,当唯一可用的判断是从已发表的临床系统评价的参考文献列表中提取时,就可能发生这种情况。我们表明,基于这些扩展相关性判断的评估比仅使用最初可用判断的评估更可靠。我们还显示了这种评估方法的影响,因为初始判断的数量减少。
{"title":"Towards information retrieval evaluation with reduced and only positive judgements","authors":"Diego Mollá Aliod, David Martínez, Iman Amini","doi":"10.1145/2537734.2537748","DOIUrl":"https://doi.org/10.1145/2537734.2537748","url":null,"abstract":"This paper proposes a document distance-based approach to automatically expand the number of available relevance judgements when those are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published clinical systematic review. We show that evaluations based on these expanded relevance judgements are more reliable than those using only the initially available judgements. We also show the impact of such an evaluation approach as the number of initial judgements decreases.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126793482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Malformed UTF-8 and spam 畸形的UTF-8和垃圾邮件
Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537746
Matt Crane, A. Trotman, Richard A. O'Keefe
In this paper we discuss some of the document encoding errors that were found when scaling our indexer and search engine up to large collections crawled from the web, such as ClueWeb09. In this paper we describe the encoding errors, what effect they could have on indexing and searching, how they are processed within our indexer and search engine and how they relate to the quality of the page measured by another method.
在本文中,我们讨论了一些文档编码错误,这些错误是在扩展我们的索引器和搜索引擎到从web抓取的大型集合时发现的,比如ClueWeb09。在本文中,我们描述了编码错误,它们对索引和搜索的影响,它们在我们的索引器和搜索引擎中是如何处理的,以及它们与用另一种方法测量的页面质量的关系。
{"title":"Malformed UTF-8 and spam","authors":"Matt Crane, A. Trotman, Richard A. O'Keefe","doi":"10.1145/2537734.2537746","DOIUrl":"https://doi.org/10.1145/2537734.2537746","url":null,"abstract":"In this paper we discuss some of the document encoding errors that were found when scaling our indexer and search engine up to large collections crawled from the web, such as ClueWeb09. In this paper we describe the encoding errors, what effect they could have on indexing and searching, how they are processed within our indexer and search engine and how they relate to the quality of the page measured by another method.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Australasian Document Computing Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1