The Task: Distinguishing Tasks and Sessions in Legal Information Retrieval

G. Wiggers, G. Zuccon
{"title":"The Task: Distinguishing Tasks and Sessions in Legal Information Retrieval","authors":"G. Wiggers, G. Zuccon","doi":"10.1145/3572960.3572983","DOIUrl":null,"url":null,"abstract":"Legal information retrieval (IR) is a form of professional search often associated with high recall. Information seeking in this context can consist of a single query with no clicks (known as updating behaviour), a literature review where a complex boolean query crafted over several iterations is performed and all documents returned are inspected, or a seeking task spanning days or weeks, consisting of multiple queries interleaved with other tasks. Analysis of query logs is paramount to the improvement of current legal IR systems, and in particular of the system we are associated with, the Dutch Legal Intelligence IR system. This analysis however requires the ability to automatically identify which queries of a user are related to the same search goal — or in other words, related to the same search task. The current practice of defining sessions — a set of user interactions with the IR system with no more than 30 minutes between user actions — and equating a session to representing a search task, might prove ineffective given the characteristics of this user group. In this paper we provide an initial analysis of a sub-set of the query log from the Dutch Legal Intelligence IR system, comprising of 970 queries issued by 10 users within the space of 1 year. From this query log, we used the 30-minutes heuristic to define sessions, and extract 126 sessions, ranging from 1 to 71 sessions per user. We then independently annotate the query log to manually identify search tasks: this activity leads to the identification of 55 tasks, ranging from 1 to 21 tasks per user. In doing this, we highlight how the currently employed heuristic is not adequate to extract search queries from a user that are related to the same search task. We also show why tasks are more informative than sessions with regards to legal information retrieval. We further describe the potential of using characteristics such as Levenshtein distance, common words and string matching for automated task classification.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th Australasian Document Computing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3572960.3572983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Legal information retrieval (IR) is a form of professional search often associated with high recall. Information seeking in this context can consist of a single query with no clicks (known as updating behaviour), a literature review where a complex boolean query crafted over several iterations is performed and all documents returned are inspected, or a seeking task spanning days or weeks, consisting of multiple queries interleaved with other tasks. Analysis of query logs is paramount to the improvement of current legal IR systems, and in particular of the system we are associated with, the Dutch Legal Intelligence IR system. This analysis however requires the ability to automatically identify which queries of a user are related to the same search goal — or in other words, related to the same search task. The current practice of defining sessions — a set of user interactions with the IR system with no more than 30 minutes between user actions — and equating a session to representing a search task, might prove ineffective given the characteristics of this user group. In this paper we provide an initial analysis of a sub-set of the query log from the Dutch Legal Intelligence IR system, comprising of 970 queries issued by 10 users within the space of 1 year. From this query log, we used the 30-minutes heuristic to define sessions, and extract 126 sessions, ranging from 1 to 71 sessions per user. We then independently annotate the query log to manually identify search tasks: this activity leads to the identification of 55 tasks, ranging from 1 to 21 tasks per user. In doing this, we highlight how the currently employed heuristic is not adequate to extract search queries from a user that are related to the same search task. We also show why tasks are more informative than sessions with regards to legal information retrieval. We further describe the potential of using characteristics such as Levenshtein distance, common words and string matching for automated task classification.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
任务:区分法律信息检索中的任务和会话
法律信息检索(IR)是一种专业搜索形式,通常与高召回率有关。在此上下文中,信息搜索可以包含不需要单击的单个查询(称为更新行为)、执行经过多次迭代精心设计的复杂布尔查询并检查返回的所有文档的文献回顾,或者包含与其他任务交错的多个查询的跨越数天或数周的搜索任务。对查询日志的分析对于改进当前的法律IR系统,特别是与我们相关的荷兰法律情报IR系统,是至关重要的。然而,这种分析需要能够自动识别用户的哪些查询与相同的搜索目标相关——或者换句话说,与相同的搜索任务相关。当前定义会话的实践——用户与IR系统之间不超过30分钟的一组用户交互——并将会话等同于代表搜索任务,考虑到该用户组的特征,可能被证明是无效的。在本文中,我们对来自荷兰法律情报IR系统的查询日志子集进行了初步分析,该查询日志由10个用户在1年内发出的970个查询组成。从这个查询日志中,我们使用30分钟启发式来定义会话,并提取126个会话,每个用户的会话从1到71不等。然后,我们独立地注释查询日志以手动识别搜索任务:此活动导致识别55个任务,每个用户的任务范围从1到21个。在此过程中,我们强调了当前使用的启发式如何不足以从与相同搜索任务相关的用户中提取搜索查询。我们还展示了为什么在法律信息检索方面,任务比会话提供的信息更多。我们进一步描述了使用Levenshtein距离、常用单词和字符串匹配等特征进行自动任务分类的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Immediate-Access Indexing Using Space-Efficient Extensible Arrays Investigating Language Use by Polarised Groups on Twitter: A Case Study of the Bushfires Robustness of Neural Rankers to Typos: A Comparative Study Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search The Task: Distinguishing Tasks and Sessions in Legal Information Retrieval
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1