首页 > 最新文献

Proceedings of the 24th Australasian Document Computing Symposium最新文献

英文 中文
Taking Risks with Confidence 自信地冒险
Pub Date : 2019-12-05 DOI: 10.1145/3372124.3372125
R. Benham, Ben Carterette, Alistair Moffat, J. Culpepper
Risk-based evaluation is a failure analysis tool that can be combined with traditional effectiveness metrics to ensure that the improvements observed are consistent across topics when comparing systems. Here we explore the stability of confidence intervals in inference-based risk measurement, extending previous work to five different commonly used inference testing techniques. Using the Robust04 and TREC Core 2017 NYT corpora, we show that risk inferences using parametric methods appear to disagree with their non-parametric counterparts, warranting further investigation. Additionally, we explore how the number of topics being evaluated affects confidence interval stability, and find that more than 50 topics appear to be required before risk-sensitive comparison results are consistent across different inference testing frameworks.
基于风险的评估是一种故障分析工具,可以与传统的有效性度量相结合,以确保在比较系统时观察到的改进在各个主题之间是一致的。在这里,我们探讨了基于推理的风险度量置信区间的稳定性,将之前的工作扩展到五种不同的常用推理测试技术。使用Robust04和TREC Core 2017 NYT语料库,我们发现使用参数方法的风险推断似乎与非参数方法不一致,值得进一步研究。此外,我们探讨了被评估主题的数量如何影响置信区间的稳定性,并发现在不同的推理测试框架中,风险敏感比较结果一致之前似乎需要超过50个主题。
{"title":"Taking Risks with Confidence","authors":"R. Benham, Ben Carterette, Alistair Moffat, J. Culpepper","doi":"10.1145/3372124.3372125","DOIUrl":"https://doi.org/10.1145/3372124.3372125","url":null,"abstract":"Risk-based evaluation is a failure analysis tool that can be combined with traditional effectiveness metrics to ensure that the improvements observed are consistent across topics when comparing systems. Here we explore the stability of confidence intervals in inference-based risk measurement, extending previous work to five different commonly used inference testing techniques. Using the Robust04 and TREC Core 2017 NYT corpora, we show that risk inferences using parametric methods appear to disagree with their non-parametric counterparts, warranting further investigation. Additionally, we explore how the number of topics being evaluated affects confidence interval stability, and find that more than 50 topics appear to be required before risk-sensitive comparison results are consistent across different inference testing frameworks.","PeriodicalId":145556,"journal":{"name":"Proceedings of the 24th Australasian Document Computing Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126001104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards Automatically Classifying Case Law Citation Treatment Using Neural Networks 基于神经网络的判例法引文处理自动分类研究
Pub Date : 2019-12-05 DOI: 10.1145/3372124.3372128
Daniel Locke, G. Zuccon
In common law legal systems, judges decide issues between parties (legal decision or case law) by reference to previous decisions that consider similar factual situations. Accordingly, these decisions typically feature rich citation networks, i.e., a new decision frequently cites previous relevant decisions (citation). These citations may, in varying degrees, express that a cited decision is applicable, not-applicable, or no longer current law. Such treatment label is important to a lawyer's process of determining whether a case is proper law. These labels serve as a matter of convenience in citation indices enabling lawyers to prioritise decisions to examine to understand the current state of the law. They also prove useful in other areas such as prioritisation for manual summarisation of cases, where not all cases can be summarised, and automatic summarisation, or, potentially, as a ranking feature in case law retrieval. While a lawyer can determine the treatment of a cited case by reading a decision, this is time consuming and can increase legal costs. Currently, not all newly decided cases feature these treatment labels. Further, older cases typically do not. Given the large amount of new legal decisions decided each year, manual annotation of such treatment is not feasible. In this paper, we explore the effectiveness of neural network architectures for identifying case law citation treatment and importance (whether a case is important to a lawyer's reasoning process). We find that these tasks are very difficult and various methods for text classification perform poorly. We address more comprehensively the task of citation importance for this reason while limiting our examination of the task of citation treatment to the modelling of the problem and the highlight of the intrinsic difficulty of the task. We make a test dataset available at github.com/ielab/caselaw-citations to stimulate further research that tackles this challenging problem. We also contribute a range of word embeddings learned over a large amount of processed case law text.
在普通法法律体系中,法官通过参考先前考虑类似事实情况的判决来决定当事人之间的问题(法律判决或判例法)。因此,这些决策通常具有丰富的引用网络,即一个新的决策经常引用以前的相关决策(引用)。这些引用可以在不同程度上表达所引用的决定适用、不适用或不再是现行法律。这样的处理标签对于律师确定案件是否为正当法律的过程是重要的。这些标签作为引文索引的便利事项,使律师能够优先考虑审查的决定,以了解当前的法律状态。它们在其他领域也被证明是有用的,比如对并非所有案例都能被摘要的人工案例摘要进行优先排序,以及自动摘要,或者可能作为判例法检索的排序功能。虽然律师可以通过阅读判决书来确定被引用案件的处理方式,但这既耗时又会增加法律成本。目前,并非所有新确诊病例都具有这些治疗标签。此外,较老的病例通常不会。鉴于每年都有大量新的法律判决,手工注释这种处理是不可行的。在本文中,我们探讨了神经网络架构在识别判例法引用处理和重要性(案件对律师的推理过程是否重要)方面的有效性。我们发现这些任务非常困难,各种文本分类方法的性能都很差。由于这个原因,我们将更全面地解决引文重要性的任务,同时将我们对引文处理任务的检查限制在问题的建模和突出任务的内在困难上。我们在github.com/ielab/caselaw-citations上提供了一个测试数据集,以刺激进一步研究解决这个具有挑战性的问题。我们还提供了一系列从大量处理过的判例法文本中学习到的词嵌入。
{"title":"Towards Automatically Classifying Case Law Citation Treatment Using Neural Networks","authors":"Daniel Locke, G. Zuccon","doi":"10.1145/3372124.3372128","DOIUrl":"https://doi.org/10.1145/3372124.3372128","url":null,"abstract":"In common law legal systems, judges decide issues between parties (legal decision or case law) by reference to previous decisions that consider similar factual situations. Accordingly, these decisions typically feature rich citation networks, i.e., a new decision frequently cites previous relevant decisions (citation). These citations may, in varying degrees, express that a cited decision is applicable, not-applicable, or no longer current law. Such treatment label is important to a lawyer's process of determining whether a case is proper law. These labels serve as a matter of convenience in citation indices enabling lawyers to prioritise decisions to examine to understand the current state of the law. They also prove useful in other areas such as prioritisation for manual summarisation of cases, where not all cases can be summarised, and automatic summarisation, or, potentially, as a ranking feature in case law retrieval. While a lawyer can determine the treatment of a cited case by reading a decision, this is time consuming and can increase legal costs. Currently, not all newly decided cases feature these treatment labels. Further, older cases typically do not. Given the large amount of new legal decisions decided each year, manual annotation of such treatment is not feasible. In this paper, we explore the effectiveness of neural network architectures for identifying case law citation treatment and importance (whether a case is important to a lawyer's reasoning process). We find that these tasks are very difficult and various methods for text classification perform poorly. We address more comprehensively the task of citation importance for this reason while limiting our examination of the task of citation treatment to the modelling of the problem and the highlight of the intrinsic difficulty of the task. We make a test dataset available at github.com/ielab/caselaw-citations to stimulate further research that tackles this challenging problem. We also contribute a range of word embeddings learned over a large amount of processed case law text.","PeriodicalId":145556,"journal":{"name":"Proceedings of the 24th Australasian Document Computing Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129365875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Proceedings of the 24th Australasian Document Computing Symposium 第24届澳洲文献计算研讨会论文集
Pub Date : 2019-12-05 DOI: 10.1145/3372124
Robert A. Allen, L. Azzopardi
{"title":"Proceedings of the 24th Australasian Document Computing Symposium","authors":"Robert A. Allen, L. Azzopardi","doi":"10.1145/3372124","DOIUrl":"https://doi.org/10.1145/3372124","url":null,"abstract":"","PeriodicalId":145556,"journal":{"name":"Proceedings of the 24th Australasian Document Computing Symposium","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116967903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differences in language use: Insights from job and talent search 语言使用的差异:来自工作和人才寻找的见解
Pub Date : 2019-12-05 DOI: 10.1145/3372124.3372127
Bahar Salehi, B. Kazimipour, Timothy Baldwin
Search platforms can have more than one type of user, e.g., those who provide and those who consume content. As an example, in a job/talent search platform, content providers are: (1) job seekers who provide CVs, and (2) hirers who provide job advertisements; content consumers, on the other hand, are: (3) job seekers searching for specific jobs, and (4) hirers/recruiters searching for candidates to fill particular positions. As a result, there are four types of users, each with potentially different patterns of language use. In this paper, we compare the language used by different groups of users in job/talent search, by way of word embeddings pre-trained over documents associated with distinct types of users. In doing so, we investigate whether there are systematic shifts/ mismatches in vocabulary or the use of the same term, and consider the implications for an integrated search solution. Our experiments unearth significant differences in language use, but also that there is a strong agreement between the results of our intrinsic and extrinsic comparisons of word embeddings.
搜索平台可以有多种类型的用户,例如,提供内容的用户和消费内容的用户。例如,在求职/人才搜索平台中,内容提供者是:(1)提供简历的求职者,(2)提供招聘广告的招聘者;另一方面,内容消费者是:(3)寻找特定工作的求职者,(4)寻找填补特定职位的候选人的雇主/招聘人员。因此,有四种类型的用户,每种用户都有可能不同的语言使用模式。在本文中,我们通过对与不同类型用户相关的文档进行预训练的词嵌入,比较了不同用户群体在工作/人才搜索中使用的语言。在此过程中,我们调查了词汇表或相同术语的使用中是否存在系统性的变化/不匹配,并考虑了集成搜索解决方案的含义。我们的实验揭示了语言使用的显著差异,但同时也表明,我们对词嵌入的内在和外在比较结果之间存在着强烈的一致性。
{"title":"Differences in language use: Insights from job and talent search","authors":"Bahar Salehi, B. Kazimipour, Timothy Baldwin","doi":"10.1145/3372124.3372127","DOIUrl":"https://doi.org/10.1145/3372124.3372127","url":null,"abstract":"Search platforms can have more than one type of user, e.g., those who provide and those who consume content. As an example, in a job/talent search platform, content providers are: (1) job seekers who provide CVs, and (2) hirers who provide job advertisements; content consumers, on the other hand, are: (3) job seekers searching for specific jobs, and (4) hirers/recruiters searching for candidates to fill particular positions. As a result, there are four types of users, each with potentially different patterns of language use. In this paper, we compare the language used by different groups of users in job/talent search, by way of word embeddings pre-trained over documents associated with distinct types of users. In doing so, we investigate whether there are systematic shifts/ mismatches in vocabulary or the use of the same term, and consider the implications for an integrated search solution. Our experiments unearth significant differences in language use, but also that there is a strong agreement between the results of our intrinsic and extrinsic comparisons of word embeddings.","PeriodicalId":145556,"journal":{"name":"Proceedings of the 24th Australasian Document Computing Symposium","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114948221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Character Profiling in Low-Resource Language Documents 低资源语言文档中的字符分析
Pub Date : 2019-12-05 DOI: 10.1145/3372124.3372129
Tak-sum Wong, J. Lee
This paper focuses on automatic character profiling --- connecting "who", "what" and "when" --- in literary documents. This task is especially challenging for low-resource languages, since off-the-shelf tools for named entity recognition, syntactic parsing and other natural language processing tasks are rarely available. We investigate the impact of human annotation on automatic profiling. Based on a Medieval Chinese corpus, experimental results show that even a relatively small amount of word segmentation, part-of-speech and dependency annotation can improve accuracy in named entity recognition and in identifying character-verb associations, but not character-toponym associations.
本文主要关注文学文献中的自动人物特征分析——连接“谁”、“什么”和“什么时候”。这项任务对于低资源语言来说尤其具有挑战性,因为用于命名实体识别、语法解析和其他自然语言处理任务的现成工具很少可用。我们研究了人工注释对自动分析的影响。基于中古汉语语料库的实验结果表明,即使是相对少量的分词、词性和依存注释也能提高命名实体识别和字动关联识别的准确性,但不能提高字地名关联识别的准确性。
{"title":"Character Profiling in Low-Resource Language Documents","authors":"Tak-sum Wong, J. Lee","doi":"10.1145/3372124.3372129","DOIUrl":"https://doi.org/10.1145/3372124.3372129","url":null,"abstract":"This paper focuses on automatic character profiling --- connecting \"who\", \"what\" and \"when\" --- in literary documents. This task is especially challenging for low-resource languages, since off-the-shelf tools for named entity recognition, syntactic parsing and other natural language processing tasks are rarely available. We investigate the impact of human annotation on automatic profiling. Based on a Medieval Chinese corpus, experimental results show that even a relatively small amount of word segmentation, part-of-speech and dependency annotation can improve accuracy in named entity recognition and in identifying character-verb associations, but not character-toponym associations.","PeriodicalId":145556,"journal":{"name":"Proceedings of the 24th Australasian Document Computing Symposium","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Image Information for eCommerce Queries 学习电子商务查询图像信息
Pub Date : 2019-04-29 DOI: 10.1145/3372124.3372126
U. Porwal
Computing similarity between a query and a document is fundamental in any information retrieval system. In search engines, computing query-document similarity is an essential step in both retrieval and ranking stages. In eBay search, document is an item and the query-item similarity can be computed by comparing different facets of the query-item pair. Query text can be compared with the text of the item title. Likewise, a category constraint applied on the query can be compared with the listing category of the item. However, images are one signal that are usually present in the items but are not present in the query. Images are one of the most intuitive signals used by users to determine the relevance of the item given a query. Including this signal in estimating similarity between the query-item pair is likely to improve the relevance of the search engine. We propose a novel way of deriving image information for queries. We attempt to learn image information for queries from item images instead of generating explicit image features or an image for queries. We use canonical correlation analysis (CCA) to learn a new subspace where projecting the original data will give us a new query and item representation. We hypothesize that this new query representation will also have image information about the query. We estimate the query-item similarity using a vector space model and report the performance of the proposed method on eBay's search data. We show 11.89% relevance improvement over the baseline using Area Under the Receiver Operating Characteristic curve (AUROC) as the evaluation metric. We also show 3.1% relevance improvement over the baseline with Area Under the Precision Recall Curve (AUPRC).
计算查询和文档之间的相似性是任何信息检索系统的基础。在搜索引擎中,计算查询文档相似度是检索和排序阶段的重要步骤。在eBay搜索中,文档是一个项目,可以通过比较查询项目对的不同方面来计算查询项目的相似度。查询文本可以与项目标题的文本进行比较。同样,可以将应用于查询的类别约束与项目的列出类别进行比较。但是,图像是一种通常出现在项目中但不出现在查询中的信号。图像是用户用来确定给定查询项的相关性的最直观的信号之一。在估计查询项对之间的相似性时包含该信号可能会提高搜索引擎的相关性。我们提出了一种新的获取查询图像信息的方法。我们尝试从项目图像中学习图像信息,而不是为查询生成显式的图像特征或图像。我们使用典型相关分析(CCA)来学习一个新的子空间,在这个子空间中,投影原始数据将给我们一个新的查询和项目表示。我们假设这个新的查询表示也将包含关于查询的图像信息。我们使用向量空间模型估计查询项目相似度,并报告了该方法在eBay搜索数据上的性能。我们使用受试者工作特征曲线下面积(AUROC)作为评估指标,显示了比基线11.89%的相关性改善。我们还显示,与精确召回曲线下面积(AUPRC)的基线相比,相关性提高了3.1%。
{"title":"Learning Image Information for eCommerce Queries","authors":"U. Porwal","doi":"10.1145/3372124.3372126","DOIUrl":"https://doi.org/10.1145/3372124.3372126","url":null,"abstract":"Computing similarity between a query and a document is fundamental in any information retrieval system. In search engines, computing query-document similarity is an essential step in both retrieval and ranking stages. In eBay search, document is an item and the query-item similarity can be computed by comparing different facets of the query-item pair. Query text can be compared with the text of the item title. Likewise, a category constraint applied on the query can be compared with the listing category of the item. However, images are one signal that are usually present in the items but are not present in the query. Images are one of the most intuitive signals used by users to determine the relevance of the item given a query. Including this signal in estimating similarity between the query-item pair is likely to improve the relevance of the search engine. We propose a novel way of deriving image information for queries. We attempt to learn image information for queries from item images instead of generating explicit image features or an image for queries. We use canonical correlation analysis (CCA) to learn a new subspace where projecting the original data will give us a new query and item representation. We hypothesize that this new query representation will also have image information about the query. We estimate the query-item similarity using a vector space model and report the performance of the proposed method on eBay's search data. We show 11.89% relevance improvement over the baseline using Area Under the Receiver Operating Characteristic curve (AUROC) as the evaluation metric. We also show 3.1% relevance improvement over the baseline with Area Under the Precision Recall Curve (AUPRC).","PeriodicalId":145556,"journal":{"name":"Proceedings of the 24th Australasian Document Computing Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114639046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 24th Australasian Document Computing Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1