从自由文本执法数据中提取语义信息结构

James R. Johnson, Anita Miller, L. Khan, B. Thuraisingham
{"title":"从自由文本执法数据中提取语义信息结构","authors":"James R. Johnson, Anita Miller, L. Khan, B. Thuraisingham","doi":"10.1109/ISI.2012.6284291","DOIUrl":null,"url":null,"abstract":"A detective distributes information on a current case to his law enforcement peers. He quickly receives a computer generated response with leads identified within hundreds of thousands of previously distributed free text documents from thousands of other detectives. The challenges lie in the nature of free text - unstructured formats, confusing word usage, cut-andpaste additions, abbreviations, inserted html/xml tags, multimedia content, and domain-specific terminology. This research proposes a new data structure, the semantic information structure, which encapsulates the extracted content information on classes of information such as people, vehicles, events, organizations, objects, and locations as well as the contextual information about the connections and measures to enable prioritization of files containing related pieces of content. The structure is organized to be a result of automated natural language processing methods that extract entities, expanded entity phrases and their links which are driven by ontologies, DLSafe rules, abductive hypotheses and semantic composition. Importance and significance measures aid in prioritization.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Extracting semantic information structures from free text law enforcement data\",\"authors\":\"James R. Johnson, Anita Miller, L. Khan, B. Thuraisingham\",\"doi\":\"10.1109/ISI.2012.6284291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A detective distributes information on a current case to his law enforcement peers. He quickly receives a computer generated response with leads identified within hundreds of thousands of previously distributed free text documents from thousands of other detectives. The challenges lie in the nature of free text - unstructured formats, confusing word usage, cut-andpaste additions, abbreviations, inserted html/xml tags, multimedia content, and domain-specific terminology. This research proposes a new data structure, the semantic information structure, which encapsulates the extracted content information on classes of information such as people, vehicles, events, organizations, objects, and locations as well as the contextual information about the connections and measures to enable prioritization of files containing related pieces of content. The structure is organized to be a result of automated natural language processing methods that extract entities, expanded entity phrases and their links which are driven by ontologies, DLSafe rules, abductive hypotheses and semantic composition. Importance and significance measures aid in prioritization.\",\"PeriodicalId\":199734,\"journal\":{\"name\":\"2012 IEEE International Conference on Intelligence and Security Informatics\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Intelligence and Security Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISI.2012.6284291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Intelligence and Security Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2012.6284291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

一名侦探将当前案件的信息分发给他的执法同僚。他很快就收到了计算机生成的回复,其中包含从数千名其他侦探先前分发的数十万份免费文本文件中识别出的线索。挑战在于自由文本的本质——非结构化格式、令人困惑的单词用法、剪切和粘贴添加、缩写、插入的html/xml标记、多媒体内容和特定于领域的术语。本研究提出了一种新的数据结构,即语义信息结构,它将提取的内容信息封装在诸如人、车辆、事件、组织、对象和位置等信息类别上,以及关于连接和度量的上下文信息,从而能够对包含相关内容的文件进行优先级排序。该结构被组织为自动自然语言处理方法的结果,这些方法提取实体、扩展实体短语及其链接,这些链接由本体、DLSafe规则、溯因假设和语义组合驱动。重要性和意义度量有助于确定优先级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Extracting semantic information structures from free text law enforcement data
A detective distributes information on a current case to his law enforcement peers. He quickly receives a computer generated response with leads identified within hundreds of thousands of previously distributed free text documents from thousands of other detectives. The challenges lie in the nature of free text - unstructured formats, confusing word usage, cut-andpaste additions, abbreviations, inserted html/xml tags, multimedia content, and domain-specific terminology. This research proposes a new data structure, the semantic information structure, which encapsulates the extracted content information on classes of information such as people, vehicles, events, organizations, objects, and locations as well as the contextual information about the connections and measures to enable prioritization of files containing related pieces of content. The structure is organized to be a result of automated natural language processing methods that extract entities, expanded entity phrases and their links which are driven by ontologies, DLSafe rules, abductive hypotheses and semantic composition. Importance and significance measures aid in prioritization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Detecting criminal networks: SNA models are compared to proprietary models Securing cyberspace: Identifying key actors in hacker communities Emergency decision support using an agent-based modeling approach Payment card fraud: Challenges and solutions Extracting action knowledge in security informatics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1