Language independent text summarization of western European languages using shape coding of text elements

A. Saleh, L. Weigang
{"title":"Language independent text summarization of western European languages using shape coding of text elements","authors":"A. Saleh, L. Weigang","doi":"10.1109/FSKD.2017.8393116","DOIUrl":null,"url":null,"abstract":"The majority of text summarization techniques in literature depend, in one way or another, on language dependent pre-structured lexicons, databases, taggers and/or parsers. Such techniques require a prior knowledge of the language of the text being summarized. In this paper we propose an extractive text summarization tool, UnB Language Independent Text Summarizer (UnB-LITS), which is capable of performing text summarization in a language independent manner. The new model depends on intrinsic characteristics of the text being summarized rather than its language and thus eliminates the need for language dependent lexicons, databases, taggers or parsers. Within this tool, we develop an innovative way of coding the shapes of text elements (words, n-grams, sentences and paragraphs), in addition to proposing language independent algorithms that is capable of normalizing words and performing relative stemming or lemmatization. The proposed algorithms and Shape-Coding routine enable the UnB-LITS tool to extract intrinsic features of document elements and score them statistically to extract a representative extractive summary independent of the document language. In this paper we focused on single document summarization of western European languages. The tool was tested on hundreds of documents written in English, Portuguese, French and Spanish and showed better performance as compared with the results obtained in literature as well as from commercial summarizers.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The majority of text summarization techniques in literature depend, in one way or another, on language dependent pre-structured lexicons, databases, taggers and/or parsers. Such techniques require a prior knowledge of the language of the text being summarized. In this paper we propose an extractive text summarization tool, UnB Language Independent Text Summarizer (UnB-LITS), which is capable of performing text summarization in a language independent manner. The new model depends on intrinsic characteristics of the text being summarized rather than its language and thus eliminates the need for language dependent lexicons, databases, taggers or parsers. Within this tool, we develop an innovative way of coding the shapes of text elements (words, n-grams, sentences and paragraphs), in addition to proposing language independent algorithms that is capable of normalizing words and performing relative stemming or lemmatization. The proposed algorithms and Shape-Coding routine enable the UnB-LITS tool to extract intrinsic features of document elements and score them statistically to extract a representative extractive summary independent of the document language. In this paper we focused on single document summarization of western European languages. The tool was tested on hundreds of documents written in English, Portuguese, French and Spanish and showed better performance as compared with the results obtained in literature as well as from commercial summarizers.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本元素形状编码的西欧语言非语言文本摘要
文献中的大多数文本摘要技术都以这样或那样的方式依赖于语言相关的预结构化词汇、数据库、标记器和/或解析器。这种技巧要求对所总结的文本的语言有事先的了解。在本文中,我们提出了一种提取文本摘要工具,UnB语言独立文本摘要器(UnB- lits),它能够以语言独立的方式执行文本摘要。新模型依赖于被总结文本的内在特征,而不是其语言,因此消除了对依赖于语言的词典、数据库、标注器或解析器的需要。在这个工具中,我们开发了一种创新的方法来编码文本元素(单词,n-gram,句子和段落)的形状,除了提出能够规范化单词并执行相对词干或词法化的语言独立算法之外。所提出的算法和Shape-Coding例程使UnB-LITS工具能够提取文档元素的内在特征,并对其进行统计评分,以提取独立于文档语言的代表性提取摘要。本文主要研究了西欧语言的单文献摘要。该工具在数百份用英语、葡萄牙语、法语和西班牙语撰写的文件上进行了测试,与从文献和商业摘要器中获得的结果相比,显示出更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Space syntax and time distance based analysis on the influences of the subways to the pubic traffic accessibility in Nanchang city Designing fuzzy apparatus to model dyslexic individual symptoms for clinical use A kNN classifier optimized by P systems Research on optimal operation of cascade hydropower station based on improved biogeography-based optimization algorithm An estimation algorithm of time-varying channels in the OFDM communication system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1