Towards a flexible open-source software library for multi-layered scholarly textual studies: An Arabic case study dealing with semi-automatic language processing

A. D. Grosso, Ouafae Nahli
{"title":"Towards a flexible open-source software library for multi-layered scholarly textual studies: An Arabic case study dealing with semi-automatic language processing","authors":"A. D. Grosso, Ouafae Nahli","doi":"10.1109/CIST.2014.7016633","DOIUrl":null,"url":null,"abstract":"This paper presents both the general model and a case study of the Computational and Collaborative Philology Library (CoPhiLib), an ongoing initiative underway at the Institute for Computational Linguistics (ILC) of the National Research Council (CNR), Pisa, Italy. The library, designed and organized as a reusable, abstract and open-source software component, aims at solving the needs of multi-lingual and cross-lingual analysis by exposing common Application Programming Interfaces (APIs). The core modules, coded by the Java programming language, constitute the groundwork of a Web platform designed to deal with textual scholarly needs. The Web application, implemented according to the Java Enterprise specifications, focuses on multi-layered analysis for the study of literary documents and related multimedia sources. This ambitious challenge seeks to obtain the management of textual resources, on the one hand by abstracting from current language, on the other hand by decoupling from the specific requirements of single projects. This goal is achieved thanks to methodologies declared by the “agile process”, and by putting into effect suitable use case modeling, design patterns, and component-based architectures. The reusability and flexibility of the system have been tested on an Arabic case study: the system allows users to choose the morphological engine (such as AraMorph or Al-Khalil), along with linguistic granularity (i.e. with or without declension). Finally, the application enables the construction of annotated resources for further statistical engines (training set).","PeriodicalId":106483,"journal":{"name":"2014 Third IEEE International Colloquium in Information Science and Technology (CIST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Third IEEE International Colloquium in Information Science and Technology (CIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIST.2014.7016633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

This paper presents both the general model and a case study of the Computational and Collaborative Philology Library (CoPhiLib), an ongoing initiative underway at the Institute for Computational Linguistics (ILC) of the National Research Council (CNR), Pisa, Italy. The library, designed and organized as a reusable, abstract and open-source software component, aims at solving the needs of multi-lingual and cross-lingual analysis by exposing common Application Programming Interfaces (APIs). The core modules, coded by the Java programming language, constitute the groundwork of a Web platform designed to deal with textual scholarly needs. The Web application, implemented according to the Java Enterprise specifications, focuses on multi-layered analysis for the study of literary documents and related multimedia sources. This ambitious challenge seeks to obtain the management of textual resources, on the one hand by abstracting from current language, on the other hand by decoupling from the specific requirements of single projects. This goal is achieved thanks to methodologies declared by the “agile process”, and by putting into effect suitable use case modeling, design patterns, and component-based architectures. The reusability and flexibility of the system have been tested on an Arabic case study: the system allows users to choose the morphological engine (such as AraMorph or Al-Khalil), along with linguistic granularity (i.e. with or without declension). Finally, the application enables the construction of annotated resources for further statistical engines (training set).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向多层学术文本研究的灵活开源软件库:处理半自动语言处理的阿拉伯语案例研究
本文介绍了计算和协作语言学图书馆(CoPhiLib)的一般模型和案例研究,这是意大利比萨国家研究委员会(CNR)计算语言学研究所(ILC)正在进行的一项倡议。该库被设计和组织为一个可重用的、抽象的、开源的软件组件,旨在通过公开通用的应用程序编程接口(api)来解决多语言和跨语言分析的需求。核心模块由Java编程语言编码,构成了Web平台的基础,该平台旨在处理文本学术需求。该Web应用程序是根据Java Enterprise规范实现的,其重点是对文学文档和相关多媒体资源进行多层分析。这一雄心勃勃的挑战寻求获得文本资源的管理,一方面通过从当前语言中抽象,另一方面通过从单个项目的特定需求中解耦。这一目标的实现要归功于“敏捷过程”所声明的方法,以及将适当的用例建模、设计模式和基于组件的体系结构付诸实施。该系统的可重用性和灵活性已经在一个阿拉伯语案例研究中进行了测试:该系统允许用户选择形态学引擎(如AraMorph或Al-Khalil),以及语言粒度(即有或没有衰落)。最后,该应用程序支持为进一步的统计引擎(训练集)构造带注释的资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Getting the static model of PIM from the CIM Development of a web-based weather station for irrigation scheduling Interactive simulation as a virtual tool in electromagnetics for online education Towards a smart cloud gate for smart devices Enhancing Arabic WordNet with the use of Princeton WordNet and a bilingual dictionary
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1