The Encoding of Avestan - Problems and Solutions

J. Gippert
{"title":"The Encoding of Avestan - Problems and Solutions","authors":"J. Gippert","doi":"10.21248/jlcl.27.2012.160","DOIUrl":null,"url":null,"abstract":"Avestan’ is the name of the ritual language of Zoroastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Textund Sprachmaterialien’). 2 Today, the complete Avestan corpus is available, together with elaborate search functions and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. Right from the beginning of their computational work concerning the Avesta, the compilers had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here. 1 The Avestan script and its transcription 1.1 Early western approaches to the Avestan script and its transcription The Avestan script has been known to western scholarship since the 17 century when the first accounts of the religion of the ‘Parsees’, i.e., Zoroastrians living in India and Iran, were published. The first notable description of the script is found in the travel report by JEAN CHARDIN who sojourned in Iran in 1673–7; in the 1711 edition of his report, the author provides an ‘alphabet of the ancient Persians’, together with a lithographed table contrasting the characters of the Avestan script with their Perso-Arabian equivalents; cf. the extract illustrated in Fig. 1.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Lang. Technol. Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.27.2012.160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Avestan’ is the name of the ritual language of Zoroastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Textund Sprachmaterialien’). 2 Today, the complete Avestan corpus is available, together with elaborate search functions and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. Right from the beginning of their computational work concerning the Avesta, the compilers had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here. 1 The Avestan script and its transcription 1.1 Early western approaches to the Avestan script and its transcription The Avestan script has been known to western scholarship since the 17 century when the first accounts of the religion of the ‘Parsees’, i.e., Zoroastrians living in India and Iran, were published. The first notable description of the script is found in the travel report by JEAN CHARDIN who sojourned in Iran in 1673–7; in the 1711 edition of his report, the author provides an ‘alphabet of the ancient Persians’, together with a lithographed table contrasting the characters of the Avestan script with their Perso-Arabian equivalents; cf. the extract illustrated in Fig. 1.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
阿维斯陀语的编码-问题和解决方案
“阿维斯陀”是琐罗亚斯德教仪式语言的名称,在阿契美尼德、阿萨西德和萨珊王朝时期,琐罗亚斯德教是伊朗帝国的国教,时间跨度超过1200年。它是以“阿维斯塔”命名的,也就是说,是由查拉图斯特拉(也被称为琐罗亚斯德)在公元前一千年初创立的宗教基础的神圣经文的集合,与吠陀梵语一起,阿维斯塔语代表了印欧语言的印度-伊朗分支的最古老的见证之一,这使得它对历史比较语言学特别有趣。这就是为什么阿维斯塔的文本是在印欧研究框架内进行的电子语料库建设的第一批对象之一,导致了TITUS数据库的建立(“Thesaurus indogermanischer Textund Sprachmaterialien”)。今天,完整的阿维斯陀语料库是可用的,连同详细的搜索功能和所谓的“雅斯纳”子语料库的扩展版本,它涵盖了大量不同阅读的证明。从有关《阿维斯塔》的计算工作开始,编纂者就不得不面对这样一个事实,即《阿维斯塔》中的文本是以一种从右向左书写的特殊文字传送的,这种文字也被用于印刷直到今天的学术版本。不用说,在20世纪80年代中期,没有办法对阿维斯陀经典进行编码,就像它们在手稿中发现的那样。相反,我们必须依靠转录设备,这些设备是由所使用的计算机系统提供的字符编码限制所决定的。由于我们在这方面必须面对的问题和我们可以应用的解决方案对于古代语言的计算工作的发展是典型的,因此似乎值得在这里概述它们。早在17世纪,西方学者就已经知道了阿维斯陀文字,当时“帕西人”(即生活在印度和伊朗的琐罗亚斯德教教徒)的第一批宗教记录被出版。关于这个剧本的第一个值得注意的描述是在1673年至1677年在伊朗逗留的让·夏丹的旅行报告中发现的;在1711年版本的报告中,作者提供了一份“古波斯人的字母表”,连同一张石版表格,对比了阿维斯塔文字与波斯阿拉伯文字的对应文字;参见图1所示的提取物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Aufbau eines Referenzkorpus zur deutschsprachigen internetbasierten Kommunikation als Zusatzkomponente für die Korpora im Projekt 'Digitales Wörterbuch der deutschen Sprache' (DWDS) Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus Comparison of OCR Accuracy on Early Printed Books using the Open Source Engines Calamari and OCRopus Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1