首页 > 最新文献

J. Lang. Technol. Comput. Linguistics最新文献

英文 中文
The Encoding of Avestan - Problems and Solutions 阿维斯陀语的编码-问题和解决方案
Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.160
J. Gippert
Avestan’ is the name of the ritual language of Zoroastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Textund Sprachmaterialien’). 2 Today, the complete Avestan corpus is available, together with elaborate search functions and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. Right from the beginning of their computational work concerning the Avesta, the compilers had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here. 1 The Avestan script and its transcription 1.1 Early western approaches to the Avestan script and its transcription The Avestan script has been known to western scholarship since the 17 century when the first accounts of the religion of the ‘Parsees’, i.e., Zoroastrians living in India and Iran, were published. The first notable description of the script is found in the travel report by JEAN CHARDIN who sojourned in Iran in 1673–7; in the 1711 edition of his report, the author provides an ‘alphabet of the ancient Persians’, together with a lithographed table contrasting the characters of the Avestan script with their Perso-Arabian equivalents; cf. the extract illustrated in Fig. 1.
“阿维斯陀”是琐罗亚斯德教仪式语言的名称,在阿契美尼德、阿萨西德和萨珊王朝时期,琐罗亚斯德教是伊朗帝国的国教,时间跨度超过1200年。它是以“阿维斯塔”命名的,也就是说,是由查拉图斯特拉(也被称为琐罗亚斯德)在公元前一千年初创立的宗教基础的神圣经文的集合,与吠陀梵语一起,阿维斯塔语代表了印欧语言的印度-伊朗分支的最古老的见证之一,这使得它对历史比较语言学特别有趣。这就是为什么阿维斯塔的文本是在印欧研究框架内进行的电子语料库建设的第一批对象之一,导致了TITUS数据库的建立(“Thesaurus indogermanischer Textund Sprachmaterialien”)。今天,完整的阿维斯陀语料库是可用的,连同详细的搜索功能和所谓的“雅斯纳”子语料库的扩展版本,它涵盖了大量不同阅读的证明。从有关《阿维斯塔》的计算工作开始,编纂者就不得不面对这样一个事实,即《阿维斯塔》中的文本是以一种从右向左书写的特殊文字传送的,这种文字也被用于印刷直到今天的学术版本。不用说,在20世纪80年代中期,没有办法对阿维斯陀经典进行编码,就像它们在手稿中发现的那样。相反,我们必须依靠转录设备,这些设备是由所使用的计算机系统提供的字符编码限制所决定的。由于我们在这方面必须面对的问题和我们可以应用的解决方案对于古代语言的计算工作的发展是典型的,因此似乎值得在这里概述它们。早在17世纪,西方学者就已经知道了阿维斯陀文字,当时“帕西人”(即生活在印度和伊朗的琐罗亚斯德教教徒)的第一批宗教记录被出版。关于这个剧本的第一个值得注意的描述是在1673年至1677年在伊朗逗留的让·夏丹的旅行报告中发现的;在1711年版本的报告中,作者提供了一份“古波斯人的字母表”,连同一张石版表格,对比了阿维斯塔文字与波斯阿拉伯文字的对应文字;参见图1所示的提取物。
{"title":"The Encoding of Avestan - Problems and Solutions","authors":"J. Gippert","doi":"10.21248/jlcl.27.2012.160","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.160","url":null,"abstract":"Avestan’ is the name of the ritual language of Zoroastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Textund Sprachmaterialien’). 2 Today, the complete Avestan corpus is available, together with elaborate search functions and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. Right from the beginning of their computational work concerning the Avesta, the compilers had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here. 1 The Avestan script and its transcription 1.1 Early western approaches to the Avestan script and its transcription The Avestan script has been known to western scholarship since the 17 century when the first accounts of the religion of the ‘Parsees’, i.e., Zoroastrians living in India and Iran, were published. The first notable description of the script is found in the travel report by JEAN CHARDIN who sojourned in Iran in 1673–7; in the 1711 edition of his report, the author provides an ‘alphabet of the ancient Persians’, together with a lithographed table contrasting the characters of the Avestan script with their Perso-Arabian equivalents; cf. the extract illustrated in Fig. 1.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120936084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation 旧立陶宛语参考语料库(SLIEKKAS)和自动语法注释
Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.165
Jolanta Gelumbeckaite, Mindaugas Sinkunas, Vytautas Zinkevicius
{"title":"Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation","authors":"Jolanta Gelumbeckaite, Mindaugas Sinkunas, Vytautas Zinkevicius","doi":"10.21248/jlcl.27.2012.165","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.165","url":null,"abstract":"","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121882748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Digitalisierung historischer Glossare zur automatisierten Vorannotation von Textkorpora am Beispiel des Altdeutschen "历史词汇数字化——传统用语的自动扩展。"你看,这就是古德国的例子
Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.162
Roland Mittmann
Um Worter und Wortformen innerhalb von Texten auffindbar zu machen, waren im vordigitalen Zeitalter Glossare unerlasslich. Heute lassen sich ihre Daten automatisiert mit den zugehorigen Texten zusammenfuhren, um die Texte so mit weiteren Informationen anzureichern. Fur die dazu notwendige Digitalisierung der Glossare ist angesichts des historischen Druckbildes und der oft nicht eindeutigen Informationsauszeichnung ein manuelles Vorgehen am zielfuhrendsten. Je nach Strukturierung des Glossars und nach Art und Uberlieferungsdichte des behandelten Textes ergeben sich dabei unterschiedliche Herausforderungen und Probleme. Diese werden am Beispiel der Digitalisierung der Glossare zum Althochdeutschen und Altsachsischen dargestellt.
在数字之前的数字时代,编码器无法让文字或文字在文字当中产生出来。现在,你的数据可以自动地与正确的文本结合,以补充更多的信息。定义词数字化所需要的工作是笔迹数字化,由于图像化而推出,许多色彩明快的资讯大奖也是笔墨式的。由于手册的构造,以及讨论经文的类型和内容,所涉挑战和问题就会出现。词汇数字化的例子就是古德语和古德语。
{"title":"Digitalisierung historischer Glossare zur automatisierten Vorannotation von Textkorpora am Beispiel des Altdeutschen","authors":"Roland Mittmann","doi":"10.21248/jlcl.27.2012.162","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.162","url":null,"abstract":"Um Worter und Wortformen innerhalb von Texten auffindbar zu machen, waren im vordigitalen Zeitalter Glossare unerlasslich. Heute lassen sich ihre Daten automatisiert mit den zugehorigen Texten zusammenfuhren, um die Texte so mit weiteren Informationen anzureichern. Fur die dazu notwendige Digitalisierung der Glossare ist angesichts des historischen Druckbildes und der oft nicht eindeutigen Informationsauszeichnung ein manuelles Vorgehen am zielfuhrendsten. Je nach Strukturierung des Glossars und nach Art und Uberlieferungsdichte des behandelten Textes ergeben sich dabei unterschiedliche Herausforderungen und Probleme. Diese werden am Beispiel der Digitalisierung der Glossare zum Althochdeutschen und Altsachsischen dargestellt.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Manuelle Abgleichung bei automatisierter Vorannotation: Das Tagging grammatischer Kategorien im Referenzkorpus Altdeutsch 自动编程时手动计算:参考文件中的语法类别,能够自动计算
Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.163
S. Linde
! ∀ # ∃ % ∃ ∃ & & # ∋ ∃ ( ∃ ) % ( ∗ + % , & ∃ ( % + #
!∀#∃%∃∃& & #∋∃(∃)%(∗+ %,&∃(% #
{"title":"Manuelle Abgleichung bei automatisierter Vorannotation: Das Tagging grammatischer Kategorien im Referenzkorpus Altdeutsch","authors":"S. Linde","doi":"10.21248/jlcl.27.2012.163","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.163","url":null,"abstract":"! ∀ # ∃ % ∃ ∃ & & # ∋ ∃ ( ∃ ) % ( ∗ + % , & ∃ ( % + #","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Annotation of Morphology, Syntax and Information Structure in a Multilayered Diachronic Corpus 多层历时语料库中形态、句法和信息结构的标注
Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.143
Kristin Bech, K. Eide
{"title":"The Annotation of Morphology, Syntax and Information Structure in a Multilayered Diachronic Corpus","authors":"Kristin Bech, K. Eide","doi":"10.21248/jlcl.26.2011.143","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.143","url":null,"abstract":"","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126189651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Slate - A Tool for Creating and Maintaining Annotated Corpora Slate -一个创建和维护标注语料库的工具
Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.149
D. Kaplan, R. Iida, K. Nishina, T. Tokunaga
Recent research trends of the last five years show that richly annotated corpora inspire novel research. These richly annotated corpora are indispensable for progressing research, but also more difficult to manage and maintain due to increasing complexity – what is needed is a way to manage the annotation project in its entirety. However, annotation project management has received little attention, with tools predominately focusing on single document annotation. Therefore, we define a list of corpus creation and management needs for annotation systems, and then introduce our multi-purpose annotation and management system Slate to address these needs through use of a case study, showing how project management is essential to creating good corpora.
近五年来的研究趋势表明,注释丰富的语料库激发了新的研究。这些注释丰富的语料库对于研究的进展是必不可少的,但由于复杂性的增加,管理和维护也变得更加困难——我们需要的是一种全面管理注释项目的方法。然而,注释项目管理很少受到关注,工具主要集中在单个文档注释上。因此,我们定义了标注系统的语料库创建和管理需求列表,然后介绍了我们的多用途标注和管理系统Slate,通过案例研究来解决这些需求,展示了项目管理对于创建好的语料库是如何必不可少的。
{"title":"Slate - A Tool for Creating and Maintaining Annotated Corpora","authors":"D. Kaplan, R. Iida, K. Nishina, T. Tokunaga","doi":"10.21248/jlcl.26.2011.149","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.149","url":null,"abstract":"Recent research trends of the last five years show that richly annotated corpora inspire novel research. These richly annotated corpora are indispensable for progressing research, but also more difficult to manage and maintain due to increasing complexity – what is needed is a way to manage the annotation project in its entirety. However, annotation project management has received little attention, with tools predominately focusing on single document annotation. Therefore, we define a list of corpus creation and management needs for annotation systems, and then introduce our multi-purpose annotation and management system Slate to address these needs through use of a case study, showing how project management is essential to creating good corpora.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133184397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
More, Faster: Accelerated Corpus Annotation with Statistical Taggers 更多,更快:使用统计标记器加速语料库注释
Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.154
Arne Skjærholt
We present our experiments with annotating a Latin corpus using an assisted annotation procedure where the corpus to be annotated is preannotated by a statistical tagger. This assisted procedure gives a notable reduction in annotator error compared to the unassisted annotation of previous annotation efforts, even with a huge tagset (1 000 tags) and modest tagger accuracy due to limited training data and domain effects.
我们提出了使用辅助注释程序注释拉丁文语料库的实验,其中要注释的语料库由统计标记器预先注释。与以前的注释工作的无辅助注释相比,这种辅助过程显着减少了注释器错误,即使使用巨大的标签集(1000个标签)和由于有限的训练数据和域效应而导致的标记器准确性不高。
{"title":"More, Faster: Accelerated Corpus Annotation with Statistical Taggers","authors":"Arne Skjærholt","doi":"10.21248/jlcl.26.2011.154","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.154","url":null,"abstract":"We present our experiments with annotating a Latin corpus using an assisted annotation procedure where the corpus to be annotated is preannotated by a statistical tagger. This assisted procedure gives a notable reduction in annotator error compared to the unassisted annotation of previous annotation efforts, even with a huge tagset (1 000 tags) and modest tagger accuracy due to limited training data and domain effects.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115842633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Korpuslinguistik in der linguistischen Lehre: Erfolge und Misserfolge 语言学家研究的成败
Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.141
Noah Bubenhofer
Fur die sprachwissenschaftliche Ausbildung an den Universitaten ist es zwar unabdingbar, die Studierenden in die Theorie und Methoden der Korpuslinguistik einzufuhren, doch als Lehrperson kampft man dabei mit einer Reihe von Problemen, denn das technische und methodische Know-how der Studierenden ist oft sehr heterogen. Zudem zeigt sich die Wichtigkeit, die Studierenden fur korpuslinguistisches Arbeiten begeistern zu konnen, indem sie an attraktives Anschauungsmaterial herangefuhrt werden. Im Folgenden zeige ich an einigen Beispielen, welche Themen in den Bereichen Semantik, Textlinguistik, Diskursund der Kulturanalyse sinnvollerweise korpuslinguistisch bearbeitet werden konnen. Zudem versuche ich anhand des Nutzungsverhaltens meiner Online-Einfuhrung in die Korpuslinguistik die Bedurfnisse von Anwendern an Methoden und Werkzeuge der Korpuslinguistik abzuleiten.
尽管学生从理论和方法上接受学院的语言科学教育至关重要,但作为教师,他们仍面临着一连串问题,因为学生从方法和技术上的观点来看,他们都是找茬的。此外,学生掌握各种有吸引力的读物,既能鼓励学生参与口腔语言学的研究,又能使学生兴致勃勃地参与研究。在下面我要举几个例子说明在语义语言学,文本语言学,讨论和文化分析等领域的哪些课题可以用珊瑚语言学的方法进行运作。还有,根据我在线输入超能力语言学的使用者模式,我也尝试得出试验形式和超能力的使用者需要的结果。
{"title":"Korpuslinguistik in der linguistischen Lehre: Erfolge und Misserfolge","authors":"Noah Bubenhofer","doi":"10.21248/jlcl.26.2011.141","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.141","url":null,"abstract":"Fur die sprachwissenschaftliche Ausbildung an den Universitaten ist es zwar unabdingbar, die Studierenden in die Theorie und Methoden der Korpuslinguistik einzufuhren, doch als Lehrperson kampft man dabei mit einer Reihe von Problemen, denn das technische und methodische Know-how der Studierenden ist oft sehr heterogen. Zudem zeigt sich die Wichtigkeit, die Studierenden fur korpuslinguistisches Arbeiten begeistern zu konnen, indem sie an attraktives Anschauungsmaterial herangefuhrt werden. Im Folgenden zeige ich an einigen Beispielen, welche Themen in den Bereichen Semantik, Textlinguistik, Diskursund der Kulturanalyse sinnvollerweise korpuslinguistisch bearbeitet werden konnen. Zudem versuche ich anhand des Nutzungsverhaltens meiner Online-Einfuhrung in die Korpuslinguistik die Bedurfnisse von Anwendern an Methoden und Werkzeuge der Korpuslinguistik abzuleiten.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125911521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Musisque Deoque: Text Retrieval on Critical Editionse 德文音乐:关键版的文本检索
Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.152
M. Manca, L. Spinazzè, P. Mastandrea, L. Tessarolo, Federico Boschetti
The Musisque Deoque Project (MQDQ) aims at creating a digital archive of Latin poetry, from its origins to the late Italian Renaissance, equipped with critical apparatus and various exegetical and linguistic information. This project is focused on the study of synchronical and diachronical intertextuality as illustrated, e.g., in Cicu (2005). For this reason, we give strong attention to formal and material aspects of the text that actually played a relevant role in the poetical tradition. The fixed text of printed critical editions, aimed at the reconstruction as close as possible to the lost originals, provides just a snapshot of the tradition, which is intrisically dynamic, and gives to the modern reader a distorted image of what an ancient text was in fact. Fully searchable digital collections currently available are based on traditional critical editions, which are, as we just said, authoritarian texts; this authoritarianism is emphasized by the conversion from printed text to database, because usually the critical apparatus is cut away and there is no way for the reader to check a variant different from the one the editor put in the main text, often dubitanter, simply because he had to choose a variant. Limiting lexical searches to editor’s choices drives unavoidably both to false positives and false negatives, which need to be verified back on printed critical editions. False positives are due to possibly wrong emendations made by modern and contemporary scholars, provided by the text retrieval systems among the genuine occurrences, whereas false negatives are the likely variants excluded by editors biased by prejudices against specific linguistic and stylistic phenomena (such as the short-term repetiton, systematically emended by philologists of the last centuries). The purpose of Musisque Deoque is to overcome these limitations, retrieving not only the word keys quoted in the reference edition, but also the variants lying in the critical apparatus. In this way, further knowledge on the accomplished itinerary – from ancient operas during the subsequent ages until the Humanism and the Renaissance – can emerge.
Musisque Deoque项目(MQDQ)旨在创建一个拉丁诗歌的数字档案,从它的起源到意大利文艺复兴晚期,配备了批判设备和各种训诂学和语言学信息。本项目重点研究共时性和历时性互文性,如Cicu(2005)所示。出于这个原因,我们非常关注文本的形式和材料方面,它们实际上在诗歌传统中发挥了相关作用。印刷的批评版本的固定文本,旨在尽可能接近于丢失的原件的重建,只是提供了传统的一个快照,它本质上是动态的,给现代读者一个扭曲的形象,什么是古代文本的实际情况。目前可用的完全可搜索的数字馆藏是基于传统的批评版本的,就像我们刚才说的,是权威文本;这种权威主义在从印刷文本到数据库的转换中得到了强调,因为通常关键的工具被删除了读者没有办法检查一个不同于编辑放在主要文本中的变体,通常是怀疑论者,仅仅因为他必须选择一个变体。将词汇搜索限制在编辑的选择范围内,不可避免地会导致假阳性和假阴性,这需要在印刷的关键版本上进行验证。误报是由于现代和当代学者对真实事件中的文本检索系统进行的可能错误的修订,而误报是由于编辑对特定语言和风格现象的偏见而排除的可能的变体(例如短期重复,由上个世纪的语言学家系统地修改)。Musisque Deoque的目的是克服这些限制,不仅检索参考版本中引用的词键,而且检索关键设备中的变体。通过这种方式,可以进一步了解从古代歌剧到后来的时代,直到人文主义和文艺复兴时期的完整旅程。
{"title":"Musisque Deoque: Text Retrieval on Critical Editionse","authors":"M. Manca, L. Spinazzè, P. Mastandrea, L. Tessarolo, Federico Boschetti","doi":"10.21248/jlcl.26.2011.152","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.152","url":null,"abstract":"The Musisque Deoque Project (MQDQ) aims at creating a digital archive of Latin poetry, from its origins to the late Italian Renaissance, equipped with critical apparatus and various exegetical and linguistic information. This project is focused on the study of synchronical and diachronical intertextuality as illustrated, e.g., in Cicu (2005). For this reason, we give strong attention to formal and material aspects of the text that actually played a relevant role in the poetical tradition. The fixed text of printed critical editions, aimed at the reconstruction as close as possible to the lost originals, provides just a snapshot of the tradition, which is intrisically dynamic, and gives to the modern reader a distorted image of what an ancient text was in fact. Fully searchable digital collections currently available are based on traditional critical editions, which are, as we just said, authoritarian texts; this authoritarianism is emphasized by the conversion from printed text to database, because usually the critical apparatus is cut away and there is no way for the reader to check a variant different from the one the editor put in the main text, often dubitanter, simply because he had to choose a variant. Limiting lexical searches to editor’s choices drives unavoidably both to false positives and false negatives, which need to be verified back on printed critical editions. False positives are due to possibly wrong emendations made by modern and contemporary scholars, provided by the text retrieval systems among the genuine occurrences, whereas false negatives are the likely variants excluded by editors biased by prejudices against specific linguistic and stylistic phenomena (such as the short-term repetiton, systematically emended by philologists of the last centuries). The purpose of Musisque Deoque is to overcome these limitations, retrieving not only the word keys quoted in the reference edition, but also the variants lying in the critical apparatus. In this way, further knowledge on the accomplished itinerary – from ancient operas during the subsequent ages until the Humanism and the Renaissance – can emerge.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128400867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From Old Texts to Modern Spellings: An Experiment in Automatic Normalisation 从旧文本到现代拼写:自动规范化实验
Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.147
Iris Hendrickx, Rita Marquilhas
We aim to tackle the problem of spelling variations in a corpus of personal Portugese letters from the 16 th to the 20 th century. We investigated the extent to which the task of normalising Portuguese spelling can be accom plished automatically. We adapted VARD2 (Baron and Rayson, 2008), a statistical tool for normalising spelling, for use with the Portuguese language and studied its performance over four dierent time periods. Our results showed that VARD2 performed best on the older letters and worst on the most modern ones. In an extrinsic evaluation, we measured the usefulness of automatic normalisation for the linguistic task of automatic POS-tagging and showed that automatic normalisation of spelling helps improve the performance of the POS-tagger.
我们的目标是解决从16世纪到20世纪的个人葡萄牙语信件语料库中的拼写变化问题。我们调查了在多大程度上规范化葡萄牙语拼写的任务可以自动完成。我们采用了VARD2 (Baron和Rayson, 2008),这是一种用于葡萄牙语规范化拼写的统计工具,并研究了它在四个不同时期的表现。我们的结果表明,VARD2在较老的字母上表现最好,在最现代的字母上表现最差。在外部评价中,我们测量了自动规范化对自动pos标注语言任务的有用性,并表明拼写自动规范化有助于提高pos标注器的性能。
{"title":"From Old Texts to Modern Spellings: An Experiment in Automatic Normalisation","authors":"Iris Hendrickx, Rita Marquilhas","doi":"10.21248/jlcl.26.2011.147","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.147","url":null,"abstract":"We aim to tackle the problem of spelling variations in a corpus of personal Portugese letters from the 16 th to the 20 th century. We investigated the extent to which the task of normalising Portuguese spelling can be accom plished automatically. We adapted VARD2 (Baron and Rayson, 2008), a statistical tool for normalising spelling, for use with the Portuguese language and studied its performance over four dierent time periods. Our results showed that VARD2 performed best on the older letters and worst on the most modern ones. In an extrinsic evaluation, we measured the usefulness of automatic normalisation for the linguistic task of automatic POS-tagging and showed that automatic normalisation of spelling helps improve the performance of the POS-tagger.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133797225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
J. Lang. Technol. Comput. Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1