This article analyzes the citation network of the Babylonian Talmud, building on an earlier article that we published (Satlow and Sperling 2022). The article has three goals. Our first goal is to show how an ontological-based information extraction system combined with pattern matching can successfully extract structured data from a very complicated, unstructured text. Our second goal is to extend our previous analysis and demonstrate how citation data might lead to wider conclusions about redactional patterns. In addition to highlighting the citation tendencies of different tractates (which could indicate different redactors for those tractates), we hypothesize that there existed a source document originating in the circle of Rav Yehudah bar Yehezkel, used by at least some redactors, and that the character of Rabbi Zeira deserves further attention as an important figure connecting different nodes on the network. Finally, we seek to outline an analytical workflow that could be helpful to other historical projects in the digital humanities.
本文分析了《巴比伦塔木德经》的引文网络,以我们之前发表的一篇文章(Satlow and Sperling 2022)为基础。本文有三个目标。第一个目标是展示基于本体的信息提取系统如何结合模式匹配,成功地从非常复杂的非结构化文本中提取结构化数据。我们的第二个目标是扩展我们之前的分析,并展示引文数据如何能为编辑模式带来更广泛的结论。除了强调不同篇章的引用倾向(这可能表明这些篇章有不同的节录者)之外,我们还假设存在一个源文件,该源文件源自 Rav Yehudah bar Yehezkel 的圈子,至少被一些节录者使用,而拉比-泽拉(Rabbi Zeira)作为连接网络上不同节点的重要人物值得进一步关注。最后,我们试图勾勒出一种分析工作流程,它可能对数字人文领域的其他历史项目有所帮助。
{"title":"Social network analysis of the Babylonian Talmud","authors":"Michael L Satlow, Michael Sperling","doi":"10.1093/llc/fqae037","DOIUrl":"https://doi.org/10.1093/llc/fqae037","url":null,"abstract":"This article analyzes the citation network of the Babylonian Talmud, building on an earlier article that we published (Satlow and Sperling 2022). The article has three goals. Our first goal is to show how an ontological-based information extraction system combined with pattern matching can successfully extract structured data from a very complicated, unstructured text. Our second goal is to extend our previous analysis and demonstrate how citation data might lead to wider conclusions about redactional patterns. In addition to highlighting the citation tendencies of different tractates (which could indicate different redactors for those tractates), we hypothesize that there existed a source document originating in the circle of Rav Yehudah bar Yehezkel, used by at least some redactors, and that the character of Rabbi Zeira deserves further attention as an important figure connecting different nodes on the network. Finally, we seek to outline an analytical workflow that could be helpful to other historical projects in the digital humanities.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"28 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of this article is to offer a systematic review of digital studies that provide new research perspectives on ancient classical theatre. The undeniable progress in the field of computational analysis in the service of traditional textual interpretation is helping to study in greater depth and to interpret in greater detail the classical linguistic corpora that have come down to us through the manuscript tradition. The new model of digital research is integrated not only in the field of information technologies, but also in the field of e-learning, where we can already observe the implementation of a new educational model. Based on the digital processing of data on Greco-Roman theatre, a systematic review is presented, following the methodological principles of the PRISMA statement [Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2009 (2020)].
{"title":"Ancient classical theatre from the digital humanities: a systematic review 2010–21","authors":"Roxana Beatriz Martínez Nieto, Monika Dabrowska","doi":"10.1093/llc/fqae033","DOIUrl":"https://doi.org/10.1093/llc/fqae033","url":null,"abstract":"The aim of this article is to offer a systematic review of digital studies that provide new research perspectives on ancient classical theatre. The undeniable progress in the field of computational analysis in the service of traditional textual interpretation is helping to study in greater depth and to interpret in greater detail the classical linguistic corpora that have come down to us through the manuscript tradition. The new model of digital research is integrated not only in the field of information technologies, but also in the field of e-learning, where we can already observe the implementation of a new educational model. Based on the digital processing of data on Greco-Roman theatre, a systematic review is presented, following the methodological principles of the PRISMA statement [Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2009 (2020)].","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"29 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media has found its path into the daily lives of people. There are several ways that users communicate in which liking and sharing images stands out. Each image shared by a user can be analyzed from aesthetic and personality traits views. In recent studies, it has been proved that personality traits impact personalized image aesthetics assessment. In this article, the same pattern was studied from a different perspective. So, we evaluated the impact of image aesthetics on personality traits to check if there is any relation between them in this form. Hence, in a two-stage architecture, we have leveraged image aesthetics to predict the personality traits of users. The first stage includes a multi-task deep learning paradigm that consists of an encoder/decoder in which the core of the network is a Swin Transformer. The second stage combines image aesthetics and personality traits with an attention mechanism for personality trait prediction. The results showed that the proposed method had achieved an average Spearman Rank Order Correlation Coefficient (SROCC) of 0.776 in image aesthetic on the Flickr-AES database and an average SROCC of 0.6730 on the PsychoFlickr database, which outperformed related SOTA (State of the Art) studies. The average accuracy performance of the first stage was boosted by 7.02 per cent in the second stage, considering the influence of image aesthetics on personality trait prediction.
{"title":"Personality prediction via multi-task transformer architecture combined with image aesthetics","authors":"Shahryar Salmani Bajestani, Mohammad Mahdi Khalilzadeh, Mahdi Azarnoosh, Hamid Reza Kobravi","doi":"10.1093/llc/fqae034","DOIUrl":"https://doi.org/10.1093/llc/fqae034","url":null,"abstract":"Social media has found its path into the daily lives of people. There are several ways that users communicate in which liking and sharing images stands out. Each image shared by a user can be analyzed from aesthetic and personality traits views. In recent studies, it has been proved that personality traits impact personalized image aesthetics assessment. In this article, the same pattern was studied from a different perspective. So, we evaluated the impact of image aesthetics on personality traits to check if there is any relation between them in this form. Hence, in a two-stage architecture, we have leveraged image aesthetics to predict the personality traits of users. The first stage includes a multi-task deep learning paradigm that consists of an encoder/decoder in which the core of the network is a Swin Transformer. The second stage combines image aesthetics and personality traits with an attention mechanism for personality trait prediction. The results showed that the proposed method had achieved an average Spearman Rank Order Correlation Coefficient (SROCC) of 0.776 in image aesthetic on the Flickr-AES database and an average SROCC of 0.6730 on the PsychoFlickr database, which outperformed related SOTA (State of the Art) studies. The average accuracy performance of the first stage was boosted by 7.02 per cent in the second stage, considering the influence of image aesthetics on personality trait prediction.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"29 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the last decade, a plethora of training datasets have been compiled for use in language-based machine perception and in human-centered AI, alongside research regarding their compilation methods. From a primarily linguistic perspective, we add to these studies in two ways. First, we provide an overview of sixty-six training datasets used in automatic image, video, and audio captioning, examining their compilation methods with a metadata analysis. Second, we delve into the annotation process of crowdsourced datasets with an interest in understanding the linguistic factors that affect the form and content of the captions, such as contextualization and perspectivation. With a qualitative content analysis, we examine annotator instructions with a selection of eleven datasets. Drawing from various theoretical frameworks that help assess the effectiveness of the instructions, we discuss the visual and textual presentation of the instructions, as well as the perspective-guidance that is an essential part of the language instructions. While our analysis indicates that some standards in the formulation of instructions seem to have formed in the field, we also identified various reoccurring issues potentially hindering readability and comprehensibility of the instructions, and therefore, caption quality. To enhance readability, we emphasize the importance of text structure, organization of the information, consistent use of typographical cues, and clarity of language use. Last, engaging with previous research, we assess the compilation of both web-sourced and crowdsourced captioning datasets from various perspectives, discussing factors affecting the diversity of the datasets.
{"title":"Language-based machine perception: linguistic perspectives on the compilation of captioning datasets","authors":"Laura Hekanaho, Maija Hirvonen, Tuomas Virtanen","doi":"10.1093/llc/fqae029","DOIUrl":"https://doi.org/10.1093/llc/fqae029","url":null,"abstract":"Over the last decade, a plethora of training datasets have been compiled for use in language-based machine perception and in human-centered AI, alongside research regarding their compilation methods. From a primarily linguistic perspective, we add to these studies in two ways. First, we provide an overview of sixty-six training datasets used in automatic image, video, and audio captioning, examining their compilation methods with a metadata analysis. Second, we delve into the annotation process of crowdsourced datasets with an interest in understanding the linguistic factors that affect the form and content of the captions, such as contextualization and perspectivation. With a qualitative content analysis, we examine annotator instructions with a selection of eleven datasets. Drawing from various theoretical frameworks that help assess the effectiveness of the instructions, we discuss the visual and textual presentation of the instructions, as well as the perspective-guidance that is an essential part of the language instructions. While our analysis indicates that some standards in the formulation of instructions seem to have formed in the field, we also identified various reoccurring issues potentially hindering readability and comprehensibility of the instructions, and therefore, caption quality. To enhance readability, we emphasize the importance of text structure, organization of the information, consistent use of typographical cues, and clarity of language use. Last, engaging with previous research, we assess the compilation of both web-sourced and crowdsourced captioning datasets from various perspectives, discussing factors affecting the diversity of the datasets.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"44 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article addresses the problematic authorship of The Constitutions of the Free-Masons (1723). Traditionally associated with James Anderson, using stylometry, we examine whether and, if so, where John T. Desaguliers, the prime mover of early English institutionalized Freemasonry, contributed to this publication. Our corpus includes writings by Anderson, Desaguliers, and two contemporary Freemasons used as distractors. The transcribed works contain texts from different genres and of varying lengths. In our methodology, we employ a wide range of robust, multivariate, unsupervised, and cross-validated supervised tests, verified through significance testing, which can hopefully contribute to the establishment of standards for historical authorship attribution. Our results suggest, in line with historical evidence, that the legendary history of the Constitutions was most likely primarily authored by Anderson. However, several of the Charges including the first one ‘Concerning God and religion’, one of the most disputed texts in the history of Freemasonry, are closer to the style of Desaguliers. The General Regulations concerning the organization of the lodges, hitherto attributed to George Payne, played a fundamental role in spreading Freemasonry worldwide. Our analyses show that the stylistic affinity of fifteen of the thirty-nine regulations has a pronounced closeness to Anderson’s style, five align more closely with Desaguliers’ style. The authorship of the rest remains inconclusive partly due to the insufficient length of texts by Payne. These novel findings are also supported by a close reading of the Constitutions and other contemporary primary sources.
{"title":"Who wrote the first Constitutions of Freemasonry?","authors":"Róbert Péter, Alejandro Napolitano Jawerbaum","doi":"10.1093/llc/fqae023","DOIUrl":"https://doi.org/10.1093/llc/fqae023","url":null,"abstract":"This article addresses the problematic authorship of The Constitutions of the Free-Masons (1723). Traditionally associated with James Anderson, using stylometry, we examine whether and, if so, where John T. Desaguliers, the prime mover of early English institutionalized Freemasonry, contributed to this publication. Our corpus includes writings by Anderson, Desaguliers, and two contemporary Freemasons used as distractors. The transcribed works contain texts from different genres and of varying lengths. In our methodology, we employ a wide range of robust, multivariate, unsupervised, and cross-validated supervised tests, verified through significance testing, which can hopefully contribute to the establishment of standards for historical authorship attribution. Our results suggest, in line with historical evidence, that the legendary history of the Constitutions was most likely primarily authored by Anderson. However, several of the Charges including the first one ‘Concerning God and religion’, one of the most disputed texts in the history of Freemasonry, are closer to the style of Desaguliers. The General Regulations concerning the organization of the lodges, hitherto attributed to George Payne, played a fundamental role in spreading Freemasonry worldwide. Our analyses show that the stylistic affinity of fifteen of the thirty-nine regulations has a pronounced closeness to Anderson’s style, five align more closely with Desaguliers’ style. The authorship of the rest remains inconclusive partly due to the insufficient length of texts by Payne. These novel findings are also supported by a close reading of the Constitutions and other contemporary primary sources.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"79 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maigul Shakenova, Dybys Tashimkhanova, Gulvira Shaikova, Ulzhan Ospanova, Olga Popovich
The issue of quantitative measurement and automatic processing is a significant problem in determining the markers of the manipulative potential of media texts, since linguistic indicators are the basis of machine parameterization. The purpose of the research is to analyse the possibilities of the main language parameters of the manipulativeness of media discourse, which can be identified using machine learning. To achieve the research goals, the following methods were used: system, content analysis, computer modelling, and comparative. The results of the article determined that such language indicators as use of the subjunctive mood of verbs, capital letters, high frequency of use of the ‘not’ particle, punctuation marks, questions, or exclamations of a rhetorical nature, use of quotation marks for the purpose of irony, double negative sentences, use of the word ‘no’, and verbal structures calling to action act as computer classification parameters. In order to cover the above purpose, PYTHON software was implemented that allowed texts to be analysed and visualized in algorithmic and lexical-vocabulary ways. In addition, it was determined that by integrating the PYTHON tool, it became possible to use language transformation markers that formed linguistic patterns in the analysed text. The list of parameters for diagnosing manipulative texts is non-exhaustive, which emphasizes the possibility of machine measurement of the manipulative component of mass media discourse.
{"title":"Parameterization of manipulative media discourse: possibilities and problems of automatic diagnosis","authors":"Maigul Shakenova, Dybys Tashimkhanova, Gulvira Shaikova, Ulzhan Ospanova, Olga Popovich","doi":"10.1093/llc/fqae024","DOIUrl":"https://doi.org/10.1093/llc/fqae024","url":null,"abstract":"The issue of quantitative measurement and automatic processing is a significant problem in determining the markers of the manipulative potential of media texts, since linguistic indicators are the basis of machine parameterization. The purpose of the research is to analyse the possibilities of the main language parameters of the manipulativeness of media discourse, which can be identified using machine learning. To achieve the research goals, the following methods were used: system, content analysis, computer modelling, and comparative. The results of the article determined that such language indicators as use of the subjunctive mood of verbs, capital letters, high frequency of use of the ‘not’ particle, punctuation marks, questions, or exclamations of a rhetorical nature, use of quotation marks for the purpose of irony, double negative sentences, use of the word ‘no’, and verbal structures calling to action act as computer classification parameters. In order to cover the above purpose, PYTHON software was implemented that allowed texts to be analysed and visualized in algorithmic and lexical-vocabulary ways. In addition, it was determined that by integrating the PYTHON tool, it became possible to use language transformation markers that formed linguistic patterns in the analysed text. The list of parameters for diagnosing manipulative texts is non-exhaustive, which emphasizes the possibility of machine measurement of the manipulative component of mass media discourse.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"46 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hollywood film remakes, as old as the cinema itself, have attracted much professional, critical, and academic attention. They have been viewed by art critics as products of cultural derivativity and imperialism and commended by financial experts as low-risk business investments, closely linked to other forms of brand extension, such as sequels and bestseller adaptations. In this article, we adopt a film-historical quantitative approach to Hollywood film remakes by analysing metadata obtained from the Internet Movie Database (IMDb) and verified against reliable print and web sources. We analyse 986 Hollywood remakes produced between 1915 and 2020 in terms of raw and relative frequencies of annual releases, genre (in)stability, and patterns of transnational reproduction. We contrast our findings with those outlined by Henderson (2014a) in his statistical survey of Hollywood sequels, series films, prequels, and spin-offs, presented in his monograph The Hollywood Sequel: History and Form, 1911–2010. Having completed his list with recent sequential productions released between 2011 and 2020, we investigate the potential parallels between Hollywood remaking and sequelization practices. Our findings demonstrate historical discrepancies in various ‘content recycling’ trends, which help better characterize the cultural and commercial significance of remakes and serial forms in the American film industry.
{"title":"A statistical approach to Hollywood remake and sequel metadata","authors":"Agata Hołobut, Jan Rybicki, Miłosz Stelmach","doi":"10.1093/llc/fqae012","DOIUrl":"https://doi.org/10.1093/llc/fqae012","url":null,"abstract":"Hollywood film remakes, as old as the cinema itself, have attracted much professional, critical, and academic attention. They have been viewed by art critics as products of cultural derivativity and imperialism and commended by financial experts as low-risk business investments, closely linked to other forms of brand extension, such as sequels and bestseller adaptations. In this article, we adopt a film-historical quantitative approach to Hollywood film remakes by analysing metadata obtained from the Internet Movie Database (IMDb) and verified against reliable print and web sources. We analyse 986 Hollywood remakes produced between 1915 and 2020 in terms of raw and relative frequencies of annual releases, genre (in)stability, and patterns of transnational reproduction. We contrast our findings with those outlined by Henderson (2014a) in his statistical survey of Hollywood sequels, series films, prequels, and spin-offs, presented in his monograph The Hollywood Sequel: History and Form, 1911–2010. Having completed his list with recent sequential productions released between 2011 and 2020, we investigate the potential parallels between Hollywood remaking and sequelization practices. Our findings demonstrate historical discrepancies in various ‘content recycling’ trends, which help better characterize the cultural and commercial significance of remakes and serial forms in the American film industry.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"22 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contemporary Japanese given names exhibit great variety and have minimal formal restrictions in their formation. It is often possible, however, to determine the gender of the name's bearer from its phonological and/or graphic form. In this article, various features, including name length, syllables, and characters at particular positions within a name and the choice of script, are statistically analyzed to determine whether they are significantly associated with male or female names and which of them contribute the most to the expression of gender. The findings of this study verify the empirical knowledge of the gender-markedness of some of the features and establish a solid foundation for future feature-based gender prediction algorithms. The expression of gender in currently bestowed names is discussed in the context of major changes in naming practices and name choices toward the end of the 20th century.
{"title":"Gender-specific features in contemporary Japanese names","authors":"Ivona Barešová, Tereza Nakaya, Vladimír Matlach","doi":"10.1093/llc/fqae022","DOIUrl":"https://doi.org/10.1093/llc/fqae022","url":null,"abstract":"Contemporary Japanese given names exhibit great variety and have minimal formal restrictions in their formation. It is often possible, however, to determine the gender of the name's bearer from its phonological and/or graphic form. In this article, various features, including name length, syllables, and characters at particular positions within a name and the choice of script, are statistically analyzed to determine whether they are significantly associated with male or female names and which of them contribute the most to the expression of gender. The findings of this study verify the empirical knowledge of the gender-markedness of some of the features and establish a solid foundation for future feature-based gender prediction algorithms. The expression of gender in currently bestowed names is discussed in the context of major changes in naming practices and name choices toward the end of the 20th century.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"10 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140834059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An early industrial town’s spatial segregation is studied using empirical data concerning the Russian population of the town of Vyborg. Several hypotheses for explaining segregation are considered using spatial analysis. The spatial data are derived from historical maps and demographic data from various tax records. Socioeconomic segregation is studied as a possible cause of ethnic segregation. The main drivers of spatial segregation were the explicit policies of segregation enforced by both the Russian military administration and the town’s civilian administration. While the effects of segregation gradually diminished due to social diffusion, the impact of policy decisions driving segregation in the 18th and early 19th centuries was still visible in the population’s later 19th-century segregation. Yet neither the different preferences of Russians and others nor the income differences between areas explains the distribution of Russians. Segregation based on the membership of a guild was insignificant, with a few exceptions. Other factors such as discrimination, prejudice, and differences in housing market information probably contributed to segregation, but they cannot be studied with the data used.
{"title":"Explaining the spatial segregation of ethnic groups in an early industrial city: the case of Vyborg","authors":"Antti Härkönen","doi":"10.1093/llc/fqae017","DOIUrl":"https://doi.org/10.1093/llc/fqae017","url":null,"abstract":"An early industrial town’s spatial segregation is studied using empirical data concerning the Russian population of the town of Vyborg. Several hypotheses for explaining segregation are considered using spatial analysis. The spatial data are derived from historical maps and demographic data from various tax records. Socioeconomic segregation is studied as a possible cause of ethnic segregation. The main drivers of spatial segregation were the explicit policies of segregation enforced by both the Russian military administration and the town’s civilian administration. While the effects of segregation gradually diminished due to social diffusion, the impact of policy decisions driving segregation in the 18th and early 19th centuries was still visible in the population’s later 19th-century segregation. Yet neither the different preferences of Russians and others nor the income differences between areas explains the distribution of Russians. Segregation based on the membership of a guild was insignificant, with a few exceptions. Other factors such as discrimination, prejudice, and differences in housing market information probably contributed to segregation, but they cannot be studied with the data used.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"75 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christofer Meinecke, Estelle Guéville, David Joseph Wrisley, Stefan Jänicke
Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a “bridge” in the combined dataset, and (2) to establish a high-quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets enable batch re-annotation of images, recommendation of label candidates, and support composing a hierarchical classification of labels.
{"title":"Is medieval distant viewing possible? : Extending and enriching annotation of legacy image collections using visual analytics","authors":"Christofer Meinecke, Estelle Guéville, David Joseph Wrisley, Stefan Jänicke","doi":"10.1093/llc/fqae020","DOIUrl":"https://doi.org/10.1093/llc/fqae020","url":null,"abstract":"Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a “bridge” in the combined dataset, and (2) to establish a high-quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets enable batch re-annotation of images, recommendation of label candidates, and support composing a hierarchical classification of labels.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"2015 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}