首页 > 最新文献

Proceedings of the 2nd International Conference on Digital Tools & Uses Congress最新文献

英文 中文
Towards big religious data: RESILIENCE research infrastructure for data on religion in the digital age 迈向大宗教数据:数字时代宗教数据的弹性研究基础设施
Marco Büchler, S. Riegert, Federico Alpi, Francesca Cadeddu
Data in and for religion is arguably as old as humanity. Religious significance has been attached to an immense variety of artifacts and documents, often in written form, in nearly all spoken and written languages over the past millennia. The rise of the digital age gives to the scholar in religious studies the opportunity to build research over a much wider array of data than ever before; institutions which have data repositories (such as libraries, museums, universities, etc.) similarly have the chance to make their collections available to a larger community. On the other hand, however, there is a serious risk that a considerable amount of data gets lost during the "Digital transition". This paper presents the approach of the RESILIENCE Research Infrastructure in dealing with the issue of big data and data loss within the field of religious studies.
关于宗教的数据可以说和人类一样古老。在过去的几千年里,几乎所有的口头和书面语言都赋予了各种各样的文物和文件以宗教意义,这些文物和文件通常以书面形式存在。数字时代的兴起给宗教研究学者提供了一个机会,可以在比以往更广泛的数据基础上进行研究;拥有数据存储库的机构(如图书馆、博物馆、大学等)同样有机会将它们的收藏提供给更大的社区。然而,另一方面,在“数字化转型”过程中,存在大量数据丢失的严重风险。本文介绍了弹性研究基础设施在处理宗教研究领域的大数据和数据丢失问题方面的方法。
{"title":"Towards big religious data: RESILIENCE research infrastructure for data on religion in the digital age","authors":"Marco Büchler, S. Riegert, Federico Alpi, Francesca Cadeddu","doi":"10.1145/3423603.3424007","DOIUrl":"https://doi.org/10.1145/3423603.3424007","url":null,"abstract":"Data in and for religion is arguably as old as humanity. Religious significance has been attached to an immense variety of artifacts and documents, often in written form, in nearly all spoken and written languages over the past millennia. The rise of the digital age gives to the scholar in religious studies the opportunity to build research over a much wider array of data than ever before; institutions which have data repositories (such as libraries, museums, universities, etc.) similarly have the chance to make their collections available to a larger community. On the other hand, however, there is a serious risk that a considerable amount of data gets lost during the \"Digital transition\". This paper presents the approach of the RESILIENCE Research Infrastructure in dealing with the issue of big data and data loss within the field of religious studies.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"26 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132394167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data lakes for digital humanities 数字人文的数据湖
J. Darmont, Cécile Favre, Sabine Loudcher, C. Noûs
Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.
数字人文项目中的传统数据具有多种格式(结构化、半结构化、文本化),需要进行大量的转换(编码和标记、词干提取、词序化等)才能进行管理和分析。为了充分掌握这一过程,我们提出使用数据湖来解决数据孤岛和大数据多样性问题。我们描述了我们目前与人文和社会科学研究人员密切合作的数据湖项目,并讨论了从这些项目中获得的经验教训。
{"title":"Data lakes for digital humanities","authors":"J. Darmont, Cécile Favre, Sabine Loudcher, C. Noûs","doi":"10.1145/3423603.3424004","DOIUrl":"https://doi.org/10.1145/3423603.3424004","url":null,"abstract":"Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133247863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Interoperability and discursive process about categories 关于类别的互操作性和话语过程
Orélie Desfriches-Doria, J. Debaz, Waldir Lisboa Rocha Filho
With interdisciplinary contexts and various epistemic points of view, Digital Humanities implies to take into account social dimensions in the production of tools, methods and archive processes. From a pragmatic point of view, Digital Humanities activities involve complex interpretative processes, linking human (individual and collective understanding) and algorithmic computation spaces. This articulation is emphasized in Prospéro software, but still has to support community dialog processes, which can be understood according to vocabulary alignment paradigm or socio-anthropological usage of language paradigm. In order to do so, we give some exploratory guidelines to express and historicize taxonomic processes and define the conditions by which algorithms could support such developments.
在跨学科的背景和不同的认知观点下,数字人文意味着在工具、方法和存档过程的生产中考虑到社会维度。从语用的角度来看,数字人文活动涉及复杂的解释过程,将人类(个人和集体理解)和算法计算空间联系起来。promassro软件强调了这种衔接,但仍然必须支持社区对话过程,这可以根据词汇对齐范式或语言的社会人类学使用范式来理解。为了做到这一点,我们给出了一些探索性的指导方针来表达和历史化分类过程,并定义了算法可以支持这种发展的条件。
{"title":"Interoperability and discursive process about categories","authors":"Orélie Desfriches-Doria, J. Debaz, Waldir Lisboa Rocha Filho","doi":"10.1145/3423603.3424005","DOIUrl":"https://doi.org/10.1145/3423603.3424005","url":null,"abstract":"With interdisciplinary contexts and various epistemic points of view, Digital Humanities implies to take into account social dimensions in the production of tools, methods and archive processes. From a pragmatic point of view, Digital Humanities activities involve complex interpretative processes, linking human (individual and collective understanding) and algorithmic computation spaces. This articulation is emphasized in Prospéro software, but still has to support community dialog processes, which can be understood according to vocabulary alignment paradigm or socio-anthropological usage of language paradigm. In order to do so, we give some exploratory guidelines to express and historicize taxonomic processes and define the conditions by which algorithms could support such developments.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"442 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125765060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From register to digital: a 100-years study of witchhunts around Ac 29 从登记到数字:一个关于公元29年前后女巫狩猎的100年研究
Gwendolin Ortega
The Ac 29 register of the Waldensian cantonal archives is one of the most famous documents of the repression of witchcraft. It covers about thirty witchcraft prosecutions, which have been studied for more than a hundred years. After having been edited, translated and commented on, a new stage in the life of this collection has begun. Indeed, it has been entirely tagged according to the recommendations of the TEI and indexed to an online database, which is now accessible via the Sources du droit Suisse website.
瓦尔登西亚州档案的Ac 29登记册是镇压巫术的最著名的文件之一。它涵盖了大约三十起巫术起诉,这些案件已经研究了一百多年。在经过编辑、翻译和评论之后,这本合集的生活开始了一个新的阶段。事实上,它完全按照TEI的建议进行了标记,并编入了一个在线数据库,现在可以通过瑞士法律资源网站访问该数据库。
{"title":"From register to digital: a 100-years study of witchhunts around Ac 29","authors":"Gwendolin Ortega","doi":"10.1145/3423603.3424006","DOIUrl":"https://doi.org/10.1145/3423603.3424006","url":null,"abstract":"The Ac 29 register of the Waldensian cantonal archives is one of the most famous documents of the repression of witchcraft. It covers about thirty witchcraft prosecutions, which have been studied for more than a hundred years. After having been edited, translated and commented on, a new stage in the life of this collection has begun. Indeed, it has been entirely tagged according to the recommendations of the TEI and indexed to an online database, which is now accessible via the Sources du droit Suisse website.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123334617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decisional architectures from business intelligence to big data: challenges and opportunities 从商业智能到大数据的决策架构:挑战与机遇
M. Aissa, Lilia Sfaxi, R. Robbana
Information is one of the most important factors in business success, hence the importance of the Business Intelligence (BI) domain in order to simplify the decision making and make it more relevant. Decisional systems have been used for several years to help decision-makers access, analyze and extract value from the data that their organisation accumulated through the years. The success gained by these types of systems caused the establishment of a well-known architecture and development chain, and the proliferation of tools and methodologies that have proven their value. Nonetheless, in some use cases, the classical decisional architecture shows some shortcomings. In fact, the traditional storage and processing models in Business Intelligence systems are not sufficient anymore when confronted with data that becomes more and more massive, varied and with a high velocity. This is where Big Data solutions can be of great use. In fact, these solutions have proven their efficiency when dealing with enormous constantly increasing amounts of data with a changing schema. Our goal in this article is to show the various manners to integrate big data solutions into the decisional world, and to help architects choose which architecture corresponds better to their needs, by taking into consideration the environmental, technical and functional constraints they are faced with.
信息是业务成功的最重要因素之一,因此商业智能(BI)领域对于简化决策制定并使其更具相关性非常重要。决策系统已经使用了好几年,以帮助决策者访问、分析和从他们的组织多年来积累的数据中提取价值。这些类型的系统所获得的成功导致了众所周知的体系结构和开发链的建立,以及已经证明其价值的工具和方法的扩散。尽管如此,在一些用例中,经典的决策体系结构显示出一些缺点。事实上,面对日益庞大、多变和高速的数据,传统的商业智能系统存储和处理模式已经不能满足需求。这就是大数据解决方案可以发挥巨大作用的地方。事实上,这些解决方案在处理大量不断增加的数据和不断变化的模式时已经证明了它们的效率。我们在本文中的目标是展示将大数据解决方案集成到决策世界中的各种方式,并通过考虑他们所面临的环境、技术和功能限制,帮助架构师选择更符合他们需求的架构。
{"title":"Decisional architectures from business intelligence to big data: challenges and opportunities","authors":"M. Aissa, Lilia Sfaxi, R. Robbana","doi":"10.1145/3423603.3424049","DOIUrl":"https://doi.org/10.1145/3423603.3424049","url":null,"abstract":"Information is one of the most important factors in business success, hence the importance of the Business Intelligence (BI) domain in order to simplify the decision making and make it more relevant. Decisional systems have been used for several years to help decision-makers access, analyze and extract value from the data that their organisation accumulated through the years. The success gained by these types of systems caused the establishment of a well-known architecture and development chain, and the proliferation of tools and methodologies that have proven their value. Nonetheless, in some use cases, the classical decisional architecture shows some shortcomings. In fact, the traditional storage and processing models in Business Intelligence systems are not sufficient anymore when confronted with data that becomes more and more massive, varied and with a high velocity. This is where Big Data solutions can be of great use. In fact, these solutions have proven their efficiency when dealing with enormous constantly increasing amounts of data with a changing schema. Our goal in this article is to show the various manners to integrate big data solutions into the decisional world, and to help architects choose which architecture corresponds better to their needs, by taking into consideration the environmental, technical and functional constraints they are faced with.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115309262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdRobot
K. Hafaiedh, Mouhib Ben Rhouma, Fahd Chargui, Yassine Haouas, A. Kerkeni
Digital Advertising and promotional e-campaigns have been a basic pillar of marketing. One of the main challenges marketers face nowadays is about associating the right promotion to the right customer. Making the product-customer assignment accurate is crucial to satisfy customer needs. However, manually analyzing qualitative data for the purpose of defining the right target audience is exhausting and time consuming, especially when the number of costumers is high. In this paper, our aim is to automatically assign personalized campaigns that match specific customer desire, therefore making promotional campaigns consistent with their interests. Automating the process of assigning the right promotion to the right customer according to its specific needs is appealing as customers often show little to no interest in random ads. Our solution, referred to as "AdRobot", aims at overcoming these challenges by gathering complex data and insights into the target audience using data collected from conversations via the designed chatbot. Our strategy consists of performing fine-grained audience classification by segmenting profiles based on some profiling and conversational constraints, so that the audience is matched with the right promotional campaign. In order to achieve this goal, we propose an algorithm that investigates profiling and conversational data collected along with the customers' intents using artificial intelligence heuristics. Results show that "AdRobot" accurately matches promotional campaigns with the right customers according to their needs.
{"title":"AdRobot","authors":"K. Hafaiedh, Mouhib Ben Rhouma, Fahd Chargui, Yassine Haouas, A. Kerkeni","doi":"10.1145/3423603.3424052","DOIUrl":"https://doi.org/10.1145/3423603.3424052","url":null,"abstract":"Digital Advertising and promotional e-campaigns have been a basic pillar of marketing. One of the main challenges marketers face nowadays is about associating the right promotion to the right customer. Making the product-customer assignment accurate is crucial to satisfy customer needs. However, manually analyzing qualitative data for the purpose of defining the right target audience is exhausting and time consuming, especially when the number of costumers is high. In this paper, our aim is to automatically assign personalized campaigns that match specific customer desire, therefore making promotional campaigns consistent with their interests. Automating the process of assigning the right promotion to the right customer according to its specific needs is appealing as customers often show little to no interest in random ads. Our solution, referred to as \"AdRobot\", aims at overcoming these challenges by gathering complex data and insights into the target audience using data collected from conversations via the designed chatbot. Our strategy consists of performing fine-grained audience classification by segmenting profiles based on some profiling and conversational constraints, so that the audience is matched with the right promotional campaign. In order to achieve this goal, we propose an algorithm that investigates profiling and conversational data collected along with the customers' intents using artificial intelligence heuristics. Results show that \"AdRobot\" accurately matches promotional campaigns with the right customers according to their needs.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121298174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards a recommendation system for children's animated movies based on chromatic features 基于色彩特征的儿童动画电影推荐系统研究
M. Frikha, Tarek Zlitni, N. Bouassida
The production of audiovisual content continues to increase, and mainly the content intended for children such as the animated video. However, children are unable to judge the quality, relevance and essentially the adequacy of videos to their ages. In this case, the problem is that many videos can have a negative impact on the psychology and the emotional state of children. Our aim in this work is to identify if an animated video can have a negative impact or not in order to develop a recommendation system. The desired system allows to avoid animated movies causing negative impacts for children. For this, we use a chromatic approach. We propose multi-level descriptors, based on color features, in the judging process. The results of the experimental study of the proposed approach prove its performance and consequently the importance of chromatic features as pertinent features for a recommendation system that aims to reduce the emotional and psychological impacts of children movies and avoid them.
视听内容的制作不断增加,主要是面向儿童的内容,如动画视频。然而,儿童无法判断视频的质量、相关性和本质上是否适合他们的年龄。在这种情况下,问题是许多视频会对孩子的心理和情绪状态产生负面影响。我们在这项工作中的目标是确定动画视频是否会产生负面影响,以便开发推荐系统。该系统允许避免动画电影对儿童造成负面影响。为此,我们使用彩色方法。在判断过程中,我们提出了基于颜色特征的多级描述符。实验研究的结果证明了该方法的性能,从而证明了色特征作为相关特征对推荐系统的重要性,该推荐系统旨在减少和避免儿童电影对情感和心理的影响。
{"title":"Towards a recommendation system for children's animated movies based on chromatic features","authors":"M. Frikha, Tarek Zlitni, N. Bouassida","doi":"10.1145/3423603.3424058","DOIUrl":"https://doi.org/10.1145/3423603.3424058","url":null,"abstract":"The production of audiovisual content continues to increase, and mainly the content intended for children such as the animated video. However, children are unable to judge the quality, relevance and essentially the adequacy of videos to their ages. In this case, the problem is that many videos can have a negative impact on the psychology and the emotional state of children. Our aim in this work is to identify if an animated video can have a negative impact or not in order to develop a recommendation system. The desired system allows to avoid animated movies causing negative impacts for children. For this, we use a chromatic approach. We propose multi-level descriptors, based on color features, in the judging process. The results of the experimental study of the proposed approach prove its performance and consequently the importance of chromatic features as pertinent features for a recommendation system that aims to reduce the emotional and psychological impacts of children movies and avoid them.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128809090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A distributed intelligent agent based intrusion detection system using deep learning algorithms 基于深度学习算法的分布式智能代理入侵检测系统
Sami Mezghani, F. Ktata
Intrusion detection system (IDS) is one of the best solutions protecting against the attacks and anomalies. Behaviour based IDS is one type of IDS that is the most subject draining the interest of most research works in the cyber-security field. The main problems treated are accuracy and false positive rate. Distribution of the detection process is one solution for those problems. The distribution consists of implementing an agent based IDS where each and every agent has a goal to achieve so the intrusion detection process can be completed with the best results through the multi agent system features including the communication, negotiation, ... . The adapted solution in our model proposed is a distributed Intelligent agent based intrusion detection system using convolutional and recurrent neural networks for the training and attack prevention.
入侵检测系统(IDS)是防范攻击和异常的最佳解决方案之一。基于行为的入侵检测是网络安全领域最受关注的一种入侵检测方法。治疗的主要问题是准确性和假阳性率。检测过程的分布是解决这些问题的一种方法。分布式包括实现一个基于agent的入侵检测系统,其中每个agent都有自己的目标,通过通信、协商、... .等多agent系统特性,以最佳的效果完成入侵检测过程该模型的改进方案是基于分布式智能代理的入侵检测系统,该系统使用卷积神经网络和递归神经网络进行训练和攻击防御。
{"title":"A distributed intelligent agent based intrusion detection system using deep learning algorithms","authors":"Sami Mezghani, F. Ktata","doi":"10.1145/3423603.3424054","DOIUrl":"https://doi.org/10.1145/3423603.3424054","url":null,"abstract":"Intrusion detection system (IDS) is one of the best solutions protecting against the attacks and anomalies. Behaviour based IDS is one type of IDS that is the most subject draining the interest of most research works in the cyber-security field. The main problems treated are accuracy and false positive rate. Distribution of the detection process is one solution for those problems. The distribution consists of implementing an agent based IDS where each and every agent has a goal to achieve so the intrusion detection process can be completed with the best results through the multi agent system features including the communication, negotiation, ... . The adapted solution in our model proposed is a distributed Intelligent agent based intrusion detection system using convolutional and recurrent neural networks for the training and attack prevention.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124623202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Standardizing linguistic data: method and tools for annotating (pre-orthographic) French 标准化语言数据:注释(前正字法)法语的方法和工具
Simon Gabay, Thibault Clérice, Jean-Baptiste Camps, Jean-Baptiste Tanguy, Matthias Gille Levenson
With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e.g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations. In the present paper, we describe both methodologically (by proposing annotation principles) and technically (by creating the required training data and the relevant models) the production of a linguistic tagger for (early) modern French (16-18th c.), taking as much as possible into account already existing standards for contemporary and, especially, medieval French.
随着各时期大语料库的发展,标准化语言标注(如引理、词性标注、词形标注)以提高所产生数据的互操作性变得至关重要,尽管存在历时性差异。在本文中,我们在方法上(通过提出注释原则)和技术上(通过创建所需的训练数据和相关模型)描述了(早期)现代法语(16-18世纪)的语言标注器的生产,尽可能多地考虑到当代,特别是中世纪法语的现有标准。
{"title":"Standardizing linguistic data: method and tools for annotating (pre-orthographic) French","authors":"Simon Gabay, Thibault Clérice, Jean-Baptiste Camps, Jean-Baptiste Tanguy, Matthias Gille Levenson","doi":"10.1145/3423603.3423996","DOIUrl":"https://doi.org/10.1145/3423603.3423996","url":null,"abstract":"With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e.g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations. In the present paper, we describe both methodologically (by proposing annotation principles) and technically (by creating the required training data and the relevant models) the production of a linguistic tagger for (early) modern French (16-18th c.), taking as much as possible into account already existing standards for contemporary and, especially, medieval French.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125032680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A cooperative agents-based workflow-level distributed data placement strategy for scientific cloud workflows 科学云工作流中基于协作代理的工作流级分布式数据放置策略
Rihab Derouiche, Zaki Brahmi
Within the Cloud Computing context, the placement of massive data used by scientific workflows environment appears to be costly in terms of energy consumption and data transfer time. Indeed, due to the large size of consumed and generated datasets of scientific workflows tasks, the data placement problem becomes more challenging task. This problem is considered as NP-Hard problem. Ensuring an optimal mapping of data to Cloud Storage Services at a reasonable time turns out to be necessary. Many task-level approaches have been hence proposed while considering shared datasets within individual workflows to reduce data transfer cost, which is not efficient enough for the situation of multiple workflows. In the same perspective, taking into account the fixed datasets still an important issue. In the present paper, cooperative agents and Formal Analysis Concepts (FCA) based-data placement strategy for workflow-level data-intensive workflows is proposed. The proposed approach deals with three main concerns: i) treating multiple workflows simultaneously, ii) considering all types of data, specifically, fixed datasets, and iii) reducing the execution time of the data placement algorithm based on a set of cooperative intelligent agents. Experimental results show that the distribution of the proposed data placement strategy among a set of cooperative agents and the FCA approach can be more cost-effective than its task-level counterpart. Eventually, the proposed strategy proves to reduce the execution time during the execution of the tasks of multiple workflows.
在云计算环境中,科学工作流环境使用的大量数据的放置似乎在能耗和数据传输时间方面代价高昂。事实上,由于科学工作流任务消耗和生成的数据集的规模很大,数据放置问题变得更具挑战性。这个问题被认为是NP-Hard问题。在合理的时间确保数据到云存储服务的最佳映射是必要的。因此,人们提出了许多任务级的方法,在单个工作流中考虑共享数据集,以降低数据传输成本,但对于多个工作流的情况,这种方法的效率不够高。从同样的角度来看,考虑固定数据集仍然是一个重要的问题。本文提出了基于合作代理和形式分析概念(FCA)的工作流级数据密集型工作流数据放置策略。提出的方法主要涉及三个方面:1)同时处理多个工作流;2)考虑所有类型的数据,特别是固定数据集;3)基于一组协作智能代理减少数据放置算法的执行时间。实验结果表明,所提出的数据放置策略在一组合作代理之间的分布和FCA方法比其任务级对应方法更具成本效益。最后,本文提出的策略在多个工作流的任务执行过程中减少了执行时间。
{"title":"A cooperative agents-based workflow-level distributed data placement strategy for scientific cloud workflows","authors":"Rihab Derouiche, Zaki Brahmi","doi":"10.1145/3423603.3424009","DOIUrl":"https://doi.org/10.1145/3423603.3424009","url":null,"abstract":"Within the Cloud Computing context, the placement of massive data used by scientific workflows environment appears to be costly in terms of energy consumption and data transfer time. Indeed, due to the large size of consumed and generated datasets of scientific workflows tasks, the data placement problem becomes more challenging task. This problem is considered as NP-Hard problem. Ensuring an optimal mapping of data to Cloud Storage Services at a reasonable time turns out to be necessary. Many task-level approaches have been hence proposed while considering shared datasets within individual workflows to reduce data transfer cost, which is not efficient enough for the situation of multiple workflows. In the same perspective, taking into account the fixed datasets still an important issue. In the present paper, cooperative agents and Formal Analysis Concepts (FCA) based-data placement strategy for workflow-level data-intensive workflows is proposed. The proposed approach deals with three main concerns: i) treating multiple workflows simultaneously, ii) considering all types of data, specifically, fixed datasets, and iii) reducing the execution time of the data placement algorithm based on a set of cooperative intelligent agents. Experimental results show that the distribution of the proposed data placement strategy among a set of cooperative agents and the FCA approach can be more cost-effective than its task-level counterpart. Eventually, the proposed strategy proves to reduce the execution time during the execution of the tasks of multiple workflows.","PeriodicalId":387247,"journal":{"name":"Proceedings of the 2nd International Conference on Digital Tools & Uses Congress","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130346419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 2nd International Conference on Digital Tools & Uses Congress
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1