Pub Date : 2024-02-20DOI: 10.1016/j.scico.2024.103094
Pablo Becker, María Fernanda Papa, Luis Olsina
This paper shows an exploratory study of the syntactic and semantic similarities and discrepancies of the terms of four selected project management glossaries. The main purpose of this study is to provide recommendations on adoptions and adaptations of labels and/or definitions of project management glossary terms to be included in a new or existing project management ontology. As a result, a list of recommended terms for a project management ontology to be built is analyzed. The recommendation of terms will be limited to generic terms that can be located at the core level instead of the domain level in the context of an ontological architecture. In particular, the list of terms will be discussed in light of a previously developed project management ontology that will be updated in future work. Another goal of this work is to evaluate the level of syntactic and semantic consistency and harmonization that currently exists in these glossaries. As a result, it becomes apparent from this early research that many opportunities exist to improve these terminologies for greater consistency, harmonization, and standardization in the field.
{"title":"Exploratory study on the syntactic and semantic consistency of terms in project management glossaries to provide recommendations for a project management ontology","authors":"Pablo Becker, María Fernanda Papa, Luis Olsina","doi":"10.1016/j.scico.2024.103094","DOIUrl":"10.1016/j.scico.2024.103094","url":null,"abstract":"<div><p>This paper shows an exploratory study of the syntactic and semantic similarities and discrepancies of the terms of four selected project management glossaries. The main purpose of this study is to provide recommendations on adoptions and adaptations of labels and/or definitions of project management glossary terms to be included in a new or existing project management ontology. As a result, a list of recommended terms for a project management ontology to be built is analyzed. The recommendation of terms will be limited to generic terms that can be located at the core level instead of the domain level in the context of an ontological architecture. In particular, the list of terms will be discussed in light of a previously developed project management ontology that will be updated in future work. Another goal of this work is to evaluate the level of syntactic and semantic consistency and harmonization that currently exists in these glossaries. As a result, it becomes apparent from this early research that many opportunities exist to improve these terminologies for greater consistency, harmonization, and standardization in the field.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"235 ","pages":"Article 103094"},"PeriodicalIF":1.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139919355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-10DOI: 10.1016/j.scico.2024.103092
Yamine Ait-Ameur , Florin Craciun
{"title":"Introduction to the TASE 2022 Special issue","authors":"Yamine Ait-Ameur , Florin Craciun","doi":"10.1016/j.scico.2024.103092","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103092","url":null,"abstract":"","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103092"},"PeriodicalIF":1.3,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139748903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.1016/j.scico.2024.103091
Davide Di Ruscio , Jessie Galasso , Richard Paige
{"title":"Preface for the Special Issue on Tools and Demonstrations in Model-Driven Engineering","authors":"Davide Di Ruscio , Jessie Galasso , Richard Paige","doi":"10.1016/j.scico.2024.103091","DOIUrl":"10.1016/j.scico.2024.103091","url":null,"abstract":"","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103091"},"PeriodicalIF":1.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139816334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.1016/j.scico.2024.103090
Maxime Savary-Leblanc , Xavier Le Pallec , Sébastien Gérard
IDEs are in full mutation thanks to more and more intelligent code completion and validation functionalities. These intelligent features also facilitate the creation of new ways of programming such as low-code or no-code. In the meantime, modeling environments still suffer, for the most part, from their outdated interfaces, their too basic features, or their lack of usability. Our research work on the topic of modeling assistance has led to the creation of a first building block to initiate a revolution similar to development environments in the landscape of modeling. In this paper, we present the Modeling Assistant Recommender, a score-based multi-criteria recommendation system for class diagrams.
{"title":"The Modeling Assistant Recommender: A UML class diagram recommender system","authors":"Maxime Savary-Leblanc , Xavier Le Pallec , Sébastien Gérard","doi":"10.1016/j.scico.2024.103090","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103090","url":null,"abstract":"<div><p>IDEs are in full mutation thanks to more and more intelligent code completion and validation functionalities. These intelligent features also facilitate the creation of new ways of programming such as low-code or no-code. In the meantime, modeling environments still suffer, for the most part, from their outdated interfaces, their too basic features, or their lack of usability. Our research work on the topic of modeling assistance has led to the creation of a first building block to initiate a revolution similar to development environments in the landscape of modeling. In this paper, we present the Modeling Assistant Recommender, a score-based multi-criteria recommendation system for class diagrams.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103090"},"PeriodicalIF":1.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139727215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In [1] we proposed to verify LTL properties using a fine grain analysis classifying formulae into four classes (stutter, shortening, lengthening insensitive or none of these). With this classification we extend the applicability of structural reduction to two new classes of formulas, when classical techniques are only applicable for stutter insensitive formulas. This comes at the price of a semi-decision procedure where only some verdicts are reliable.
In this paper, we present an implementation of this approach, built as an extension to the ITS-Tools model-checker that relies on the Spot library to analyze automata. This new approach significantly improves the ITS-tools model-checker when verifying properties that are not stutter insensitive. It can also be used as a front-end simplification step for any other model-checker.
{"title":"A model-checker exploiting structural reductions even with stutter sensitive LTL","authors":"Yann Thierry-Mieg , Etienne Renault , Emmanuel Paviot-Adet , Denis Poitrenaud","doi":"10.1016/j.scico.2024.103089","DOIUrl":"10.1016/j.scico.2024.103089","url":null,"abstract":"<div><p>In <span>[1]</span> we proposed to verify LTL properties using a fine grain analysis classifying formulae into four classes (stutter, shortening, lengthening insensitive or none of these). With this classification we extend the applicability of structural reduction to two new classes of formulas, when classical techniques are only applicable for stutter insensitive formulas. This comes at the price of a semi-decision procedure where only some verdicts are reliable.</p><p>In this paper, we present an implementation of this approach, built as an extension to the ITS-Tools model-checker that relies on the Spot library to analyze automata. This new approach significantly improves the ITS-tools model-checker when verifying properties that are not stutter insensitive. It can also be used as a front-end simplification step for any other model-checker.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"235 ","pages":"Article 103089"},"PeriodicalIF":1.3,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139821985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.1016/j.scico.2024.103088
Sharar Ahmadi, Brijesh Dongol, Matt Griffin
Security-critical applications often rely on memory isolation mechanisms to ensure integrity of critical data (e.g., keys) and program instructions (e.g., implementing an attestation protocol). These include software-based security microvisor S μV or hardware-based (e.g., TrustLite or SMART) techniques. Here, we must guarantee that during an execution of a program, none of the assembly-level instructions corresponding to the program violate the imposed memory access restrictions. We focus on two security architectures (S μV and TrustLite). We use Binary Analysis Platform (BAP) to generate assembly-level code in an intermediate language (BIL) for a compiled C program. This is then translated to Isabelle/HOL theories. We develop an operational semantics by defining a collection of transition rules for a subset of BIL (called AIRv2) that is sufficient for our work. We develop an adversary model and define conformance predicates for each assembly-level instruction. A conformance predicate holds iff the associated memory access restriction imposed by the underlying security architecture is satisfied. We generate a set of programs covering all possible cases in which an assembly-level instruction attempts to violate at least one of the conformance predicates. For S μV, we capture all such violations not only by checking specific lines of the program but also by applying the operational semantics for every machine-state transition. This shows that the memory access restrictions of S μV is operationally maintained. For TrustLite, we capture all such violations by checking specific lines of the program. Also, we provide an example to show how we can use the operational semantics to capture such violations.
安全关键型应用通常依赖内存隔离机制来确保关键数据(如密钥)和程序指令(如执行验证协议)的完整性。这些机制包括基于软件的安全微顾问 S μV 或基于硬件的技术(如 TrustLite 或 SMART)。在这里,我们必须保证在程序执行过程中,与程序相对应的汇编级指令都不会违反强加的内存访问限制。我们重点关注两种安全架构(S μV 和 TrustLite)。我们使用二进制分析平台(BAP)以中间语言(BIL)为编译后的 C 程序生成汇编级代码。然后将其转换为 Isabelle/HOL 理论。我们为 BIL 的一个子集(称为 AIRv2)定义了一系列过渡规则,从而开发出一种操作语义,这对我们的工作来说已经足够。我们开发了一个对抗模型,并为每条汇编级指令定义了一致性谓词。如果底层安全架构规定的相关内存访问限制得到满足,则一致性谓词成立。我们生成一组程序,涵盖汇编级指令试图违反至少一个一致性谓词的所有可能情况。对于 S μV,我们不仅通过检查程序的特定行,还通过对每个机器状态转换应用运算语义,来捕捉所有此类违规行为。这表明 S μV 的内存访问限制是可操作的。对于 TrustLite,我们通过检查程序的特定行来捕获所有此类违规行为。此外,我们还提供了一个示例,说明如何使用运算语义来捕捉此类违规行为。
{"title":"Operationally proving memory access violations in Isabelle/HOL","authors":"Sharar Ahmadi, Brijesh Dongol, Matt Griffin","doi":"10.1016/j.scico.2024.103088","DOIUrl":"10.1016/j.scico.2024.103088","url":null,"abstract":"<div><p>Security-critical applications often rely on memory isolation mechanisms to ensure integrity of critical data (e.g., keys) and program instructions (e.g., implementing an attestation protocol). These include software-based security microvisor S μV or hardware-based (e.g., TrustLite or SMART) techniques. Here, we must guarantee that during an execution of a program, none of the assembly-level instructions corresponding to the program violate the imposed memory access restrictions. We focus on two security architectures (S μV and TrustLite). We use Binary Analysis Platform (BAP) to generate assembly-level code in an intermediate language (BIL) for a compiled C program. This is then translated to Isabelle/HOL theories. We develop an operational semantics by defining a collection of transition rules for a subset of BIL (called AIRv2) that is sufficient for our work. We develop an adversary model and define <em>conformance predicates</em> for each assembly-level instruction. A conformance predicate holds iff the associated memory access restriction imposed by the underlying security architecture is satisfied. We generate a set of programs covering all possible cases in which an assembly-level instruction attempts to violate at least one of the conformance predicates. For S μV, we capture all such violations not only by checking specific lines of the program but also by applying the operational semantics for every machine-state transition. This shows that the memory access restrictions of S μV is operationally maintained. For TrustLite, we capture all such violations by checking specific lines of the program. Also, we provide an example to show how we can use the operational semantics to capture such violations.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103088"},"PeriodicalIF":1.3,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016764232400011X/pdfft?md5=48f5dc7ae3a5319fc966384bc9f832e2&pid=1-s2.0-S016764232400011X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-23DOI: 10.1016/j.scico.2024.103087
Karam Ignaim , João M. Fernandes , André L. Ferreira
Context
Software Product Lines (SPLs) constitute a popular method for encouraging the methodical reuse of software artefacts. Just like any other piece of software, SPLs require management throughout their evolution, namely to preserve the consistency between requirements and the code.
Problem
Over time, for a given SPL, many change requests are made and all of them need to be integrated in a consistent and coordinated way. The evolution of an SPL is facilitated if there exist links between its artefacts, namely between each feature and its respective pieces of implementation code.
Method
This paper proposes FMap, a systematic feature mapping approach to be used within SPLs. FMap traces a Feature Model (FM) to other artefacts of an SPL, the reference architecture, and the code, and it establishes connections between each feature of the FM and its locations in the code-base. Additionally, we have created a tool called friendlyMapper to provide some automatic support for the approach. Using two case studies from two different companies, FMap and friendlyMapper are evaluated.
Results
The evaluation of the case studies indicates that the FMap approach outperforms the baseline approach (i.e., the branching approach).
Contribution
This work contributes with FMap, a novel tool-based approach that supports feature-architecture-code mappings based on reference architecture. FMap assists software engineers in adapting the evolution of the SPLs to accommodate new features and change requests as the SPLs evolve. The case studies for both companies demonstrate that the approach is applicable to real-world products and is able to support feature traceability and maintain consistency among features, architecture, and code.
{"title":"An industrial experience of using reference architectures for mapping features to code","authors":"Karam Ignaim , João M. Fernandes , André L. Ferreira","doi":"10.1016/j.scico.2024.103087","DOIUrl":"10.1016/j.scico.2024.103087","url":null,"abstract":"<div><h3>Context</h3><p>Software Product Lines (SPLs) constitute a popular method for encouraging the methodical reuse of software artefacts. Just like any other piece of software, SPLs require management throughout their evolution, namely to preserve the consistency between requirements and the code.</p></div><div><h3>Problem</h3><p>Over time, for a given SPL, many change requests are made and all of them need to be integrated in a consistent and coordinated way. The evolution of an SPL is facilitated if there exist links between its artefacts, namely between each feature and its respective pieces of implementation code.</p></div><div><h3>Method</h3><p>This paper proposes FMap, a systematic feature mapping approach to be used within SPLs. FMap traces a Feature Model (FM) to other artefacts of an SPL, the reference architecture, and the code, and it establishes connections between each feature of the FM and its locations in the code-base. Additionally, we have created a tool called friendlyMapper to provide some automatic support for the approach. Using two case studies from two different companies, FMap and friendlyMapper are evaluated.</p></div><div><h3>Results</h3><p>The evaluation of the case studies indicates that the FMap approach outperforms the baseline approach (i.e., the branching approach).</p></div><div><h3>Contribution</h3><p>This work contributes with FMap, a novel tool-based approach that supports feature-architecture-code mappings based on reference architecture. FMap assists software engineers in adapting the evolution of the SPLs to accommodate new features and change requests as the SPLs evolve. The case studies for both companies demonstrate that the approach is applicable to real-world products and is able to support feature traceability and maintain consistency among features, architecture, and code.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103087"},"PeriodicalIF":1.3,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139560822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-17DOI: 10.1016/j.scico.2024.103086
Birthe van den Berg, Tom Schrijvers
Algebraic effects & handlers are a modular approach for modeling side-effects in functional programming. Their syntax is defined in terms of a signature of effectful operations, encoded as a functor, that are plugged into the free monad; their denotational semantics is defined by fold-style handlers that only interpret their part of the syntax and forward the rest. However, not all effects are algebraic: some need to access an internal computation. For example, scoped effects distinguish between a computation in scope and out of scope; parallel effects parallelize over a computation, latent effects defer a computation. Separate definitions have been proposed for these higher-order effects and their corresponding handlers, often leading to expedient and complex monad definitions. In this work we propose a generic framework for higher-order effects, generalizing algebraic effects & handlers: a generic free monad with higher-order effect signatures and a corresponding interpreter. Specializing this higher-order syntax leads to various definitions of previously defined (scoped, parallel, latent) and novel (writer, bracketing) effects. Furthermore, we formally show our framework theoretically correct, also putting different effect instances on formal footing; a significant contribution for parallel, latent, writer and bracketing effects.
{"title":"A framework for higher-order effects & handlers","authors":"Birthe van den Berg, Tom Schrijvers","doi":"10.1016/j.scico.2024.103086","DOIUrl":"10.1016/j.scico.2024.103086","url":null,"abstract":"<div><p>Algebraic effects & handlers are a modular approach for modeling side-effects in functional programming. Their syntax is defined in terms of a signature of effectful operations, encoded as a functor, that are plugged into the free monad; their denotational semantics is defined by fold-style handlers that only interpret their part of the syntax and forward the rest. However, not all effects are algebraic: some need to access an <em>internal computation</em>. For example, scoped effects distinguish between a computation in scope and out of scope; parallel effects parallelize over a computation, latent effects defer a computation. Separate definitions have been proposed for these <em>higher-order effects</em> and their corresponding handlers, often leading to expedient and complex monad definitions. In this work we propose a generic framework for higher-order effects, generalizing algebraic effects & handlers: a generic free monad with higher-order effect signatures and a corresponding interpreter. Specializing this higher-order syntax leads to various definitions of previously defined (scoped, parallel, latent) and novel (writer, bracketing) effects. Furthermore, we formally show our framework theoretically correct, also putting different effect instances on formal footing; a significant contribution for parallel, latent, writer and bracketing effects.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103086"},"PeriodicalIF":1.3,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167642324000091/pdfft?md5=2f32f586d39373add129303303d8a760&pid=1-s2.0-S0167642324000091-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing use of intelligent systems in various domains such as self-driving cars, robotics, and smart cities, it is crucial to ensure the quality of intelligent systems for their reliable and effective use in various domains. However, testing intelligent systems poses unique challenges due to their complex structure, low efficiency, and the high cost associated with manually collecting a large number of test cases. Hence, it is crucial to design tools that can adequately test intelligent systems while overcoming these obstacles.
We propose an intelligent system test tool called ISTA+. This tool implements automatic generation and optimization of test cases based on coverage analysis, resulting in improved test adequacy for intelligent systems. To evaluate the effectiveness of ISTA+, we applied it to two different models (fully-connected DNN and the Rambo model) and two datasets of different data types (i.e., image and text). The evaluation results demonstrate that ISTA+ successfully improves the test dataset quality and ensures comprehensive testing for both text and image data types.
•
Link to source code: https://github.com/wuxiaoxue/ISTAplus
•
Link to video demonstration: https://youtu.be/6CkzMJ0ghq8
{"title":"ISTA+: Test case generation and optimization for intelligent systems based on coverage analysis","authors":"Xiaoxue Wu , Yizeng Gu , Lidan Lin , Wei Zheng , Xiang Chen","doi":"10.1016/j.scico.2024.103078","DOIUrl":"10.1016/j.scico.2024.103078","url":null,"abstract":"<div><p>With the increasing use of intelligent systems in various domains such as self-driving cars, robotics, and smart cities, it is crucial to ensure the quality of intelligent systems for their reliable and effective use in various domains. However, testing intelligent systems poses unique challenges due to their complex structure, low efficiency, and the high cost associated with manually collecting a large number of test cases. Hence, it is crucial to design tools that can adequately test intelligent systems while overcoming these obstacles.</p><p>We propose an intelligent system test tool called ISTA+. This tool implements automatic generation and optimization of test cases based on coverage analysis, resulting in improved test adequacy for intelligent systems. To evaluate the effectiveness of ISTA+, we applied it to two different models (fully-connected DNN and the Rambo model) and two datasets of different data types (i.e., image and text). The evaluation results demonstrate that ISTA+ successfully improves the test dataset quality and ensures comprehensive testing for both text and image data types.</p><ul><li><span>•</span><span><p>Link to source code: <span>https://github.com/wuxiaoxue/ISTAplus</span><svg><path></path></svg></p></span></li><li><span>•</span><span><p>Link to video demonstration: <span>https://youtu.be/6CkzMJ0ghq8</span><svg><path></path></svg></p></span></li></ul></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103078"},"PeriodicalIF":1.3,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139470269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-10DOI: 10.1016/j.scico.2024.103079
Adem Ait , Javier Luis Cánovas Izquierdo , Jordi Cabot
Social coding platforms such as GitHub or GitLab have become the de facto standard for developing Open-Source Software (OSS) projects. With the emergence of Machine Learning (ML), platforms specifically designed for hosting and developing ML-based projects have appeared, being Hugging Face Hub (HFH) one of the most popular ones. HFH aims at sharing datasets, pre-trained ML models and the applications built with them. With over 400 K repositories, and growing fast, HFH is becoming a promising source of empirical data on all aspects of ML project development. However, apart from the API provided by the platform, there are no easy-to-use solutions to collect the data, nor prepackaged datasets to explore the different facets of HFH. We present HFCommunity, an extraction process for HFH data and a relational database to facilitate an empirical analysis on the growing number of ML projects.
GitHub 或 GitLab 等社交编码平台已成为开发开源软件 (OSS) 项目的事实标准。随着机器学习(ML)的出现,出现了专门用于托管和开发基于 ML 的项目的平台,Hugging Face Hub(HFH)就是其中最受欢迎的平台之一。HFH 旨在共享数据集、预训练的 ML 模型以及使用这些模型构建的应用程序。HFH 拥有超过 400 K 个存储库,并且还在快速增长,它正在成为有关 ML 项目开发各个方面的经验数据的一个有前途的来源。然而,除了该平台提供的应用程序接口(API)外,还没有易于使用的解决方案来收集数据,也没有预先打包的数据集来探索 HFH 的不同方面。我们介绍了 HFCommunity,这是一个提取 HFH 数据的流程,也是一个关系数据库,有助于对日益增多的 ML 项目进行实证分析。
{"title":"HFCommunity: An extraction process and relational database to analyze Hugging Face Hub data","authors":"Adem Ait , Javier Luis Cánovas Izquierdo , Jordi Cabot","doi":"10.1016/j.scico.2024.103079","DOIUrl":"10.1016/j.scico.2024.103079","url":null,"abstract":"<div><p>Social coding platforms such as <span>GitHub</span> or <span>GitLab</span> have become the <em>de facto</em> standard for developing Open-Source Software (OSS) projects. With the emergence of Machine Learning (ML), platforms specifically designed for hosting and developing ML-based projects have appeared, being <span>Hugging Face Hub</span> (HFH) one of the most popular ones. HFH aims at sharing datasets, pre-trained ML models and the applications built with them. With over 400 K repositories, and growing fast, HFH is becoming a promising source of empirical data on all aspects of ML project development. However, apart from the API provided by the platform, there are no easy-to-use solutions to collect the data, nor prepackaged datasets to explore the different facets of HFH. We present <span>HFCommunity</span>, an extraction process for HFH data and a relational database to facilitate an empirical analysis on the growing number of ML projects.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"234 ","pages":"Article 103079"},"PeriodicalIF":1.3,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167642324000029/pdfft?md5=bb0c43422124d50d91f987a6ab598504&pid=1-s2.0-S0167642324000029-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139420808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}