首页 > 最新文献

Data & Knowledge Engineering最新文献

英文 中文
Data analytics and knowledge discovery on big data: Algorithms, architectures, and applications 大数据上的数据分析和知识发现:算法、架构和应用
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-05 DOI: 10.1016/j.datak.2024.102279
Robert Wrembel , Johann Gamper
{"title":"Data analytics and knowledge discovery on big data: Algorithms, architectures, and applications","authors":"Robert Wrembel , Johann Gamper","doi":"10.1016/j.datak.2024.102279","DOIUrl":"10.1016/j.datak.2024.102279","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102279"},"PeriodicalIF":2.5,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139102420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep learning model for predicting the number of stores and average sales in commercial district 用于预测商业区商店数量和平均销售额的深度学习模型
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-01-04 DOI: 10.1016/j.datak.2024.102277
Suan Lee , Sangkeun Ko , Arousha Haghighian Roudsari , Wookey Lee

This paper presents a plan for preparing for changes in the business environment by analyzing and predicting business district data in Seoul. The COVID-19 pandemic and economic crisis caused by inflation have led to an increase in store closures and a decrease in sales, which has had a significant impact on commercial districts. The number of stores and sales are critical factors that directly affect the business environment and can help prepare for changes. This study conducted correlation analysis to extract factors related to the commercial district’s environment in Seoul and estimated the number of stores and sales based on these factors. Using the Kendaltau correlation coefficient, the study found that existing population and working population were the most influential factors. Linear regression, tensor decomposition, Factorization Machine, and deep neural network models were used to estimate the number of stores and sales, with the deep neural network model showing the best performance in RMSE and evaluation indicators. This study also predicted the number of stores and sales of the service industry in a specific area using the population prediction results of the neural prophet model. The study’s findings can help identify commercial district information and predict the number of stores and sales based on location, industry, and influencing factors, contributing to the revitalization of commercial districts.

本文通过对首尔商业区数据的分析和预测,提出了一项为商业环境变化做准备的计划。COVID-19 大流行和通货膨胀引发的经济危机导致商店关闭数量增加和销售额下降,这对商业区产生了重大影响。商店数量和销售额是直接影响商业环境的关键因素,有助于为变化做好准备。本研究通过相关分析提取了与首尔商业区环境相关的因素,并根据这些因素估算了商店数量和销售额。利用 Kendaltau 相关系数,研究发现现有人口和工作人口是影响最大的因素。研究采用了线性回归、张量分解、因果化机和深度神经网络模型来估算商店数量和销售额,其中深度神经网络模型在均方根误差和评价指标方面表现最佳。本研究还利用神经先知模型的人口预测结果预测了特定地区服务业的门店数量和销售额。研究结果有助于识别商业区信息,并根据区位、行业和影响因素预测商店数量和销售额,从而促进商业区的振兴。
{"title":"A deep learning model for predicting the number of stores and average sales in commercial district","authors":"Suan Lee ,&nbsp;Sangkeun Ko ,&nbsp;Arousha Haghighian Roudsari ,&nbsp;Wookey Lee","doi":"10.1016/j.datak.2024.102277","DOIUrl":"10.1016/j.datak.2024.102277","url":null,"abstract":"<div><p>This paper presents a plan for preparing for changes in the business environment by analyzing and predicting business district data in Seoul. The COVID-19 pandemic and economic crisis caused by inflation have led to an increase in store closures and a decrease in sales, which has had a significant impact on commercial districts. The number of stores and sales are critical factors that directly affect the business environment and can help prepare for changes. This study conducted correlation analysis to extract factors related to the commercial district’s environment in Seoul and estimated the number of stores and sales based on these factors. Using the Kendaltau correlation coefficient, the study found that existing population and working population were the most influential factors. Linear regression, tensor decomposition, Factorization Machine, and deep neural network models were used to estimate the number of stores and sales, with the deep neural network model showing the best performance in RMSE and evaluation indicators. This study also predicted the number of stores and sales of the service industry in a specific area using the population prediction results of the neural prophet model. The study’s findings can help identify commercial district information and predict the number of stores and sales based on location, industry, and influencing factors, contributing to the revitalization of commercial districts.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102277"},"PeriodicalIF":2.5,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000016/pdfft?md5=399d90f81e8f5fbe38aeaa5e86a26560&pid=1-s2.0-S0169023X24000016-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139095414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformer-based neural network framework for full names prediction with abbreviations and contexts 基于转换器的神经网络框架,用于预测包含缩写和上下文的全名
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-30 DOI: 10.1016/j.datak.2023.102275
Ziming Ye , Shuangyin Li

With the rapid spread of information, abbreviations are used more and more common because they are convenient. However, the duplication of abbreviations can lead to confusion in many cases, such as information management and information retrieval. The resultant confusion annoys users. Thus, inferring a full name from an abbreviation has practical and significant advantages. The bulk of studies in the literature mainly inferred full names based on rule-based methods, statistical models, the similarity of representation, etc. However, these methods are unable to use various grained contexts properly. In this paper, we propose a flexible framework of Multi-attention mask Abbreviation Context and Full name language model, named MACF to address the problem. With the abbreviation and contexts as the inputs, the MACF can automatically predict a full name by generation, where the contexts can be variously grained. That is, different grained contexts ranging from coarse to fine can be selected to perform such complicated tasks in which contexts include paragraphs, several sentences, or even just a few keywords. A novel multi-attention mask mechanism is also proposed, which allows the model to learn the relationships among abbreviations, contexts, and full names, a process that makes the most of various grained contexts. The three corpora of different languages and fields were analyzed and measured with seven metrics in various aspects to evaluate the proposed framework. According to the experimental results, the MACF yielded more significant and consistent outputs than other baseline methods. Moreover, we discuss the significance and findings, and give the case studies to show the performance in real applications.

随着信息的迅速传播,缩写因其方便快捷而被越来越多地使用。然而,在信息管理和信息检索等许多情况下,缩略语的重复使用会导致混乱。由此造成的混乱会让用户感到厌烦。因此,从缩写中推断全名具有实际而显著的优势。文献中的大量研究主要是基于规则方法、统计模型、表征相似性等来推断全名。然而,这些方法无法正确使用各种粒度的上下文。本文提出了一种灵活的多注意掩码缩写上下文和全名语言模型框架(命名为 MACF)来解决这一问题。以缩写和上下文为输入,MACF 可以自动生成预测全名,其中上下文可以是不同粒度的。也就是说,可以选择从粗粒到细粒的不同粒度上下文,来完成这种复杂的任务,其中上下文包括段落、几个句子,甚至只是几个关键词。此外,还提出了一种新颖的多注意掩码机制,该机制允许模型学习缩写、上下文和全名之间的关系,这一过程充分利用了不同粒度的上下文。通过对三个不同语言和领域的语料库进行分析,并从七个方面进行衡量,对所提出的框架进行了评估。实验结果表明,与其他基线方法相比,MACF 得出的结果更显著、更一致。此外,我们还讨论了实验的意义和结果,并通过案例研究展示了其在实际应用中的性能。
{"title":"A transformer-based neural network framework for full names prediction with abbreviations and contexts","authors":"Ziming Ye ,&nbsp;Shuangyin Li","doi":"10.1016/j.datak.2023.102275","DOIUrl":"10.1016/j.datak.2023.102275","url":null,"abstract":"<div><p>With the rapid spread of information, abbreviations are used more and more common because they are convenient. However, the duplication of abbreviations can lead to confusion in many cases, such as information management and information retrieval. The resultant confusion annoys users. Thus, inferring a full name from an abbreviation has practical and significant advantages. The bulk of studies in the literature mainly inferred full names based on rule-based methods, statistical models, the similarity of representation, etc. However, these methods are unable to use various grained contexts properly. In this paper, we propose a flexible framework of Multi-attention mask Abbreviation Context and Full name language model<span>, named MACF to address the problem. With the abbreviation and contexts as the inputs, the MACF can automatically predict a full name by generation, where the contexts can be variously grained. That is, different grained contexts ranging from coarse to fine can be selected to perform such complicated tasks in which contexts include paragraphs, several sentences, or even just a few keywords. A novel multi-attention mask mechanism is also proposed, which allows the model to learn the relationships among abbreviations, contexts, and full names, a process that makes the most of various grained contexts. The three corpora of different languages and fields were analyzed and measured with seven metrics in various aspects to evaluate the proposed framework. According to the experimental results, the MACF yielded more significant and consistent outputs than other baseline methods. Moreover, we discuss the significance and findings, and give the case studies to show the performance in real applications.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102275"},"PeriodicalIF":2.5,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139069387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A bitwise approach on influence overload problem 影响超载问题的比特方法
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-30 DOI: 10.1016/j.datak.2023.102276
Charles Cheolgi Lee , Jafar Afshar , Arousha Haghighian Roudsari , Woong-Kee Loh , Wookey Lee

Increasingly developing online social networks has enabled users to send or receive information very fast. However, due to the availability of an excessive amount of data in today’s society, managing the information has become very cumbersome, which may lead to the problem of information overload. This highly eminent problem, where the existence of too much relevant information available becomes a hindrance rather than a help, may cause losses, delays, and hardships in making decisions. Thus, in this paper, by defining information overload from a different aspect, we aim to maximize the information propagation while minimizing the information overload (duplication). To do so, we theoretically present the lower and upper bounds for the information overload using a bitwise-based approach as the leverage to mitigate the computation complexities and obtain an approximation ratio of 11e. We propose two main algorithms, B-square and C-square, and compare them with the existing algorithms. Experiments on two types of datasets, synthetic and real-world networks, verify the effectiveness and efficiency of the proposed approach in addressing the problem.

日益发展的在线社交网络使用户能够快速发送或接收信息。然而,由于当今社会数据量过大,信息管理变得非常繁琐,可能导致信息超载问题。在这个非常突出的问题中,过多相关信息的存在成为一种阻碍而非帮助,可能会造成损失、延误和决策困难。因此,在本文中,我们从另一个角度定义信息过载,旨在最大限度地扩大信息传播,同时最大限度地减少信息过载(重复)。为此,我们从理论上提出了信息过载的下限和上限,使用基于比特的方法作为杠杆,以减轻计算复杂性,并获得 1-1e 的近似率。我们提出了两种主要算法:B-square 和 C-square,并将它们与现有算法进行了比较。在合成网络和真实世界网络两类数据集上进行的实验验证了所提方法在解决问题方面的有效性和效率。
{"title":"A bitwise approach on influence overload problem","authors":"Charles Cheolgi Lee ,&nbsp;Jafar Afshar ,&nbsp;Arousha Haghighian Roudsari ,&nbsp;Woong-Kee Loh ,&nbsp;Wookey Lee","doi":"10.1016/j.datak.2023.102276","DOIUrl":"10.1016/j.datak.2023.102276","url":null,"abstract":"<div><p><span>Increasingly developing online social networks has enabled users to send or receive information very fast. However, due to the availability of an excessive amount of data in today’s society, managing the information has become very cumbersome, which may lead to the problem of information overload. This highly eminent problem, where the existence of too much relevant information available becomes a hindrance rather than a help, may cause losses, delays, and hardships in making decisions. Thus, in this paper, by defining information overload from a different aspect, we aim to maximize the information propagation while minimizing the information overload (duplication). To do so, we theoretically present the lower and upper bounds for the information overload using a bitwise-based approach as the leverage to mitigate the computation complexities and obtain an approximation ratio of </span><span><math><mrow><mn>1</mn><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>e</mi></mrow></mfrac></mrow></math></span>. We propose two main algorithms, B-square and C-square, and compare them with the existing algorithms. Experiments on two types of datasets, synthetic and real-world networks, verify the effectiveness and efficiency of the proposed approach in addressing the problem.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102276"},"PeriodicalIF":2.5,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139069125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Keys for Graphs 挖掘图形的密钥
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-27 DOI: 10.1016/j.datak.2023.102274
Morteza Alipourlangouri, Fei Chiang

Keys for graphs are a class of data quality rules that use topological and value constraints to uniquely identify entities in a data graph. They have been studied to support object identification, knowledge fusion, data deduplication, and social network reconciliation. Manual specification and discovery of graph keys is tedious and infeasible over large-scale graphs. To make GKeys useful in practice, we study the GKey discovery problem, and present GKMiner, an algorithm that mines keys over graphs. Our algorithm discovers keys in a graph via frequent subgraph expansion, and notably, identifies recursive keys, i.e., where the unique identification of an entity type is dependent upon the identification of another entity type. We introduce the key properties, minimality and support, which effectively help to reduce the space of candidate keys. GKMiner uses a set of auxillary structures to summarize an input graph, and to identify likely candidate keys for greater pruning efficiency and evaluation of the search space. Our evaluation shows that identifying and using recursive keys in entity linking, lead to improved accuracy, over keys found using existing graph key mining techniques.

图键是一类数据质量规则,它使用拓扑和值约束来唯一识别数据图中的实体。对它们的研究支持对象识别、知识融合、重复数据删除和社交网络调节。在大规模图中,手动规范和发现图键既繁琐又不可行。为了让 GKeys 在实践中发挥作用,我们研究了 GKey 发现问题,并提出了 GKMiner 算法,这是一种在图上挖掘密钥的算法。我们的算法通过频繁子图扩展发现图中的密钥,特别是识别递归密钥,即一个实体类型的唯一识别依赖于另一个实体类型的识别。我们引入了密钥属性--最小性和支持性,它们能有效帮助减少候选密钥的空间。GKMiner 使用一组辅助结构来概括输入图,并识别可能的候选键,以提高剪枝效率并评估搜索空间。我们的评估结果表明,在实体链接中识别和使用递归键,比使用现有图键挖掘技术找到的键更准确。
{"title":"Mining Keys for Graphs","authors":"Morteza Alipourlangouri,&nbsp;Fei Chiang","doi":"10.1016/j.datak.2023.102274","DOIUrl":"10.1016/j.datak.2023.102274","url":null,"abstract":"<div><p><span>Keys for graphs are a class of data quality rules that use topological and value constraints to uniquely identify entities in a data graph. They have been studied to support object identification, knowledge fusion, data deduplication, and social network reconciliation. Manual specification and discovery of graph keys is tedious and infeasible over large-scale graphs. To make </span><span><math><mi>GKeys</mi></math></span> useful in practice, we study the <span><math><mi>GKey</mi></math></span> discovery problem, and present <span><math><mi>GKMiner</mi></math></span>, an algorithm that mines keys over graphs. Our algorithm discovers keys in a graph via frequent subgraph expansion, and notably, identifies <em>recursive</em> keys, i.e., where the unique identification of an entity type is dependent upon the identification of another entity type. We introduce the key properties, <em>minimality</em> and <em>support</em>, which effectively help to reduce the space of candidate keys. <span><math><mi>GKMiner</mi></math></span><span> uses a set of auxillary structures to summarize an input graph, and to identify likely candidate keys for greater pruning efficiency and evaluation of the search space. Our evaluation shows that identifying and using recursive keys in entity linking, lead to improved accuracy, over keys found using existing graph key mining techniques.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102274"},"PeriodicalIF":2.5,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139055186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An approach to on-demand extension of multidimensional cubes in multi-model settings: Application to IoT-based agro-ecology 在多模型环境中按需扩展多维立方体的方法:基于物联网的农业生态学应用
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-23 DOI: 10.1016/j.datak.2023.102267
Sandro Bimonte , Fagnine Alassane Coulibaly , Stefano Rizzi

Managing unstructured and heterogeneous data, integrating them, and enabling their analysis are among the key challenges in data ecosystems, together with the need to accommodate a progressive growth in these systems by seamlessly supporting extensibility. This is particularly relevant for OLAP analyses on multidimensional cubes stored in data warehouses (DWs), which naturally span large portions of heterogeneous data, possibly relying on different data models (relational, document-based, graph-based). While the management of model heterogeneity in DWs, using for instance multi-model databases, has already been investigated, not much has been done to support extensibility. In a previous paper we have investigated a schema-on-read scenario aimed at granting the extensibility of multidimensional cubes by proposing an architecture to support it and discussing the main open issues associated. This paper takes a step further by presenting xCube, an approach to provide on-demand extensibility of multidimensional cubes in a supply-driven fashion. xCube lets users choose a multidimensional element to be extended, using additional data, possibly uploaded from a data lake. Then, the multidimensional schema is extended by considering the functional dependencies implied by these additional data, and the extended multidimensional schema is made available to users for OLAP analyses. After explaining our approach with reference to a motivating case study in agro-ecology, we propose a proof-of-concept implementation using AgensGraph and Mondrian.

管理非结构化和异构数据、整合这些数据并对其进行分析是数据生态系统面临的主要挑战之一,同时还需要通过无缝支持可扩展性来适应这些系统的逐步发展。这与对存储在数据仓库(DW)中的多维立方体进行 OLAP 分析尤其相关,因为这些立方体自然会跨越大量异构数据,并可能依赖于不同的数据模型(关系型、文档型、图形型)。虽然人们已经研究了使用多模型数据库等方法来管理数据仓库中的模型异构性,但在支持可扩展性方面所做的工作还不多。在前一篇论文中,我们研究了读取模式方案,旨在通过提出一种支持多维立方体可扩展性的架构并讨论相关的主要开放性问题来实现多维立方体的可扩展性。xCube 允许用户选择要扩展的多维元素,并使用可能从数据湖上传的附加数据。然后,通过考虑这些附加数据所隐含的功能依赖关系来扩展多维模式,并将扩展后的多维模式提供给用户进行 OLAP 分析。在参考农业生态学的一个激励性案例研究解释我们的方法后,我们提出了使用 AgensGraph 和 Mondrian 的概念验证实施方案。
{"title":"An approach to on-demand extension of multidimensional cubes in multi-model settings: Application to IoT-based agro-ecology","authors":"Sandro Bimonte ,&nbsp;Fagnine Alassane Coulibaly ,&nbsp;Stefano Rizzi","doi":"10.1016/j.datak.2023.102267","DOIUrl":"10.1016/j.datak.2023.102267","url":null,"abstract":"<div><p><span>Managing unstructured and heterogeneous data<span>, integrating them, and enabling their analysis are among the key challenges in data ecosystems, together with the need to accommodate a progressive growth in these systems by seamlessly supporting extensibility. This is particularly relevant for OLAP analyses on multidimensional cubes stored in data warehouses (DWs), which naturally span large portions of heterogeneous data, possibly relying on different data models (relational, document-based, graph-based). While the management of model heterogeneity in DWs, using for instance multi-model databases, has already been investigated, not much has been done to support extensibility. In a previous paper we have investigated a schema-on-read scenario aimed at granting the extensibility of multidimensional cubes by proposing an architecture to support it and discussing the main open issues associated. This paper takes a step further by presenting </span></span><em>xCube</em><span>, an approach to provide on-demand extensibility of multidimensional cubes in a supply-driven fashion. xCube lets users choose a multidimensional element to be extended, using additional data, possibly uploaded from a data lake. Then, the multidimensional schema is extended by considering the functional dependencies implied by these additional data, and the extended multidimensional schema is made available to users for OLAP analyses. After explaining our approach with reference to a motivating case study in agro-ecology, we propose a proof-of-concept implementation using AgensGraph and Mondrian.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102267"},"PeriodicalIF":2.5,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139031847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increase development productivity by domain-specific conceptual modeling 通过特定领域的概念建模提高开发效率
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-15 DOI: 10.1016/j.datak.2023.102263
Martin Paczona , Heinrich C. Mayr , Guenter Prochart

This paper addresses the question of whether and how the development and use of a domain-specific modeling method (DSMM) can increase productivity in the development of technical systems in an industrial setting. This is because an essential prerequisite for DSMMs to become established in operational practice is that productivity increases can be achieved with them and qualitative benefits such as quality assurance, innovation potential, and the like can be exploited. After all, managers’ decisions are ultimately based on whether or not the use of a new method pays off. We illustrate our findings using the example of a DSMM development for the design and realization of electric vehicle testbeds, which we carried out as part of a cooperation project. This work sets the base for possible generalization into other automotive, mechatronic, and technical areas.

本文探讨的问题是:在工业环境中,开发和使用特定领域建模方法(DSMM)能否以及如何提高技术系统开发的生产率。这是因为,DSMM 在操作实践中得以确立的一个基本前提是,使用 DSMM 可以提高生产率,并能获得质量保证、创新潜力等质量效益。毕竟,管理者的决策最终取决于新方法的使用是否带来回报。我们以电动汽车测试平台设计和实现的 DSMM 开发为例,说明我们的研究成果。这项工作为可能推广到其他汽车、机电一体化和技术领域奠定了基础。
{"title":"Increase development productivity by domain-specific conceptual modeling","authors":"Martin Paczona ,&nbsp;Heinrich C. Mayr ,&nbsp;Guenter Prochart","doi":"10.1016/j.datak.2023.102263","DOIUrl":"10.1016/j.datak.2023.102263","url":null,"abstract":"<div><p>This paper addresses the question of whether and how the development and use of a domain-specific modeling method (DSMM) can increase productivity in the development of technical systems in an industrial setting. This is because an essential prerequisite for DSMMs to become established in operational practice is that productivity increases can be achieved with them and qualitative benefits such as quality assurance, innovation potential, and the like can be exploited. After all, managers’ decisions are ultimately based on whether or not the use of a new method pays off. We illustrate our findings using the example of a DSMM development for the design and realization of electric vehicle testbeds, which we carried out as part of a cooperation project. This work sets the base for possible generalization into other automotive, mechatronic, and technical areas.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102263"},"PeriodicalIF":2.5,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23001234/pdfft?md5=04e4fde34990bf78c3bd54b41b8496e0&pid=1-s2.0-S0169023X23001234-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138685877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts 通过专家混合物融合自监督学习和频谱特征,提高语音情感识别能力
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-13 DOI: 10.1016/j.datak.2023.102262
Jonghwan Hyeon, Yung-Hwan Oh, Young-Jun Lee, Ho-Jin Choi

Speech Emotion Recognition (SER) is an important area of research in speech processing that aims to identify and classify emotional states conveyed through speech signals. Recent studies have shown considerable performance in SER by exploiting deep contextualized speech representations from self-supervised learning (SSL) models. However, SSL models pre-trained on clean speech data may not perform well on emotional speech data due to the domain shift problem. To address this problem, this paper proposes a novel approach that simultaneously exploits an SSL model and a domain-agnostic spectral feature (SF) through the Mixture of Experts (MoE) technique. The proposed approach achieves the state-of-the-art performance on weighted accuracy compared to other methods in the IEMOCAP dataset. Moreover, this paper demonstrates the existence of the domain shift problem of SSL models in the SER task.

语音情绪识别(SER)是语音处理领域的一个重要研究领域,旨在识别和分类通过语音信号传递的情绪状态。最近的研究表明,通过利用来自自监督学习(SSL)模型的深度上下文化语音表示,在SER中取得了相当大的性能。然而,由于域移位问题,在干净语音数据上预训练的SSL模型在情感语音数据上可能表现不佳。为了解决这个问题,本文提出了一种新的方法,通过混合专家(MoE)技术同时利用SSL模型和领域不可知论光谱特征(SF)。与IEMOCAP数据集的其他方法相比,该方法在加权精度方面达到了最先进的性能。此外,本文还证明了SSL模型在SER任务中存在领域转移问题。
{"title":"Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts","authors":"Jonghwan Hyeon,&nbsp;Yung-Hwan Oh,&nbsp;Young-Jun Lee,&nbsp;Ho-Jin Choi","doi":"10.1016/j.datak.2023.102262","DOIUrl":"10.1016/j.datak.2023.102262","url":null,"abstract":"<div><p>Speech Emotion Recognition (SER) is an important area of research in speech processing that aims to identify and classify emotional states conveyed through speech signals. Recent studies have shown considerable performance in SER by exploiting deep contextualized speech representations from self-supervised learning (SSL) models. However, SSL models pre-trained on clean speech data may not perform well on emotional speech data due to the domain shift problem. To address this problem, this paper proposes a novel approach that simultaneously exploits an SSL model and a domain-agnostic spectral feature (SF) through the Mixture of Experts (MoE) technique. The proposed approach achieves the state-of-the-art performance on weighted accuracy compared to other methods in the IEMOCAP dataset. Moreover, this paper demonstrates the existence of the domain shift problem of SSL models in the SER task.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102262"},"PeriodicalIF":2.5,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23001222/pdfft?md5=48b44d06659bb1ef2a62c484d7369d5b&pid=1-s2.0-S0169023X23001222-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138631035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition algorithm for cross-texting in text chat conversations 文本聊天对话中的交叉文本识别算法
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-10 DOI: 10.1016/j.datak.2023.102261
Da-Young Lee, Hwan-Gue Cho

As the development of the Internet and IT technology, short-text based communication is so popular compared with voice based one. Chat-based communication enables rapid, short and massive exchange of message with many people, creates new social problems. ‘Cross-texting’ is one of them. It refers to accidentally sending a text to an unintended person during the concurrent conversations with separated multiple people. Cross-texting would be a serious problem in languages where respectful expressions are required. As text-based communication is getting popular, it is a crucial work to prevent cross-texting by detecting it in advance in languages with honorifics expression such as Korean. In this paper, we proposed two methods detecting a cross-text using a deep learning model. The first model is the formal feature vector, which models dialog by explicitly defining the politeness and completeness features. The second one is the grpah2vec based ChatGram-net model, which models the dialog based on the syllable occurrence relationship. To evaluate the detection performance, we suggest a generating method for cross-text datasets from a actual messenger corpus. In experiment we show that both proposed models detected cross-text effectively, and exceeded the performance of the baseline models.

随着互联网和 IT 技术的发展,与语音通信相比,以短文为基础的通信非常流行。以聊天为基础的通信方式可以快速、简短、大量地与许多人交换信息,但也带来了新的社会问题。交叉短信 "就是其中之一。它指的是在与分开的多人同时聊天时,不小心将短信发送给了不想要的人。在需要表达尊重的语言中,交叉发短信将是一个严重的问题。随着基于文本的通信日益普及,在韩语等使用敬语表达的语言中,通过提前检测来防止交叉发文是一项至关重要的工作。本文提出了两种使用深度学习模型检测交叉文本的方法。第一个模型是形式特征向量,它通过明确定义礼貌性和完整性特征对对话进行建模。第二个模型是基于 grpah2vec 的 ChatGram-net 模型,该模型基于音节出现关系对对话进行建模。为了评估检测性能,我们提出了一种从实际信使语料库中生成跨文本数据集的方法。实验结果表明,这两种检测模型都能有效地检测到交叉文本,而且性能超过了基线模型。
{"title":"Recognition algorithm for cross-texting in text chat conversations","authors":"Da-Young Lee,&nbsp;Hwan-Gue Cho","doi":"10.1016/j.datak.2023.102261","DOIUrl":"10.1016/j.datak.2023.102261","url":null,"abstract":"<div><p>As the development of the Internet and IT technology, short-text based communication is so popular compared with voice based one. Chat-based communication enables rapid, short and massive exchange of message with many people, creates new social problems. ‘Cross-texting’ is one of them. It refers to accidentally sending a text to an unintended person during the concurrent conversations with separated multiple people. Cross-texting would be a serious problem in languages where respectful expressions are required. As text-based communication is getting popular, it is a crucial work to prevent cross-texting by detecting it in advance in languages with honorifics expression such as Korean. In this paper, we proposed two methods detecting a cross-text using a deep learning model<span>. The first model is the formal feature vector, which models dialog by explicitly defining the politeness and completeness features. The second one is the grpah2vec based ChatGram-net model, which models the dialog based on the syllable occurrence relationship. To evaluate the detection performance, we suggest a generating method for cross-text datasets from a actual messenger corpus. In experiment we show that both proposed models detected cross-text effectively, and exceeded the performance of the baseline models.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102261"},"PeriodicalIF":2.5,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138576764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards deep understanding of graph convolutional networks for relation extraction 深入理解用于关系提取的图卷积网络
IF 2.5 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-07 DOI: 10.1016/j.datak.2023.102265
Tao Wu , Xiaolin You , Xingping Xian , Xiao Pu , Shaojie Qiao , Chao Wang

Relation extraction aims at identifying semantic relations between pairs of named entities from unstructured texts and is considered an essential prerequisite for many downstream tasks in natural language processing (NLP). Owing to the ability in expressing complex relationships and interdependency, graph neural networks (GNNs) have been gradually used to solve the relation extraction problem and have achieved state-of-the-art results. However, the designs of GNN-based relation extraction methods are mostly based on empirical intuition, heuristic, and experimental trial-and-error. A clear understanding of why and how GNNs perform well in relation extraction tasks is lacking. In this study, we investigate three well-known GNN-based relation extraction models, CGCN, AGGCN, and SGCN, and aim to understand the underlying mechanisms of the extractions. In particular, we provide a visual analytic to reveal the dynamics of the models and provide insight into the function of intermediate convolutional layers. We determine that entities, particularly subjects and objects in them, are more important features than other words for relation extraction tasks. With various masking strategies, the significance of entity type to relation extraction is recognized. Then, from the perspective of the model architecture, we find that graph structure modeling and aggregation mechanisms in GCN do not significantly affect the performance improvement of GCN-based relation extraction models. The above findings are of great significance in promoting the development of GNNs. Based on these findings, an engineering oriented MLP-based GNN relation extraction model is proposed to achieve a comparable performance and greater efficiency.

关系提取的目的是从非结构化文本中识别命名实体对之间的语义关系,被认为是自然语言处理(NLP)中许多下游任务的必要前提。图神经网络(GNN)具有表达复杂关系和相互依存关系的能力,因此已逐渐被用于解决关系提取问题,并取得了先进的成果。然而,基于图神经网络的关系提取方法的设计大多基于经验直觉、启发式和实验试错。对于 GNN 为何以及如何在关系提取任务中表现出色,还缺乏清晰的认识。在本研究中,我们研究了三种著名的基于 GNN 的关系提取模型:CGCN、AGGCN 和 SGCN,旨在了解提取的内在机制。特别是,我们提供了一种可视化分析方法来揭示模型的动态,并深入了解中间卷积层的功能。我们确定,在关系提取任务中,实体,尤其是其中的主体和客体,是比其他词更重要的特征。通过各种屏蔽策略,我们认识到了实体类型对关系提取的重要性。然后,从模型架构的角度,我们发现 GCN 中的图结构建模和聚合机制对基于 GCN 的关系抽取模型的性能提升影响不大。上述发现对促进 GCN 的发展具有重要意义。基于这些发现,我们提出了一种面向工程的基于 MLP 的 GNN 关系提取模型,以达到相当的性能和更高的效率。
{"title":"Towards deep understanding of graph convolutional networks for relation extraction","authors":"Tao Wu ,&nbsp;Xiaolin You ,&nbsp;Xingping Xian ,&nbsp;Xiao Pu ,&nbsp;Shaojie Qiao ,&nbsp;Chao Wang","doi":"10.1016/j.datak.2023.102265","DOIUrl":"10.1016/j.datak.2023.102265","url":null,"abstract":"<div><p><span><span>Relation extraction aims at identifying semantic relations between pairs of named entities from unstructured texts and is considered an essential prerequisite for many downstream tasks in </span>natural language processing (NLP). Owing to the ability in expressing complex relationships and </span>interdependency<span><span><span>, graph neural networks<span> (GNNs) have been gradually used to solve the relation extraction problem and have achieved state-of-the-art results. However, the designs of GNN-based relation extraction methods are mostly based on empirical intuition, heuristic, and experimental trial-and-error. A clear understanding of why and how GNNs perform well in relation extraction tasks is lacking. In this study, we investigate three well-known GNN-based relation extraction models, CGCN, AGGCN, and SGCN, and aim to understand the underlying mechanisms of the extractions. In particular, we provide a </span></span>visual analytic to reveal the dynamics of the models and provide insight into the function of intermediate </span>convolutional layers. We determine that entities, particularly subjects and objects in them, are more important features than other words for relation extraction tasks. With various masking strategies, the significance of entity type to relation extraction is recognized. Then, from the perspective of the model architecture, we find that graph structure modeling and aggregation mechanisms in GCN do not significantly affect the performance improvement of GCN-based relation extraction models. The above findings are of great significance in promoting the development of GNNs. Based on these findings, an engineering oriented MLP-based GNN relation extraction model is proposed to achieve a comparable performance and greater efficiency.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102265"},"PeriodicalIF":2.5,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data & Knowledge Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1