首页 > 最新文献

Big Data Research最新文献

英文 中文
A Large Comparison of Normalization Methods on Time Series 时间序列归一化方法的比较
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-08-22 DOI: 10.1016/j.bdr.2023.100407
Felipe Tomazelli Lima, Vinicius M.A. Souza

Normalization is a mandatory preprocessing step in time series problems to guarantee similarity comparisons invariant to unexpected distortions in amplitude and offset. Such distortions are usual for most time series data. A typical example is gait recognition by motion collected on subjects with varying body height and width. To rescale the data for the same range of values, the vast majority of researchers consider z-normalization as the default method for any domain application, data, or task. This choice is made without a searching process as occurs to set the parameters of an algorithm or without any experimental evidence in the literature considering a variety of scenarios to support this decision. To address this gap, we evaluate the impact of different normalization methods on time series data. Our analysis is based on an extensive experimental comparison on classification problems involving 10 normalization methods, 3 state-of-the-art classifiers, and 38 benchmark datasets. We consider the classification task due to the simplicity of the experimental settings and well-defined metrics. However, our findings can be extrapolated for other time series mining tasks, such as forecasting or clustering. Based on our results, we suggest to evaluate the maximum absolute scale as an alternative to z-normalization. Besides being time efficient, this alternative shows promising results for similarity-based methods using Euclidean distance. For deep learning, mean normalization could be considered.

归一化是时间序列问题中必不可少的预处理步骤,以保证相似性比较不受幅度和偏移的意外畸变的影响。这种扭曲对大多数时间序列数据来说是常见的。一个典型的例子是通过收集不同身高和宽度的受试者的运动来识别步态。为了在相同的值范围内重新调整数据,绝大多数研究人员认为z归一化是任何领域应用程序、数据或任务的默认方法。这种选择是在没有搜索过程的情况下做出的,就像设置算法的参数一样,或者在文献中没有考虑到各种场景来支持这一决定的任何实验证据。为了解决这一差距,我们评估了不同归一化方法对时间序列数据的影响。我们的分析基于对分类问题的广泛实验比较,涉及10种归一化方法、3种最先进的分类器和38个基准数据集。我们考虑的分类任务,由于简单的实验设置和良好定义的指标。然而,我们的发现可以外推到其他时间序列挖掘任务,如预测或聚类。根据我们的结果,我们建议评估最大绝对尺度作为z归一化的替代方案。除了时间效率外,这种替代方法在使用欧几里得距离的基于相似性的方法中显示出有希望的结果。对于深度学习,可以考虑均值归一化。
{"title":"A Large Comparison of Normalization Methods on Time Series","authors":"Felipe Tomazelli Lima,&nbsp;Vinicius M.A. Souza","doi":"10.1016/j.bdr.2023.100407","DOIUrl":"10.1016/j.bdr.2023.100407","url":null,"abstract":"<div><p>Normalization is a mandatory preprocessing step<span><span><span> in time series problems to guarantee similarity comparisons invariant to unexpected distortions in amplitude and offset. Such distortions are usual for most time series data<span>. A typical example is gait recognition by motion collected on subjects with varying body height and width. To rescale the data for the same range of values, the vast majority of researchers consider z-normalization as the default method for any domain application, data, or task. This choice is made without a searching process as occurs to set the parameters of an algorithm or without any experimental evidence in the literature considering a variety of scenarios to support this decision. To address this gap, we evaluate the impact of different normalization methods on time series data. Our analysis is based on an extensive experimental comparison on classification problems involving 10 normalization methods, 3 state-of-the-art classifiers, and 38 benchmark datasets. We consider the </span></span>classification task<span> due to the simplicity of the experimental settings and well-defined metrics. However, our findings can be extrapolated for other time series mining tasks, such as forecasting or clustering. Based on our results, we suggest to evaluate the maximum absolute scale as an alternative to z-normalization. Besides being time efficient, this alternative shows promising results for similarity-based methods using Euclidean distance. For </span></span>deep learning, mean normalization could be considered.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43624406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallel Framework for Memory-Efficient Computation of Image Descriptors for Megapixel Images 百万像素图像描述符内存高效计算的并行框架
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-08-01 DOI: 10.1016/j.bdr.2023.100398
Amr M. Abdeltif, K. Hosny, M. M. Darwish, Ahmad Salah, KenLi Li
{"title":"Parallel Framework for Memory-Efficient Computation of Image Descriptors for Megapixel Images","authors":"Amr M. Abdeltif, K. Hosny, M. M. Darwish, Ahmad Salah, KenLi Li","doi":"10.1016/j.bdr.2023.100398","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100398","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54134995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heterogeneous Graph Convolutional Network Based on Correlation Matrix 基于关联矩阵的异构图卷积网络
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-28 DOI: 10.1016/j.bdr.2023.100379
Liqing Qiu, Jingcheng Zhou, Caixia Jing, Yuying Liu

Heterogeneous graph embedding maps a high-dimension graph that has different sorts of nodes and edges to a low-dimensional space, making it perform well in downstream tasks. The existing models mainly use two approaches to explore and embed heterogeneous graph information. One is to use meta-path to mining heterogeneous information; the other is to use special modules designed by researchers to explore heterogeneous information. These models show excellent performance in heterogeneous graph embedding tasks. However, none of the models considers using the number of meta-path instances between nodes to improve the performance of heterogeneous graph embedding. The paper proposes a Heterogeneous Graph Convolutional Network based on Correlation Matrix (CMHGCN) to fully use of the number of meta-path instances between nodes to discover interactive information between nodes in heterogeneous graphs. CMHGCN contains two core components: the node-level correlation component and the semantic-level correlation component. The node-level correlation component is able to use the number of meta-path instances between nodes to calculate the correlation between nodes guided by different meta-paths. The semantic-level correlation component can reasonably integrate such information from different meta-paths. On heterogeneous graphs with a large number of meta-path instances, CMHGCN outperforms baselines in node classification and clustering, according to experiments carried out on three benchmark heterogeneous datasets.

异构图嵌入将具有不同节点和边的高维图映射到低维空间,使其在下游任务中表现良好。现有的模型主要使用两种方法来探索和嵌入异构图信息。一种是利用元路径挖掘异构信息;另一种是利用研究人员设计的特殊模块来探索异构信息。这些模型在异构图嵌入任务中表现出优异的性能。然而,没有一个模型考虑使用节点之间的元路径实例数量来提高异构图嵌入的性能。本文提出了一种基于相关矩阵的异构图卷积网络(CMHGCN),充分利用节点之间的元路径实例数量来发现异构图中节点之间的交互信息。CMHGCN包含两个核心组件:节点级关联组件和语义级关联组件。节点级相关性组件能够使用节点之间的元路径实例的数量来计算由不同元路径引导的节点之间的相关性。语义级关联组件可以合理地集成来自不同元路径的这些信息。根据在三个基准异构数据集上进行的实验,在具有大量元路径实例的异构图上,CMHGCN在节点分类和聚类方面优于基线。
{"title":"Heterogeneous Graph Convolutional Network Based on Correlation Matrix","authors":"Liqing Qiu,&nbsp;Jingcheng Zhou,&nbsp;Caixia Jing,&nbsp;Yuying Liu","doi":"10.1016/j.bdr.2023.100379","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100379","url":null,"abstract":"<div><p><span>Heterogeneous graph embedding maps a high-dimension graph that has different sorts of nodes and edges to a low-dimensional space, making it perform well in downstream tasks. The existing models mainly use two approaches to explore and embed heterogeneous graph information. One is to use meta-path to mining heterogeneous information; the other is to use special modules designed by researchers to explore heterogeneous information. These models show excellent performance in heterogeneous graph embedding tasks. However, none of the models considers using the number of meta-path instances between nodes to improve the performance of heterogeneous graph embedding. The paper proposes a </span><em><strong>H</strong>eterogeneous <strong>G</strong>raph <strong>C</strong>onvolutional <strong>N</strong>etwork based on <strong>C</strong>orrelation <strong>M</strong>atrix</em><span> (CMHGCN) to fully use of the number of meta-path instances between nodes to discover interactive information between nodes in heterogeneous graphs. CMHGCN contains two core components: the node-level correlation component and the semantic-level correlation component. The node-level correlation component is able to use the number of meta-path instances between nodes to calculate the correlation between nodes guided by different meta-paths. The semantic-level correlation component can reasonably integrate such information from different meta-paths. On heterogeneous graphs with a large number of meta-path instances, CMHGCN outperforms baselines in node classification and clustering, according to experiments carried out on three benchmark heterogeneous datasets.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49713936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Is a Multi-Modal Knowledge Graph: A Survey 什么是多模态知识图谱:综述
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-28 DOI: 10.1016/j.bdr.2023.100380
Jinghui Peng, Xinyu Hu, Wenbo Huang, Jian Yang

With the explosive growth of multi-modal information on the Internet, the multi-modal knowledge graph (MMKG) has become an important research topic in knowledge graphs to meet the needs of data management and application. Most research on MMKG has taken image-text data as the research object and used the multi-modal deep learning approach to process multi-modal data. In comparison, the structure of the MMKG is no uniform statement. This paper focuses on MMKG, introduces the related theories of multi-modal knowledge, and analyzes several common ideas about its construction. The survey also explains the structural evolution, proposes mirror node alignment to represent cross-modal knowledge for MMKG, lists some tasks' difficulties, and ultimately gives a sample MMKG for the news scene.

随着互联网上多模态信息的爆炸式增长,为了满足数据管理和应用的需要,多模态知识图(MMKG)已成为知识图中的一个重要研究课题。大多数关于MMKG的研究都以图像文本数据为研究对象,并使用多模态深度学习方法来处理多模态数据。相比之下,MMKG的结构并不是一个统一的说法。本文以MMKG为研究对象,介绍了多模态知识的相关理论,并分析了其构建的几种常见思想。调查还解释了结构演变,提出了镜像节点对齐来表示MMKG的跨模态知识,列出了一些任务的困难,并最终给出了新闻场景的MMKG样本。
{"title":"What Is a Multi-Modal Knowledge Graph: A Survey","authors":"Jinghui Peng,&nbsp;Xinyu Hu,&nbsp;Wenbo Huang,&nbsp;Jian Yang","doi":"10.1016/j.bdr.2023.100380","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100380","url":null,"abstract":"<div><p>With the explosive growth of multi-modal information on the Internet, the multi-modal knowledge graph (MMKG) has become an important research topic in knowledge graphs to meet the needs of data management and application. Most research on MMKG has taken image-text data as the research object and used the multi-modal deep learning approach to process multi-modal data. In comparison, the structure of the MMKG is no uniform statement. This paper focuses on MMKG, introduces the related theories of multi-modal knowledge, and analyzes several common ideas about its construction. The survey also explains the structural evolution, proposes mirror node alignment to represent cross-modal knowledge for MMKG, lists some tasks' difficulties, and ultimately gives a sample MMKG for the news scene.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49713867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-Temporal Characteristics of Influenza Burden and Its Influence Factors in Japan in the Past Three Decades: An Influenza Disease Burden Data-Based Modeling Study 日本近30年流感负担时空特征及其影响因素——基于流感疾病负担数据的模型研究
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-28 DOI: 10.1016/j.bdr.2023.100384
Junru Wang , Shixin Zhang , Anbang Dai

Introduction: Influenza has still posed a great threat to humans. The knowledge of the systematic disease burden of influenza in Japan was limited. The study was aimed to investigate Spatio-temporal characteristics of the influenza burden and its influence factors in the past three decades.

Methods: Data on annual death, years lived with disability (YLDs), years of life lost (YLLs) and disability adjusted life year (DALYs) of influenza from 1990 to 2019 in Japan were available from the Global Health Data Exchange (GHDx), and data on annual social household available from e-Stat in Japan. A joinpoint regression model was used to assess the trends of influenza from 1990 to 2019, a discrete Poisson model to analyze the spatial and temporal cluster of influenza, and a generalized linear model to assess the association of death and DALY of influenza with social household factors.

Results: From 1990 to 2019, the mortality rate increased from 9.95 per 100000 to 19.49 per 100000 in Japan, with AAPC of 2.2% (95% CI: 1.5, 3.0, P<0.05). The DALYs rate increased from 153.86 per 100000 to 209.22 per 100000, with AAPC of 1.0% (95% CI: 0.1, 1.9, P<0.05). The mortality rate ranged from 1.98 per 100000 (Chiba) to 16.9 per 100000 (Kochi) in 1990, and from 5.10 per 100000 (Chiba) to 35.74 per 100000 (Akita) in 2019. The population aged 60+ had the highest mortality rates from 53.79 per 100000 in 1990 to 55.74 per 100000 in 2019 (AAPC: 0.0%, 95% CI: -0.5, 0.6, P=0.944) and DALYs rates from 713.43 per 100000 to 565.22 per 100000 (AAPC: -0.9%, 95% CI: -1.5, -0.3, P<0.05). YLLs and DALYs rates among the population aged 1-4 were also high from 1990 to 2019, ranked after that among populations aged 60+. The mortality rate had two stages of spatio-temporal aggregation across Japan, northern Japan with the period of 2005-2019 (RR = 1.36, P < 0.001) and southern Japan with the same period in the northern area (RR = 1.36, P < 0.001). The generalized linear model (GLM) indicated that year was positively correlated with the mortality rate of influenza (β = 0.18, p<0.01); while the ratio of households ordered via the internet and population were negatively correlated with the mortality rate of influenza (β = -4.41, p<0.05 and β =-0.17, p<0.01, respectively).

Conclusions: The disease burden of influenza in Japan increased in the past three decades, especially among the population aged 60+ years, followed by the population aged 1-4 years. It had two stages of spatio-temporal aggregation across Japan. Lifestyle of households ordered via the internet contributed to the low mortality rate of influenza.

引言:流感仍然对人类构成巨大威胁。对日本流感系统性疾病负担的了解有限。本研究旨在调查近三十年来流感负担的时空特征及其影响因素。方法:日本1990年至2019年流感的年死亡、残疾寿命(YLD)、生命损失年数(YLLs)和残疾调整生命年(DALYs)数据可从全球健康数据交换(GHDx)获得,年度社会家庭数据可从日本e-Stat获得。连接点回归模型用于评估1990年至2019年流感的趋势,离散泊松模型用于分析流感的时空集群,广义线性模型用于评估流感的死亡和DALY与社会家庭因素的关系。结果:从1990年到2019年,日本的死亡率从9.95/100000增加到19.49/100000,AAPC为2.2%(95%CI:1.5,3.0,P<;0.05)。DALY率从153.86/100000增加到209.22/100000,AAPC率为1.0%(95%CI:0.1,1.9,P<)。死亡率从1.98/100000(千叶)到1990年的16.9/10万(高知),2019年,从每10万人中5.10人(千叶)上升到每10万名中35.74人(秋田)。60岁以上人群的死亡率最高,从1990年的53.79/10万上升到2019年的55.74/10万(AAP:0.0%,95%CI:-0.5,0.6,P=0.944),DALY率从713.43/10万上升到565.22/10万(APP:-0.9%,95%CI:-1.5,-0.3,P<;0.05)。1-4岁人群的YLLs和DALY率在1990年至2019年也很高,排在60岁以上人口之后。日本各地的死亡率有两个时空聚集阶段,2005-2019年日本北部(RR=1.36,P<;0.001)和北部地区同期的日本南部(RR=1.66,P>;0.001)。广义线性模型(GLM)表明,年份与流感死亡率呈正相关(β=0.18,P<;0.01);而通过互联网订购的家庭和人口比例与流感死亡率呈负相关(分别为β=-4.41,p<;0.05和β=-0.17,p<;0.01)。它在日本有两个时空聚合阶段。通过互联网订购的家庭生活方式有助于降低流感死亡率。
{"title":"Spatio-Temporal Characteristics of Influenza Burden and Its Influence Factors in Japan in the Past Three Decades: An Influenza Disease Burden Data-Based Modeling Study","authors":"Junru Wang ,&nbsp;Shixin Zhang ,&nbsp;Anbang Dai","doi":"10.1016/j.bdr.2023.100384","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100384","url":null,"abstract":"<div><p><strong>Introduction:</strong> Influenza has still posed a great threat to humans. The knowledge of the systematic disease burden of influenza in Japan was limited. The study was aimed to investigate Spatio-temporal characteristics of the influenza burden and its influence factors in the past three decades.</p><p><strong>Methods:</strong> Data on annual death, years lived with disability (YLDs), years of life lost (YLLs) and disability adjusted life year (DALYs) of influenza from 1990 to 2019 in Japan were available from the Global Health Data Exchange (GHDx), and data on annual social household available from e-Stat in Japan. A joinpoint regression model was used to assess the trends of influenza from 1990 to 2019, a discrete Poisson model to analyze the spatial and temporal cluster of influenza, and a generalized linear model to assess the association of death and DALY of influenza with social household factors.</p><p><strong>Results:</strong> From 1990 to 2019, the mortality rate increased from 9.95 per 100000 to 19.49 per 100000 in Japan, with AAPC of 2.2% (95% CI: 1.5, 3.0, P&lt;0.05). The DALYs rate increased from 153.86 per 100000 to 209.22 per 100000, with AAPC of 1.0% (95% CI: 0.1, 1.9, P&lt;0.05). The mortality rate ranged from 1.98 per 100000 (Chiba) to 16.9 per 100000 (Kochi) in 1990, and from 5.10 per 100000 (Chiba) to 35.74 per 100000 (Akita) in 2019. The population aged 60+ had the highest mortality rates from 53.79 per 100000 in 1990 to 55.74 per 100000 in 2019 (AAPC: 0.0%, 95% CI: -0.5, 0.6, P=0.944) and DALYs rates from 713.43 per 100000 to 565.22 per 100000 (AAPC: -0.9%, 95% CI: -1.5, -0.3, P&lt;0.05). YLLs and DALYs rates among the population aged 1-4 were also high from 1990 to 2019, ranked after that among populations aged 60+. The mortality rate had two stages of spatio-temporal aggregation across Japan, northern Japan with the period of 2005-2019 (RR = 1.36, P &lt; 0.001) and southern Japan with the same period in the northern area (RR = 1.36, P &lt; 0.001). The generalized linear model (GLM) indicated that year was positively correlated with the mortality rate of influenza (<em>β</em> = 0.18, p&lt;0.01); while the ratio of households ordered via the internet and population were negatively correlated with the mortality rate of influenza (<em>β</em> = -4.41, p&lt;0.05 and <em>β</em> =-0.17, p&lt;0.01, respectively).</p><p><strong>Conclusions:</strong><span> The disease burden of influenza in Japan increased in the past three decades, especially among the population aged 60+ years, followed by the population aged 1-4 years. It had two stages of spatio-temporal aggregation across Japan. Lifestyle of households ordered via the internet contributed to the low mortality rate of influenza.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49714138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Parallel Fusion Graph Convolutional Network for Aspect-Level Sentiment Analysis 面向层面情感分析的并行融合图卷积网络
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-28 DOI: 10.1016/j.bdr.2023.100378
Yuxin Wu, Guofeng Deng

Sentiment analysis has always been an important basic task in the NLP field. Recently, graph convolutional networks (GCNs) have been widely used in aspect-level sentiment analysis. Because GCNs have good aggregation effects, every node can contain neighboring node information. However, in previous studies, most models used only a single GCN to learn contextual information. The GCN relies on the construction method of the graph, and a single GCN will cause the model to focus on a certain relationship of nodes that depends on the construction method and ignore other information. In addition, when the GCN aggregates node information, it cannot determine whether the aggregated information is useful, so it will inevitably introduce noise. We propose a model that fuses two parallel GCNs to learn different relational features between sentences at the same time, and we add a gate mechanism to the GCN to filter the noise introduced by the GCN when aggregating information. Finally, we validate our model on public datasets, and the experiments show that compared to state-of-the-art models, our model performs better.

情绪分析一直是NLP领域的一项重要的基础性工作。近年来,图卷积网络(GCN)已被广泛应用于方面级情感分析。由于GCN具有良好的聚合效果,每个节点都可以包含相邻节点的信息。然而,在以前的研究中,大多数模型只使用单个GCN来学习上下文信息。GCN依赖于图的构造方法,单个GCN会导致模型专注于依赖于构造方法的某个节点关系,而忽略其他信息。此外,当GCN聚合节点信息时,它无法确定聚合的信息是否有用,因此不可避免地会引入噪声。我们提出了一个模型,该模型融合了两个并行的GCN,以同时学习句子之间的不同关系特征,并在GCN中添加了一个门机制,以过滤GCN在聚合信息时引入的噪声。最后,我们在公共数据集上验证了我们的模型,实验表明,与最先进的模型相比,我们的模型表现更好。
{"title":"A Parallel Fusion Graph Convolutional Network for Aspect-Level Sentiment Analysis","authors":"Yuxin Wu,&nbsp;Guofeng Deng","doi":"10.1016/j.bdr.2023.100378","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100378","url":null,"abstract":"<div><p>Sentiment analysis<span> has always been an important basic task in the NLP<span> field. Recently, graph convolutional networks (GCNs) have been widely used in aspect-level sentiment analysis. Because GCNs have good aggregation effects, every node can contain neighboring node information. However, in previous studies, most models used only a single GCN to learn contextual information. The GCN relies on the construction method of the graph, and a single GCN will cause the model to focus on a certain relationship of nodes that depends on the construction method and ignore other information. In addition, when the GCN aggregates node information, it cannot determine whether the aggregated information is useful, so it will inevitably introduce noise. We propose a model that fuses two parallel GCNs to learn different relational features between sentences at the same time, and we add a gate mechanism to the GCN to filter the noise introduced by the GCN when aggregating information. Finally, we validate our model on public datasets, and the experiments show that compared to state-of-the-art models, our model performs better.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49714247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLPQ: A Dataset for Path Question Answering over Multilingual Knowledge Graphs MLPQ:一个多语言知识图路径问答数据集
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-28 DOI: 10.1016/j.bdr.2023.100381
Yiming Tan , Yongrui Chen , Guilin Qi , Weizhuo Li , Meng Wang

Knowledge Graph-based Multilingual Question Answering (KG-MLQA), as one of the essential subtasks in Knowledge Graph-based Question Answering (KGQA), emphasizes that questions on the KGQA task can be expressed in different languages to solve the lexical gap between questions and knowledge graph(s). However, the existing KG-MLQA works mainly focus on the semantic parsing of multilingual questions but ignore the questions that require integrating information from cross-lingual knowledge graphs (CLKG). This paper extends KG-MLQA to Cross-lingual KG-based multilingual Question Answering (CLKGQA) and constructs the first CLKGQA dataset over multilingual DBpedia named MLPQ, which contains 300K questions in English, Chinese, and French. We further propose a novel KG sampling algorithm for KG construction, making the MLPQ support the research of different types of methods. To evaluate the dataset, we put forward a general question answering workflow whose core idea is to transform CLKGQA into KG-MLQA. We first use the Entity Alignment (EA) model to merge CLKG into a single KG and get the answer to the question by the Multi-hop QA model combined with the Multilingual pre-training model. By instantiating the above QA workflow, we establish two baseline models for MLPQ, one of which uses Google translation to obtain alignment entities, and the other adopts the recent EA model. Experiments show that the baseline models are insufficient to obtain the ideal performances on CLKGQA. Moreover, the availability of our benchmark contributes to the community of question answering and entity alignment.

基于知识图的多语言问答(KG-MLQA)作为基于知识图问答(KGQA)的重要子任务之一,强调KGQA任务中的问题可以用不同的语言表达,以解决问题与知识图之间的词汇差距。然而,现有的KG-MLQA工作主要集中在多语言问题的语义解析上,而忽略了需要整合跨语言知识图信息的问题。本文将KG-MLQA扩展到基于跨语言KG的多语言问答(CLKGQA),并在多语言DBpedia上构建了第一个CLKGQA数据集MLPQ,该数据集包含300K个英语、汉语和法语问题。我们进一步提出了一种用于KG构造的新的KG采样算法,使MLPQ支持不同类型方法的研究。为了评估数据集,我们提出了一个通用的问答工作流,其核心思想是将CLKGQA转换为KG-MLQA。我们首先使用实体对齐(EA)模型将CLKG合并为单个KG,并通过多跳QA模型与多语言预训练模型相结合来获得问题的答案。通过实例化上述QA工作流程,我们为MLPQ建立了两个基线模型,其中一个使用谷歌翻译来获得对齐实体,另一个使用最近的EA模型。实验表明,基线模型不足以在CLKGQA上获得理想的性能。此外,我们的基准的可用性有助于问答和实体协调的社区。
{"title":"MLPQ: A Dataset for Path Question Answering over Multilingual Knowledge Graphs","authors":"Yiming Tan ,&nbsp;Yongrui Chen ,&nbsp;Guilin Qi ,&nbsp;Weizhuo Li ,&nbsp;Meng Wang","doi":"10.1016/j.bdr.2023.100381","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100381","url":null,"abstract":"<div><p>Knowledge Graph-based Multilingual Question Answering (KG-MLQA), as one of the essential subtasks in Knowledge Graph-based Question Answering (KGQA), emphasizes that questions on the KGQA task can be expressed in different languages to solve the lexical gap between questions and knowledge graph(s). However, the existing KG-MLQA works mainly focus on the semantic parsing<span> of multilingual questions but ignore the questions that require integrating information from cross-lingual knowledge graphs (CLKG). This paper extends KG-MLQA to Cross-lingual KG-based multilingual Question Answering (CLKGQA) and constructs the first CLKGQA dataset over multilingual DBpedia named MLPQ, which contains 300K questions in English, Chinese, and French. We further propose a novel KG sampling algorithm<span> for KG construction, making the MLPQ support the research of different types of methods. To evaluate the dataset, we put forward a general question answering workflow whose core idea is to transform CLKGQA into KG-MLQA. We first use the Entity Alignment (EA) model to merge CLKG into a single KG and get the answer to the question by the Multi-hop QA model combined with the Multilingual pre-training model. By instantiating the above QA workflow, we establish two baseline models for MLPQ, one of which uses Google translation to obtain alignment entities, and the other adopts the recent EA model. Experiments show that the baseline models are insufficient to obtain the ideal performances on CLKGQA. Moreover, the availability of our benchmark contributes to the community of question answering and entity alignment.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49729716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spatiotemporal Prediction Based on Feature Classification for Multivariate Floating-Point Time Series Lossy Compression 基于特征分类的多变量浮点时间序列有损压缩时空预测
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-28 DOI: 10.1016/j.bdr.2023.100377
Huimin Feng , Ruizhe Ma , Li Yan , Zongmin Ma

A large amount of time series is produced because of the frequent use of IoT devices and sensors. Time series compression is widely adopted to reduce storage overhead and transport costs. At present, most state-of-the-art approaches focus on univariate time series. Therefore, the task of compressing multivariate time series (MTS) is still an important but challenging problem. Traditional MTS compression methods treat each variable individually, ignoring the correlations across variables. This paper proposes a novel MTS prediction method, which can be applied to compress MTS to achieve a higher compression ratio. The method can extract the spatial and temporal correlation across multiple variables, achieving a more accurate prediction and improving the lossy compression performance of MTS based on the prediction-quantization-entropy framework. We use a convolutional neural network (CNN) to extract the temporal features of all variables within the window length. Then the features generated by CNN are transformed, and the image classification algorithm extracts the spatial features of the transformed data. Predictions are made according to spatiotemporal characteristics. To enhance the robustness of our model, we integrate the AR autoregressive linear model in parallel with the proposed network. Experimental results demonstrate that our work can improve the prediction accuracy of MTS and the MTS compression performance in most cases.

由于物联网设备和传感器的频繁使用,产生了大量的时间序列。时间序列压缩被广泛采用以减少存储开销和传输成本。目前,大多数最先进的方法都集中在单变量时间序列上。因此,压缩多变量时间序列(MTS)的任务仍然是一个重要但具有挑战性的问题。传统的MTS压缩方法单独处理每个变量,忽略变量之间的相关性。本文提出了一种新的MTS预测方法,该方法可用于对MTS进行压缩,以获得更高的压缩比。该方法可以提取多个变量之间的空间和时间相关性,实现更准确的预测,并基于预测量化熵框架提高MTS的有损压缩性能。我们使用卷积神经网络(CNN)来提取窗口长度内所有变量的时间特征。然后对CNN生成的特征进行变换,图像分类算法提取变换后数据的空间特征。根据时空特征进行预测。为了增强我们模型的鲁棒性,我们将AR自回归线性模型与所提出的网络并行集成。实验结果表明,在大多数情况下,我们的工作可以提高MTS的预测精度和MTS的压缩性能。
{"title":"Spatiotemporal Prediction Based on Feature Classification for Multivariate Floating-Point Time Series Lossy Compression","authors":"Huimin Feng ,&nbsp;Ruizhe Ma ,&nbsp;Li Yan ,&nbsp;Zongmin Ma","doi":"10.1016/j.bdr.2023.100377","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100377","url":null,"abstract":"<div><p><span>A large amount of time series is produced because of the frequent use of IoT<span> devices and sensors. Time series compression is widely adopted to reduce storage overhead<span> and transport costs. At present, most state-of-the-art approaches focus on univariate time series. Therefore, the task of compressing multivariate time series (MTS) is still an important but challenging problem. Traditional MTS compression methods treat each variable individually, ignoring the correlations across variables. This paper proposes a novel MTS prediction method, which can be applied to compress MTS to achieve a higher compression ratio. The method can extract the spatial and temporal correlation across multiple variables, achieving a more accurate prediction and improving the lossy </span></span></span>compression performance<span> of MTS based on the prediction-quantization-entropy framework. We use a convolutional neural network<span> (CNN) to extract the temporal features of all variables within the window length. Then the features generated by CNN are transformed, and the image classification algorithm extracts the spatial features of the transformed data. Predictions are made according to spatiotemporal characteristics. To enhance the robustness of our model, we integrate the AR autoregressive linear model in parallel with the proposed network. Experimental results demonstrate that our work can improve the prediction accuracy of MTS and the MTS compression performance in most cases.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49713957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-Learning Based Dynamic Adaptive Relation Learning for Few-Shot Knowledge Graph Completion 基于元学习的知识图补全动态自适应关系学习
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-01 DOI: 10.1016/j.bdr.2023.100394
Linqin Cai, Ling-Yong Wang, Rongdi Yuan, Tingjie Lai
{"title":"Meta-Learning Based Dynamic Adaptive Relation Learning for Few-Shot Knowledge Graph Completion","authors":"Linqin Cai, Ling-Yong Wang, Rongdi Yuan, Tingjie Lai","doi":"10.1016/j.bdr.2023.100394","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100394","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54134981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding 基于混合嵌入变压器网络的僵尸网络DGA域名分类
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2023-05-01 DOI: 10.1016/j.bdr.2023.100395
Ling Ding, Peng Du, Hai-wei Hou, Jian Zhang, Di Jin, Shifei Ding
{"title":"Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding","authors":"Ling Ding, Peng Du, Hai-wei Hou, Jian Zhang, Di Jin, Shifei Ding","doi":"10.1016/j.bdr.2023.100395","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100395","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54134987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1