Big Data Mining and Analytics最新文献_第9页

A comparison of computational approaches for intron retention detection 内含子保留检测的计算方法比较

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020014

Jiantao Zheng;Cuixiang Lin;Zhenpeng Wu;Hong-Dong Li

Intron Retention (IR) is an alternative splicing mode through which introns are retained in mature RNAs rather than being spliced in most cases. IR has been gaining increasing attention in recent years because of its recognized association with gene expression regulation and complex diseases. Continuous efforts have been dedicated to the development of IR detection methods. These methods differ in their metrics to quantify retention propensity, performance to detect IR events, functional enrichment of detected IRs, and computational speed. A systematic experimental comparison would be valuable to the selection and use of existing methods. In this work, we conduct an experimental comparison of existing IR detection methods. Considering the unavailability of a gold standard dataset of intron retention, we compare the IR detection performance on simulation datasets. Then, we compare the IR detection results with real RNA-Seq data. We also describe the use of differential analysis methods to identify disease-associated IRs and compare differential IRs along with their Gene Ontology enrichment, which is illustrated on an Alzheimer's disease RNA-Seq dataset. We discuss key principles and features of existing approaches and outline their differences. This systematic analysis provides helpful guidance for interrogating transcriptomic data from the point of view of IR.

内含子保留（IR）是一种替代剪接模式，在大多数情况下，内含子保留在成熟RNA中，而不是剪接。近年来，IR因其与基因表达调控和复杂疾病的关系而越来越受到关注。一直致力于开发红外探测方法。这些方法在量化保留倾向、检测IR事件的性能、检测到的IR的功能富集和计算速度方面有所不同。系统的实验比较对现有方法的选择和使用是有价值的。在这项工作中，我们对现有的红外探测方法进行了实验比较。考虑到内含子保留金标准数据集的不可用性，我们比较了模拟数据集上的IR检测性能。然后，我们将IR检测结果与真实的RNA-Seq数据进行比较。我们还描述了使用差异分析方法来识别疾病相关的IRs，并比较差异IRs及其基因本体论富集，如阿尔茨海默病RNA-Seq数据集所示。我们讨论了现有方法的主要原则和特点，并概述了它们的差异。该系统分析为从IR的角度询问转录组数据提供了有用的指导。

{"title":"A comparison of computational approaches for intron retention detection","authors":"Jiantao Zheng;Cuixiang Lin;Zhenpeng Wu;Hong-Dong Li","doi":"10.26599/BDMA.2021.9020014","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020014","url":null,"abstract":"Intron Retention (IR) is an alternative splicing mode through which introns are retained in mature RNAs rather than being spliced in most cases. IR has been gaining increasing attention in recent years because of its recognized association with gene expression regulation and complex diseases. Continuous efforts have been dedicated to the development of IR detection methods. These methods differ in their metrics to quantify retention propensity, performance to detect IR events, functional enrichment of detected IRs, and computational speed. A systematic experimental comparison would be valuable to the selection and use of existing methods. In this work, we conduct an experimental comparison of existing IR detection methods. Considering the unavailability of a gold standard dataset of intron retention, we compare the IR detection performance on simulation datasets. Then, we compare the IR detection results with real RNA-Seq data. We also describe the use of differential analysis methods to identify disease-associated IRs and compare differential IRs along with their Gene Ontology enrichment, which is illustrated on an Alzheimer's disease RNA-Seq dataset. We discuss key principles and features of existing approaches and outline their differences. This systematic analysis provides helpful guidance for interrogating transcriptomic data from the point of view of IR.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 1","pages":"15-31"},"PeriodicalIF":13.6,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9663253/09663257.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68077702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Toward intelligent financial advisors for identifying potential clients: A multitask perspective 面向识别潜在客户的智能财务顾问：多任务视角

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020021

Qixiang Shao;Runlong Yu;Hongke Zhao;Chunli Liu;Mengyi Zhang;Hongmei Song;Qi Liu

Intelligent Financial Advisors (IFAs) in online financial applications (apps) have brought new life to personal investment by providing appropriate and high-quality portfolios for users. In real-world scenarios, identifying potential clients is a crucial issue for IFAs, i.e., identifying users who are willing to purchase the portfolios. Thus, extracting useful information from various characteristics of users and further predicting their purchase inclination are urgent. However, two critical problems encountered in real practice make this prediction task challenging, i.e., sample selection bias and data sparsity. In this study, we formalize a potential conversion relationship, i.e., user ! activated user ! client and decompose this relationship into three related tasks. Then, we propose a Multitask Feature Extraction Model (MFEM), which can leverage useful information contained in these related tasks and learn them jointly, thereby solving the two problems simultaneously. In addition, we design a two-stage feature selection algorithm to select highly relevant user features efficiently and accurately from an incredibly huge number of user feature fields. Finally, we conduct extensive experiments on a real-world dataset provided by a famous fintech bank. Experimental results clearly demonstrate the effectiveness of MFEM.

在线金融应用程序中的智能金融顾问（IFA）为用户提供了合适且高质量的投资组合，为个人投资带来了新的活力。在现实世界中，识别潜在客户是IFA的一个关键问题，即识别愿意购买投资组合的用户。因此，迫切需要从用户的各种特征中提取有用的信息，并进一步预测他们的购买倾向。然而，在实际实践中遇到的两个关键问题使这项预测任务具有挑战性，即样本选择偏差和数据稀疏性。在这项研究中，我们形式化了一种潜在的转换关系，即用户！激活的用户！客户端，并将此关系分解为三个相关任务。然后，我们提出了一种多任务特征提取模型（MFEM），该模型可以利用这些相关任务中包含的有用信息并联合学习，从而同时解决这两个问题。此外，我们设计了一种两阶段特征选择算法，从数量惊人的用户特征字段中高效准确地选择高度相关的用户特征。最后，我们在一家著名金融科技银行提供的真实世界数据集上进行了广泛的实验。实验结果清楚地证明了MFEM的有效性。

{"title":"Toward intelligent financial advisors for identifying potential clients: A multitask perspective","authors":"Qixiang Shao;Runlong Yu;Hongke Zhao;Chunli Liu;Mengyi Zhang;Hongmei Song;Qi Liu","doi":"10.26599/BDMA.2021.9020021","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020021","url":null,"abstract":"Intelligent Financial Advisors (IFAs) in online financial applications (apps) have brought new life to personal investment by providing appropriate and high-quality portfolios for users. In real-world scenarios, identifying potential clients is a crucial issue for IFAs, i.e., identifying users who are willing to purchase the portfolios. Thus, extracting useful information from various characteristics of users and further predicting their purchase inclination are urgent. However, two critical problems encountered in real practice make this prediction task challenging, i.e., sample selection bias and data sparsity. In this study, we formalize a potential conversion relationship, i.e., user ! activated user ! client and decompose this relationship into three related tasks. Then, we propose a Multitask Feature Extraction Model (MFEM), which can leverage useful information contained in these related tasks and learn them jointly, thereby solving the two problems simultaneously. In addition, we design a two-stage feature selection algorithm to select highly relevant user features efficiently and accurately from an incredibly huge number of user feature fields. Finally, we conduct extensive experiments on a real-world dataset provided by a famous fintech bank. Experimental results clearly demonstrate the effectiveness of MFEM.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 1","pages":"64-78"},"PeriodicalIF":13.6,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9663253/09663261.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68077805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Exploiting more associations between slots for multi-domain dialog state tracking 利用插槽之间的更多关联进行多域对话框状态跟踪

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020013

Hui Bai;Yan Yang;Jie Wang

Dialog State Tracking (DST) aims to extract the current state from the conversation and plays an important role in dialog systems. Existing methods usually predict the value of each slot independently and do not consider the correlations among slots, which will exacerbate the data sparsity problem because of the increased number of candidate values. In this paper, we propose a multi-domain DST model that integrates slot-relevant information. In particular, certain connections may exist among slots in different domains, and their corresponding values can be obtained through explicit or implicit reasoning. Therefore, we use the graph adjacency matrix to determine the correlation between slots, so that the slots can incorporate more slot-value transformer information. Experimental results show that our approach has performed well on the Multi-domain Wizard-Of-Oz (MultiWOZ) 2.0 and MultiWOZ2.1 datasets, demonstrating the effectiveness and necessity of incorporating slot-relevant information.

对话状态跟踪（DST）旨在从对话中提取当前状态，在对话系统中发挥着重要作用。现有的方法通常独立地预测每个时隙的值，而不考虑时隙之间的相关性，这将由于候选值的数量增加而加剧数据稀疏性问题。在本文中，我们提出了一个集成时隙相关信息的多域DST模型。特别地，不同域中的槽之间可能存在某些连接，并且可以通过显式或隐式推理来获得它们对应的值。因此，我们使用图邻接矩阵来确定槽之间的相关性，以便槽可以包含更多的槽值变换器信息。实验结果表明，我们的方法在多域Wizard Of Oz（MultiWOZ）2.0和MultiWOZ2.1数据集上表现良好，证明了引入时隙相关信息的有效性和必要性。

引用次数: 0

Big data with cloud computing: Discussions and challenges 云计算的大数据：讨论和挑战

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020016

Amanpreet Kaur Sandhu

With the recent advancements in computer technologies, the amount of data available is increasing day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing services provide a powerful environment to store large volumes of data. They eliminate various requirements, such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a time-consuming task that requires large computational clusters to ensure successful data storage and processing. In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud, Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed. Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and data visualization.

随着计算机技术的进步，可用的数据量与日俱增。然而，过多的数据给用户带来了巨大的挑战。同时，云计算服务提供了一个强大的环境来存储大量数据。它们消除了各种要求，例如专用空间和维护昂贵的计算机硬件和软件。处理大数据是一项耗时的任务，需要大型计算集群来确保成功的数据存储和处理。在这项工作中，讨论了大数据的定义、分类和特征，以及各种云服务，如Microsoft Azure、Google cloud、Amazon Web services、International Business Machine cloud、Hortonworks和MapR。还对各种基于云的大数据框架进行了比较分析。从分布式数据库存储、数据安全、异构性和数据可视化等方面定义了各种研究挑战。

引用次数: 47

BCSE: Blockchain-based trusted service evaluation model over big data BCSE：基于区块链的大数据可信服务评估模型

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2020.9020028

Fengyin Li;Xinying Yu;Rui Ge;Yanli Wang;Yang Cui;Huiyu Zhou

The blockchain, with its key characteristics of decentralization, persistence, anonymity, and auditability, has become a solution to overcome the overdependence and lack of trust for a traditional public key infrastructure on third-party institutions. Because of these characteristics, the blockchain is suitable for solving certain open problems in the service-oriented social network, where the unreliability of submitted reviews of service vendors can cause serious security problems. To solve the unreliability problems of submitted reviews, this paper first proposes a blockchain-based identity authentication scheme and a new trusted service evaluation model by introducing the scheme into a service evaluation model. The new trusted service evaluation model consists of the blockchain-based identity authentication scheme, evaluation submission module, and evaluation publicity module. In the proposed evaluation model, only users who have successfully been authenticated can submit reviews to service vendors. The registration and authentication records of users' identity and the reviews for service vendors are all stored in the blockchain network. The security analysis shows that this model can ensure the credibility of users' reviews for service vendors, and other users can obtain credible reviews of service vendors via the review publicity module. The experimental results also show that the proposed model has a lower review submission delay than other models.

区块链具有去中心化、持久性、匿名性和可审计性的关键特征，已成为克服传统公钥基础设施对第三方机构过度依赖和缺乏信任的解决方案。由于这些特性，区块链适用于解决面向服务的社交网络中的某些开放性问题，服务供应商提交的评论的不可靠性可能会导致严重的安全问题。为了解决提交评论的不可靠性问题，本文首先提出了一种基于区块链的身份认证方案和一种新的可信服务评估模型，并将该方案引入到服务评估模型中。新的可信服务评估模型由基于区块链的身份认证方案、评估提交模块和评估公示模块组成。在所提出的评估模型中，只有成功通过身份验证的用户才能向服务供应商提交评论。用户身份的注册和认证记录以及对服务供应商的审查都存储在区块链网络中。安全分析表明，该模型可以确保用户对服务供应商的评价的可信度，其他用户可以通过评价公示模块获得服务供应商的可信评价。实验结果还表明，与其他模型相比，所提出的模型具有更低的评审提交延迟。

{"title":"BCSE: Blockchain-based trusted service evaluation model over big data","authors":"Fengyin Li;Xinying Yu;Rui Ge;Yanli Wang;Yang Cui;Huiyu Zhou","doi":"10.26599/BDMA.2020.9020028","DOIUrl":"https://doi.org/10.26599/BDMA.2020.9020028","url":null,"abstract":"The blockchain, with its key characteristics of decentralization, persistence, anonymity, and auditability, has become a solution to overcome the overdependence and lack of trust for a traditional public key infrastructure on third-party institutions. Because of these characteristics, the blockchain is suitable for solving certain open problems in the service-oriented social network, where the unreliability of submitted reviews of service vendors can cause serious security problems. To solve the unreliability problems of submitted reviews, this paper first proposes a blockchain-based identity authentication scheme and a new trusted service evaluation model by introducing the scheme into a service evaluation model. The new trusted service evaluation model consists of the blockchain-based identity authentication scheme, evaluation submission module, and evaluation publicity module. In the proposed evaluation model, only users who have successfully been authenticated can submit reviews to service vendors. The registration and authentication records of users' identity and the reviews for service vendors are all stored in the blockchain network. The security analysis shows that this model can ensure the credibility of users' reviews for service vendors, and other users can obtain credible reviews of service vendors via the review publicity module. The experimental results also show that the proposed model has a lower review submission delay than other models.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 1","pages":"1-14"},"PeriodicalIF":13.6,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9663253/09663256.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68077704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Sampling with prior knowledge for high-dimensional gravitational wave data analysis 高维引力波数据分析的先验知识采样

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020018

He Wang;Zhoujian Cao;Yue Zhou;Zong-Kuan Guo;Zhixiang Ren

Extracting knowledge from high-dimensional data has been notoriously difficult, primarily due to the so-called "curse of dimensionality" and the complex joint distributions of these dimensions. This is a particularly profound issue for high-dimensional gravitational wave data analysis where one requires to conduct Bayesian inference and estimate joint posterior distributions. In this study, we incorporate prior physical knowledge by sampling from desired interim distributions to develop the training dataset. Accordingly, the more relevant regions of the high-dimensional feature space are covered by additional data points, such that the model can learn the subtle but important details. We adapt the normalizing flow method to be more expressive and trainable, such that the information can be effectively extracted and represented by the transformation between the prior and target distributions. Once trained, our model only takes approximately 1 s on one V100 GPU to generate thousands of samples for probabilistic inference purposes. The evaluation of our approach confirms the efficacy and efficiency of gravitational wave data inferences and points to a promising direction for similar research. The source code, specifications, and detailed procedures are publicly accessible on GitHub.

从高维数据中提取知识一直是出了名的困难，主要是由于所谓的“维度诅咒”和这些维度的复杂联合分布。对于高维引力波数据分析来说，这是一个特别深刻的问题，需要进行贝叶斯推理并估计联合后验分布。在这项研究中，我们通过从期望的中期分布中采样来结合先前的物理知识，以开发训练数据集。因此，高维特征空间中更相关的区域被额外的数据点覆盖，使得模型可以学习细微但重要的细节。我们对归一化流方法进行了调整，使其更具表达性和可训练性，从而可以通过先验分布和目标分布之间的转换来有效地提取和表示信息。一旦经过训练，我们的模型在一个V100 GPU上只需要大约1秒就可以生成数千个样本用于概率推理。对我们方法的评估证实了引力波数据推断的有效性和效率，并为类似研究指明了一个有希望的方向。源代码、规范和详细过程可在GitHub上公开访问。

{"title":"Sampling with prior knowledge for high-dimensional gravitational wave data analysis","authors":"He Wang;Zhoujian Cao;Yue Zhou;Zong-Kuan Guo;Zhixiang Ren","doi":"10.26599/BDMA.2021.9020018","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020018","url":null,"abstract":"Extracting knowledge from high-dimensional data has been notoriously difficult, primarily due to the so-called \"curse of dimensionality\" and the complex joint distributions of these dimensions. This is a particularly profound issue for high-dimensional gravitational wave data analysis where one requires to conduct Bayesian inference and estimate joint posterior distributions. In this study, we incorporate prior physical knowledge by sampling from desired interim distributions to develop the training dataset. Accordingly, the more relevant regions of the high-dimensional feature space are covered by additional data points, such that the model can learn the subtle but important details. We adapt the normalizing flow method to be more expressive and trainable, such that the information can be effectively extracted and represented by the transformation between the prior and target distributions. Once trained, our model only takes approximately 1 s on one V100 GPU to generate thousands of samples for probabilistic inference purposes. The evaluation of our approach confirms the efficacy and efficiency of gravitational wave data inferences and points to a promising direction for similar research. The source code, specifications, and detailed procedures are publicly accessible on GitHub.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 1","pages":"53-63"},"PeriodicalIF":13.6,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9663253/09663260.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68077806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Call for papers: Special issue on deep learning and evolutionary computation for satellite imagery 论文征集:卫星图像的深度学习和进化计算特刊

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020025

Satellite images are humungous sources of data that require efficient methods for knowledge discovery. The increased availability of earth data from satellite images has immense opportunities in various fields. However, the volume and heterogeneity of data poses serious computational challenges. The development of efficient techniques has the potential of discovering hidden information from these images. This knowledge can be used in various activities related to planning, monitoring, and managing the earth resources. Deep learning are being widely used for image analysis and processing. Deep learning based models can be effectively used for mining and knowledge discovery from satellite images.

卫星图像是巨大的数据来源，需要有效的知识发现方法。卫星图像中地球数据的可用性增加在各个领域都有巨大的机会。然而，数据的数量和异构性带来了严重的计算挑战。高效技术的发展有可能从这些图像中发现隐藏的信息。这些知识可以用于与规划、监测和管理地球资源有关的各种活动。深度学习正被广泛用于图像分析和处理。基于深度学习的模型可以有效地用于从卫星图像中挖掘和发现知识。

引用次数: 0

Call for papers: Special issue on privacy-preserving data mining for artificial intelligence of things 论文征集：为事物的人工智能保护隐私的数据挖掘特刊

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-12-27 DOI: 10.26599/BDMA.2021.9020026

Artificial Intelligence of Things (AIoT) is experiencing unimaginable fast booming with the popularization of end devices and advanced machine learning and data processing techniques. An increasing volume of data is being collected every single second to enable Artificial Intelligence (AI) on the Internet of Things (IoT). The explosion of data brings significant benefits to various intelligent industries to provide predictive services and research institutes to advance human knowledge in data-intensive fields. To make the best use of the collected data, various data mining techniques have been deployed to extract data patterns. In classic scenarios, the data collected from IoT devices is directly sent to cloud servers for processing using diverse methods such as training machine learning models. However, the network between cloud servers and massive end devices may not be stable due to irregular bursts of traffic, weather, etc. Therefore, autonomous data mining that is self-organized by a group of local devices to maintain ongoing and robust AI services plays a growing important role for critical IoT infrastructures. Privacy issues become more concerning in this scenario. The data transmitted via autonomous networks are publicly accessible by all internal participants, which increases the risk of exposure. Besides, data mining techniques may reveal sensitive information from the collected data. Various attacks, such as inference attacks, are emerging and evolving to breach sensitive data due to its great financial benefits. Motivated by this, it is essential to devise novel privacy-preserving autonomous data mining solutions for AIoT. In this Special Issue, we aim to gather state-of-art advances in privacy-preserving data mining and autonomous data processing solutions for AIoT. Topics include, but are not limited to, the following: • Privacy-preserving federated learning for AIoT • Differentially private machine learning for AIoT • Personalized privacy-preserving data mining • Decentralized machine learning paradigms for autonomous data mining using blockchain • AI-enhanced edge data mining for AIoT • AI and blockchain empowered privacy-preserving big data analytics for AIoT • Anomaly detection and inference attack defense for AIoT • Privacy protection measurement metrics • Zero trust architectures for privacy protection management • Privacy protection data mining and analysis via blockchain-enabled digital twin.

随着终端设备以及先进的机器学习和数据处理技术的普及，物联网正经历着难以想象的快速发展。每秒都在收集越来越多的数据，以实现物联网上的人工智能。数据的爆炸性增长为提供预测服务的各种智能行业和在数据密集型领域推进人类知识的研究机构带来了巨大的好处。为了最大限度地利用收集到的数据，已经部署了各种数据挖掘技术来提取数据模式。在经典场景中，从物联网设备收集的数据被直接发送到云服务器，使用各种方法进行处理，如训练机器学习模型。然而，由于流量、天气等的不规则爆发，云服务器和大型终端设备之间的网络可能不稳定。因此，由一组本地设备自行组织的自主数据挖掘，以维持持续和强大的人工智能服务，在关键的物联网基础设施中发挥着越来越重要的作用。在这种情况下，隐私问题变得更加令人担忧。通过自主网络传输的数据可供所有内部参与者公开访问，这增加了暴露的风险。此外，数据挖掘技术可能会从收集的数据中揭示敏感信息。各种攻击，如推理攻击，由于其巨大的经济利益，正在出现并发展为破坏敏感数据。基于此，为AIoT设计新的隐私保护自主数据挖掘解决方案至关重要。在本期特刊中，我们旨在收集AIoT隐私保护数据挖掘和自主数据处理解决方案的最新进展。主题包括但不限于，以下内容：•AIoT的隐私保护联合学习•AIoT的差异私有机器学习•个性化隐私保护数据挖掘•使用区块链的自主数据挖掘的去中心化机器学习范式•AIoTAI增强的边缘数据挖掘•AI和区块链增强的AIoT隐私保护大数据分析•异常检测以及AIoT的推断攻击防御•隐私保护测量指标•隐私保护管理的零信任架构•通过区块链实现的数字孪生进行隐私保护数据挖掘和分析。

{"title":"Call for papers: Special issue on privacy-preserving data mining for artificial intelligence of things","authors":"","doi":"10.26599/BDMA.2021.9020026","DOIUrl":"10.26599/BDMA.2021.9020026","url":null,"abstract":"Artificial Intelligence of Things (AIoT) is experiencing unimaginable fast booming with the popularization of end devices and advanced machine learning and data processing techniques. An increasing volume of data is being collected every single second to enable Artificial Intelligence (AI) on the Internet of Things (IoT). The explosion of data brings significant benefits to various intelligent industries to provide predictive services and research institutes to advance human knowledge in data-intensive fields. To make the best use of the collected data, various data mining techniques have been deployed to extract data patterns. In classic scenarios, the data collected from IoT devices is directly sent to cloud servers for processing using diverse methods such as training machine learning models. However, the network between cloud servers and massive end devices may not be stable due to irregular bursts of traffic, weather, etc. Therefore, autonomous data mining that is self-organized by a group of local devices to maintain ongoing and robust AI services plays a growing important role for critical IoT infrastructures. Privacy issues become more concerning in this scenario. The data transmitted via autonomous networks are publicly accessible by all internal participants, which increases the risk of exposure. Besides, data mining techniques may reveal sensitive information from the collected data. Various attacks, such as inference attacks, are emerging and evolving to breach sensitive data due to its great financial benefits. Motivated by this, it is essential to devise novel privacy-preserving autonomous data mining solutions for AIoT. In this Special Issue, we aim to gather state-of-art advances in privacy-preserving data mining and autonomous data processing solutions for AIoT. Topics include, but are not limited to, the following: • Privacy-preserving federated learning for AIoT • Differentially private machine learning for AIoT • Personalized privacy-preserving data mining • Decentralized machine learning paradigms for autonomous data mining using blockchain • AI-enhanced edge data mining for AIoT • AI and blockchain empowered privacy-preserving big data analytics for AIoT • Anomaly detection and inference attack defense for AIoT • Privacy protection measurement metrics • Zero trust architectures for privacy protection management • Privacy protection data mining and analysis via blockchain-enabled digital twin.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 1","pages":"80-80"},"PeriodicalIF":13.6,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9663253/09663263.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41991676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attention-aware heterogeneous graph neural network 注意感知异构图神经网络

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-08-26 DOI: 10.26599/BDMA.2021.9020008

Jintao Zhang;Quan Xu

As a powerful tool for elucidating the embedding representation of graph-structured data, Graph Neural Networks (GNNs), which are a series of powerful tools built on homogeneous networks, have been widely used in various data mining tasks. It is a huge challenge to apply a GNN to an embedding Heterogeneous Information Network (HIN). The main reason for this challenge is that HINs contain many different types of nodes and different types of relationships between nodes. HIN contains rich semantic and structural information, which requires a specially designed graph neural network. However, the existing HIN-based graph neural network models rarely consider the interactive information hidden between the meta-paths of HIN in the poor embedding of nodes in the HIN. In this paper, we propose an Attention-aware Heterogeneous graph Neural Network (AHNN) model to effectively extract useful information from HIN and use it to learn the embedding representation of nodes. Specifically, we first use node-level attention to aggregate and update the embedding representation of nodes, and then concatenate the embedding representation of the nodes on different meta-paths. Finally, the semantic-level neural network is proposed to extract the feature interaction relationships on different meta-paths and learn the final embedding of nodes. Experimental results on three widely used datasets showed that the AHNN model could significantly outperform the state-of-the-art models.

作为解释图结构数据嵌入表示的强大工具，图神经网络（GNNs）是建立在同构网络上的一系列强大工具，已被广泛应用于各种数据挖掘任务中。将GNN应用于嵌入式异构信息网络是一个巨大的挑战。这一挑战的主要原因是HIN包含许多不同类型的节点以及节点之间不同类型的关系。HIN包含丰富的语义和结构信息，这需要专门设计的图神经网络。然而，现有的基于HIN的图神经网络模型很少考虑隐藏在HIN元路径之间的交互信息，因为节点在HIN中的嵌入很差。在本文中，我们提出了一种注意感知异构图神经网络（AHNN）模型，以有效地从HIN中提取有用的信息，并使用它来学习节点的嵌入表示。具体来说，我们首先使用节点级别的注意力来聚合和更新节点的嵌入表示，然后将不同元路径上的节点嵌入表示连接起来。最后，提出了语义级神经网络来提取不同元路径上的特征交互关系，并学习节点的最终嵌入。在三个广泛使用的数据集上的实验结果表明，AHNN模型可以显著优于最先进的模型。

{"title":"Attention-aware heterogeneous graph neural network","authors":"Jintao Zhang;Quan Xu","doi":"10.26599/BDMA.2021.9020008","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020008","url":null,"abstract":"As a powerful tool for elucidating the embedding representation of graph-structured data, Graph Neural Networks (GNNs), which are a series of powerful tools built on homogeneous networks, have been widely used in various data mining tasks. It is a huge challenge to apply a GNN to an embedding Heterogeneous Information Network (HIN). The main reason for this challenge is that HINs contain many different types of nodes and different types of relationships between nodes. HIN contains rich semantic and structural information, which requires a specially designed graph neural network. However, the existing HIN-based graph neural network models rarely consider the interactive information hidden between the meta-paths of HIN in the poor embedding of nodes in the HIN. In this paper, we propose an Attention-aware Heterogeneous graph Neural Network (AHNN) model to effectively extract useful information from HIN and use it to learn the embedding representation of nodes. Specifically, we first use node-level attention to aggregate and update the embedding representation of nodes, and then concatenate the embedding representation of the nodes on different meta-paths. Finally, the semantic-level neural network is proposed to extract the feature interaction relationships on different meta-paths and learn the final embedding of nodes. Experimental results on three widely used datasets showed that the AHNN model could significantly outperform the state-of-the-art models.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 4","pages":"233-241"},"PeriodicalIF":13.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9523493/09523497.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68020434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Multimodal adaptive identity-recognition algorithm fused with gait perception 融合步态感知的多模式自适应身份识别算法

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2021-08-26 DOI: 10.26599/BDMA.2021.9020006

Changjie Wang;Zhihua Li;Benjamin Sarpong

Identity-recognition technologies require assistive equipment, whereas they are poor in recognition accuracy and expensive. To overcome this deficiency, this paper proposes several gait feature identification algorithms. First, in combination with the collected gait information of individuals from triaxial accelerometers on smartphones, the collected information is preprocessed, and multimodal fusion is used with the existing standard datasets to yield a multimodal synthetic dataset; then, with the multimodal characteristics of the collected biological gait information, a Convolutional Neural Network based Gait Recognition (CNN-GR) model and the related scheme for the multimodal features are developed; at last, regarding the proposed CNN-GR model and scheme, a unimodal gait feature identity single-gait feature identification algorithm and a multimodal gait feature fusion identity multimodal gait information algorithm are proposed. Experimental results show that the proposed algorithms perform well in recognition accuracy, the confusion matrix, and the kappa statistic, and they have better recognition scores and robustness than the compared algorithms; thus, the proposed algorithm has prominent promise in practice.

身份识别技术需要辅助设备，但识别精度低且价格昂贵。为了克服这一不足，本文提出了几种步态特征识别算法。首先，结合智能手机上三轴加速度计采集的个体步态信息，对采集的信息进行预处理，并将多模态融合与现有标准数据集相结合，生成多模态合成数据集；然后，根据采集到的生物步态信息的多模式特征，建立了基于卷积神经网络的步态识别（CNN-GR）模型和多模式特征的相关方案；最后，针对所提出的CNN-GR模型和方案，提出了一种单模态步态特征识别算法和多模态步态特征融合识别多模态步态信息算法。实验结果表明，所提出的算法在识别精度、混淆矩阵和kappa统计量方面都表现良好，并且比比较算法具有更好的识别分数和鲁棒性；因此，该算法在实际应用中具有突出的前景。

{"title":"Multimodal adaptive identity-recognition algorithm fused with gait perception","authors":"Changjie Wang;Zhihua Li;Benjamin Sarpong","doi":"10.26599/BDMA.2021.9020006","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020006","url":null,"abstract":"Identity-recognition technologies require assistive equipment, whereas they are poor in recognition accuracy and expensive. To overcome this deficiency, this paper proposes several gait feature identification algorithms. First, in combination with the collected gait information of individuals from triaxial accelerometers on smartphones, the collected information is preprocessed, and multimodal fusion is used with the existing standard datasets to yield a multimodal synthetic dataset; then, with the multimodal characteristics of the collected biological gait information, a Convolutional Neural Network based Gait Recognition (CNN-GR) model and the related scheme for the multimodal features are developed; at last, regarding the proposed CNN-GR model and scheme, a unimodal gait feature identity single-gait feature identification algorithm and a multimodal gait feature fusion identity multimodal gait information algorithm are proposed. Experimental results show that the proposed algorithms perform well in recognition accuracy, the confusion matrix, and the kappa statistic, and they have better recognition scores and robustness than the compared algorithms; thus, the proposed algorithm has prominent promise in practice.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 4","pages":"223-232"},"PeriodicalIF":13.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9523493/09523496.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68022963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9