首页 > 最新文献

Big Data Mining and Analytics最新文献

英文 中文
Identification of Proteins and Genes Associated with Hedgehog Signaling Pathway Involved in Neoplasm Formation Using Text-Mining Approach 利用文本挖掘方法识别与涉及肿瘤形成的刺猬信号通路相关的蛋白质和基因
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020007
Analysis of molecular mechanisms that lead to the development of various types of tumors is essential for biology and medicine, because it may help to find new therapeutic opportunities for cancer treatment and cure including personalized treatment approaches. One of the pathways known to be important for the development of neoplastic diseases and pathological processes is the Hedgehog signaling pathway that normally controls human embryonic development. Systematic accumulation of various types of biological data, including interactions between proteins, regulation of genes transcription, proteomics, and metabolomics experiments results, allows the application of computational analysis of these big data for identification of key molecular mechanisms of certain diseases and pathologies and promising therapeutic targets. The aim of this study is to develop a computational approach for revealing associations between human proteins and genes interacting with the Hedgehog pathway components, as well as for identifying their roles in the development of various types of tumors. We automatically collect sets of abstract texts from the NCBI PubMed bibliographic database. For recognition of the Hedgehog pathway proteins and genes and neoplastic diseases we use a dictionary-based named entity recognition approach, while for all other proteins and genes machine learning method is used. For association extraction, we develop a set of semantic rules. We complete the results of the text analysis with the gene set enrichment analysis. The identified key pathways that may influence the Hedgehog pathway and their roles in tumor development are then verified using the information in the literature.
分析导致各种类型肿瘤发生的分子机制对生物学和医学至关重要,因为这有助于为癌症治疗和治愈找到新的治疗机会,包括个性化治疗方法。已知对肿瘤性疾病的发展和病理过程非常重要的途径之一是刺猬信号途径,它通常控制着人类胚胎的发育。通过系统积累各类生物数据,包括蛋白质之间的相互作用、基因转录调控、蛋白质组学和代谢组学实验结果,可以应用这些大数据进行计算分析,从而确定某些疾病和病理的关键分子机制以及有希望的治疗靶点。本研究的目的是开发一种计算方法,用于揭示与刺猬通路成分相互作用的人类蛋白质和基因之间的关联,并确定它们在各类肿瘤发生发展中的作用。我们从 NCBI PubMed 书目数据库中自动收集摘要文本集。对于刺猬通路蛋白和基因以及肿瘤性疾病的识别,我们采用了基于字典的命名实体识别方法,而对于所有其他蛋白和基因则采用了机器学习方法。在关联提取方面,我们制定了一套语义规则。我们通过基因组富集分析来完善文本分析结果。然后,我们利用文献信息对已确定的可能影响刺猬蛋白通路的关键通路及其在肿瘤发生中的作用进行了验证。
{"title":"Identification of Proteins and Genes Associated with Hedgehog Signaling Pathway Involved in Neoplasm Formation Using Text-Mining Approach","authors":"","doi":"10.26599/BDMA.2023.9020007","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020007","url":null,"abstract":"Analysis of molecular mechanisms that lead to the development of various types of tumors is essential for biology and medicine, because it may help to find new therapeutic opportunities for cancer treatment and cure including personalized treatment approaches. One of the pathways known to be important for the development of neoplastic diseases and pathological processes is the Hedgehog signaling pathway that normally controls human embryonic development. Systematic accumulation of various types of biological data, including interactions between proteins, regulation of genes transcription, proteomics, and metabolomics experiments results, allows the application of computational analysis of these big data for identification of key molecular mechanisms of certain diseases and pathologies and promising therapeutic targets. The aim of this study is to develop a computational approach for revealing associations between human proteins and genes interacting with the Hedgehog pathway components, as well as for identifying their roles in the development of various types of tumors. We automatically collect sets of abstract texts from the NCBI PubMed bibliographic database. For recognition of the Hedgehog pathway proteins and genes and neoplastic diseases we use a dictionary-based named entity recognition approach, while for all other proteins and genes machine learning method is used. For association extraction, we develop a set of semantic rules. We complete the results of the text analysis with the gene set enrichment analysis. The identified key pathways that may influence the Hedgehog pathway and their roles in tumor development are then verified using the information in the literature.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"25-93"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limits of Depth: Over-Smoothing and Over-Squashing in GNNs 深度极限:GNN 中的过度平滑和过度扭曲
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020019
Aafaq Mohi ud din;Shaima Qureshi
Graph Neural Networks (GNNs) have become a widely used tool for learning and analyzing data on graph structures, largely due to their ability to preserve graph structure and properties via graph representation learning. However, the effect of depth on the performance of GNNs, particularly isotropic and anisotropic models, remains an active area of research. This study presents a comprehensive exploration of the impact of depth on GNNs, with a focus on the phenomena of over-smoothing and the bottleneck effect in deep graph neural networks. Our research investigates the tradeoff between depth and performance, revealing that increasing depth can lead to over-smoothing and a decrease in performance due to the bottleneck effect. We also examine the impact of node degrees on classification accuracy, finding that nodes with low degrees can pose challenges for accurate classification. Our experiments use several benchmark datasets and a range of evaluation metrics to compare isotropic and anisotropic GNNs of varying depths, also explore the scalability of these models. Our findings provide valuable insights into the design of deep GNNs and offer potential avenues for future research to improve their performance.
图神经网络(GNN)已成为学习和分析图结构数据的一种广泛应用的工具,这主要是由于它们能够通过图表示学习保留图结构和属性。然而,深度对 GNN 性能的影响,尤其是对各向同性和各向异性模型的影响,仍然是一个活跃的研究领域。本研究全面探讨了深度对 GNN 的影响,重点关注深度图神经网络中的过度平滑和瓶颈效应现象。我们的研究调查了深度与性能之间的权衡,发现深度的增加会导致过度平滑,并因瓶颈效应而降低性能。我们还研究了节点度对分类准确性的影响,发现低度节点会给准确分类带来挑战。我们的实验使用了多个基准数据集和一系列评估指标来比较不同深度的各向同性和各向异性 GNN,同时还探索了这些模型的可扩展性。我们的研究结果为深度 GNN 的设计提供了有价值的见解,并为未来提高其性能的研究提供了潜在的途径。
{"title":"Limits of Depth: Over-Smoothing and Over-Squashing in GNNs","authors":"Aafaq Mohi ud din;Shaima Qureshi","doi":"10.26599/BDMA.2023.9020019","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020019","url":null,"abstract":"Graph Neural Networks (GNNs) have become a widely used tool for learning and analyzing data on graph structures, largely due to their ability to preserve graph structure and properties via graph representation learning. However, the effect of depth on the performance of GNNs, particularly isotropic and anisotropic models, remains an active area of research. This study presents a comprehensive exploration of the impact of depth on GNNs, with a focus on the phenomena of over-smoothing and the bottleneck effect in deep graph neural networks. Our research investigates the tradeoff between depth and performance, revealing that increasing depth can lead to over-smoothing and a decrease in performance due to the bottleneck effect. We also examine the impact of node degrees on classification accuracy, finding that nodes with low degrees can pose challenges for accurate classification. Our experiments use several benchmark datasets and a range of evaluation metrics to compare isotropic and anisotropic GNNs of varying depths, also explore the scalability of these models. Our findings provide valuable insights into the design of deep GNNs and offer potential avenues for future research to improve their performance.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"205-216"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372997","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PURP: A Scalable System for Predicting Short-Term Urban Traffic Flow Based on License Plate Recognition Data PURP: 基于车牌识别数据预测短期城市交通流量的可扩展系统
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020017
Shan Zhang;Qinkai Jiang;Hao Li;Bin Cao;Jing Fan
Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time, consequently helping them avoid congestion and accidents to a certain extent. However, the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency, which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure. To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining, herein, we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data (PURP). First, to ensure prediction accuracy, PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition (LPR) data as effective characteristics. Subsequently, to utilize the recent data without retraining the model online, PURP uses the nonparametric method k-Nearest Neighbor (namely KNN) as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online. The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.
准确高效的城市交通流量预测可以帮助驾驶员实时识别道路交通状况,从而在一定程度上帮助驾驶员避免拥堵和事故。然而,现有的城市交通流量实时预测方法只注重提高模型预测精度或效率,而忽视了训练效率,导致预测系统缺乏可扩展性,无法将实时交通流量纳入训练过程。为了在考虑最新历史数据的同时进行准确、实时的城市交通流量预测,避免耗时的在线再训练,我们在本文中提出了一种可扩展的基于车牌识别数据的短期URban交通流量实时预测系统(PURP)。首先,为确保预测的准确性,PURP 从车牌识别(LPR)数据中构建了交通流量预测的时空背景,作为有效特征。随后,为了利用近期数据而无需在线重新训练模型,PURP 使用非参数方法 k-近邻(即 KNN)作为预测框架,因为 KNN 可以有效识别前 k 个最相似的时空背景,并基于这些背景进行预测,而无需耗时的在线模型重新训练。实验结果表明,随着预测周期的延长,PURP 仍能保持较高的预测效率。
{"title":"PURP: A Scalable System for Predicting Short-Term Urban Traffic Flow Based on License Plate Recognition Data","authors":"Shan Zhang;Qinkai Jiang;Hao Li;Bin Cao;Jing Fan","doi":"10.26599/BDMA.2023.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020017","url":null,"abstract":"Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time, consequently helping them avoid congestion and accidents to a certain extent. However, the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency, which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure. To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining, herein, we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data (PURP). First, to ensure prediction accuracy, PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition (LPR) data as effective characteristics. Subsequently, to utilize the recent data without retraining the model online, PURP uses the nonparametric method k-Nearest Neighbor (namely KNN) as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online. The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"171-187"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372996","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gender-Based Analysis of User Reactions to Facebook Posts 基于性别的用户对 Facebook 帖子反应分析
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020005
Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour
Online Social Networks (OSNs) are based on the sharing of different types of information and on various interactions (comments, reactions, and sharing). One of these important actions is the emotional reaction to the content. The diversity of reaction types available on Facebook (namely FB) enables users to express their feelings, and its traceability creates and enriches the users' emotional identity in the virtual world. This paper is based on the analysis of 119875012 FB reactions (Like, Love, Haha, Wow, Sad, Angry, Thankful, and Pride) made at multiple levels (publications, comments, and sub-comments) to study and classify the users' emotional behavior, visualize the distribution of different types of reactions, and analyze the gender impact on emotion generation. All of these can be achieved by addressing these research questions: who reacts the most? Which emotion is the most expressed?
在线社交网络(OSN)以共享不同类型的信息和各种互动(评论、反应和分享)为基础。其中一个重要的行为就是对内容的情绪反应。Facebook (即 FB)上反应类型的多样性使用户能够表达自己的情感,其可追溯性创造并丰富了用户在虚拟世界中的情感认同。本文基于对 119875012 次 FB 反应(赞、爱、哈哈、哇、悲伤、愤怒、感恩和自豪)的多层次(发表、评论和子评论)分析,对用户的情感行为进行研究和分类,可视化不同反应类型的分布,并分析性别对情感产生的影响。所有这些都可以通过解决这些研究问题来实现:谁的反应最多?哪种情绪表达最多?
{"title":"Gender-Based Analysis of User Reactions to Facebook Posts","authors":"Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour","doi":"10.26599/BDMA.2023.9020005","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020005","url":null,"abstract":"Online Social Networks (OSNs) are based on the sharing of different types of information and on various interactions (comments, reactions, and sharing). One of these important actions is the emotional reaction to the content. The diversity of reaction types available on Facebook (namely FB) enables users to express their feelings, and its traceability creates and enriches the users' emotional identity in the virtual world. This paper is based on the analysis of 119875012 FB reactions (Like, Love, Haha, Wow, Sad, Angry, Thankful, and Pride) made at multiple levels (publications, comments, and sub-comments) to study and classify the users' emotional behavior, visualize the distribution of different types of reactions, and analyze the gender impact on emotion generation. All of these can be achieved by addressing these research questions: who reacts the most? Which emotion is the most expressed?","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"75-86"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372951","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Event Tracking in Social Media Data Streams 社交媒体数据流中的事件跟踪调查
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020021
Zixuan Han;Leilei Shi;Lu Liu;Liang Jiang;Jiawei Fang;Fanyuan Lin;Jinjuan Zhang;John Panneerselvam;Nick Antonopoulos
Social networks are inevitable parts of our daily life, where an unprecedented amount of complex data corresponding to a diverse range of applications are generated. As such, it is imperative to conduct research on social events and patterns from the perspectives of conventional sociology to optimize services that originate from social networks. Event tracking in social networks finds various applications, such as network security and societal governance, which involves analyzing data generated by user groups on social networks in real time. Moreover, as deep learning techniques continue to advance and make important breakthroughs in various fields, researchers are using this technology to progressively optimize the effectiveness of Event Detection (ED) and tracking algorithms. In this regard, this paper presents an in-depth comprehensive review of the concept and methods involved in ED and tracking in social networks. We introduce mainstream event tracking methods, which involve three primary technical steps: ED, event propagation, and event evolution. Finally, we introduce benchmark datasets and evaluation metrics for ED and tracking, which allow comparative analysis on the performance of mainstream methods. Finally, we present a comprehensive analysis of the main research findings and existing limitations in this field, as well as future research prospects and challenges.
社交网络是我们日常生活中不可避免的一部分,在社交网络中产生了前所未有的大量复杂数据,这些数据与各种不同的应用相对应。因此,必须从传统社会学的角度对社交事件和模式进行研究,以优化源自社交网络的服务。社交网络中的事件追踪有多种应用,如网络安全和社会治理,其中涉及对社交网络上用户群体产生的数据进行实时分析。此外,随着深度学习技术的不断进步和在各个领域的重要突破,研究人员正在利用这一技术逐步优化事件检测(ED)和跟踪算法的有效性。为此,本文对社交网络中事件检测和跟踪的概念和方法进行了深入全面的综述。我们介绍了主流的事件跟踪方法,其中涉及三个主要技术步骤:ED、事件传播和事件演化。最后,我们介绍了 ED 和跟踪的基准数据集和评估指标,以便对主流方法的性能进行比较分析。最后,我们全面分析了该领域的主要研究成果和现有局限性,以及未来的研究前景和挑战。
{"title":"A Survey on Event Tracking in Social Media Data Streams","authors":"Zixuan Han;Leilei Shi;Lu Liu;Liang Jiang;Jiawei Fang;Fanyuan Lin;Jinjuan Zhang;John Panneerselvam;Nick Antonopoulos","doi":"10.26599/BDMA.2023.9020021","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020021","url":null,"abstract":"Social networks are inevitable parts of our daily life, where an unprecedented amount of complex data corresponding to a diverse range of applications are generated. As such, it is imperative to conduct research on social events and patterns from the perspectives of conventional sociology to optimize services that originate from social networks. Event tracking in social networks finds various applications, such as network security and societal governance, which involves analyzing data generated by user groups on social networks in real time. Moreover, as deep learning techniques continue to advance and make important breakthroughs in various fields, researchers are using this technology to progressively optimize the effectiveness of Event Detection (ED) and tracking algorithms. In this regard, this paper presents an in-depth comprehensive review of the concept and methods involved in ED and tracking in social networks. We introduce mainstream event tracking methods, which involve three primary technical steps: ED, event propagation, and event evolution. Finally, we introduce benchmark datasets and evaluation metrics for ED and tracking, which allow comparative analysis on the performance of mainstream methods. Finally, we present a comprehensive analysis of the main research findings and existing limitations in this field, as well as future research prospects and challenges.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"217-243"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Feature Fusion Model for Bridge Appearance Defect Detection 桥梁外观缺陷检测的多尺度特征融合模型
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2022.9020048
Rong Pang;Yan Yang;Aiguo Huang;Yan Liu;Peng Zhang;Guangwu Tang
Although the Faster Region-based Convolutional Neural Network (Faster R-CNN) model has obvious advantages in defect recognition, it still cannot overcome challenging problems, such as time-consuming, small targets, irregular shapes, and strong noise interference in bridge defect detection. To deal with these issues, this paper proposes a novel Multi-scale Feature Fusion (MFF) model for bridge appearance disease detection. First, the Faster R-CNN model adopts Region Of Interest (ROI) pooling, which omits the edge information of the target area, resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects. Therefore, this paper proposes an MFF based on regional feature Aggregation (MFF-A), which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area. Second, the Faster R-CNN model is insensitive to small targets, irregular shapes, and strong noises in bridge defect detection, which results in a long training time and low recognition accuracy. Accordingly, a novel Lightweight MFF (namely MFF-L) model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed, which fuses multi-scale features to shorten the training speed and improve recognition accuracy. Finally, the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.
虽然基于区域的更快卷积神经网络(Faster R-CNN)模型在缺陷识别方面具有明显优势,但它仍然无法克服桥梁缺陷检测中的挑战性问题,如耗时长、目标小、形状不规则、噪声干扰强等。针对这些问题,本文提出了一种新型的多尺度特征融合(MFF)模型,用于桥梁外观病害检测。首先,Faster R-CNN 模型采用感兴趣区域(Region Of Interest,ROI)池化方法,忽略了目标区域的边缘信息,导致在检测和定位桥梁缺陷时存在一定的漏检和不准确性。因此,本文提出了一种基于区域特征聚合的 MFF(MFF-A),它可以降低桥梁缺陷检测的漏检率,提高目标区域的定位精度。其次,在桥梁缺陷检测中,Faster R-CNN 模型对小目标、不规则形状和强噪声不敏感,导致训练时间长、识别精度低。因此,提出了一种新的轻量级 MFF(即 MFF-L)模型,利用轻量级网络 EfficientNetV2 和特征金字塔网络进行桥梁外观缺陷检测,该模型融合了多尺度特征,缩短了训练速度,提高了识别准确率。最后,在桥梁病害数据集和公共计算流体力学数据集上评估了所提方法的有效性。
{"title":"Multi-Scale Feature Fusion Model for Bridge Appearance Defect Detection","authors":"Rong Pang;Yan Yang;Aiguo Huang;Yan Liu;Peng Zhang;Guangwu Tang","doi":"10.26599/BDMA.2022.9020048","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020048","url":null,"abstract":"Although the Faster Region-based Convolutional Neural Network (Faster R-CNN) model has obvious advantages in defect recognition, it still cannot overcome challenging problems, such as time-consuming, small targets, irregular shapes, and strong noise interference in bridge defect detection. To deal with these issues, this paper proposes a novel Multi-scale Feature Fusion (MFF) model for bridge appearance disease detection. First, the Faster R-CNN model adopts Region Of Interest (ROI) pooling, which omits the edge information of the target area, resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects. Therefore, this paper proposes an MFF based on regional feature Aggregation (MFF-A), which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area. Second, the Faster R-CNN model is insensitive to small targets, irregular shapes, and strong noises in bridge defect detection, which results in a long training time and low recognition accuracy. Accordingly, a novel Lightweight MFF (namely MFF-L) model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed, which fuses multi-scale features to shorten the training speed and improve recognition accuracy. Finally, the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372954","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell Consistency Evaluation Method Based on Multiple Unsupervised Learning Algorithms 基于多重无监督学习算法的细胞一致性评价方法
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9010003
Jiang Chang;Xianglong Gu;Jieyun Wu;Debu Zhang
Unsupervised learning algorithms can effectively solve sample imbalance. To address battery consistency anomalies in new energy vehicles, we adopt a variety of unsupervised learning algorithms to evaluate and predict the battery consistency of three vehicles using charging fragment data from actual operating conditions. We extract battery-related features, such as the mean of maximum difference, standard deviation, and entropy of batteries and then apply principal component analysis to reduce the dimensionality and record the amount of preserved information. We then build models through a collection of unsupervised learning algorithms for the anomaly detection of cell consistency faults. We also determine whether unsupervised and supervised learning algorithms can address the battery consistency problem and document the parameter tuning process. In addition, we compare the prediction effectiveness of charging and discharging features modeled individually and in combination, determine the choice of charging and discharging features to be modeled in combination, and visualize the multidimensional data for fault detection. Experimental results show that the unsupervised learning algorithm is effective in visualizing and predicting vehicle core conformance faults, and can accurately predict faults in real time. The “distance-boxplot” algorithm shows the best performance with a prediction accuracy of 80%, a recall rate of 100%, and an F1 of 0.89. The proposed approach can be applied to monitor battery consistency faults in real time and reduce the possibility of disasters arising from consistency faults.
无监督学习算法可以有效解决样本不平衡问题。针对新能源汽车电池一致性异常的问题,我们采用多种无监督学习算法,利用实际运行条件下的充电片段数据,对三种汽车的电池一致性进行评估和预测。我们提取与电池相关的特征,如电池的最大差值均值、标准差和熵,然后应用主成分分析法降低维度并记录保留的信息量。然后,我们通过一系列无监督学习算法建立模型,用于异常检测电池一致性故障。我们还确定了无监督和有监督学习算法是否能解决电池一致性问题,并记录了参数调整过程。此外,我们还比较了单独建模和组合建模的充电和放电特征的预测效果,确定了组合建模的充电和放电特征的选择,并对故障检测的多维数据进行了可视化。实验结果表明,无监督学习算法在可视化和预测车辆核心一致性故障方面效果显著,并能实时准确地预测故障。其中,"距离方框图 "算法表现最佳,预测准确率达 80%,召回率达 100%,F1 为 0.89。所提出的方法可用于实时监控电池一致性故障,降低一致性故障引发灾难的可能性。
{"title":"Cell Consistency Evaluation Method Based on Multiple Unsupervised Learning Algorithms","authors":"Jiang Chang;Xianglong Gu;Jieyun Wu;Debu Zhang","doi":"10.26599/BDMA.2023.9010003","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9010003","url":null,"abstract":"Unsupervised learning algorithms can effectively solve sample imbalance. To address battery consistency anomalies in new energy vehicles, we adopt a variety of unsupervised learning algorithms to evaluate and predict the battery consistency of three vehicles using charging fragment data from actual operating conditions. We extract battery-related features, such as the mean of maximum difference, standard deviation, and entropy of batteries and then apply principal component analysis to reduce the dimensionality and record the amount of preserved information. We then build models through a collection of unsupervised learning algorithms for the anomaly detection of cell consistency faults. We also determine whether unsupervised and supervised learning algorithms can address the battery consistency problem and document the parameter tuning process. In addition, we compare the prediction effectiveness of charging and discharging features modeled individually and in combination, determine the choice of charging and discharging features to be modeled in combination, and visualize the multidimensional data for fault detection. Experimental results show that the unsupervised learning algorithm is effective in visualizing and predicting vehicle core conformance faults, and can accurately predict faults in real time. The “distance-boxplot” algorithm shows the best performance with a prediction accuracy of 80%, a recall rate of 100%, and an F1 of 0.89. The proposed approach can be applied to monitor battery consistency faults in real time and reduce the possibility of disasters arising from consistency faults.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"42-54"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372956","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building a High-Performance Graph Storage on Top of Tree-Structured Key-Value Stores 在树状结构键值存储基础上构建高性能图存储
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020015
Heng Lin;Zhiyong Wang;Shipeng Qi;Xiaowei Zhu;Chuntao Hong;Wenguang Chen;Yingwei Luo
Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications, including financial risk assessment, commodity recommendation, and data lineage tracking. While the principles and design of these databases have been the subject of some investigation, there remains a lack of comprehensive examination of aspects such as storage layout, query language, and deployment. The present study focuses on the design and implementation of graph storage layout, with a particular emphasis on tree-structured key-value stores. We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph, a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System (GDBMS). Additionally, TuGraph demonstrates superior performance in the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) interactive benchmark.
图数据库已在各行各业得到广泛应用,并被用于一系列应用中,包括金融风险评估、商品推荐和数据脉络跟踪。虽然这些数据库的原理和设计已成为一些研究的主题,但仍缺乏对存储布局、查询语言和部署等方面的全面研究。本研究的重点是图存储布局的设计与实现,特别强调树形结构的键值存储。我们还研究了图存储层的不同设计选择,并通过开发高效的单机图数据库TuGraph来展示我们的研究成果,TuGraph的性能明显优于著名的图数据库管理系统(GDBMS)。此外,TuGraph 在关联数据基准委员会(LDBC)的社交网络基准(SNB)交互基准中表现出了卓越的性能。
{"title":"Building a High-Performance Graph Storage on Top of Tree-Structured Key-Value Stores","authors":"Heng Lin;Zhiyong Wang;Shipeng Qi;Xiaowei Zhu;Chuntao Hong;Wenguang Chen;Yingwei Luo","doi":"10.26599/BDMA.2023.9020015","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020015","url":null,"abstract":"Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications, including financial risk assessment, commodity recommendation, and data lineage tracking. While the principles and design of these databases have been the subject of some investigation, there remains a lack of comprehensive examination of aspects such as storage layout, query language, and deployment. The present study focuses on the design and implementation of graph storage layout, with a particular emphasis on tree-structured key-value stores. We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph, a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System (GDBMS). Additionally, TuGraph demonstrates superior performance in the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) interactive benchmark.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"156-170"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-Based CNN Fusion Model for Emotion Recognition During Walking Using Discrete Wavelet Transform on EEG and Inertial Signals 基于注意力的 CNN 融合模型,利用离散小波变换对脑电图和惯性信号进行行走过程中的情绪识别
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020018
Yan Zhao;Ming Guo;Xiangyong Chen;Jianqiang Sun;Jianlong Qiu
Walking as a unique biometric tool conveys important information for emotion recognition. Individuals in different emotional states exhibit distinct walking patterns. For this purpose, this paper proposes a novel approach to recognizing emotion during walking using electroencephalogram (EEG) and inertial signals. Accurate recognition of emotion is achieved by training in an end-to-end deep learning fashion and taking into account multi-modal fusion. Subjects wear virtual reality head-mounted display (VR-HMD) equipment to immerse in strong emotions during walking. VR environment shows excellent imitation and experience ability, which plays an important role in awakening and changing emotions. In addition, the multi-modal signals acquired from EEG and inertial sensors are separately represented as virtual emotion images by discrete wavelet transform (DWT). These serve as input to the attention-based convolutional neural network (CNN) fusion model. The designed network structure is simple and lightweight while integrating the channel attention mechanism to extract and enhance features. To effectively improve the performance of the recognition system, the proposed decision fusion algorithm combines Critic method and majority voting strategy to determine the weight values that affect the final decision results. An investigation is made on the effect of diverse mother wavelet types and wavelet decomposition levels on model performance which indicates that the 2.2-order reverse biorthogonal (rbio2.2) wavelet with two-level decomposition has the best recognition performance. Comparative experiment results show that the proposed method outperforms other existing state-of-the-art works with an accuracy of 98.73%.
行走作为一种独特的生物识别工具,为情绪识别提供了重要信息。不同情绪状态下的个体会表现出不同的行走模式。为此,本文提出了一种利用脑电图(EEG)和惯性信号识别步行过程中情绪的新方法。通过端到端深度学习训练并考虑多模态融合,实现了对情绪的准确识别。受试者佩戴虚拟现实头戴式显示器(VR-HMD)设备,在行走过程中沉浸在强烈的情绪中。VR 环境显示出卓越的模仿和体验能力,在唤醒和改变情绪方面发挥着重要作用。此外,通过离散小波变换(DWT)将从脑电图和惯性传感器获取的多模态信号分别表示为虚拟情绪图像。这些都是基于注意力的卷积神经网络(CNN)融合模型的输入。所设计的网络结构简单轻便,同时整合了通道注意力机制,以提取和增强特征。为了有效提高识别系统的性能,所提出的决策融合算法结合了批判法和多数投票策略,以确定影响最终决策结果的权重值。研究了不同母小波类型和小波分解级别对模型性能的影响,结果表明,采用两级分解的 2.2 阶反向双杂交(rbio2.2)小波具有最佳的识别性能。对比实验结果表明,所提出的方法优于其他现有的先进方法,准确率达到 98.73%。
{"title":"Attention-Based CNN Fusion Model for Emotion Recognition During Walking Using Discrete Wavelet Transform on EEG and Inertial Signals","authors":"Yan Zhao;Ming Guo;Xiangyong Chen;Jianqiang Sun;Jianlong Qiu","doi":"10.26599/BDMA.2023.9020018","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020018","url":null,"abstract":"Walking as a unique biometric tool conveys important information for emotion recognition. Individuals in different emotional states exhibit distinct walking patterns. For this purpose, this paper proposes a novel approach to recognizing emotion during walking using electroencephalogram (EEG) and inertial signals. Accurate recognition of emotion is achieved by training in an end-to-end deep learning fashion and taking into account multi-modal fusion. Subjects wear virtual reality head-mounted display (VR-HMD) equipment to immerse in strong emotions during walking. VR environment shows excellent imitation and experience ability, which plays an important role in awakening and changing emotions. In addition, the multi-modal signals acquired from EEG and inertial sensors are separately represented as virtual emotion images by discrete wavelet transform (DWT). These serve as input to the attention-based convolutional neural network (CNN) fusion model. The designed network structure is simple and lightweight while integrating the channel attention mechanism to extract and enhance features. To effectively improve the performance of the recognition system, the proposed decision fusion algorithm combines Critic method and majority voting strategy to determine the weight values that affect the final decision results. An investigation is made on the effect of diverse mother wavelet types and wavelet decomposition levels on model performance which indicates that the 2.2-order reverse biorthogonal (rbio2.2) wavelet with two-level decomposition has the best recognition performance. Comparative experiment results show that the proposed method outperforms other existing state-of-the-art works with an accuracy of 98.73%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"188-204"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Call for Papers: Special Issue on Big Data Computing for Cyber Physical Social Intelligence 征稿:网络物理社会智能大数据计算特刊
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020031
{"title":"Call for Papers: Special Issue on Big Data Computing for Cyber Physical Social Intelligence","authors":"","doi":"10.26599/BDMA.2023.9020031","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020031","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"245-245"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372960","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Mining and Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1