Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020007
Analysis of molecular mechanisms that lead to the development of various types of tumors is essential for biology and medicine, because it may help to find new therapeutic opportunities for cancer treatment and cure including personalized treatment approaches. One of the pathways known to be important for the development of neoplastic diseases and pathological processes is the Hedgehog signaling pathway that normally controls human embryonic development. Systematic accumulation of various types of biological data, including interactions between proteins, regulation of genes transcription, proteomics, and metabolomics experiments results, allows the application of computational analysis of these big data for identification of key molecular mechanisms of certain diseases and pathologies and promising therapeutic targets. The aim of this study is to develop a computational approach for revealing associations between human proteins and genes interacting with the Hedgehog pathway components, as well as for identifying their roles in the development of various types of tumors. We automatically collect sets of abstract texts from the NCBI PubMed bibliographic database. For recognition of the Hedgehog pathway proteins and genes and neoplastic diseases we use a dictionary-based named entity recognition approach, while for all other proteins and genes machine learning method is used. For association extraction, we develop a set of semantic rules. We complete the results of the text analysis with the gene set enrichment analysis. The identified key pathways that may influence the Hedgehog pathway and their roles in tumor development are then verified using the information in the literature.
{"title":"Identification of Proteins and Genes Associated with Hedgehog Signaling Pathway Involved in Neoplasm Formation Using Text-Mining Approach","authors":"","doi":"10.26599/BDMA.2023.9020007","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020007","url":null,"abstract":"Analysis of molecular mechanisms that lead to the development of various types of tumors is essential for biology and medicine, because it may help to find new therapeutic opportunities for cancer treatment and cure including personalized treatment approaches. One of the pathways known to be important for the development of neoplastic diseases and pathological processes is the Hedgehog signaling pathway that normally controls human embryonic development. Systematic accumulation of various types of biological data, including interactions between proteins, regulation of genes transcription, proteomics, and metabolomics experiments results, allows the application of computational analysis of these big data for identification of key molecular mechanisms of certain diseases and pathologies and promising therapeutic targets. The aim of this study is to develop a computational approach for revealing associations between human proteins and genes interacting with the Hedgehog pathway components, as well as for identifying their roles in the development of various types of tumors. We automatically collect sets of abstract texts from the NCBI PubMed bibliographic database. For recognition of the Hedgehog pathway proteins and genes and neoplastic diseases we use a dictionary-based named entity recognition approach, while for all other proteins and genes machine learning method is used. For association extraction, we develop a set of semantic rules. We complete the results of the text analysis with the gene set enrichment analysis. The identified key pathways that may influence the Hedgehog pathway and their roles in tumor development are then verified using the information in the literature.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"25-93"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020019
Aafaq Mohi ud din;Shaima Qureshi
Graph Neural Networks (GNNs) have become a widely used tool for learning and analyzing data on graph structures, largely due to their ability to preserve graph structure and properties via graph representation learning. However, the effect of depth on the performance of GNNs, particularly isotropic and anisotropic models, remains an active area of research. This study presents a comprehensive exploration of the impact of depth on GNNs, with a focus on the phenomena of over-smoothing and the bottleneck effect in deep graph neural networks. Our research investigates the tradeoff between depth and performance, revealing that increasing depth can lead to over-smoothing and a decrease in performance due to the bottleneck effect. We also examine the impact of node degrees on classification accuracy, finding that nodes with low degrees can pose challenges for accurate classification. Our experiments use several benchmark datasets and a range of evaluation metrics to compare isotropic and anisotropic GNNs of varying depths, also explore the scalability of these models. Our findings provide valuable insights into the design of deep GNNs and offer potential avenues for future research to improve their performance.
{"title":"Limits of Depth: Over-Smoothing and Over-Squashing in GNNs","authors":"Aafaq Mohi ud din;Shaima Qureshi","doi":"10.26599/BDMA.2023.9020019","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020019","url":null,"abstract":"Graph Neural Networks (GNNs) have become a widely used tool for learning and analyzing data on graph structures, largely due to their ability to preserve graph structure and properties via graph representation learning. However, the effect of depth on the performance of GNNs, particularly isotropic and anisotropic models, remains an active area of research. This study presents a comprehensive exploration of the impact of depth on GNNs, with a focus on the phenomena of over-smoothing and the bottleneck effect in deep graph neural networks. Our research investigates the tradeoff between depth and performance, revealing that increasing depth can lead to over-smoothing and a decrease in performance due to the bottleneck effect. We also examine the impact of node degrees on classification accuracy, finding that nodes with low degrees can pose challenges for accurate classification. Our experiments use several benchmark datasets and a range of evaluation metrics to compare isotropic and anisotropic GNNs of varying depths, also explore the scalability of these models. Our findings provide valuable insights into the design of deep GNNs and offer potential avenues for future research to improve their performance.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"205-216"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372997","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020017
Shan Zhang;Qinkai Jiang;Hao Li;Bin Cao;Jing Fan
Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time, consequently helping them avoid congestion and accidents to a certain extent. However, the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency, which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure. To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining, herein, we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data (PURP). First, to ensure prediction accuracy, PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition (LPR) data as effective characteristics. Subsequently, to utilize the recent data without retraining the model online, PURP uses the nonparametric method k-Nearest Neighbor (namely KNN) as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online. The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.
准确高效的城市交通流量预测可以帮助驾驶员实时识别道路交通状况,从而在一定程度上帮助驾驶员避免拥堵和事故。然而,现有的城市交通流量实时预测方法只注重提高模型预测精度或效率,而忽视了训练效率,导致预测系统缺乏可扩展性,无法将实时交通流量纳入训练过程。为了在考虑最新历史数据的同时进行准确、实时的城市交通流量预测,避免耗时的在线再训练,我们在本文中提出了一种可扩展的基于车牌识别数据的短期URban交通流量实时预测系统(PURP)。首先,为确保预测的准确性,PURP 从车牌识别(LPR)数据中构建了交通流量预测的时空背景,作为有效特征。随后,为了利用近期数据而无需在线重新训练模型,PURP 使用非参数方法 k-近邻(即 KNN)作为预测框架,因为 KNN 可以有效识别前 k 个最相似的时空背景,并基于这些背景进行预测,而无需耗时的在线模型重新训练。实验结果表明,随着预测周期的延长,PURP 仍能保持较高的预测效率。
{"title":"PURP: A Scalable System for Predicting Short-Term Urban Traffic Flow Based on License Plate Recognition Data","authors":"Shan Zhang;Qinkai Jiang;Hao Li;Bin Cao;Jing Fan","doi":"10.26599/BDMA.2023.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020017","url":null,"abstract":"Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time, consequently helping them avoid congestion and accidents to a certain extent. However, the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency, which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure. To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining, herein, we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data (PURP). First, to ensure prediction accuracy, PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition (LPR) data as effective characteristics. Subsequently, to utilize the recent data without retraining the model online, PURP uses the nonparametric method k-Nearest Neighbor (namely KNN) as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online. The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"171-187"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372996","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020005
Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour
Online Social Networks (OSNs) are based on the sharing of different types of information and on various interactions (comments, reactions, and sharing). One of these important actions is the emotional reaction to the content. The diversity of reaction types available on Facebook (namely FB) enables users to express their feelings, and its traceability creates and enriches the users' emotional identity in the virtual world. This paper is based on the analysis of 119875012 FB reactions (Like, Love, Haha, Wow, Sad, Angry, Thankful, and Pride) made at multiple levels (publications, comments, and sub-comments) to study and classify the users' emotional behavior, visualize the distribution of different types of reactions, and analyze the gender impact on emotion generation. All of these can be achieved by addressing these research questions: who reacts the most? Which emotion is the most expressed?
{"title":"Gender-Based Analysis of User Reactions to Facebook Posts","authors":"Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour","doi":"10.26599/BDMA.2023.9020005","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020005","url":null,"abstract":"Online Social Networks (OSNs) are based on the sharing of different types of information and on various interactions (comments, reactions, and sharing). One of these important actions is the emotional reaction to the content. The diversity of reaction types available on Facebook (namely FB) enables users to express their feelings, and its traceability creates and enriches the users' emotional identity in the virtual world. This paper is based on the analysis of 119875012 FB reactions (Like, Love, Haha, Wow, Sad, Angry, Thankful, and Pride) made at multiple levels (publications, comments, and sub-comments) to study and classify the users' emotional behavior, visualize the distribution of different types of reactions, and analyze the gender impact on emotion generation. All of these can be achieved by addressing these research questions: who reacts the most? Which emotion is the most expressed?","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"75-86"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372951","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social networks are inevitable parts of our daily life, where an unprecedented amount of complex data corresponding to a diverse range of applications are generated. As such, it is imperative to conduct research on social events and patterns from the perspectives of conventional sociology to optimize services that originate from social networks. Event tracking in social networks finds various applications, such as network security and societal governance, which involves analyzing data generated by user groups on social networks in real time. Moreover, as deep learning techniques continue to advance and make important breakthroughs in various fields, researchers are using this technology to progressively optimize the effectiveness of Event Detection (ED) and tracking algorithms. In this regard, this paper presents an in-depth comprehensive review of the concept and methods involved in ED and tracking in social networks. We introduce mainstream event tracking methods, which involve three primary technical steps: ED, event propagation, and event evolution. Finally, we introduce benchmark datasets and evaluation metrics for ED and tracking, which allow comparative analysis on the performance of mainstream methods. Finally, we present a comprehensive analysis of the main research findings and existing limitations in this field, as well as future research prospects and challenges.
社交网络是我们日常生活中不可避免的一部分,在社交网络中产生了前所未有的大量复杂数据,这些数据与各种不同的应用相对应。因此,必须从传统社会学的角度对社交事件和模式进行研究,以优化源自社交网络的服务。社交网络中的事件追踪有多种应用,如网络安全和社会治理,其中涉及对社交网络上用户群体产生的数据进行实时分析。此外,随着深度学习技术的不断进步和在各个领域的重要突破,研究人员正在利用这一技术逐步优化事件检测(ED)和跟踪算法的有效性。为此,本文对社交网络中事件检测和跟踪的概念和方法进行了深入全面的综述。我们介绍了主流的事件跟踪方法,其中涉及三个主要技术步骤:ED、事件传播和事件演化。最后,我们介绍了 ED 和跟踪的基准数据集和评估指标,以便对主流方法的性能进行比较分析。最后,我们全面分析了该领域的主要研究成果和现有局限性,以及未来的研究前景和挑战。
{"title":"A Survey on Event Tracking in Social Media Data Streams","authors":"Zixuan Han;Leilei Shi;Lu Liu;Liang Jiang;Jiawei Fang;Fanyuan Lin;Jinjuan Zhang;John Panneerselvam;Nick Antonopoulos","doi":"10.26599/BDMA.2023.9020021","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020021","url":null,"abstract":"Social networks are inevitable parts of our daily life, where an unprecedented amount of complex data corresponding to a diverse range of applications are generated. As such, it is imperative to conduct research on social events and patterns from the perspectives of conventional sociology to optimize services that originate from social networks. Event tracking in social networks finds various applications, such as network security and societal governance, which involves analyzing data generated by user groups on social networks in real time. Moreover, as deep learning techniques continue to advance and make important breakthroughs in various fields, researchers are using this technology to progressively optimize the effectiveness of Event Detection (ED) and tracking algorithms. In this regard, this paper presents an in-depth comprehensive review of the concept and methods involved in ED and tracking in social networks. We introduce mainstream event tracking methods, which involve three primary technical steps: ED, event propagation, and event evolution. Finally, we introduce benchmark datasets and evaluation metrics for ED and tracking, which allow comparative analysis on the performance of mainstream methods. Finally, we present a comprehensive analysis of the main research findings and existing limitations in this field, as well as future research prospects and challenges.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"217-243"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2022.9020048
Rong Pang;Yan Yang;Aiguo Huang;Yan Liu;Peng Zhang;Guangwu Tang
Although the Faster Region-based Convolutional Neural Network (Faster R-CNN) model has obvious advantages in defect recognition, it still cannot overcome challenging problems, such as time-consuming, small targets, irregular shapes, and strong noise interference in bridge defect detection. To deal with these issues, this paper proposes a novel Multi-scale Feature Fusion (MFF) model for bridge appearance disease detection. First, the Faster R-CNN model adopts Region Of Interest (ROI) pooling, which omits the edge information of the target area, resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects. Therefore, this paper proposes an MFF based on regional feature Aggregation (MFF-A), which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area. Second, the Faster R-CNN model is insensitive to small targets, irregular shapes, and strong noises in bridge defect detection, which results in a long training time and low recognition accuracy. Accordingly, a novel Lightweight MFF (namely MFF-L) model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed, which fuses multi-scale features to shorten the training speed and improve recognition accuracy. Finally, the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.
{"title":"Multi-Scale Feature Fusion Model for Bridge Appearance Defect Detection","authors":"Rong Pang;Yan Yang;Aiguo Huang;Yan Liu;Peng Zhang;Guangwu Tang","doi":"10.26599/BDMA.2022.9020048","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020048","url":null,"abstract":"Although the Faster Region-based Convolutional Neural Network (Faster R-CNN) model has obvious advantages in defect recognition, it still cannot overcome challenging problems, such as time-consuming, small targets, irregular shapes, and strong noise interference in bridge defect detection. To deal with these issues, this paper proposes a novel Multi-scale Feature Fusion (MFF) model for bridge appearance disease detection. First, the Faster R-CNN model adopts Region Of Interest (ROI) pooling, which omits the edge information of the target area, resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects. Therefore, this paper proposes an MFF based on regional feature Aggregation (MFF-A), which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area. Second, the Faster R-CNN model is insensitive to small targets, irregular shapes, and strong noises in bridge defect detection, which results in a long training time and low recognition accuracy. Accordingly, a novel Lightweight MFF (namely MFF-L) model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed, which fuses multi-scale features to shorten the training speed and improve recognition accuracy. Finally, the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372954","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9010003
Jiang Chang;Xianglong Gu;Jieyun Wu;Debu Zhang
Unsupervised learning algorithms can effectively solve sample imbalance. To address battery consistency anomalies in new energy vehicles, we adopt a variety of unsupervised learning algorithms to evaluate and predict the battery consistency of three vehicles using charging fragment data from actual operating conditions. We extract battery-related features, such as the mean of maximum difference, standard deviation, and entropy of batteries and then apply principal component analysis to reduce the dimensionality and record the amount of preserved information. We then build models through a collection of unsupervised learning algorithms for the anomaly detection of cell consistency faults. We also determine whether unsupervised and supervised learning algorithms can address the battery consistency problem and document the parameter tuning process. In addition, we compare the prediction effectiveness of charging and discharging features modeled individually and in combination, determine the choice of charging and discharging features to be modeled in combination, and visualize the multidimensional data for fault detection. Experimental results show that the unsupervised learning algorithm is effective in visualizing and predicting vehicle core conformance faults, and can accurately predict faults in real time. The “distance-boxplot” algorithm shows the best performance with a prediction accuracy of 80%, a recall rate of 100%, and an F1 of 0.89. The proposed approach can be applied to monitor battery consistency faults in real time and reduce the possibility of disasters arising from consistency faults.
{"title":"Cell Consistency Evaluation Method Based on Multiple Unsupervised Learning Algorithms","authors":"Jiang Chang;Xianglong Gu;Jieyun Wu;Debu Zhang","doi":"10.26599/BDMA.2023.9010003","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9010003","url":null,"abstract":"Unsupervised learning algorithms can effectively solve sample imbalance. To address battery consistency anomalies in new energy vehicles, we adopt a variety of unsupervised learning algorithms to evaluate and predict the battery consistency of three vehicles using charging fragment data from actual operating conditions. We extract battery-related features, such as the mean of maximum difference, standard deviation, and entropy of batteries and then apply principal component analysis to reduce the dimensionality and record the amount of preserved information. We then build models through a collection of unsupervised learning algorithms for the anomaly detection of cell consistency faults. We also determine whether unsupervised and supervised learning algorithms can address the battery consistency problem and document the parameter tuning process. In addition, we compare the prediction effectiveness of charging and discharging features modeled individually and in combination, determine the choice of charging and discharging features to be modeled in combination, and visualize the multidimensional data for fault detection. Experimental results show that the unsupervised learning algorithm is effective in visualizing and predicting vehicle core conformance faults, and can accurately predict faults in real time. The “distance-boxplot” algorithm shows the best performance with a prediction accuracy of 80%, a recall rate of 100%, and an F1 of 0.89. The proposed approach can be applied to monitor battery consistency faults in real time and reduce the possibility of disasters arising from consistency faults.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"42-54"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372956","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020015
Heng Lin;Zhiyong Wang;Shipeng Qi;Xiaowei Zhu;Chuntao Hong;Wenguang Chen;Yingwei Luo
Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications, including financial risk assessment, commodity recommendation, and data lineage tracking. While the principles and design of these databases have been the subject of some investigation, there remains a lack of comprehensive examination of aspects such as storage layout, query language, and deployment. The present study focuses on the design and implementation of graph storage layout, with a particular emphasis on tree-structured key-value stores. We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph, a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System (GDBMS). Additionally, TuGraph demonstrates superior performance in the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) interactive benchmark.
{"title":"Building a High-Performance Graph Storage on Top of Tree-Structured Key-Value Stores","authors":"Heng Lin;Zhiyong Wang;Shipeng Qi;Xiaowei Zhu;Chuntao Hong;Wenguang Chen;Yingwei Luo","doi":"10.26599/BDMA.2023.9020015","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020015","url":null,"abstract":"Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications, including financial risk assessment, commodity recommendation, and data lineage tracking. While the principles and design of these databases have been the subject of some investigation, there remains a lack of comprehensive examination of aspects such as storage layout, query language, and deployment. The present study focuses on the design and implementation of graph storage layout, with a particular emphasis on tree-structured key-value stores. We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph, a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System (GDBMS). Additionally, TuGraph demonstrates superior performance in the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) interactive benchmark.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"156-170"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020018
Yan Zhao;Ming Guo;Xiangyong Chen;Jianqiang Sun;Jianlong Qiu
Walking as a unique biometric tool conveys important information for emotion recognition. Individuals in different emotional states exhibit distinct walking patterns. For this purpose, this paper proposes a novel approach to recognizing emotion during walking using electroencephalogram (EEG) and inertial signals. Accurate recognition of emotion is achieved by training in an end-to-end deep learning fashion and taking into account multi-modal fusion. Subjects wear virtual reality head-mounted display (VR-HMD) equipment to immerse in strong emotions during walking. VR environment shows excellent imitation and experience ability, which plays an important role in awakening and changing emotions. In addition, the multi-modal signals acquired from EEG and inertial sensors are separately represented as virtual emotion images by discrete wavelet transform (DWT). These serve as input to the attention-based convolutional neural network (CNN) fusion model. The designed network structure is simple and lightweight while integrating the channel attention mechanism to extract and enhance features. To effectively improve the performance of the recognition system, the proposed decision fusion algorithm combines Critic method and majority voting strategy to determine the weight values that affect the final decision results. An investigation is made on the effect of diverse mother wavelet types and wavelet decomposition levels on model performance which indicates that the 2.2-order reverse biorthogonal (rbio2.2) wavelet with two-level decomposition has the best recognition performance. Comparative experiment results show that the proposed method outperforms other existing state-of-the-art works with an accuracy of 98.73%.
{"title":"Attention-Based CNN Fusion Model for Emotion Recognition During Walking Using Discrete Wavelet Transform on EEG and Inertial Signals","authors":"Yan Zhao;Ming Guo;Xiangyong Chen;Jianqiang Sun;Jianlong Qiu","doi":"10.26599/BDMA.2023.9020018","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020018","url":null,"abstract":"Walking as a unique biometric tool conveys important information for emotion recognition. Individuals in different emotional states exhibit distinct walking patterns. For this purpose, this paper proposes a novel approach to recognizing emotion during walking using electroencephalogram (EEG) and inertial signals. Accurate recognition of emotion is achieved by training in an end-to-end deep learning fashion and taking into account multi-modal fusion. Subjects wear virtual reality head-mounted display (VR-HMD) equipment to immerse in strong emotions during walking. VR environment shows excellent imitation and experience ability, which plays an important role in awakening and changing emotions. In addition, the multi-modal signals acquired from EEG and inertial sensors are separately represented as virtual emotion images by discrete wavelet transform (DWT). These serve as input to the attention-based convolutional neural network (CNN) fusion model. The designed network structure is simple and lightweight while integrating the channel attention mechanism to extract and enhance features. To effectively improve the performance of the recognition system, the proposed decision fusion algorithm combines Critic method and majority voting strategy to determine the weight values that affect the final decision results. An investigation is made on the effect of diverse mother wavelet types and wavelet decomposition levels on model performance which indicates that the 2.2-order reverse biorthogonal (rbio2.2) wavelet with two-level decomposition has the best recognition performance. Comparative experiment results show that the proposed method outperforms other existing state-of-the-art works with an accuracy of 98.73%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"188-204"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-25DOI: 10.26599/BDMA.2023.9020031
{"title":"Call for Papers: Special Issue on Big Data Computing for Cyber Physical Social Intelligence","authors":"","doi":"10.26599/BDMA.2023.9020031","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020031","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"245-245"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372960","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}