Pub Date : 2025-12-01DOI: 10.1177/2167647X251406607
Yuping Yan, Hanyang Xie, Liang Chen, You Wen, Huaquan Su
Data in power grid digital operation exhibit multisource heterogeneous characteristics, resulting in low integration efficiency and slow anomaly detection response. To address this, this paper proposes a method for power grid digital operation data integration based on K-medoids clustering. The basic service layer utilizes an Field Programmable Gate Array parallel architecture. This enables millisecond-level synchronous acquisition and dynamic preprocessing of multisource data, such as mechanical vibration, partial discharge signals, and temperature. The implementation is based on the analysis of the power grid digital operation structure. The data are then fed back to the cloud service layer, which, through business integration services, data analysis, and data access services, performs data filtering and analysis. Subsequently, the data are input to the application layer via the database server. The application layer employs a K-medoids clustering method that introduces a density-weighted Euclidean distance metric and an adaptive centroid selection strategy, significantly enhancing the clustering performance of multisource data. In particular, the proposed architecture supports real-time data processing and can be extended to cross-modal scenarios, including integration with speech-to-text systems in power grid monitoring. By aligning with low-latency neural network principles, this method facilitates timely decision-making in intelligent operation environments. Experiments confirm the method's efficacy. It acquires and integrates multisource heterogeneous power grid digital operation data effectively. The data throughput of different power grid digital operation data sources all exceed 110 MB/s. The silhouette coefficient of the integrated data sets is greater than 0.91, indicating that the integration of power grid digital operation data using this method exhibits good separability and reliability, enabling rapid detection of data anomalies within the power grid, thus laying a solid foundation for the operation and maintenance management of power grid digital operation.
{"title":"Method for Power Grid Digital Operation Data Integration Based on K-Medoids Clustering with Support for Real-Time Cross-Modal Applications.","authors":"Yuping Yan, Hanyang Xie, Liang Chen, You Wen, Huaquan Su","doi":"10.1177/2167647X251406607","DOIUrl":"https://doi.org/10.1177/2167647X251406607","url":null,"abstract":"<p><p>Data in power grid digital operation exhibit multisource heterogeneous characteristics, resulting in low integration efficiency and slow anomaly detection response. To address this, this paper proposes a method for power grid digital operation data integration based on K-medoids clustering. The basic service layer utilizes an Field Programmable Gate Array parallel architecture. This enables millisecond-level synchronous acquisition and dynamic preprocessing of multisource data, such as mechanical vibration, partial discharge signals, and temperature. The implementation is based on the analysis of the power grid digital operation structure. The data are then fed back to the cloud service layer, which, through business integration services, data analysis, and data access services, performs data filtering and analysis. Subsequently, the data are input to the application layer via the database server. The application layer employs a K-medoids clustering method that introduces a density-weighted Euclidean distance metric and an adaptive centroid selection strategy, significantly enhancing the clustering performance of multisource data. In particular, the proposed architecture supports real-time data processing and can be extended to cross-modal scenarios, including integration with speech-to-text systems in power grid monitoring. By aligning with low-latency neural network principles, this method facilitates timely decision-making in intelligent operation environments. Experiments confirm the method's efficacy. It acquires and integrates multisource heterogeneous power grid digital operation data effectively. The data throughput of different power grid digital operation data sources all exceed 110 MB/s. The silhouette coefficient of the integrated data sets is greater than 0.91, indicating that the integration of power grid digital operation data using this method exhibits good separability and reliability, enabling rapid detection of data anomalies within the power grid, thus laying a solid foundation for the operation and maintenance management of power grid digital operation.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"13 6","pages":"453-470"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soybeans are a high-quality vegetable protein resource and a fundamental strategic material integral to the national economy and public livelihood. To investigate the research status of soybean quality evaluation, this study analyzes relevant literature from Web of Science and China Knowledge Network (2000-2024). Using bibliometric methods with Excel and VOSviewer, we examined publication years, keywords, authors, sources, countries/regions, and institutions, generating visualizations to intuitively illustrate the field's developmental status. Results indicate that over the past 25 years, soybean quality evaluation research has emerged as a focal point in crop science, with institutions predominantly located in China and the United States. Key journals in this domain include Food Chemistry, Frontiers in Plant Science, and Soybean Science, among others. Research primarily focuses on soybean physical characteristics and the component-quality relationship. Interdisciplinary advancements have positioned spectral analysis, intelligent systems, and multitechnology fusion as innovative frontiers in this field. These findings enhance researchers' understanding of current trends and support evidence-based decision-making in soybean quality evaluation.
大豆是一种优质植物蛋白资源,是关系国计民生的基础性战略物资。为了了解大豆品质评价的研究现状,本研究分析了Web of Science和中国知识网(2000-2024)的相关文献。利用文献计量学方法,结合Excel和VOSviewer,对论文的出版年份、关键词、作者、来源、国家/地区和机构进行了统计分析,生成了可视化图,直观地说明了该领域的发展状况。结果表明,在过去的25年中,大豆质量评价研究已成为作物科学的一个焦点,研究机构主要集中在中国和美国。该领域的主要期刊包括《食品化学》、《植物科学前沿》和《大豆科学》等。研究主要集中在大豆的物理特性和成分与品质的关系。跨学科的进步将光谱分析、智能系统和多技术融合定位为该领域的创新前沿。这些发现增强了研究人员对当前趋势的理解,并为大豆质量评价的循证决策提供了支持。
{"title":"Analysis on Research Situation of Soybean Quality Evaluation Based on Bibliometrics.","authors":"Yanxia Gao, Pengju Tang, Xuhong Tang, Dong Wang, Jiaqi Luo, JiaDong Wu","doi":"10.1177/2167647X251399053","DOIUrl":"10.1177/2167647X251399053","url":null,"abstract":"<p><p>Soybeans are a high-quality vegetable protein resource and a fundamental strategic material integral to the national economy and public livelihood. To investigate the research status of soybean quality evaluation, this study analyzes relevant literature from Web of Science and China Knowledge Network (2000-2024). Using bibliometric methods with Excel and VOSviewer, we examined publication years, keywords, authors, sources, countries/regions, and institutions, generating visualizations to intuitively illustrate the field's developmental status. Results indicate that over the past 25 years, soybean quality evaluation research has emerged as a focal point in crop science, with institutions predominantly located in China and the United States. Key journals in this domain include <i>Food Chemistry</i>, <i>Frontiers in Plant Science</i>, and <i>Soybean Science</i>, among others. Research primarily focuses on soybean physical characteristics and the component-quality relationship. Interdisciplinary advancements have positioned spectral analysis, intelligent systems, and multitechnology fusion as innovative frontiers in this field. These findings enhance researchers' understanding of current trends and support evidence-based decision-making in soybean quality evaluation.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"487-496"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1177/2167647X251405797
Qiong He, Xueqing Guo
This study aims to enhance the prediction precision of aircraft engine remaining useful life (RUL) by overcoming common challenges in current models, such as ineffective feature extraction and insufficient modeling of long-term temporal dependencies. We propose a novel multilayer hybrid architecture that combines bidirectional long short-term memory (BiLSTM) and gated recurrent unit (GRU) networks, augmented with an attention mechanism to enhance the model's focus on informative temporal patterns. In this framework, raw time series data are initially processed by the BiLSTM to extract bidirectional features associated with engine health conditions. The GRU network is subsequently used to effectively model long-range dependencies, thereby enriching the temporal representation. An adaptive attention module is included to assign varying importance to different features, allowing the model to focus on key indicators of engine condition. Evaluation results on the FD001 and FD003 datasets show that the model achieves root mean squared error reductions ranging from 8.81% to 30.60% and from 7.48% to 37.96%, validating its performance and robustness in RUL forecasting. In comparison with conventional BiLSTM and GRU models, the proposed BiLSTM-GRU-Attention architecture integrates attention-based feature weighting with a hybrid recurrent framework, thereby offering a concise and effective approach to RUL prediction for aircraft engines.
{"title":"Prediction of Remaining Life of Aircraft Engines Based on BiLSTM-GRU-Attention Model.","authors":"Qiong He, Xueqing Guo","doi":"10.1177/2167647X251405797","DOIUrl":"https://doi.org/10.1177/2167647X251405797","url":null,"abstract":"<p><p>This study aims to enhance the prediction precision of aircraft engine remaining useful life (RUL) by overcoming common challenges in current models, such as ineffective feature extraction and insufficient modeling of long-term temporal dependencies. We propose a novel multilayer hybrid architecture that combines bidirectional long short-term memory (BiLSTM) and gated recurrent unit (GRU) networks, augmented with an attention mechanism to enhance the model's focus on informative temporal patterns. In this framework, raw time series data are initially processed by the BiLSTM to extract bidirectional features associated with engine health conditions. The GRU network is subsequently used to effectively model long-range dependencies, thereby enriching the temporal representation. An adaptive attention module is included to assign varying importance to different features, allowing the model to focus on key indicators of engine condition. Evaluation results on the FD001 and FD003 datasets show that the model achieves root mean squared error reductions ranging from 8.81% to 30.60% and from 7.48% to 37.96%, validating its performance and robustness in RUL forecasting. In comparison with conventional BiLSTM and GRU models, the proposed BiLSTM-GRU-Attention architecture integrates attention-based feature weighting with a hybrid recurrent framework, thereby offering a concise and effective approach to RUL prediction for aircraft engines.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"13 6","pages":"471-486"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-18DOI: 10.1177/2167647X251392796
Yang Wang, Tianchun Xiang, Shuai Luo, Yi Gao, Xiangyu Kong
Human activities that generate greenhouse gas emissions pose a significant threat to urban green and sustainable development. Production activities in key industrial sectors are a primary contributor to high urban carbon emissions. Therefore, effectively reducing carbon emissions in these sectors is crucial for achieving urban carbon peak and neutrality goals. Carbon emission monitoring is a critical approach that aids governmental bodies in understanding changes in industrial carbon emissions, thereby supporting decision-making and carbon reduction efforts. However, current industry-oriented carbon monitoring methods suffer from issues such as low frequency, poor accuracy, and inadequate privacy security. To address these challenges, this article proposes a novel privacy-protected "electricity-carbon'' nexus model, long short-term memory with the vertical federated framework (VF-LSTM), to monitor carbon emissions in key urban industries. The vertical federated framework ensures "usable but invisible" privacy protection for multisource data from various participants. The embedded long short-term memory model accurately captures industry-specific carbon emissions. Using data from key industries (steel, petrochemical, chemical, and nonferrous industries), this article constructs and validates the performance of the proposed industry-level carbon emission monitoring model. The results demonstrate that the model has high accuracy and robustness, effectively monitoring industry carbon emissions while protecting data privacy.
{"title":"Monitoring Carbon Emission from Key Industries Based on VF-LSTM Model.","authors":"Yang Wang, Tianchun Xiang, Shuai Luo, Yi Gao, Xiangyu Kong","doi":"10.1177/2167647X251392796","DOIUrl":"10.1177/2167647X251392796","url":null,"abstract":"<p><p>Human activities that generate greenhouse gas emissions pose a significant threat to urban green and sustainable development. Production activities in key industrial sectors are a primary contributor to high urban carbon emissions. Therefore, effectively reducing carbon emissions in these sectors is crucial for achieving urban carbon peak and neutrality goals. Carbon emission monitoring is a critical approach that aids governmental bodies in understanding changes in industrial carbon emissions, thereby supporting decision-making and carbon reduction efforts. However, current industry-oriented carbon monitoring methods suffer from issues such as low frequency, poor accuracy, and inadequate privacy security. To address these challenges, this article proposes a novel privacy-protected \"electricity-carbon'' nexus model, long short-term memory with the vertical federated framework (VF-LSTM), to monitor carbon emissions in key urban industries. The vertical federated framework ensures \"usable but invisible\" privacy protection for multisource data from various participants. The embedded long short-term memory model accurately captures industry-specific carbon emissions. Using data from key industries (steel, petrochemical, chemical, and nonferrous industries), this article constructs and validates the performance of the proposed industry-level carbon emission monitoring model. The results demonstrate that the model has high accuracy and robustness, effectively monitoring industry carbon emissions while protecting data privacy.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"441-452"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-02-28DOI: 10.1089/big.2024.0128
Ikpe Justice Akpan, Rouzbeh Razavi, Asuama A Akpan
Decision sciences (DSC) involves studying complex dynamic systems and processes to aid informed choices subject to constraints in uncertain conditions. It integrates multidisciplinary methods and strategies to evaluate decision engineering processes, identifying alternatives and providing insights toward enhancing prudent decision-making. This study analyzes the evolutionary trends and innovation in DSC education and research trends over the past 25 years. Using metadata from bibliographic records and employing the science mapping method and text analytics, we map and evaluate the thematic, intellectual, and social structures of DSC research. The results identify "knowledge management," "decision support systems," "data envelopment analysis," "simulation," and "artificial intelligence" (AI) as some of the prominent critical skills and knowledge requirements for problem-solving in DSC before and during the period (2000-2024). However, these technologies are evolving significantly in the recent wave of digital transformation, with data analytics frameworks (including techniques such as big data analytics, machine learning, business intelligence, data mining, and information visualization) becoming crucial. DSC education and research continue to mirror the development in practice, with sustainable education through virtual/online learning becoming prominent. Innovative pedagogical approaches/strategies also include computer simulation and games ("play and learn" or "role-playing"). The current era witnesses AI adoption in different forms as conversational Chatbot agent and generative AI (GenAI), such as chat generative pretrained transformer in teaching, learning, and scholarly activities amidst challenges (academic integrity, plagiarism, intellectual property violations, and other ethical and legal issues). Future DSC education must innovatively integrate GenAI into DSC education and address the resulting challenges.
{"title":"Evolutionary Trends in Decision Sciences Education Research from Simulation and Games to Big Data Analytics and Generative Artificial Intelligence.","authors":"Ikpe Justice Akpan, Rouzbeh Razavi, Asuama A Akpan","doi":"10.1089/big.2024.0128","DOIUrl":"10.1089/big.2024.0128","url":null,"abstract":"<p><p>Decision sciences (DSC) involves studying complex dynamic systems and processes to aid informed choices subject to constraints in uncertain conditions. It integrates multidisciplinary methods and strategies to evaluate decision engineering processes, identifying alternatives and providing insights toward enhancing prudent decision-making. This study analyzes the evolutionary trends and innovation in DSC education and research trends over the past 25 years. Using metadata from bibliographic records and employing the science mapping method and text analytics, we map and evaluate the thematic, intellectual, and social structures of DSC research. The results identify \"knowledge management,\" \"decision support systems,\" \"data envelopment analysis,\" \"simulation,\" and \"artificial intelligence\" (AI) as some of the prominent critical skills and knowledge requirements for problem-solving in DSC before and during the period (2000-2024). However, these technologies are evolving significantly in the recent wave of digital transformation, with data analytics frameworks (including techniques such as big data analytics, machine learning, business intelligence, data mining, and information visualization) becoming crucial. DSC education and research continue to mirror the development in practice, with sustainable education through virtual/online learning becoming prominent. Innovative pedagogical approaches/strategies also include computer simulation and games (\"play and learn\" or \"role-playing\"). The current era witnesses AI adoption in different forms as conversational Chatbot agent and generative AI (GenAI), such as chat generative pretrained transformer in teaching, learning, and scholarly activities amidst challenges (academic integrity, plagiarism, intellectual property violations, and other ethical and legal issues). Future DSC education must innovatively integrate GenAI into DSC education and address the resulting challenges.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"416-437"},"PeriodicalIF":2.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-01-10DOI: 10.1089/big.2024.0036
Sofie Goethals, Sandra Matz, Foster Provost, David Martens, Yanou Ramon
Our online lives generate a wealth of behavioral records-digital footprints-which are stored and leveraged by technology platforms. These data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual orientation). We explore the concept of cloaking: allowing users to hide parts of their digital footprints from predictive algorithms, to prevent unwanted inferences. This article addresses two open questions: (i) can cloaking be effective in the longer term, as users continue to generate new digital footprints? And (ii) what is the potential impact of cloaking on the accuracy of desirable inferences? We introduce a novel strategy focused on cloaking "metafeatures" and compare its efficacy against just cloaking the raw footprints. The main findings are (i) while cloaking effectiveness does indeed diminish over time, using metafeatures slows the degradation; (ii) there is a tradeoff between privacy and personalization: cloaking undesired inferences also can inhibit desirable inferences. Furthermore, the metafeature strategy-which yields more stable cloaking-also incurs a larger reduction in desirable inferences.
{"title":"The Impact of Cloaking Digital Footprints on User Privacy and Personalization.","authors":"Sofie Goethals, Sandra Matz, Foster Provost, David Martens, Yanou Ramon","doi":"10.1089/big.2024.0036","DOIUrl":"10.1089/big.2024.0036","url":null,"abstract":"<p><p>Our online lives generate a wealth of behavioral records-<i>digital footprints</i>-which are stored and leveraged by technology platforms. These data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual orientation). We explore the concept of <i>cloaking</i>: allowing users to hide parts of their digital footprints from predictive algorithms, to prevent unwanted inferences. This article addresses two open questions: (i) can cloaking be effective in the longer term, as users continue to generate new digital footprints? And (ii) what is the potential impact of cloaking on the accuracy of <i>desirable</i> inferences? We introduce a novel strategy focused on cloaking \"metafeatures\" and compare its efficacy against just cloaking the raw footprints. The main findings are (i) while cloaking effectiveness does indeed diminish over time, using metafeatures slows the degradation; (ii) there is a tradeoff between privacy and personalization: cloaking undesired inferences also can inhibit desirable inferences. Furthermore, the metafeature strategy-which yields more stable cloaking-also incurs a larger reduction in desirable inferences.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"345-363"},"PeriodicalIF":2.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The influence maximization problem has several issues, including low infection rates and high time complexity. Many proposed methods are not suitable for large-scale networks due to their time complexity or free parameter usage. To address these challenges, this article proposes a local heuristic called Embedding Technique for Influence Maximization (ETIM) that uses shell decomposition, graph embedding, and reduction, as well as combined local structural features. The algorithm selects candidate nodes based on their connections among network shells and topological features, reducing the search space and computational overhead. It uses a deep learning-based node embedding technique to create a multidimensional vector of candidate nodes and calculates the dependency on spreading for each node based on local topological features. Finally, influential nodes are identified using the results of the previous phases and newly defined local features. The proposed algorithm is evaluated using the independent cascade model, showing its competitiveness and ability to achieve the best performance in terms of solution quality. Compared with the collective influence global algorithm, ETIM is significantly faster and improves the infection rate by an average of 12%.
{"title":"Maximizing Influence in Social Networks Using Combined Local Features and Deep Learning-Based Node Embedding.","authors":"Asgarali Bouyer, Hamid Ahmadi Beni, Amin Golzari Oskouei, Alireza Rouhi, Bahman Arasteh, Xiaoyang Liu","doi":"10.1089/big.2023.0117","DOIUrl":"10.1089/big.2023.0117","url":null,"abstract":"<p><p>The influence maximization problem has several issues, including low infection rates and high time complexity. Many proposed methods are not suitable for large-scale networks due to their time complexity or free parameter usage. To address these challenges, this article proposes a local heuristic called Embedding Technique for Influence Maximization (ETIM) that uses shell decomposition, graph embedding, and reduction, as well as combined local structural features. The algorithm selects candidate nodes based on their connections among network shells and topological features, reducing the search space and computational overhead. It uses a deep learning-based node embedding technique to create a multidimensional vector of candidate nodes and calculates the dependency on spreading for each node based on local topological features. Finally, influential nodes are identified using the results of the previous phases and newly defined local features. The proposed algorithm is evaluated using the independent cascade model, showing its competitiveness and ability to achieve the best performance in terms of solution quality. Compared with the collective influence global algorithm, ETIM is significantly faster and improves the infection rate by an average of 12%.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"379-397"},"PeriodicalIF":2.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2024-10-23DOI: 10.1089/big.2023.0131
Qi Ouyang, Hongchang Chen, Shuxin Liu, Liming Pu, Dongdong Ge, Ke Fan
Predicting propagation cascades is crucial for understanding information propagation in social networks. Existing methods always focus on structure or order of infected users in a single cascade sequence, ignoring the global dependencies of cascades and users, which is insufficient to characterize their dynamic interaction preferences. Moreover, existing methods are poor at addressing the problem of model robustness. To address these issues, we propose a predication model named DropMessage Hypergraph Attention Networks, which constructs a hypergraph based on the cascade sequence. Specifically, to dynamically obtain user preferences, we divide the diffusion hypergraph into multiple subgraphs according to the time stamps, develop hypergraph attention networks to explicitly learn complete interactions, and adopt a gated fusion strategy to connect them for user cascade prediction. In addition, a new drop immediately method DropMessage is added to increase the robustness of the model. Experimental results on three real-world datasets indicate that proposed model significantly outperforms the most advanced information propagation prediction model in both MAP@k and Hits@K metrics, and the experiment also proves that the model achieves more significant prediction performance than the existing model under data perturbation.
{"title":"DMHANT: DropMessage Hypergraph Attention Network for Information Propagation Prediction.","authors":"Qi Ouyang, Hongchang Chen, Shuxin Liu, Liming Pu, Dongdong Ge, Ke Fan","doi":"10.1089/big.2023.0131","DOIUrl":"10.1089/big.2023.0131","url":null,"abstract":"<p><p>Predicting propagation cascades is crucial for understanding information propagation in social networks. Existing methods always focus on structure or order of infected users in a single cascade sequence, ignoring the global dependencies of cascades and users, which is insufficient to characterize their dynamic interaction preferences. Moreover, existing methods are poor at addressing the problem of model robustness. To address these issues, we propose a predication model named DropMessage Hypergraph Attention Networks, which constructs a hypergraph based on the cascade sequence. Specifically, to dynamically obtain user preferences, we divide the diffusion hypergraph into multiple subgraphs according to the time stamps, develop hypergraph attention networks to explicitly learn complete interactions, and adopt a gated fusion strategy to connect them for user cascade prediction. In addition, a new drop immediately method DropMessage is added to increase the robustness of the model. Experimental results on three real-world datasets indicate that proposed model significantly outperforms the most advanced information propagation prediction model in both MAP@k and Hits@K metrics, and the experiment also proves that the model achieves more significant prediction performance than the existing model under data perturbation.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"364-378"},"PeriodicalIF":2.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-08DOI: 10.1089/big.2024.0094
Kenan Menguc, Alper Yilmaz
This research highlights the importance of accurately analyzing real-world multilayer network problems and introduces effective solutions. Whether simulating protein-protein network, transportation network, or a social network, representation and analysis over these networks are crucial. Multilayer networks, that contain added layers, may undergo dynamic transformations over time akin to single-layer networks that experience changes over time. These dynamic networks, that expand and contract, can be optimized by guidance from human operators if the transient changes are known and can be controlled. For the expansion and contraction of networks, this study introduces two distinct algorithms designed to make optimal decisions across dynamic changes of a multilayer network. The main strategy is to minimize the standard deviation across betweenness centrality of the edges in a complex network. The approaches we introduce incorporate diverse constraints into a multilayer weighted network, probing the network's expansion or contraction under various conditions represented as objective functions. The addition of changing of objective function enhances the model's adaptability to solve a wide array of problem types. In this way, complex network structures representing real-world problems can be mathematically modeled which makes it easier to make informed decisions.
{"title":"Optimizing Multilayer Networks Through Time-Dependent Decision-Making: A Comparative Study.","authors":"Kenan Menguc, Alper Yilmaz","doi":"10.1089/big.2024.0094","DOIUrl":"10.1089/big.2024.0094","url":null,"abstract":"<p><p>This research highlights the importance of accurately analyzing real-world multilayer network problems and introduces effective solutions. Whether simulating protein-protein network, transportation network, or a social network, representation and analysis over these networks are crucial. Multilayer networks, that contain added layers, may undergo dynamic transformations over time akin to single-layer networks that experience changes over time. These dynamic networks, that expand and contract, can be optimized by guidance from human operators if the transient changes are known and can be controlled. For the expansion and contraction of networks, this study introduces two distinct algorithms designed to make optimal decisions across dynamic changes of a multilayer network. The main strategy is to minimize the standard deviation across betweenness centrality of the edges in a complex network. The approaches we introduce incorporate diverse constraints into a multilayer weighted network, probing the network's expansion or contraction under various conditions represented as objective functions. The addition of changing of objective function enhances the model's adaptability to solve a wide array of problem types. In this way, complex network structures representing real-world problems can be mathematically modeled which makes it easier to make informed decisions.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"398-415"},"PeriodicalIF":2.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144585578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2024-07-23DOI: 10.1089/big.2023.0033
Hong Wang, Ling Hong
Survival models have found wider and wider applications in credit scoring recently due to their ability to estimate the dynamics of risk over time. In this research, we propose a Buckley-James safe sample screening support vector regression (BJS4VR) algorithm to model large-scale survival data by combing the Buckley-James transformation and support vector regression. Different from previous support vector regression survival models, censored samples here are imputed using a censoring unbiased Buckley-James estimator. Safe sample screening is then applied to discard samples that guaranteed to be non-active at the final optimal solution from the original data to improve efficiency. Experimental results on the large-scale real lending club loan data have shown that the proposed BJS4VR model outperforms existing popular survival models such as RSFM, CoxRidge and CoxBoost in terms of both prediction accuracy and time efficiency. Important variables highly correlated with credit risk are also identified with the proposed method.
{"title":"A Fast Survival Support Vector Regression Approach to Large Scale Credit Scoring via Safe Screening.","authors":"Hong Wang, Ling Hong","doi":"10.1089/big.2023.0033","DOIUrl":"10.1089/big.2023.0033","url":null,"abstract":"<p><p>Survival models have found wider and wider applications in credit scoring recently due to their ability to estimate the dynamics of risk over time. In this research, we propose a Buckley-James safe sample screening support vector regression (BJS4VR) algorithm to model large-scale survival data by combing the Buckley-James transformation and support vector regression. Different from previous support vector regression survival models, censored samples here are imputed using a censoring unbiased Buckley-James estimator. Safe sample screening is then applied to discard samples that guaranteed to be non-active at the final optimal solution from the original data to improve efficiency. Experimental results on the large-scale real lending club loan data have shown that the proposed BJS4VR model outperforms existing popular survival models such as RSFM, CoxRidge and CoxBoost in terms of both prediction accuracy and time efficiency. Important variables highly correlated with credit risk are also identified with the proposed method.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"304-318"},"PeriodicalIF":2.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}