首页 > 最新文献

Big Data Research最新文献

英文 中文
Time-synchronized sentiment labeling via autonomous online comments data mining: A multimodal information fusion on large-scale multimedia data 基于自主在线评论数据挖掘的时间同步情感标记:大规模多媒体数据的多模态信息融合
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-08 DOI: 10.1016/j.bdr.2025.100552
Jiachen Ma , Nazmus Sakib , Fahim Islam Anik , Sheikh Iqbal Ahamed
While temporal sentiment labels prove invaluable for video tagging, segmentation, and labeling tasks in multimedia studies, large-scale manual annotation remains cost and time-prohibitive. Emerging Online Time-Sync Comment (TSC) datasets offer promising alternatives for generating sentiment maps. However, limitations in existing TSC scope and a lack of resource-constrained data creation guidelines hinder broader use. This study addresses these challenges by proposing a novel system for automated TSC generation utilizing recent YouTube comments as a readily accessible source of time-synchronized data. The efficacy of our multi-platform data mining system is evaluated through extensive long-term trials, leading to the development and analysis of two large-scale TSC datasets. Benchmarking against original temporal Automatic Speech Recognition (ASR) sentiment annotations validates the accuracy of our generated data. This work establishes a promising method for automatic TSC generation, laying the groundwork for further advancements in multimedia research and paving the way for novel sentiment analysis applications.
虽然时间情感标签在多媒体研究中的视频标记、分割和标记任务中被证明是无价的,但大规模的人工注释仍然是成本和时间上的限制。新兴的在线时间同步评论(TSC)数据集为生成情感地图提供了有希望的替代方案。然而,现有TSC范围的限制和缺乏资源有限的数据创建指南阻碍了更广泛的使用。本研究通过提出一种新的系统来解决这些挑战,该系统利用最近的YouTube评论作为易于访问的时间同步数据来源,自动生成TSC。我们的多平台数据挖掘系统的有效性通过广泛的长期试验进行评估,从而开发和分析了两个大型TSC数据集。对原始时间自动语音识别(ASR)情感注释进行基准测试验证了我们生成数据的准确性。这项工作建立了一个有前途的自动生成TSC的方法,为多媒体研究的进一步发展奠定了基础,并为新的情感分析应用铺平了道路。
{"title":"Time-synchronized sentiment labeling via autonomous online comments data mining: A multimodal information fusion on large-scale multimedia data","authors":"Jiachen Ma ,&nbsp;Nazmus Sakib ,&nbsp;Fahim Islam Anik ,&nbsp;Sheikh Iqbal Ahamed","doi":"10.1016/j.bdr.2025.100552","DOIUrl":"10.1016/j.bdr.2025.100552","url":null,"abstract":"<div><div>While temporal sentiment labels prove invaluable for video tagging, segmentation, and labeling tasks in multimedia studies, large-scale manual annotation remains cost and time-prohibitive. Emerging Online Time-Sync Comment (TSC) datasets offer promising alternatives for generating sentiment maps. However, limitations in existing TSC scope and a lack of resource-constrained data creation guidelines hinder broader use. This study addresses these challenges by proposing a novel system for automated TSC generation utilizing recent YouTube comments as a readily accessible source of time-synchronized data. The efficacy of our multi-platform data mining system is evaluated through extensive long-term trials, leading to the development and analysis of two large-scale TSC datasets. Benchmarking against original temporal Automatic Speech Recognition (ASR) sentiment annotations validates the accuracy of our generated data. This work establishes a promising method for automatic TSC generation, laying the groundwork for further advancements in multimedia research and paving the way for novel sentiment analysis applications.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100552"},"PeriodicalIF":3.5,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of an integrated data system for regional tourism analysis in Italy: A microdata perspective 意大利区域旅游分析综合数据系统的开发:微数据视角
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-07 DOI: 10.1016/j.bdr.2025.100550
Samuele Cesarini, Fabrizio Antolini, Ivan Terraglia
This paper presents the development of an integrated data system tailored for the Italian regions, combining microdata from the Bank of Italy's and ISTAT's surveys. These datasets offer an in-depth analysis of both domestic and international aspects of tourism, framed within the theoretical context of the tourism determinants. By merging this integrated dataset with additional data from other statistical sources, this study offers a queryable relational database enabling granular regional analysis. Currently, tourism statistics in Italy are fragmented and do not provide a unified picture of tourism in its many aspects. The relational model's interoperability addresses Italy's fragmented tourism data landscape, and its data definition language represents an important step towards the creation of a unified tourism archive. Micro-data allows for different statistical analyses than those usually carried out with aggregated data, increasing knowledge of the dynamics of the sector.
本文介绍了为意大利地区量身定制的综合数据系统的开发,结合了意大利银行和ISTAT调查的微观数据。这些数据集在旅游决定因素的理论背景下,对国内和国际旅游方面进行了深入分析。通过将这个集成数据集与其他统计来源的其他数据合并,本研究提供了一个可查询的关系数据库,可以进行粒度区域分析。目前,意大利的旅游统计数据是支离破碎的,不能提供一个统一的旅游业的许多方面的画面。关系模型的互操作性解决了意大利支离破碎的旅游数据格局,其数据定义语言代表了创建统一旅游档案的重要一步。与通常使用汇总数据进行的统计分析相比,微观数据允许进行不同的统计分析,从而增加了对该部门动态的了解。
{"title":"Development of an integrated data system for regional tourism analysis in Italy: A microdata perspective","authors":"Samuele Cesarini,&nbsp;Fabrizio Antolini,&nbsp;Ivan Terraglia","doi":"10.1016/j.bdr.2025.100550","DOIUrl":"10.1016/j.bdr.2025.100550","url":null,"abstract":"<div><div>This paper presents the development of an integrated data system tailored for the Italian regions, combining microdata from the Bank of Italy's and ISTAT's surveys. These datasets offer an in-depth analysis of both domestic and international aspects of tourism, framed within the theoretical context of the tourism determinants. By merging this integrated dataset with additional data from other statistical sources, this study offers a queryable relational database enabling granular regional analysis. Currently, tourism statistics in Italy are fragmented and do not provide a unified picture of tourism in its many aspects. The relational model's interoperability addresses Italy's fragmented tourism data landscape, and its data definition language represents an important step towards the creation of a unified tourism archive. Micro-data allows for different statistical analyses than those usually carried out with aggregated data, increasing knowledge of the dynamics of the sector.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100550"},"PeriodicalIF":3.5,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144272198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BETM: A new pre-trained BERT-guided embedding-based topic model BETM:一种新的预训练bert引导的基于嵌入的主题模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-06 DOI: 10.1016/j.bdr.2025.100551
Yang Liu , Xiaotang Zhou , Zhenwei Zhang , Xiran Yang
The application of topic models and pre-trained BERT is becoming increasingly widespread in Natural Language Processing (NLP), but there is no standard method for incorporating them. In this paper, we propose a new pre-trained BERT-guided Embedding-based Topic Model (BETM). Through constraints on the topic-word distribution and document-topic distributions, BETM can ingeniously learn semantic information, syntactic information and topic information from BERT embeddings. In addition, we design two solutions to improve the problem of insufficient contextual information caused by short input and the issue of semantic truncation caused by long put in BETM. We find that word embeddings of BETM are more suitable for topic modeling than pre-trained GloVe word embeddings, and BETM can flexibly select different variants of the pre-trained BERT for specific datasets to obtain better topic quality. And we find that BETM is good at handling large and heavy-tailed vocabularies even if it contains stop words. BETM obtained the State-Of-The-Art (SOTA) on several benchmark datasets - Yelp Review Polarity (106,586 samplest), Wiki Text 103 (71,533 samples), Open-Web-Text (35,713 samples), 20Newsgroups (10,899 samples), and AG-news (127,588 samples).
主题模型和预训练BERT在自然语言处理(NLP)中的应用越来越广泛,但目前还没有一个标准的方法来整合它们。本文提出了一种新的预训练bert引导的基于嵌入的主题模型(BETM)。通过对主题-词分布和文档-主题分布的约束,BETM可以巧妙地从BERT嵌入中学习语义信息、句法信息和主题信息。此外,针对BETM中短输入导致的上下文信息不足和长输入导致的语义截断问题,我们设计了两种解决方案。我们发现BETM的词嵌入比预训练好的GloVe词嵌入更适合于主题建模,并且BETM可以针对特定数据集灵活地选择预训练BERT的不同变体,从而获得更好的主题质量。我们发现,即使包含停止词,BETM也能很好地处理大而重尾的词汇。BETM在几个基准数据集上获得了最先进的(SOTA) - Yelp Review Polarity(106,586个样本),Wiki Text 103(71,533个样本),Open-Web-Text(35,713个样本),20Newsgroups(10,899个样本)和AG-news(127,588个样本)。
{"title":"BETM: A new pre-trained BERT-guided embedding-based topic model","authors":"Yang Liu ,&nbsp;Xiaotang Zhou ,&nbsp;Zhenwei Zhang ,&nbsp;Xiran Yang","doi":"10.1016/j.bdr.2025.100551","DOIUrl":"10.1016/j.bdr.2025.100551","url":null,"abstract":"<div><div>The application of topic models and pre-trained BERT is becoming increasingly widespread in Natural Language Processing (NLP), but there is no standard method for incorporating them. In this paper, we propose a new pre-trained BERT-guided Embedding-based Topic Model (BETM). Through constraints on the topic-word distribution and document-topic distributions, BETM can ingeniously learn semantic information, syntactic information and topic information from BERT embeddings. In addition, we design two solutions to improve the problem of insufficient contextual information caused by short input and the issue of semantic truncation caused by long put in BETM. We find that word embeddings of BETM are more suitable for topic modeling than pre-trained GloVe word embeddings, and BETM can flexibly select different variants of the pre-trained BERT for specific datasets to obtain better topic quality. And we find that BETM is good at handling large and heavy-tailed vocabularies even if it contains stop words. BETM obtained the State-Of-The-Art (SOTA) on several benchmark datasets - Yelp Review Polarity (106,586 samplest), Wiki Text 103 (71,533 samples), Open-Web-Text (35,713 samples), 20Newsgroups (10,899 samples), and AG-news (127,588 samples).</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100551"},"PeriodicalIF":3.5,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bankruptcy risk prediction: A new approach based on compositional analysis of financial statements 破产风险预测:基于财务报表成分分析的新方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-23 DOI: 10.1016/j.bdr.2025.100537
Alessandro Magrini
The development of models for bankruptcy risk prediction has gained much attention in recent years due to the great availability of financial statement data. Most existing predictive models rely on financial ratios, which are performance-based measures expressing the relative magnitude of two accounting items. Despite the popularity of financial ratios, their use is notoriously accompanied by serious practical drawbacks, like the occurrence of outliers and redundancy, making data preprocessing necessary to avoid computational problems and obtain a good predictive accuracy. Isometric log ratios can potentially overcome these problems because they are designed to represent compositional data efficiently and have a logarithmic form that limits the occurrence of outliers. However, although they are not novel in the analysis of financial statements, no study has ever employed them to predict bankruptcy. In this article, we show the effectiveness of isometric log ratios to detect bankruptcy events in a sample of 138,720 Italian firms (127,420 active and 11,300 bankrupted) belonging to different industries and with different size and age. For this purpose, we use logistic regression with adaptive LASSO regularization and random forests to construct several predictive models featuring either financial ratios or isometric log ratios, and combining different horizons and lag structures. The results show that a set of 8 isometric log ratios provides, without preprocessing, almost the same predictive accuracy as a selection of 16 financial ratios that requires dropping 3.6% of the data. Also, the adaptive LASSO regularization reveals that redundancy for isometric log ratios is always below 20%, and in some cases near 0%, while it ranges from 12.5% to 46.9% for financial ratios. The predictive accuracy of models based on logistic regression is in line with and even higher than the one reported by recent studies, and random forests achieve a gain in the area under the Receiver Operating Characteristic (ROC) curve ranging between two and three percentage points.
近年来,由于财务报表数据的可获得性很大,破产风险预测模型的发展受到了广泛关注。大多数现有的预测模型依赖于财务比率,这是一种基于业绩的指标,表示两个会计项目的相对大小。尽管财务比率很受欢迎,但众所周知,它们的使用伴随着严重的实际缺陷,如异常值和冗余的出现,使得数据预处理成为必要,以避免计算问题并获得良好的预测准确性。等距对数比可以潜在地克服这些问题,因为它们被设计为有效地表示组成数据,并且具有限制异常值出现的对数形式。然而,尽管它们在分析财务报表方面并不新颖,但还没有研究使用它们来预测破产。在本文中,我们展示了等距对数比率在138,720家意大利公司(127,420家活跃公司和11,300家破产公司)中检测破产事件的有效性,这些公司属于不同的行业,具有不同的规模和年龄。为此,我们使用具有自适应LASSO正则化和随机森林的逻辑回归来构建几个具有财务比率或等距对数比率的预测模型,并结合不同的视界和滞后结构。结果表明,在没有预处理的情况下,一组8个等距对数比率提供的预测精度几乎与选择16个财务比率相同,这需要减少3.6%的数据。此外,自适应LASSO正则化表明,等距对数比率的冗余度始终低于20%,在某些情况下接近0%,而财务比率的冗余度在12.5%至46.9%之间。基于logistic回归的模型预测精度符合甚至高于近期研究报道,随机森林在受试者工作特征(ROC)曲线下的面积增加了2 - 3个百分点。
{"title":"Bankruptcy risk prediction: A new approach based on compositional analysis of financial statements","authors":"Alessandro Magrini","doi":"10.1016/j.bdr.2025.100537","DOIUrl":"10.1016/j.bdr.2025.100537","url":null,"abstract":"<div><div>The development of models for bankruptcy risk prediction has gained much attention in recent years due to the great availability of financial statement data. Most existing predictive models rely on financial ratios, which are performance-based measures expressing the relative magnitude of two accounting items. Despite the popularity of financial ratios, their use is notoriously accompanied by serious practical drawbacks, like the occurrence of outliers and redundancy, making data preprocessing necessary to avoid computational problems and obtain a good predictive accuracy. Isometric log ratios can potentially overcome these problems because they are designed to represent compositional data efficiently and have a logarithmic form that limits the occurrence of outliers. However, although they are not novel in the analysis of financial statements, no study has ever employed them to predict bankruptcy. In this article, we show the effectiveness of isometric log ratios to detect bankruptcy events in a sample of 138,720 Italian firms (127,420 active and 11,300 bankrupted) belonging to different industries and with different size and age. For this purpose, we use logistic regression with adaptive LASSO regularization and random forests to construct several predictive models featuring either financial ratios or isometric log ratios, and combining different horizons and lag structures. The results show that a set of 8 isometric log ratios provides, without preprocessing, almost the same predictive accuracy as a selection of 16 financial ratios that requires dropping 3.6% of the data. Also, the adaptive LASSO regularization reveals that redundancy for isometric log ratios is always below 20%, and in some cases near 0%, while it ranges from 12.5% to 46.9% for financial ratios. The predictive accuracy of models based on logistic regression is in line with and even higher than the one reported by recent studies, and random forests achieve a gain in the area under the Receiver Operating Characteristic (ROC) curve ranging between two and three percentage points.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100537"},"PeriodicalIF":3.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144138321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mortality and risk of cardiovascular diseases by age at retirement in three Italian cohorts 意大利三个队列按退休年龄划分的心血管疾病死亡率和风险
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-23 DOI: 10.1016/j.bdr.2025.100543
Chiara Ardito , Roberto Leombruni , Giuseppe Costa , Angelo d’Errico
The relationship between age at retirement and subsequent physical health appears still contradictory in the literature, with more recent studies suggesting possible adverse health effects linked to employment at later ages. Aim of this study was to assess the long-term risk of overall mortality and incidence of cardiovascular diseases (CVDs) associated with age at retirement in three large Italian cohorts using both survey and administrative data.
The risk of mortality and CVDs associated with age at retirement, kept continuous, was assessed separately for gender using age-adjusted Cox models, further controlled for chronic morbidity, education, socioeconomic and previous working characteristics. In another analysis, age at retirement was examined treating it as a dichotomous variable, comparing, in a set of analyses with age at retirement from 52 to 65 years, the incidence of the health outcomes among subjects who retired after a certain age, compared to those who retired up to that age.
Higher age at retirement was associated with significantly higher mortality among men in the three cohorts, while among women the association was not significant, although in the same direction as for men. The risk of CVDs was also significantly associated with higher age at retirement in all the datasets among men, and in two of them among women. The set of the analyses on age at retirement dichotomized confirmed the results based on continuous age at retirement for both genders. Several robustness analyses, including IV Poisson instrumental variable, confirm the validity of results for men, whereas female results were less stable and robust.
Policy makers should be aware of the risk for public heath of policies that increase retirement age.
在文献中,退休年龄与随后的身体健康之间的关系似乎仍然是矛盾的,最近的研究表明,年龄较晚的就业可能对健康产生不利影响。本研究的目的是利用调查和管理数据评估意大利三个大型队列中与退休年龄相关的总死亡率和心血管疾病(cvd)发病率的长期风险。死亡率和心血管疾病风险与退休年龄相关,保持连续性,使用年龄调整的Cox模型分别评估性别,进一步控制慢性发病率、教育、社会经济和以前的工作特征。在另一项分析中,对退休年龄进行了检查,将其作为一个二分变量,在一组与退休年龄从52岁到65岁的分析中,比较了在某一年龄之后退休的受试者与在该年龄之前退休的受试者的健康结果发生率。在这三个队列中,退休年龄越高的男性死亡率越高,而在女性中,尽管与男性的方向相同,但这种关联并不显著。在所有的男性数据集中,心血管疾病的风险也与较高的退休年龄显著相关,其中两个是女性数据集。在连续退休年龄的基础上,对两种性别的退休年龄进行了二分类分析,证实了这一结果。包括泊松工具变量在内的几个稳健性分析证实了男性结果的有效性,而女性结果则不那么稳定和稳健。决策者应该意识到提高退休年龄的政策对公共健康的风险。
{"title":"Mortality and risk of cardiovascular diseases by age at retirement in three Italian cohorts","authors":"Chiara Ardito ,&nbsp;Roberto Leombruni ,&nbsp;Giuseppe Costa ,&nbsp;Angelo d’Errico","doi":"10.1016/j.bdr.2025.100543","DOIUrl":"10.1016/j.bdr.2025.100543","url":null,"abstract":"<div><div>The relationship between age at retirement and subsequent physical health appears still contradictory in the literature, with more recent studies suggesting possible adverse health effects linked to employment at later ages. Aim of this study was to assess the long-term risk of overall mortality and incidence of cardiovascular diseases (CVDs) associated with age at retirement in three large Italian cohorts using both survey and administrative data.</div><div>The risk of mortality and CVDs associated with age at retirement, kept continuous, was assessed separately for gender using age-adjusted Cox models, further controlled for chronic morbidity, education, socioeconomic and previous working characteristics. In another analysis, age at retirement was examined treating it as a dichotomous variable, comparing, in a set of analyses with age at retirement from 52 to 65 years, the incidence of the health outcomes among subjects who retired after a certain age, compared to those who retired up to that age.</div><div>Higher age at retirement was associated with significantly higher mortality among men in the three cohorts, while among women the association was not significant, although in the same direction as for men. The risk of CVDs was also significantly associated with higher age at retirement in all the datasets among men, and in two of them among women. The set of the analyses on age at retirement dichotomized confirmed the results based on continuous age at retirement for both genders. Several robustness analyses, including IV Poisson instrumental variable, confirm the validity of results for men, whereas female results were less stable and robust.</div><div>Policy makers should be aware of the risk for public heath of policies that increase retirement age.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100543"},"PeriodicalIF":3.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144205025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The influence of China's exchange rate market on the Belt and Road trade market: Based on temporal two-layer networks 中国汇率市场对“一带一路”贸易市场的影响——基于时间双层网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-19 DOI: 10.1016/j.bdr.2025.100540
Xiaoyu Zhang , Ye Pan , Lilan Tu
From 2010 to 2023, this research utilizes daily closing exchange rate data for countries participating in the Belt and Road Initiative (BRI) as well as China’s import and export volumes with these countries. Taking the renminbi (RMB) as the base currency and the other BRI currencies as quote currencies, we employ the Autoregressive Distributed Lag (ARDL) model to propose an algorithm for constructing a temporal two-layer network, resulting in the exchange-rate-trade network composed of 14 subnetworks. Through an analysis of the network’s topological structure, we observe that 2013 marks a significant turning point, after which the network transitions from a decentralized to a more centralized form. To assess the annual impact of China’s exchange rate and trade from 2010 to 2023, we introduce a comprehensive index for identifying key nodes within the network. Our findings based on this index indicate that: (1) Lebanon, Kyrgyzstan, and other diverse countries and regions emerge as key nodes, demonstrating China’s close economic ties with these countries and reflecting the substantial influence of RMB internationalization; and (2) compared with other years, China’s exchange rate market exerts notably stronger influence on the trade market in 2018, 2021, 2022, and 2023.
从2010年到2023年,本研究使用了参与“一带一路”倡议(BRI)的国家的每日收盘汇率数据以及中国与这些国家的进出口贸易额。以人民币为基准货币,其他一带一路货币为报价货币,采用自回归分布滞后(ARDL)模型,提出了一种构建时间双层网络的算法,得到了由14个子网络组成的汇率-贸易网络。通过对网络拓扑结构的分析,我们观察到2013年标志着一个重要的转折点,之后网络从分散的形式转变为更集中的形式。为了评估从2010年到2023年中国汇率和贸易的年度影响,我们引入了一个综合指数来识别网络中的关键节点。基于该指数的研究结果表明:(1)黎巴嫩、吉尔吉斯斯坦等不同国家和地区成为关键节点,表明中国与这些国家的经济联系密切,反映了人民币国际化的实质性影响;(2) 2018年、2021年、2022年和2023年中国汇率市场对贸易市场的影响明显强于其他年份。
{"title":"The influence of China's exchange rate market on the Belt and Road trade market: Based on temporal two-layer networks","authors":"Xiaoyu Zhang ,&nbsp;Ye Pan ,&nbsp;Lilan Tu","doi":"10.1016/j.bdr.2025.100540","DOIUrl":"10.1016/j.bdr.2025.100540","url":null,"abstract":"<div><div>From 2010 to 2023, this research utilizes daily closing exchange rate data for countries participating in the Belt and Road Initiative (BRI) as well as China’s import and export volumes with these countries. Taking the renminbi (RMB) as the base currency and the other BRI currencies as quote currencies, we employ the Autoregressive Distributed Lag (ARDL) model to propose an algorithm for constructing a temporal two-layer network, resulting in the exchange-rate-trade network composed of 14 subnetworks. Through an analysis of the network’s topological structure, we observe that 2013 marks a significant turning point, after which the network transitions from a decentralized to a more centralized form. To assess the annual impact of China’s exchange rate and trade from 2010 to 2023, we introduce a comprehensive index for identifying key nodes within the network. Our findings based on this index indicate that: (1) Lebanon, Kyrgyzstan, and other diverse countries and regions emerge as key nodes, demonstrating China’s close economic ties with these countries and reflecting the substantial influence of RMB internationalization; and (2) compared with other years, China’s exchange rate market exerts notably stronger influence on the trade market in 2018, 2021, 2022, and 2023.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100540"},"PeriodicalIF":3.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multiple-group hidden Markov model for multi-source data. Cross-country differences in employment mobility in the presence of measurement error 多源数据的多组隐马尔可夫模型。存在测量误差的就业流动性的跨国差异
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-19 DOI: 10.1016/j.bdr.2025.100527
Roberta Varriale , Mauricio Garnier-Villarreal , Dimitris Pavlopoulos , Danila Filipponi
In this paper, we develop a multigroup hidden Markov model to tackle the issue of measurement error in multi-source data from different countries. We focus, in particular, on the measurement of employment mobility in the Netherlands and Italy using linked data from the Labour Force Survey and administrative sources. The measurement-error correction we apply reconciles differences between data sources and shows that cross-country differences in employment mobility are smaller than originally thought. Error-corrected estimates indicate that mobility from temporary to permanent employment has become, over time, larger in Italy than in the Netherlands, while mobility from non-employment to temporary employment has steadily been higher in the Netherlands than in Italy.
在本文中,我们建立了一个多组隐马尔可夫模型来解决来自不同国家的多源数据的测量误差问题。我们特别关注荷兰和意大利的就业流动性,使用来自劳动力调查和行政来源的相关数据。我们采用的测量误差修正调和了数据源之间的差异,并表明就业流动性的跨国差异比最初想象的要小。修正错误的估计表明,随着时间的推移,意大利从临时就业到永久就业的流动性比荷兰大,而荷兰从非就业到临时就业的流动性一直高于意大利。
{"title":"A multiple-group hidden Markov model for multi-source data. Cross-country differences in employment mobility in the presence of measurement error","authors":"Roberta Varriale ,&nbsp;Mauricio Garnier-Villarreal ,&nbsp;Dimitris Pavlopoulos ,&nbsp;Danila Filipponi","doi":"10.1016/j.bdr.2025.100527","DOIUrl":"10.1016/j.bdr.2025.100527","url":null,"abstract":"<div><div>In this paper, we develop a multigroup hidden Markov model to tackle the issue of measurement error in multi-source data from different countries. We focus, in particular, on the measurement of employment mobility in the Netherlands and Italy using linked data from the Labour Force Survey and administrative sources. The measurement-error correction we apply reconciles differences between data sources and shows that cross-country differences in employment mobility are smaller than originally thought. Error-corrected estimates indicate that mobility from temporary to permanent employment has become, over time, larger in Italy than in the Netherlands, while mobility from non-employment to temporary employment has steadily been higher in the Netherlands than in Italy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100527"},"PeriodicalIF":3.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multimodal deep learning framework for constructing a market sentiment index from stock news 基于股票新闻构建市场情绪指数的多模态深度学习框架
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-17 DOI: 10.1016/j.bdr.2025.100535
Yunting Liu, Yirong Huang
Unimodal sentiment analysis often fails to capture the complexity of financial sentiment. This paper proposes a multimodal deep learning framework that integrates text, audio, and image data from CCTV news videos on TikTok to construct a multimodal sentiment indicator for the Chinese stock market. Empirical results show that multimodal fusion enhances sentiment analysis, with text outperforming audio and image modalities. The indicator correlates weakly with stock returns but significantly with market volatility, aligns with seasonal sentiment patterns, and reflects significant events like COVID-19. Additionally, weekly sentiment trends indicate the lowest sentiment on Thursdays and the highest on Fridays. This study advances financial sentiment analysis by demonstrating the efficacy of multimodal indicators in capturing market sentiment and informing volatility forecasts.
单模态情绪分析往往无法捕捉到金融情绪的复杂性。本文提出了一个多模态深度学习框架,该框架整合了TikTok上CCTV新闻视频的文本、音频和图像数据,构建了中国股市的多模态情绪指标。实证结果表明,多模态融合增强了情感分析,文本模式优于音频和图像模式。该指标与股票回报相关性较弱,但与市场波动性相关性显著,与季节性情绪模式一致,并反映了COVID-19等重大事件。此外,每周情绪趋势显示周四情绪最低,周五情绪最高。本研究通过展示多模态指标在捕捉市场情绪和为波动率预测提供信息方面的有效性,推进了金融情绪分析。
{"title":"A multimodal deep learning framework for constructing a market sentiment index from stock news","authors":"Yunting Liu,&nbsp;Yirong Huang","doi":"10.1016/j.bdr.2025.100535","DOIUrl":"10.1016/j.bdr.2025.100535","url":null,"abstract":"<div><div>Unimodal sentiment analysis often fails to capture the complexity of financial sentiment. This paper proposes a multimodal deep learning framework that integrates text, audio, and image data from CCTV news videos on TikTok to construct a multimodal sentiment indicator for the Chinese stock market. Empirical results show that multimodal fusion enhances sentiment analysis, with text outperforming audio and image modalities. The indicator correlates weakly with stock returns but significantly with market volatility, aligns with seasonal sentiment patterns, and reflects significant events like COVID-19. Additionally, weekly sentiment trends indicate the lowest sentiment on Thursdays and the highest on Fridays. This study advances financial sentiment analysis by demonstrating the efficacy of multimodal indicators in capturing market sentiment and informing volatility forecasts.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100535"},"PeriodicalIF":3.5,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The narrative on tourism sustainability in Italian news: A text mining approach 意大利新闻中旅游业可持续性的叙述:一种文本挖掘方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-16 DOI: 10.1016/j.bdr.2025.100541
Carla Galluccio , Paola Beccherle , Alessandra Petrucci
Tourism sustainability is a complex and multidimensional construct, for which there is no shared definition in the literature. Consequently, there is no standard method for its measurement, and the adoption of sustainable practices often falls short of reached goals. Therefore, contributing to the definition of the concept of sustainable tourism is essential, both for policymakers and academics. In this vein, news media data can represent a key element through which to understand the debate about tourism sustainability. This research aims to exploit the potential of news texts to explore how sustainable tourism is conceived within specific cultural contexts. Focusing on the case study of Italy, we analysed how the concept of tourism sustainability is represented in Italian newspapers, extracting the topics discussed in relation to this theme. From a methodological point of view, we employed a network-based approach for topic extraction. Our study contributes to the literature on tourism sustainability by proposing an innovative method for extracting information from unstructured data sources, such as textual data, providing policymakers with insights about the narrative around this topic.
旅游可持续性是一个复杂的多维结构,在文献中没有共同的定义。因此,没有衡量它的标准方法,采用可持续的做法往往达不到达到的目标。因此,对政策制定者和学者来说,为可持续旅游概念的定义做出贡献至关重要。在这方面,新闻媒体数据可以成为理解关于旅游业可持续性辩论的一个关键因素。本研究旨在利用新闻文本的潜力,探索如何在特定的文化背景下构思可持续旅游。着眼于意大利的案例研究,我们分析了旅游可持续发展的概念是如何在意大利报纸上表现出来的,提取了与这一主题相关的讨论话题。从方法论的角度来看,我们采用了基于网络的方法进行主题提取。我们的研究提出了一种从非结构化数据源(如文本数据)中提取信息的创新方法,为政策制定者提供了关于这一主题的见解,从而为旅游业可持续发展的文献做出了贡献。
{"title":"The narrative on tourism sustainability in Italian news: A text mining approach","authors":"Carla Galluccio ,&nbsp;Paola Beccherle ,&nbsp;Alessandra Petrucci","doi":"10.1016/j.bdr.2025.100541","DOIUrl":"10.1016/j.bdr.2025.100541","url":null,"abstract":"<div><div>Tourism sustainability is a complex and multidimensional construct, for which there is no shared definition in the literature. Consequently, there is no standard method for its measurement, and the adoption of sustainable practices often falls short of reached goals. Therefore, contributing to the definition of the concept of sustainable tourism is essential, both for policymakers and academics. In this vein, news media data can represent a key element through which to understand the debate about tourism sustainability. This research aims to exploit the potential of news texts to explore how sustainable tourism is conceived within specific cultural contexts. Focusing on the case study of Italy, we analysed how the concept of tourism sustainability is represented in Italian newspapers, extracting the topics discussed in relation to this theme. From a methodological point of view, we employed a network-based approach for topic extraction. Our study contributes to the literature on tourism sustainability by proposing an innovative method for extracting information from unstructured data sources, such as textual data, providing policymakers with insights about the narrative around this topic.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100541"},"PeriodicalIF":3.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bipartite graph partitioning and spatial bootstrapping methods: A case study of innovative startups 二部图划分与空间自举方法:以创新创业公司为例
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-16 DOI: 10.1016/j.bdr.2025.100533
Alessio Bumbea , Andrea Mazzitelli , Giuseppe Espa , Alessandro Rinaldi
Innovative startups are the source of innovation and technological development; therefore, understanding their behavior can help better recognize the business organization's direction. This paper introduces a new method for clustering innovative startups using bipartite graph partitioning combined with spatial bootstrapping, improving clusters' accuracy and interpretability. Recent advancements in clustering techniques have introduced ensemble or consensus clustering methods, which aim to merge multiple clustering results into a superior outcome. A key challenge in this field is effectively integrating diverse clusters, and one promising solution involves utilizing graph formalism and partitioning strategies. By leveraging advanced graph partitioning techniques, we transform the task of partitioning the ensemble graph into a community detection problem. Our methodological approach improves the traditional method of bipartite graphs used in cluster ensembles by implementing the state of the art biLouvain algorithm. We also focused on techniques that could be used to increase the interpretability of the clusters themselves and how they can be used to obtain insightful information from the data. The proposed methodology was applied to a dataset of technologically advanced new businesses, located in the Lombardy region and recorded as innovative startups in the special section of the Italian Chambers of Commerce's Business Register.
创新型创业公司是创新和技术发展的源泉;因此,了解他们的行为有助于更好地认识企业组织的方向。本文提出了一种利用二部图划分和空间自举相结合的创新创业公司聚类方法,提高了聚类的准确率和可解释性。聚类技术的最新进展引入了集成或一致聚类方法,其目的是将多个聚类结果合并为一个更好的结果。该领域的一个关键挑战是有效地集成不同的集群,一个有前途的解决方案涉及利用图的形式化和划分策略。通过利用先进的图划分技术,我们将集成图的划分任务转化为社区检测问题。我们的方法通过实现最先进的biLouvain算法,改进了聚类集成中使用的传统二部图方法。我们还关注了可用于提高集群本身的可解释性的技术,以及如何使用它们从数据中获得有洞察力的信息。拟议的方法被应用于伦巴第地区技术先进的新企业数据集,这些企业被记录为意大利商会商业登记簿的特别部分中的创新初创企业。
{"title":"Bipartite graph partitioning and spatial bootstrapping methods: A case study of innovative startups","authors":"Alessio Bumbea ,&nbsp;Andrea Mazzitelli ,&nbsp;Giuseppe Espa ,&nbsp;Alessandro Rinaldi","doi":"10.1016/j.bdr.2025.100533","DOIUrl":"10.1016/j.bdr.2025.100533","url":null,"abstract":"<div><div>Innovative startups are the source of innovation and technological development; therefore, understanding their behavior can help better recognize the business organization's direction. This paper introduces a new method for clustering innovative startups using bipartite graph partitioning combined with spatial bootstrapping, improving clusters' accuracy and interpretability. Recent advancements in clustering techniques have introduced ensemble or consensus clustering methods, which aim to merge multiple clustering results into a superior outcome. A key challenge in this field is effectively integrating diverse clusters, and one promising solution involves utilizing graph formalism and partitioning strategies. By leveraging advanced graph partitioning techniques, we transform the task of partitioning the ensemble graph into a community detection problem. Our methodological approach improves the traditional method of bipartite graphs used in cluster ensembles by implementing the state of the art biLouvain algorithm. We also focused on techniques that could be used to increase the interpretability of the clusters themselves and how they can be used to obtain insightful information from the data. The proposed methodology was applied to a dataset of technologically advanced new businesses, located in the Lombardy region and recorded as innovative startups in the special section of the Italian Chambers of Commerce's Business Register.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100533"},"PeriodicalIF":3.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1