首页 > 最新文献

Big Data最新文献

英文 中文
Acknowledgment of Reviewers 2023. 鸣谢 2023 年审稿人。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-12-19 DOI: 10.1089/big.2023.29063.ack
{"title":"Acknowledgment of Reviewers 2023.","authors":"","doi":"10.1089/big.2023.29063.ack","DOIUrl":"10.1089/big.2023.29063.ack","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"12 1","pages":"81-82"},"PeriodicalIF":4.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139730992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Natural Language Processing-Based Supplier Discovery for Financial Services. 基于自然语言处理的金融服务供应商自动发现。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-07-07 DOI: 10.1089/big.2022.0215
Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos

Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.

公共采购被视为一种重要的市场力量,可用于促进创新和推动中小型企业的发展。在这种情况下,采购系统的设计依赖于在供应商与创新服务和产品提供商之间建立纵向联系的中介机构。在这项工作中,我们提出了一种创新方法,用于在最终选择供应商之前的发现供应商过程中提供决策支持。我们专注于从 Reddit 和 Wikidata 等基于社区的来源收集数据,避免使用任何历史公开采购数据集来识别市场份额极小的创新产品和服务的中小型供应商。我们研究了金融部门的一个真实采购案例,重点是金融和市场数据产品,并开发了一个基于网络的互动式支持工具,以满足意大利中央银行的某些要求。我们展示了如何选择合适的自然语言处理模型,如语音部分标记和词嵌入模型,并结合新颖的命名实体消歧义算法,高效地分析大量文本数据,从而提高全面覆盖市场的可能性。
{"title":"Automated Natural Language Processing-Based Supplier Discovery for Financial Services.","authors":"Mauro Papa, Ioannis Chatzigiannakis, Aris Anagnostopoulos","doi":"10.1089/big.2022.0215","DOIUrl":"10.1089/big.2022.0215","url":null,"abstract":"<p><p>Public procurement is viewed as a major market force that can be used to promote innovation and drive small and medium-sized enterprises growth. In such cases, procurement system design relies on intermediates that provide vertical linkages between suppliers and providers of innovative services and products. In this work we propose an innovative methodology for decision support in the process of supplier discovery, which precedes the final supplier selection. We focus on data gathered from community-based sources such as Reddit and Wikidata and avoid any use of historical open procurement datasets to identify small and medium sized suppliers of innovative products and services that own very little market shares. We look into a real-world procurement case study from the financial sector focusing on the Financial and Market Data offering and develop an interactive web-based support tool to address certain requirements of the Italian central bank. We demonstrate how a suitable selection of natural language processing models, such as a part-of-speech tagger and a word-embedding model, in combination with a novel named-entity-disambiguation algorithm, can efficiently analyze huge quantity of textual data, increasing the probability of a full coverage of the market.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"30-48"},"PeriodicalIF":2.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9749953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support. 合作创新对高科技企业技术创新绩效的影响:大数据能力与政策支持的双重调节效应模型。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-09-14 DOI: 10.1089/big.2022.0301
Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu

The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.

高科技企业的合作创新(CI)机制旨在提高其技术创新绩效。它有效整合了企业内外部的创新资源,同时降低了技术创新的不确定性,保持了企业在竞争中的比较优势。本研究以中国政府认定的 33 个国家自主创新示范基地中的 322 家高科技企业为样本。我们采用多元线性回归的方法,检验了这些高科技企业开展的 CI 对其技术创新绩效水平的影响。此外,研究还进一步检验了两个边界条件--大数据能力和政策支持(PS)--对主要假设的调节作用。我们的研究发现,高科技企业开展 CI 能有效提高其技术创新绩效,而大数据能力和政策支持能显著提高这种影响程度。研究揭示了CI对高科技企业技术创新绩效影响的内在机理,在一定程度上拓展了CI的应用范围,丰富了CI对企业创新绩效影响的研究视角。同时,研究结果也为数字时代的高科技企业如何在CI过程中合理利用数据赋能实现技术创新绩效的提升提供了启示。
{"title":"Impact of Cooperative Innovation on the Technological Innovation Performance of High-Tech Firms: A Dual Moderating Effect Model of Big Data Capabilities and Policy Support.","authors":"Xianglong Li, Qingjin Wang, Renbo Shi, Xueling Wang, Kaiyun Zhang, Xiao Liu","doi":"10.1089/big.2022.0301","DOIUrl":"10.1089/big.2022.0301","url":null,"abstract":"<p><p>The mechanism of cooperative innovation (CI) for high-tech firms aims to improve their technological innovation performance. It is the effective integration of the internal and external innovation resources of these firms, along with the simultaneous reduction in the uncertainty of technological innovation and the maintenance of the comparative advantage of the firms in the competition. This study used 322 high-tech firms as our sample, which were located in 33 national innovation demonstration bases identified by the Chinese government. We implemented a multiple linear regression to test the impact of CI conducted by these high-tech firms at the level of their technological innovation performance. In addition, the study further examined the moderating effect of two boundary conditions-big data capabilities and policy support (PS)-on the main hypotheses. Our study found that high-tech firms carrying out CI can effectively improve their technological innovation performance, with big data capabilities and PS significantly enhancing the degree of this influence. The study reveals the intrinsic mechanism of the impact of CI on the technological innovation performance of high-tech firms, which, to a certain extent, expands the application context of CI and enriches the research perspective on the impact of CI on the innovation performance of firms. At the same time, the findings provide insight for how high-tech firms in the digital era can make reasonable use of data empowerment in the process of CI to achieve improved technological innovation performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"63-80"},"PeriodicalIF":2.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10243508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Estimation and Analysis of Web Users' Mood from Web Search Query and Mobile Sensor Data. 从网络搜索查询和移动传感器数据中大规模估计和分析网络用户的情绪。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-06-02 DOI: 10.1089/big.2022.0211
Wataru Sasaki, Satoki Hamanaka, Satoko Miyahara, Kota Tsubouchi, Jin Nakazawa, Tadashi Okoshi

The ability to estimate the current mood states of web users has considerable potential for realizing user-centric opportune services in pervasive computing. However, it is difficult to determine the data type used for such estimation and collect the ground truth of such mood states. Therefore, we built a model to estimate the mood states from search-query data in an easy-to-collect and non-invasive manner. Then, we built a model to estimate mood states from mobile sensor data as another estimation model and supplemented its output to the ground-truth label of the model estimated from search queries. This novel two-step model building contributed to boosting the performance of estimating the mood states of web users. Our system was also deployed in the commercial stack, and large-scale data analysis with >11 million users was conducted. We proposed a nationwide mood score, which bundles the mood values of users across the country. It shows the daily and weekly rhythm of people's moods and explains the ups and downs of moods during the COVID-19 pandemic, which is inversely synchronized to the number of new COVID-19 cases. It detects big news that simultaneously affects the mood states of many users, even under fine-grained time resolution, such as the order of hours. In addition, we identified a certain class of advertisements that indicated a clear tendency in the mood of the users who clicked such advertisements.

估计网络用户当前情绪状态的能力对于在普适计算中实现以用户为中心的适时服务具有相当大的潜力。然而,很难确定用于这种估计的数据类型,也很难收集这种情绪状态的基本事实。因此,我们建立了一个模型,以易于收集和非侵入性的方式从搜索查询数据中估计情绪状态。然后,我们建立了一个从移动传感器数据中估计情绪状态的模型,作为另一个估计模型,并将其输出补充到从搜索查询中估计的模型的地面实况标签中。这种分两步建立模型的新方法有助于提高估计网络用户情绪状态的性能。我们的系统还部署在商业堆栈中,并对超过 1100 万用户进行了大规模数据分析。我们提出了一个全国性的情绪评分,它捆绑了全国用户的情绪值。它显示了人们每日和每周的情绪节奏,并解释了 COVID-19 大流行期间的情绪起伏,这与 COVID-19 新病例的数量成反比。它能检测到同时影响许多用户情绪状态的大新闻,即使是在时间分辨率很细的情况下,如数小时。此外,我们还发现了某类广告,点击此类广告的用户的情绪有明显的变化趋势。
{"title":"Large-Scale Estimation and Analysis of Web Users' Mood from Web Search Query and Mobile Sensor Data.","authors":"Wataru Sasaki, Satoki Hamanaka, Satoko Miyahara, Kota Tsubouchi, Jin Nakazawa, Tadashi Okoshi","doi":"10.1089/big.2022.0211","DOIUrl":"10.1089/big.2022.0211","url":null,"abstract":"<p><p>The ability to estimate the current mood states of web users has considerable potential for realizing user-centric opportune services in pervasive computing. However, it is difficult to determine the data type used for such estimation and collect the ground truth of such mood states. Therefore, we built a model to estimate the mood states from search-query data in an easy-to-collect and non-invasive manner. Then, we built a model to estimate mood states from mobile sensor data as another estimation model and supplemented its output to the ground-truth label of the model estimated from search queries. This novel two-step model building contributed to boosting the performance of estimating the mood states of web users. Our system was also deployed in the commercial stack, and large-scale data analysis with >11 million users was conducted. We proposed a nationwide mood score, which bundles the mood values of users across the country. It shows the daily and weekly rhythm of people's moods and explains the ups and downs of moods during the COVID-19 pandemic, which is inversely synchronized to the number of new COVID-19 cases. It detects big news that simultaneously affects the mood states of many users, even under fine-grained time resolution, such as the order of hours. In addition, we identified a certain class of advertisements that indicated a clear tendency in the mood of the users who clicked such advertisements.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"191-209"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9565593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational Efficient Approximations of the Concordance Probability in a Big Data Setting. 大数据环境下一致概率的高效计算近似。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-06-07 DOI: 10.1089/big.2022.0107
Robin Van Oirbeek, Jolien Ponnet, Bart Baesens, Tim Verdonck

Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.

建立统计模型后,性能测量是一项重要任务。接收运行特征曲线下面积(AUC)是评估二元分类器质量的最常用指标。在这种情况下,AUC 等于一致性概率,是评估模型判别能力的常用指标。与 AUC 相反,一致性概率也可以扩展到连续响应变量的情况。由于当今数据集的规模惊人,确定这种判别能力需要进行大量昂贵的计算,因此非常耗时,当然是在连续响应变量的情况下。因此,我们提出了两种估算方法,可以快速、准确地计算一致性概率,并同时适用于离散和连续环境。大量的仿真研究表明,这两种估计方法都具有卓越的性能和快速的计算时间。最后,两个真实数据集的实验证实了人工模拟的结论。
{"title":"Computational Efficient Approximations of the Concordance Probability in a Big Data Setting.","authors":"Robin Van Oirbeek, Jolien Ponnet, Bart Baesens, Tim Verdonck","doi":"10.1089/big.2022.0107","DOIUrl":"10.1089/big.2022.0107","url":null,"abstract":"<p><p>Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"243-268"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9592435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small Files Problem Resolution via Hierarchical Clustering Algorithm. 通过分层聚类算法解决小文件问题
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-05-16 DOI: 10.1089/big.2022.0181
Oded Koren, Aviel Shamalov, Nir Perel

The Small Files Problem in Hadoop Distributed File System (HDFS) is an ongoing challenge that has not yet been solved. However, various approaches have been developed to tackle the obstacles this problem creates. Properly managing the size of blocks in a file system is essential as it saves memory and computing time and may reduce bottlenecks. In this article, a new approach using a Hierarchical Clustering Algorithm is suggested for dealing with small files. The proposed method identifies the files by their structure and via a special Dendrogram analysis, and then recommends which files can be merged. As a simulation, the proposed algorithm was applied via 100 CSV files with different structures, containing 2-4 columns with different data types (integer, decimal and text). Also, 20 files that were not CSV files were created to demonstrate that the algorithm only works on CSV files. All data were analyzed via a machine learning hierarchical clustering method, and a Dendrogram was created. According to the merge process that was performed, seven files from the Dendrogram analysis were chosen as appropriate files to be merged. This reduced the memory space in the HDFS. Furthermore, the results showed that using the suggested algorithm led to efficient file management.

Hadoop 分布式文件系统(HDFS)中的小文件问题是一个持续存在的挑战,至今尚未解决。不过,人们已经开发出各种方法来解决这一问题带来的障碍。在文件系统中适当管理块的大小至关重要,因为这样可以节省内存和计算时间,并可减少瓶颈。本文提出了一种使用分层聚类算法处理小文件的新方法。建议的方法通过文件结构和特殊的树枝图分析来识别文件,然后推荐哪些文件可以合并。作为模拟,建议的算法在 100 个不同结构的 CSV 文件中应用,这些文件包含 2-4 列不同的数据类型(整数、小数和文本)。此外,还创建了 20 个非 CSV 文件,以证明该算法仅适用于 CSV 文件。所有数据都通过机器学习分层聚类方法进行了分析,并创建了树枝图。根据所执行的合并程序,从树枝图分析中选择了七个文件作为适当的文件进行合并。这减少了 HDFS 的内存空间。此外,结果表明,使用建议的算法可实现高效的文件管理。
{"title":"Small Files Problem Resolution via Hierarchical Clustering Algorithm.","authors":"Oded Koren, Aviel Shamalov, Nir Perel","doi":"10.1089/big.2022.0181","DOIUrl":"10.1089/big.2022.0181","url":null,"abstract":"<p><p>The Small Files Problem in Hadoop Distributed File System (HDFS) is an ongoing challenge that has not yet been solved. However, various approaches have been developed to tackle the obstacles this problem creates. Properly managing the size of blocks in a file system is essential as it saves memory and computing time and may reduce bottlenecks. In this article, a new approach using a Hierarchical Clustering Algorithm is suggested for dealing with small files. The proposed method identifies the files by their structure and via a special Dendrogram analysis, and then recommends which files can be merged. As a simulation, the proposed algorithm was applied via 100 CSV files with different structures, containing 2-4 columns with different data types (integer, decimal and text). Also, 20 files that were not CSV files were created to demonstrate that the algorithm only works on CSV files. All data were analyzed via a machine learning hierarchical clustering method, and a Dendrogram was created. According to the merge process that was performed, seven files from the Dendrogram analysis were chosen as appropriate files to be merged. This reduced the memory space in the HDFS. Furthermore, the results showed that using the suggested algorithm led to efficient file management.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"229-242"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9830746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Sociodemographic Attributes from Mobile Usage Patterns: Applications and Privacy Implications. 从移动使用模式预测社会人口属性:应用与隐私影响
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-08-14 DOI: 10.1089/big.2022.0182
Rouzbeh Razavi, Guisen Xue, Ikpe Justice Akpan

When users interact with their mobile devices, they leave behind unique digital footprints that can be viewed as predictive proxies that reveal an array of users' characteristics, including their demographics. Predicting users' demographics based on mobile usage can provide significant benefits for service providers and users, including improving customer targeting, service personalization, and market research efforts. This study uses machine learning algorithms and mobile usage data from 235 demographically diverse users to examine the accuracy of predicting their sociodemographic attributes (age, gender, income, and education) from mobile usage metadata, filling the gap in the current literature by quantifying the predictive power of each attribute and discussing the practical applications and privacy implications. According to the results, gender can be most accurately predicted (balanced accuracy = 0.862) from mobile usage footprints, whereas predicting users' education level is more challenging (balanced accuracy = 0.719). Moreover, the classification models were able to classify users based on whether their age or income was above or below a certain threshold with acceptable accuracy. The study also presents the practical applications of inferring demographic attributes from mobile usage data and discusses the implications of the findings, such as privacy and discrimination risks, from the perspectives of different stakeholders.

当用户与他们的移动设备互动时,会留下独特的数字足迹,这些足迹可被视为预测性代理,揭示用户的一系列特征,包括他们的人口统计学特征。根据移动使用情况预测用户的人口统计学特征可为服务提供商和用户带来显著的好处,包括改善客户定位、服务个性化和市场研究工作。本研究利用机器学习算法和来自 235 位不同人口统计学特征用户的移动使用数据,研究了从移动使用元数据预测其社会人口属性(年龄、性别、收入和教育程度)的准确性,通过量化各属性的预测能力并讨论实际应用和隐私影响,填补了现有文献的空白。研究结果表明,从移动使用足迹中预测性别最为准确(平衡准确率 = 0.862),而预测用户的教育水平则更具挑战性(平衡准确率 = 0.719)。此外,分类模型还能根据用户的年龄或收入是否高于或低于某个阈值对其进行分类,准确率在可接受范围内。研究还介绍了从移动使用数据推断人口统计学属性的实际应用,并从不同利益相关者的角度讨论了研究结果的影响,如隐私和歧视风险。
{"title":"Predicting Sociodemographic Attributes from Mobile Usage Patterns: Applications and Privacy Implications.","authors":"Rouzbeh Razavi, Guisen Xue, Ikpe Justice Akpan","doi":"10.1089/big.2022.0182","DOIUrl":"10.1089/big.2022.0182","url":null,"abstract":"<p><p>When users interact with their mobile devices, they leave behind unique digital footprints that can be viewed as predictive proxies that reveal an array of users' characteristics, including their demographics. Predicting users' demographics based on mobile usage can provide significant benefits for service providers and users, including improving customer targeting, service personalization, and market research efforts. This study uses machine learning algorithms and mobile usage data from 235 demographically diverse users to examine the accuracy of predicting their sociodemographic attributes (age, gender, income, and education) from mobile usage metadata, filling the gap in the current literature by quantifying the predictive power of each attribute and discussing the practical applications and privacy implications. According to the results, gender can be most accurately predicted (balanced accuracy = 0.862) from mobile usage footprints, whereas predicting users' education level is more challenging (balanced accuracy = 0.719). Moreover, the classification models were able to classify users based on whether their age or income was above or below a certain threshold with acceptable accuracy. The study also presents the practical applications of inferring demographic attributes from mobile usage data and discusses the implications of the findings, such as privacy and discrimination risks, from the perspectives of different stakeholders.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"213-228"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9997249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Improved Influence Maximization Method for Online Advertising in Social Internet of Things. 社交物联网中网络广告影响力最大化的改进方法。
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2023-08-02 DOI: 10.1089/big.2023.0042
Reza Molaei, Kheirollah Rahsepar Fard, Asgarali Bouyer

Recently, a new subject known as the Social Internet of Things (SIoT) has been presented based on the integration the Internet of Things and social network concepts. SIoT is increasingly popular in modern human living, including applications such as smart transportation, online health care systems, and viral marketing. In advertising based on SIoT, identifying the most effective diffuser nodes to maximize reach is a critical challenge. This article proposes an efficient heuristic algorithm named Influence Maximization of advertisement for Social Internet of Things (IMSoT), inspired by real-world advertising. The IMSoT algorithm consists of two steps: selecting candidate objects and identifying the final seed set. In the first step, influential candidate objects are selected based on factors, such as degree, local importance value, and weak and sensitive neighbors set. In the second step, effective influence is calculated based on overlapping between candidate objects to identify the appropriate final seed set. The IMSoT algorithm ensures maximum influence and minimum overlap, reducing the spreading caused by the seed set. A unique feature of IMSoT is its focus on preventing duplicate advertising, which reduces extra costs, and considering weak objects to reach the maximum target audience. Experimental evaluations in both real-world and synthetic networks demonstrate that our algorithm outperforms other state-of-the-art algorithms in terms of paying attention to weak objects by 38%-193% and in terms of preventing duplicate advertising (reducing extra cost) by 26%-77%. Additionally, the running time of the IMSoT algorithm is shorter than other state-of-the-art algorithms.

最近,一个基于物联网和社交网络概念整合的新课题--社交物联网(SIoT)被提出来。SIoT 在现代人类生活中越来越受欢迎,包括智能交通、在线医疗系统和病毒式营销等应用。在基于 SIoT 的广告中,如何识别最有效的扩散节点以最大限度地扩大覆盖范围是一个严峻的挑战。本文受现实世界广告的启发,提出了一种高效的启发式算法,名为社交物联网广告影响最大化算法(IMSoT)。IMSoT 算法包括两个步骤:选择候选对象和确定最终种子集。第一步,根据度、局部重要性值、弱敏感邻居集等因素选择有影响力的候选对象。在第二步中,根据候选对象之间的重叠计算有效影响,以确定合适的最终种子集。IMSoT 算法可确保影响最大、重叠最小,从而减少种子集造成的传播。IMSoT 的独特之处在于它注重防止重复广告,从而降低了额外成本,并考虑到弱对象,以最大限度地覆盖目标受众。在真实世界和合成网络中进行的实验评估表明,我们的算法在关注弱对象方面比其他一流算法高出 38%-193%,在防止重复广告(降低额外成本)方面比其他一流算法高出 26%-77%。此外,IMSoT 算法的运行时间也短于其他先进算法。
{"title":"An Improved Influence Maximization Method for Online Advertising in Social Internet of Things.","authors":"Reza Molaei, Kheirollah Rahsepar Fard, Asgarali Bouyer","doi":"10.1089/big.2023.0042","DOIUrl":"10.1089/big.2023.0042","url":null,"abstract":"<p><p>Recently, a new subject known as the Social Internet of Things (SIoT) has been presented based on the integration the Internet of Things and social network concepts. SIoT is increasingly popular in modern human living, including applications such as smart transportation, online health care systems, and viral marketing. In advertising based on SIoT, identifying the most effective diffuser nodes to maximize reach is a critical challenge. This article proposes an efficient heuristic algorithm named <i>Influence Maximization of advertisement for Social Internet of Things (IMSoT)</i>, inspired by real-world advertising. The IMSoT algorithm consists of two steps: selecting candidate objects and identifying the final seed set. In the first step, influential candidate objects are selected based on factors, such as degree, local importance value, and weak and sensitive neighbors set. In the second step, effective influence is calculated based on overlapping between candidate objects to identify the appropriate final seed set. The IMSoT algorithm ensures maximum influence and minimum overlap, reducing the spreading caused by the seed set. A unique feature of IMSoT is its focus on preventing duplicate advertising, which reduces extra costs, and considering weak objects to reach the maximum target audience. Experimental evaluations in both real-world and synthetic networks demonstrate that our algorithm outperforms other state-of-the-art algorithms in terms of paying attention to weak objects by 38%-193% and in terms of preventing duplicate advertising (reducing extra cost) by 26%-77%. Additionally, the running time of the IMSoT algorithm is shorter than other state-of-the-art algorithms.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"173-190"},"PeriodicalIF":2.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9922927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acknowledgment of Reviewers 2023. 鸣谢 2023 年审稿人。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-19 DOI: 10.1089/big.2023.29063.ack
{"title":"Acknowledgment of Reviewers 2023.","authors":"","doi":"10.1089/big.2023.29063.ack","DOIUrl":"https://doi.org/10.1089/big.2023.29063.ack","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138809290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Biomedical Document Protection Framework to Ensure Privacy Through Blockchain. 通过区块链确保隐私的生物医学文件安全保护框架。
IF 4.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 Epub Date: 2023-05-23 DOI: 10.1089/big.2022.0170
Ramkumar Jayaraman, Mohammed Alshehri, Manoj Kumar, Ahed Abugabah, Surender Singh Samant, Ahmed A Mohamed

In the recent health care era, biomedical documents play a crucial role, and they contain much evidence-based documentation associated with many stakeholders data. Protecting those confidential research documents is more difficult and effective, and a significant process in the medical-based research domain. Those bio-documentation related to health care and other relevant community-valued data are suggested by medical professionals and processed. Many traditional security mechanisms such as akteonline and Health Insurance Portability and Accountability Act (HIPAA) are used to protect the biomedical documents as they consider the problem of non-repudiation and data integrity related to the retrieval and storage of documents. Thus, there is a need for a comprehensive framework that improves protection in terms of cost and response time related to biomedical documents. In this research work, blockchain-based biomedical document protection framework (BBDPF) is proposed, which includes blockchain-based biomedical data protection (BBDP) and blockchain-based biomedical data retrieval (BBDR) algorithms. BBDP and BBDR algorithms provide consistency on the data to prevent data modification and interception of confidential data with proper data validation. Both the algorithms have strong cryptographic mechanisms to withstand post-quantum security risks, ensuring the integrity of biomedical document retrieval and non-deny of data retrieval transactions. In the performance analysis, Ethereum blockchain infrastructure is deployed BBDPF and smart contracts using Solidity language. In the performance analysis, request time and searching time are determined based on the number of request to ensure data integrity, non-repudiation, and smart contracts for the proposed hybrid model as it gets increased gradually. A modified prototype is built with a web-based interface to prove the concept and evaluate the proposed framework. The experimental results revealed that the proposed framework renders data integrity, non-repudiation, and support for smart contracts with Query Notary Service, MedRec, MedShare, and Medlock.

在最近的医疗保健时代,生物医学文件发挥着至关重要的作用,其中包含许多与利益相关者数据相关的循证文件。保护这些机密研究文件更加困难和有效,也是医学研究领域的一个重要过程。这些与医疗保健有关的生物文档和其他相关的社区价值数据都是由医疗专业人员建议和处理的。许多传统的安全机制,如akteonline 和《健康保险可携性和责任法案》(HIPAA),都被用来保护生物医学文档,因为它们考虑到了与文档检索和存储相关的不可抵赖性和数据完整性问题。因此,有必要建立一个综合框架,从成本和响应时间方面改善对生物医学文件的保护。在这项研究工作中,提出了基于区块链的生物医学文档保护框架(BBDPF),其中包括基于区块链的生物医学数据保护(BBDP)和基于区块链的生物医学数据检索(BBDR)算法。BBDP 和 BBDR 算法提供数据一致性,通过适当的数据验证防止数据被修改和机密数据被截取。这两种算法都具有强大的加密机制,能够抵御量子化后的安全风险,确保生物医学文献检索的完整性和数据检索交易的非否认性。在性能分析中,以太坊区块链基础设施部署了 BBDPF 和使用 Solidity 语言的智能合约。在性能分析中,根据请求数量确定请求时间和搜索时间,以确保数据完整性、不可抵赖性和智能合约的逐步增加。为了验证概念和评估所提出的框架,我们建立了一个基于网络界面的修改原型。实验结果表明,建议的框架提供了数据完整性、不可否认性,并支持与 Query Notary Service、MedRec、MedShare 和 Medlock 的智能合约。
{"title":"Secure Biomedical Document Protection Framework to Ensure Privacy Through Blockchain.","authors":"Ramkumar Jayaraman, Mohammed Alshehri, Manoj Kumar, Ahed Abugabah, Surender Singh Samant, Ahmed A Mohamed","doi":"10.1089/big.2022.0170","DOIUrl":"10.1089/big.2022.0170","url":null,"abstract":"<p><p>In the recent health care era, biomedical documents play a crucial role, and they contain much evidence-based documentation associated with many stakeholders data. Protecting those confidential research documents is more difficult and effective, and a significant process in the medical-based research domain. Those bio-documentation related to health care and other relevant community-valued data are suggested by medical professionals and processed. Many traditional security mechanisms such as akteonline and Health Insurance Portability and Accountability Act (HIPAA) are used to protect the biomedical documents as they consider the problem of non-repudiation and data integrity related to the retrieval and storage of documents. Thus, there is a need for a comprehensive framework that improves protection in terms of cost and response time related to biomedical documents. In this research work, blockchain-based biomedical document protection framework (BBDPF) is proposed, which includes blockchain-based biomedical data protection (BBDP) and blockchain-based biomedical data retrieval (BBDR) algorithms. BBDP and BBDR algorithms provide consistency on the data to prevent data modification and interception of confidential data with proper data validation. Both the algorithms have strong cryptographic mechanisms to withstand post-quantum security risks, ensuring the integrity of biomedical document retrieval and non-deny of data retrieval transactions. In the performance analysis, Ethereum blockchain infrastructure is deployed BBDPF and smart contracts using Solidity language. In the performance analysis, request time and searching time are determined based on the number of request to ensure data integrity, non-repudiation, and smart contracts for the proposed hybrid model as it gets increased gradually. A modified prototype is built with a web-based interface to prove the concept and evaluate the proposed framework. The experimental results revealed that the proposed framework renders data integrity, non-repudiation, and support for smart contracts with Query Notary Service, MedRec, MedShare, and Medlock.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"437-451"},"PeriodicalIF":4.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9563040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1