Big Data最新文献

英文中文

Sharing Medical Big Data While Preserving Patient Confidentiality in Innovative Medicines Initiative: A Summary and Case Report from BigData@Heart. 在创新药物倡议中共享医疗大数据同时保护患者机密：来自BigData@Heart.

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-12-01 Epub Date: 2023-10-27 DOI: 10.1089/big.2022.0178

Megan Schröder, Sam H A Muller, Eleni Vradi, Johanna Mielke, Yvonne M F Lim, Fabrice Couvelard, Menno Mostert, Stefan Koudstaal, Marinus J C Eijkemans, Christoph Gerlinger

Sharing individual patient data (IPD) is a simple concept but complex to achieve due to data privacy and data security concerns, underdeveloped guidelines, and legal barriers. Sharing IPD is additionally difficult in big data-driven collaborations such as Bigdata@Heart in the Innovative Medicines Initiative, due to competing interests between diverse consortium members. One project within BigData@Heart, case study 1, needed to pool data from seven heterogeneous data sets: five randomized controlled trials from three different industry partners, and two disease registries. Sharing IPD was not considered feasible due to legal requirements and the sensitive medical nature of these data. In addition, harmonizing the data sets for a federated data analysis was difficult due to capacity constraints and the heterogeneity of the data sets. An alternative option was to share summary statistics through contingency tables. Here it is demonstrated that this method along with anonymization methods to ensure patient anonymity had minimal loss of information. Although sharing IPD should continue to be encouraged and strived for, our approach achieved a good balance between data transparency while protecting patient privacy. It also allowed a successful collaboration between industry and academia.

共享个人患者数据（IPD）是一个简单的概念，但由于数据隐私和数据安全问题、指导方针不完善以及法律障碍，实现起来很复杂。在诸如Bigdata@Heart在创新药物倡议中，由于不同联盟成员之间的利益竞争。一个项目BigData@Heart，案例研究1，需要汇集来自七个异质数据集的数据：来自三个不同行业合作伙伴的五项随机对照试验，以及两个疾病登记处。由于法律要求和这些数据的敏感医学性质，共享IPD被认为是不可行的。此外，由于容量限制和数据集的异质性，统一联邦数据分析的数据集很困难。另一种选择是通过列联表共享汇总统计数据。这里证明了这种方法以及确保患者匿名性的匿名化方法具有最小的信息损失。尽管应该继续鼓励和努力共享IPD，但我们的方法在数据透明度和保护患者隐私之间取得了良好的平衡。它还促成了工业界和学术界之间的成功合作。

{"title":"Sharing Medical Big Data While Preserving Patient Confidentiality in Innovative Medicines Initiative: A Summary and Case Report from BigData@Heart.","authors":"Megan Schröder, Sam H A Muller, Eleni Vradi, Johanna Mielke, Yvonne M F Lim, Fabrice Couvelard, Menno Mostert, Stefan Koudstaal, Marinus J C Eijkemans, Christoph Gerlinger","doi":"10.1089/big.2022.0178","DOIUrl":"10.1089/big.2022.0178","url":null,"abstract":"Sharing individual patient data (IPD) is a simple concept but complex to achieve due to data privacy and data security concerns, underdeveloped guidelines, and legal barriers. Sharing IPD is additionally difficult in big data-driven collaborations such as Bigdata@Heart in the Innovative Medicines Initiative, due to competing interests between diverse consortium members. One project within BigData@Heart, case study 1, needed to pool data from seven heterogeneous data sets: five randomized controlled trials from three different industry partners, and two disease registries. Sharing IPD was not considered feasible due to legal requirements and the sensitive medical nature of these data. In addition, harmonizing the data sets for a federated data analysis was difficult due to capacity constraints and the heterogeneity of the data sets. An alternative option was to share summary statistics through contingency tables. Here it is demonstrated that this method along with anonymization methods to ensure patient anonymity had minimal loss of information. Although sharing IPD should continue to be encouraged and strived for, our approach achieved a good balance between data transparency while protecting patient privacy. It also allowed a successful collaboration between industry and academia.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"399-407"},"PeriodicalIF":4.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10733752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"61566098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The incidence and prevalence of coeliac disease in the United Kingdom 英国乳糜泻的发病率和流行率

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.5051

Yvonne Nartey, C. Crooks, Joe West, Timothy R. Card, Laila J. Tata

引用次数: 0

Machine Learning Analysis of Serious Illness Conversations Predicts Patient Reports of Feeling Heard & Understood 重症患者对话的机器学习分析可预测患者关于被倾听和理解的报告

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.5279

Bob Gramling, Donna Rizzo, Margaret Eppstein, Bradford Demarest

引用次数: 0

Changes in Reasons for Visits to Primary Care as a Result of the COVID-19 Pandemic: by INTRePID COVID-19 大流行导致初级保健就诊原因的变化：按 INTRePID 分类

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.5425

Karen Tu, M. Lapadula

引用次数: 0

Breast cancer screening during the COVID-19 Pandemic in the United States: Results from real-world health records data 美国 COVID-19 大流行期间的乳腺癌筛查：来自真实世界健康记录数据的结果

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.4885

William Curry, Wen-Jan Tuan, Qiushi Chen, Andrew Chung

引用次数: 0

A Novel Method for Utilizing Electronic Health Record Data in Condition-specific Research 在特定病症研究中利用电子健康记录数据的新方法

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.4955

Tarin Clay, Melissa Filippi, Elise Robertson, Cory B. Lutgen, Elisabeth F. Callen

引用次数: 0

Harmonized Healthcare Database across Family Medicine Institutions 全科医疗机构的统一医疗保健数据库

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.5404

Chance R. Strenth, David Schneider, U. Sambamoorthi, Sravan Mattevada, Kimberly Fulda, Bhaskar Thakur, Anna Espinoza

引用次数: 0

Identifying the Factors Associated with the Accumulation of Diabetes Complications to Inform a Prediction Tool 确定糖尿病并发症累积的相关因素，为预测工具提供依据

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-11-01 DOI: 10.1370/afm.22.s1.5071

Winston R. Liaw, Ben King, Omolola E. Adepoju, Jiangtao Luo, Ioannis Kakadiaris, Todd Prewitt, Jessica Dobbins, Pete Womack

引用次数: 0

Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System. 大数据保密：使用基于规则的系统实现企业合规的方法。

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-10-31 DOI: 10.1089/big.2022.0201

Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson

Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.

组织一直在投资于依赖内部和外部数据的分析，以获得竞争优势。然而，国家和国际上实施的法律和监管法案已成为一项挑战，尤其是对卫生或金融/银行等高度监管的部门而言。脸书（Facebook）和亚马逊（Amazon）等数据处理公司已经因违反数据治理规定而被处以巨额罚款，或正在接受调查。大数据时代通过将Volume、Velocity和Variety等维度引入保密性，进一步加剧了将数据丢失风险降至最低的挑战。尽管Volume和Velocity已经得到了广泛的研究，但Variety这个大数据的“丑小鸭”却经常被忽视和难以解决，从而增加了数据暴露和数据丢失的风险。在本文中，为了降低数据暴露和数据丢失的风险，提出了一个框架，利用算法分类和工作流功能，为跨组织的数据评估提供一致的方法。一个基于规则的系统，实施公司数据分类政策，将通过方便用户识别批准的指导方针并迅速执行，将暴露风险降至最低。该框架包括一个例外处理程序，对情有可原的情况给予适当批准。该系统是在概念验证工作原型中实现的，以展示其能力并提供动手体验。安全和数据管理领域的学者和高级企业高管对该信息系统进行了评估和认可。观众平均经历了~25年，积累了近三个世纪（294年）的总经历。结果证实，3V令人担忧，而拥有90%评论员的《综艺》是最令人担忧的。除此之外，平均水平约为60%，证实了适当的分类政策、程序和先决条件已经到位，而实施工具却滞后。

{"title":"Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System.","authors":"Georgios Vranopoulos, Nathan Clarke, Shirley Atkinson","doi":"10.1089/big.2022.0201","DOIUrl":"https://doi.org/10.1089/big.2022.0201","url":null,"abstract":"Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, \"the ugly duckling\" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Consumer Segmentation Based on Location and Timing Dimensions Using Big Data from Business-to-Customer Retailing Marketplaces. 利用从企业到客户零售市场的大数据，基于位置和时间维度的消费者细分。

IF 4.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Big Data

Pub Date : 2023-10-30 DOI: 10.1089/big.2022.0307

Fatemeh Ehsani, Monireh Hosseini

Consumer segmentation is an electronic marketing practice that involves dividing consumers into groups with similar features to discover their preferences. In the business-to-customer (B2C) retailing industry, marketers explore big data to segment consumers based on various dimensions. However, among these dimensions, the motives of location and time of shopping have received relatively less attention. In this study, we use the recency, frequency, monetary, and tenure (RFMT) method to segment consumers into 10 groups based on their time and geographical features. To explore location, we investigate market distribution, revenue distribution, and consumer distribution. Geographical coordinates and peculiarities are estimated based on consumer density. Regarding time exploration, we evaluate the accuracy of product delivery and the timing of promotions. To pinpoint the target consumers, we display the main hotspots on the distribution heatmap. Furthermore, we identify the optimal time for purchase and the most densely populated locations of beneficial consumers. In addition, we evaluate product distribution to determine the most popular product categories. Based on the RFMT segmentation and product popularity, we have developed a product recommender system to assist marketers in attracting and engaging potential consumers. Through a case study using data from massive B2C retailing, we conclude that the proposed segmentation provides superior insights into consumer behavior and improves product recommendation performance.

消费者细分是一种电子营销实践，包括将消费者分为具有相似特征的群体，以发现他们的偏好。在企业对客户（B2C）零售业中，营销人员探索大数据，根据不同维度对消费者进行细分。然而，在这些维度中，购物地点和时间的动机受到的关注相对较少。在这项研究中，我们使用最近度、频率、货币和保有权（RFMT）方法，根据消费者的时间和地理特征将其分为10组。为了探索地点，我们调查了市场分布、收入分布和消费者分布。地理坐标和特性是根据消费者密度估计的。关于时间探索，我们评估产品交付的准确性和促销时间。为了准确定位目标消费者，我们在分销热图上显示了主要热点。此外，我们确定了有利消费者的最佳购买时间和人口最密集的地点。此外，我们评估产品分布，以确定最受欢迎的产品类别。基于RFMT细分和产品受欢迎程度，我们开发了一个产品推荐系统，以帮助营销人员吸引和吸引潜在消费者。通过使用大规模B2C零售数据的案例研究，我们得出结论，所提出的细分提供了对消费者行为的卓越见解，并提高了产品推荐性能。

{"title":"Consumer Segmentation Based on Location and Timing Dimensions Using Big Data from Business-to-Customer Retailing Marketplaces.","authors":"Fatemeh Ehsani, Monireh Hosseini","doi":"10.1089/big.2022.0307","DOIUrl":"10.1089/big.2022.0307","url":null,"abstract":"Consumer segmentation is an electronic marketing practice that involves dividing consumers into groups with similar features to discover their preferences. In the business-to-customer (B2C) retailing industry, marketers explore big data to segment consumers based on various dimensions. However, among these dimensions, the motives of location and time of shopping have received relatively less attention. In this study, we use the recency, frequency, monetary, and tenure (RFMT) method to segment consumers into 10 groups based on their time and geographical features. To explore location, we investigate market distribution, revenue distribution, and consumer distribution. Geographical coordinates and peculiarities are estimated based on consumer density. Regarding time exploration, we evaluate the accuracy of product delivery and the timing of promotions. To pinpoint the target consumers, we display the main hotspots on the distribution heatmap. Furthermore, we identify the optimal time for purchase and the most densely populated locations of beneficial consumers. In addition, we evaluate product distribution to determine the most popular product categories. Based on the RFMT segmentation and product popularity, we have developed a product recommender system to assist marketers in attracting and engaging potential consumers. Through a case study using data from massive B2C retailing, we conclude that the proposed segmentation provides superior insights into consumer behavior and improves product recommendation performance.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Big Data

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀