Harvard data science review最新文献

英文中文

How Can Data Science Revolutionize Humanitarian Crises? 数据科学如何改变人道主义危机？

Harvard data science review

Pub Date : 2022-01-27 DOI: 10.1162/99608f92.57898732

L. Vittert, Rita Ko

引用次数: 0

Data Science and Energy: Some Lessons from Europe on Higher Education Course Design and Delivery 数据科学与能源:欧洲高等教育课程设计与教学的经验教训

Harvard data science review

Pub Date : 2022-01-27 DOI: 10.1162/99608f92.fd504fc4

H. Kazmi, Ingrid Munné-Collado, Khaoula Tidriri, L. Nordström, F. Gielen, J. Driesen

Data science is seen as a key enabler for technologies that help decarbonize global energy use. However, the energy sector continues to struggle to attract and train enough data scientists. The primary reason for this is the lack of emphasis on data science in most graduate programs in energy engineering, and the high barriers of entry for data scientists from other sectors. In this article, we present a snapshot of the data science–related curriculum being taught in graduate energy programs in four different European universities as well as include feedback we received from students and alumni of these programs. While knowledge of data science remains low across the board, students in these programs already recognize data science as an important element of their future professional careers. We also present findings from running three separate iterations of an energy data science course we developed in light of this feedback—one of these iterations was offered only in KU Leuven (Belgium), while the other two were accessible to students at all four universities. In the article, we also discuss challenges and opportunities arising from designing and delivering courses in a cross-university context. This foundational course and others like it are seen as a necessary means to enable students to take more specialized courses in data science, and eventually contribute toward realizing a sustainable energy transition and meeting climate change mitigation objectives.

数据科学被视为帮助全球能源使用脱碳技术的关键推动者。然而，能源行业仍然难以吸引和培养足够的数据科学家。造成这种情况的主要原因是大多数能源工程研究生课程缺乏对数据科学的重视，以及来自其他领域的数据科学家的高进入门槛。在这篇文章中，我们简要介绍了欧洲四所不同大学的研究生能源项目中正在教授的数据科学相关课程，以及我们从这些项目的学生和校友那里收到的反馈。虽然数据科学的知识仍然很低，但这些项目的学生已经认识到数据科学是他们未来职业生涯的重要组成部分。我们还介绍了我们根据这一反馈开发的能源数据科学课程的三个独立迭代的结果——其中一个迭代仅在比利时鲁汶大学提供，而其他两个迭代则对所有四所大学的学生开放。在本文中，我们还讨论了在跨大学环境中设计和交付课程所带来的挑战和机遇。这门基础课程和其他类似课程被视为使学生能够学习更多数据科学专业课程的必要手段，并最终有助于实现可持续能源转型和实现减缓气候变化的目标。

{"title":"Data Science and Energy: Some Lessons from Europe on Higher Education Course Design and Delivery","authors":"H. Kazmi, Ingrid Munné-Collado, Khaoula Tidriri, L. Nordström, F. Gielen, J. Driesen","doi":"10.1162/99608f92.fd504fc4","DOIUrl":"https://doi.org/10.1162/99608f92.fd504fc4","url":null,"abstract":"Data science is seen as a key enabler for technologies that help decarbonize global energy use. However, the energy sector continues to struggle to attract and train enough data scientists. The primary reason for this is the lack of emphasis on data science in most graduate programs in energy engineering, and the high barriers of entry for data scientists from other sectors. In this article, we present a snapshot of the data science–related curriculum being taught in graduate energy programs in four different European universities as well as include feedback we received from students and alumni of these programs. While knowledge of data science remains low across the board, students in these programs already recognize data science as an important element of their future professional careers. We also present findings from running three separate iterations of an energy data science course we developed in light of this feedback—one of these iterations was offered only in KU Leuven (Belgium), while the other two were accessible to students at all four universities. In the article, we also discuss challenges and opportunities arising from designing and delivering courses in a cross-university context. This foundational course and others like it are seen as a necessary means to enable students to take more specialized courses in data science, and eventually contribute toward realizing a sustainable energy transition and meeting climate change mitigation objectives.","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42542270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

When Lions Write: An American Immigrant Story 当狮子写作:一个美国移民的故事

Harvard data science review

Pub Date : 2022-01-27 DOI: 10.1162/99608f92.08b08a97

T. Olubunmi

引用次数: 0

When Quantum Computation Meets Data Science: Making Data Science Quantum 当量子计算遇到数据科学:使数据科学量子化

Harvard data science review

Pub Date : 2022-01-27 DOI: 10.1162/99608f92.ef5d8928

Yazhen Wang

Quantum computation and quantum information have attracted considerable attention on multiple frontiers of scientific fields ranging from physics to chemistry and engineering, as well as from computer science to mathematics and statistics. Data science combines statistical methods, computational algorithms, and domain science information to extract knowledge and insights from big data, and to solve complex real world problems. While it is well-known that quantum computation has the potential to revolutionize data science, much less has been said about the potential of data science to advance quantum computation. Yet because the stochasticity of quantum physics renders quantum computation random, data science can play an important role in the development of quantum computation and quantum information. This article gives an overview of quantum computation and promotes interplay between quantum science and data science. Overall, it advocates for the development of quantum data science for advancing quantum computation and quantum information.

量子计算和量子信息在从物理学到化学和工程，以及从计算机科学到数学和统计学的多个科学领域引起了相当大的关注。数据科学结合了统计方法、计算算法和领域科学信息，从大数据中提取知识和见解，并解决复杂的现实世界问题。虽然众所周知，量子计算有可能彻底改变数据科学，但人们对数据科学推进量子计算的潜力却知之甚少。然而，由于量子物理学的随机性使量子计算具有随机性，数据科学可以在量子计算和量子信息的发展中发挥重要作用。本文概述了量子计算，并促进了量子科学和数据科学之间的相互作用。总体而言，它主张发展量子数据科学，以推进量子计算和量子信息。

引用次数: 10

Data Science on a Future Quantum Internet 未来量子互联网上的数据科学

Harvard data science review

Pub Date : 2022-01-27 DOI: 10.1162/99608f92.32fa682f

Nana Liu

引用次数: 0

The Development of Quantum Machine Learning 量子机器学习的发展

Harvard data science review

Pub Date : 2022-01-27 DOI: 10.1162/99608f92.5a9fd72c

K. Najafi, S. Yelin, Xun Gao

引用次数: 4

Data Flush. 数据刷新。

Harvard data science review

Pub Date : 2022-01-01 DOI: 10.1162/99608f92.681fe3bd

Xiaotong Shen, Xuan Bi, Rex Shen

Data perturbation is a technique for generating synthetic data by adding "noise" to raw data, which has an array of applications in science and engineering, primarily in data security and privacy. One challenge for data perturbation is that it usually produces synthetic data resulting in information loss at the expense of privacy protection. The information loss, in turn, renders the accuracy loss for any statistical or machine learning method based on the synthetic data, weakening downstream analysis and deteriorating in machine learning. In this article, we introduce and advocate a fundamental principle of data perturbation, which requires the preservation of the distribution of raw data. To achieve this, we propose a new scheme, named data flush, which ascertains the validity of the downstream analysis and maintains the predictive accuracy of a learning task. It perturbs data nonlinearly while accommodating the requirement of strict privacy protection, for instance, differential privacy. We highlight multiple facets of data flush through examples.

数据扰动是一种通过在原始数据中添加“噪声”来生成合成数据的技术，在科学和工程中有一系列应用，主要是在数据安全和隐私方面。数据扰动的一个挑战是，它通常会产生合成数据，从而以牺牲隐私保护为代价导致信息丢失。而信息的丢失，又会导致任何基于合成数据的统计或机器学习方法的准确性下降，削弱下游分析能力，机器学习能力下降。在本文中，我们介绍并提倡数据摄动的基本原理，它要求保留原始数据的分布。为了实现这一目标，我们提出了一种名为数据刷新的新方案，该方案确定了下游分析的有效性，并保持了学习任务的预测准确性。它在满足严格的隐私保护要求(如差分隐私)的同时，对数据进行非线性扰动。我们通过示例强调数据刷新的多个方面。

引用次数: 2

The Family of Single-Case Experimental Designs. 单例实验设计系列。

Harvard data science review

Pub Date : 2022-01-01 Epub Date: 2022-09-08 DOI: 10.1162/99608f92.ff9300a8

Leonard H Epstein, Jesse Dallery

Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case-which can be an individual, clinic, or community-ideally with replications of effects within and/or between cases. These designs are flexible and cost-effective and can be used for treatment development, translational research, personalized interventions, and the study of rare diseases and disorders. This article provides a broad overview of the family of single-case experimental designs with corresponding examples, including reversal designs, multiple baseline designs, combined multiple baseline/reversal designs, and integration of single-case designs to identify optimal treatments for individuals into larger randomized controlled trials (RCTs). Personalized N-of-1 trials can be considered a subcategory of SCEDs that overlaps with reversal designs. Relevant issues for each type of design-including comparisons of treatments, design issues such as randomization and blinding, standards for designs, and statistical approaches to complement visual inspection of single-case experimental designs-are also discussed.

单病例实验设计（SCEDs）是一种使用实验方法研究治疗对结果影响的研究设计。分析的基本单位是单个病例--可以是个人、诊所或社区--最好在病例内和/或病例间进行效果复制。这些设计具有灵活性和成本效益，可用于治疗开发、转化研究、个性化干预以及罕见疾病和失调的研究。本文概述了单病例实验设计系列，并列举了相应的实例，包括逆转设计、多基线设计、多基线/逆转组合设计，以及将单病例设计整合到更大规模的随机对照试验（RCT）中以确定个体的最佳治疗方法。个性化 N-of-1 试验可视为 SCED 的一个子类别，与逆转设计重叠。此外，还讨论了每种设计的相关问题，包括治疗方法的比较、随机化和盲法等设计问题、设计的标准以及补充单病例实验设计目测的统计方法。

{"title":"The Family of Single-Case Experimental Designs.","authors":"Leonard H Epstein, Jesse Dallery","doi":"10.1162/99608f92.ff9300a8","DOIUrl":"10.1162/99608f92.ff9300a8","url":null,"abstract":"Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case-which can be an individual, clinic, or community-ideally with replications of effects within and/or between cases. These designs are flexible and cost-effective and can be used for treatment development, translational research, personalized interventions, and the study of rare diseases and disorders. This article provides a broad overview of the family of single-case experimental designs with corresponding examples, including reversal designs, multiple baseline designs, combined multiple baseline/reversal designs, and integration of single-case designs to identify optimal treatments for individuals into larger randomized controlled trials (RCTs). Personalized N-of-1 trials can be considered a subcategory of SCEDs that overlaps with reversal designs. Relevant issues for each type of design-including comparisons of treatments, design issues such as randomization and blinding, standards for designs, and statistical approaches to complement visual inspection of single-case experimental designs-are also discussed.","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":"4 SI3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10016625/pdf/nihms-1842588.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10536718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalized (N-of-1) Trials for Patient-Centered Treatments of Multimorbidity. 以患者为中心的多病治疗的个性化(N-of-1)试验

Harvard data science review

Pub Date : 2022-01-01 Epub Date: 2022-09-08 DOI: 10.1162/99608f92.d99e6ff5

Jerry M Suls, Catherine Alfano, Christina Yap

Treatment of patients who suffer from concurrent health conditions is not well served by (1) evidence-based clinical guidelines that mainly specify treatment of single conditions and (2) conventional randomized controlled trials (RCTs) that identify treatments as safe and effective on average. Clinical decision-making based on the average patient effect may be inappropriate for treatment of those with multimorbidity who experience burdens and obstacles that may be unique to their personal situation. We describe how the personalized (N-of-1) trials can be integrated with an automatic platform and virtual/remote technologies to improve patient-centered care for those living with multimorbidity. To illustrate, we present a hypothetical clinical scenario-survivors of both coronavirus disease 2019 (COVID-19) and cancer who chronically suffer from sleeplessness and fatigue. Then, we will describe how the four standard phases of conventional RCT development can be modified for personalized trials and applied to the multimorbidity clinical scenario, outline how personalized trials can be adapted and extended to compare the benefits of personalized trials versus between-subject trial design, and explain how personalized trials can address special problems associated with multimorbidity for which conventional trials are poorly suited.

(1)循证临床指南主要规定单一疾病的治疗方法，(2)常规随机对照试验(rct)确定治疗方法的平均安全性和有效性，不能很好地服务于患有并发健康状况的患者。基于平均患者效果的临床决策可能不适合治疗那些经历负担和障碍的多病患者，这些负担和障碍可能是他们个人情况所特有的。我们描述了个性化(N-of-1)试验如何与自动平台和虚拟/远程技术相结合，以改善患有多种疾病的患者的以患者为中心的护理。为了说明这一点，我们提出了一个假设的临床场景——2019年冠状病毒病(COVID-19)和癌症的幸存者长期患有失眠和疲劳。然后，我们将描述如何修改常规RCT开发的四个标准阶段以进行个性化试验并将其应用于多病临床场景，概述如何调整和扩展个性化试验以比较个性化试验与受试者间试验设计的好处，并解释个性化试验如何解决与传统试验不适合的多病相关的特殊问题。

{"title":"Personalized (N-of-1) Trials for Patient-Centered Treatments of Multimorbidity.","authors":"Jerry M Suls, Catherine Alfano, Christina Yap","doi":"10.1162/99608f92.d99e6ff5","DOIUrl":"10.1162/99608f92.d99e6ff5","url":null,"abstract":"Treatment of patients who suffer from concurrent health conditions is not well served by (1) evidence-based clinical guidelines that mainly specify treatment of single conditions and (2) conventional randomized controlled trials (RCTs) that identify treatments as safe and effective on average. Clinical decision-making based on the average patient effect may be inappropriate for treatment of those with multimorbidity who experience burdens and obstacles that may be unique to their personal situation. We describe how the personalized (N-of-1) trials can be integrated with an automatic platform and virtual/remote technologies to improve patient-centered care for those living with multimorbidity. To illustrate, we present a hypothetical clinical scenario-survivors of both coronavirus disease 2019 (COVID-19) and cancer who chronically suffer from sleeplessness and fatigue. Then, we will describe how the four standard phases of conventional RCT development can be modified for personalized trials and applied to the multimorbidity clinical scenario, outline how personalized trials can be adapted and extended to compare the benefits of personalized trials versus between-subject trial design, and explain how personalized trials can address special problems associated with multimorbidity for which conventional trials are poorly suited.","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10673634/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47706300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sharing Begins at Home: How Continuous and Ubiquitous FAIRness Can Enhance Research Productivity and Data Reuse. 共享始于家庭：持续和无处不在的 FAIRness 如何提高研究效率和数据再利用。

Harvard data science review

Pub Date : 2022-01-01 Epub Date: 2022-07-28 DOI: 10.1162/99608f92.44d21b86

William Dempsey, Ian Foster, Scott Fraser, Carl Kesselman

The broad sharing of research data is widely viewed as critical for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data and the frequency of data reuse remain stubbornly low. We argue here that a significant reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project when preparing publications-by which time essential details are no longer accessible. Thus, we propose an approach to research informatics in which FAIR principles are applied continuously, from the inception of a research project and ubiquitously, to every data asset produced by experiment or computation. We suggest that this seemingly challenging task can be made feasible by the adoption of simple tools, such as lightweight identifiers (to ensure that every data asset is findable), packaging methods (to facilitate understanding of data contents), data access methods, and metadata organization and structuring tools (to support schema development and evolution). We use an example from experimental neuroscience to illustrate how these methods can work in practice.

人们普遍认为，广泛共享研究数据对于提高科学研究的速度、质量、可获取性和完整性至关重要。尽管鼓励数据共享的力度不断加大，但共享数据的质量和数据再利用的频率仍然很低。我们在此认为，造成这种不幸状况的一个重要原因是，以可查找、可访问、可互操作和可重复使用（FAIR）的形式组织研究成果以满足重复使用的要求，往往被推迟到研究项目结束、准备出版物时才进行。因此，我们提出了一种研究信息学方法，在这种方法中，FAIR 原则从研究项目一开始就被持续应用于实验或计算所产生的每项数据资产。我们认为，这项看似具有挑战性的任务可以通过采用简单的工具来实现，例如轻量级标识符（确保每项数据资产都能被找到）、打包方法（便于理解数据内容）、数据访问方法以及元数据组织和结构化工具（支持模式开发和演化）。我们以实验神经科学为例，说明这些方法如何在实践中发挥作用。

{"title":"Sharing Begins at Home: How Continuous and Ubiquitous FAIRness Can Enhance Research Productivity and Data Reuse.","authors":"William Dempsey, Ian Foster, Scott Fraser, Carl Kesselman","doi":"10.1162/99608f92.44d21b86","DOIUrl":"10.1162/99608f92.44d21b86","url":null,"abstract":"The broad sharing of research data is widely viewed as critical for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data and the frequency of data reuse remain stubbornly low. We argue here that a significant reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project when preparing publications-by which time essential details are no longer accessible. Thus, we propose an approach to research informatics in which FAIR principles are applied continuously, from the inception of a research project and ubiquitously, to every data asset produced by experiment or computation. We suggest that this seemingly challenging task can be made feasible by the adoption of simple tools, such as lightweight identifiers (to ensure that every data asset is findable), packaging methods (to facilitate understanding of data contents), data access methods, and metadata organization and structuring tools (to support schema development and evolution). We use an example from experimental neuroscience to illustrate how these methods can work in practice.","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9410569/pdf/nihms-1829357.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33444431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Harvard data science review

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀