Data perturbation is a technique for generating synthetic data by adding "noise" to raw data, which has an array of applications in science and engineering, primarily in data security and privacy. One challenge for data perturbation is that it usually produces synthetic data resulting in information loss at the expense of privacy protection. The information loss, in turn, renders the accuracy loss for any statistical or machine learning method based on the synthetic data, weakening downstream analysis and deteriorating in machine learning. In this article, we introduce and advocate a fundamental principle of data perturbation, which requires the preservation of the distribution of raw data. To achieve this, we propose a new scheme, named data flush, which ascertains the validity of the downstream analysis and maintains the predictive accuracy of a learning task. It perturbs data nonlinearly while accommodating the requirement of strict privacy protection, for instance, differential privacy. We highlight multiple facets of data flush through examples.
Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case-which can be an individual, clinic, or community-ideally with replications of effects within and/or between cases. These designs are flexible and cost-effective and can be used for treatment development, translational research, personalized interventions, and the study of rare diseases and disorders. This article provides a broad overview of the family of single-case experimental designs with corresponding examples, including reversal designs, multiple baseline designs, combined multiple baseline/reversal designs, and integration of single-case designs to identify optimal treatments for individuals into larger randomized controlled trials (RCTs). Personalized N-of-1 trials can be considered a subcategory of SCEDs that overlaps with reversal designs. Relevant issues for each type of design-including comparisons of treatments, design issues such as randomization and blinding, standards for designs, and statistical approaches to complement visual inspection of single-case experimental designs-are also discussed.
Treatment of patients who suffer from concurrent health conditions is not well served by (1) evidence-based clinical guidelines that mainly specify treatment of single conditions and (2) conventional randomized controlled trials (RCTs) that identify treatments as safe and effective on average. Clinical decision-making based on the average patient effect may be inappropriate for treatment of those with multimorbidity who experience burdens and obstacles that may be unique to their personal situation. We describe how the personalized (N-of-1) trials can be integrated with an automatic platform and virtual/remote technologies to improve patient-centered care for those living with multimorbidity. To illustrate, we present a hypothetical clinical scenario-survivors of both coronavirus disease 2019 (COVID-19) and cancer who chronically suffer from sleeplessness and fatigue. Then, we will describe how the four standard phases of conventional RCT development can be modified for personalized trials and applied to the multimorbidity clinical scenario, outline how personalized trials can be adapted and extended to compare the benefits of personalized trials versus between-subject trial design, and explain how personalized trials can address special problems associated with multimorbidity for which conventional trials are poorly suited.
The broad sharing of research data is widely viewed as critical for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data and the frequency of data reuse remain stubbornly low. We argue here that a significant reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project when preparing publications-by which time essential details are no longer accessible. Thus, we propose an approach to research informatics in which FAIR principles are applied continuously, from the inception of a research project and ubiquitously, to every data asset produced by experiment or computation. We suggest that this seemingly challenging task can be made feasible by the adoption of simple tools, such as lightweight identifiers (to ensure that every data asset is findable), packaging methods (to facilitate understanding of data contents), data access methods, and metadata organization and structuring tools (to support schema development and evolution). We use an example from experimental neuroscience to illustrate how these methods can work in practice.