In the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 33 state-of-the-art methods, categorizing SVGs into three types: overall, cell-type-specific, and spatial-domain-marker SVGs. Our review explains the intuitions underlying these methods, summarizes their applications, and categorizes the hypothesis tests they use in the trade-off between generality and specificity for SVG detection. We discuss challenges in SVG detection and propose future directions for improvement. Our review offers insights for method developers and users, advocating for category-specific benchmarking.
{"title":"Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data.","authors":"Guanao Yan, Shuo Harper Hua, Jingyi Jessica Li","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 33 state-of-the-art methods, categorizing SVGs into three types: overall, cell-type-specific, and spatial-domain-marker SVGs. Our review explains the intuitions underlying these methods, summarizes their applications, and categorizes the hypothesis tests they use in the trade-off between generality and specificity for SVG detection. We discuss challenges in SVG detection and propose future directions for improvement. Our review offers insights for method developers and users, advocating for category-specific benchmarking.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11160866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141297582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daphna Raz, Varun Joshi, Brian R Umberger, Necmiye Ozay
Humans rely on ankle torque to maintain standing balance, particularly in the presence of small to moderate perturbations. Reductions in maximum torque (MT) production and maximum rate of torque development (MRTD) occur at the ankle with age, diminishing stability. Ankle exoskeletons are powered orthotic devices that may assist older adults by compensating for reduced muscle force and power production capabilities. They may also be able to assist with ankle strategies used for balance. However, no studies have investigated the effect of such devices on balance in older adults. Here, we model the effect ankle exoskeletons have on stability in physics-based models of healthy young and old adults, focusing on the mitigation of age-related deficits such as reduced MT and MRTD. We show that an ankle exoskeleton moderately reduces feasible stability boundaries in users who have full ankle strength. For individuals with age-related deficits, there is a trade-off. While exoskeletons augment stability in low velocity conditions, they reduce stability in some high velocity conditions. Our results suggest that well-established control strategies must still be experimentally validated in older adults.
{"title":"Ankle Exoskeletons May Hinder Standing Balance in Simple Models of Older and Younger Adults.","authors":"Daphna Raz, Varun Joshi, Brian R Umberger, Necmiye Ozay","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Humans rely on ankle torque to maintain standing balance, particularly in the presence of small to moderate perturbations. Reductions in maximum torque (MT) production and maximum rate of torque development (MRTD) occur at the ankle with age, diminishing stability. Ankle exoskeletons are powered orthotic devices that may assist older adults by compensating for reduced muscle force and power production capabilities. They may also be able to assist with ankle strategies used for balance. However, no studies have investigated the effect of such devices on balance in older adults. Here, we model the effect ankle exoskeletons have on stability in physics-based models of healthy young and old adults, focusing on the mitigation of age-related deficits such as reduced MT and MRTD. We show that an ankle exoskeleton moderately reduces feasible stability boundaries in users who have full ankle strength. For individuals with age-related deficits, there is a trade-off. While exoskeletons augment stability in low velocity conditions, they reduce stability in some high velocity conditions. Our results suggest that well-established control strategies must still be experimentally validated in older adults.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11343240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikolas H Claussen, Fridtjof Brauns, Boris I Shraiman
Convergent extension of epithelial tissue is a key motif of animal morphogenesis. On a coarse scale, cell motion resembles laminar fluid flow; yet in contrast to a fluid, epithelial cells adhere to each other and maintain the tissue layer under actively generated internal tension. To resolve this apparent paradox, we formulate a model in which tissue flow in the tension-dominated regime occurs through adiabatic remodeling of force balance in the network of adherens junctions. We propose that the slow dynamics within the manifold of force-balanced configurations is driven by positive feedback on myosin-generated cytoskeletal tension. Shifting force balance within a tension network causes active cell rearrangements (T1 transitions) resulting in net tissue deformation oriented by initial tension anisotropy. Strikingly, we find that the total extent of tissue deformation depends on the initial cellular packing order. T1s degrade this order so that tissue flow is self-limiting. We explain these findings by showing that coordination of T1s depends on coherence in local tension configurations, quantified by a geometric order parameter in tension space. Our model reproduces the salient tissue- and cell-scale features of germ band elongation during Drosophila gastrulation, in particular the slowdown of tissue flow after approximately twofold longation concomitant with a loss of order in tension configurations. This suggests local cell geometry contains morphogenetic information and yields experimentally testable predictions. Defining biologically controlled active tension dynamics on the manifold of force-balanced states may provide a general approach to the description of morphogenetic flow.
{"title":"A Geometric Tension Dynamics Model of Epithelial Convergent Extension.","authors":"Nikolas H Claussen, Fridtjof Brauns, Boris I Shraiman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Convergent extension of epithelial tissue is a key motif of animal morphogenesis. On a coarse scale, cell motion resembles laminar fluid flow; yet in contrast to a fluid, epithelial cells adhere to each other and maintain the tissue layer under actively generated internal tension. To resolve this apparent paradox, we formulate a model in which tissue flow in the tension-dominated regime occurs through adiabatic remodeling of force balance in the network of adherens junctions. We propose that the slow dynamics within the manifold of force-balanced configurations is driven by positive feedback on myosin-generated cytoskeletal tension. Shifting force balance within a tension network causes active cell rearrangements (T1 transitions) resulting in net tissue deformation oriented by initial tension anisotropy. Strikingly, we find that the total extent of tissue deformation depends on the initial cellular packing order. T1s degrade this order so that tissue flow is self-limiting. We explain these findings by showing that coordination of T1s depends on coherence in local tension configurations, quantified by a geometric order parameter in tension space. Our model reproduces the salient tissue- and cell-scale features of germ band elongation during Drosophila gastrulation, in particular the slowdown of tissue flow after approximately twofold longation concomitant with a loss of order in tension configurations. This suggests local cell geometry contains morphogenetic information and yields experimentally testable predictions. Defining biologically controlled active tension dynamics on the manifold of force-balanced states may provide a general approach to the description of morphogenetic flow.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10705598/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138804963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E Carpenter, Meng Jiang, Shantanu Singh
Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream applications: molecular property prediction against up to 27 baseline methods across four datasets, plus zero-shot molecule-morphology matching.
{"title":"Learning Molecular Representation in a Cell.","authors":"Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E Carpenter, Meng Jiang, Shantanu Singh","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream applications: molecular property prediction against up to 27 baseline methods across four datasets, plus zero-shot molecule-morphology matching.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. For successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, large amounts of data are necessary for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model's capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND agree with the distributions obtained from real data. Most importantly, the generated normative data significantly enhance the accuracy of downstream machine learning models on tasks such as disease classification. Data and models are available at: https://huggingface.co/spaces/rongguangw/GenMIND.
{"title":"Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples.","authors":"Sai Spandana Chintapalli, Rongguang Wang, Zhijian Yang, Vasiliki Tassopoulou, Fanyang Yu, Vishnu Bashyam, Guray Erus, Pratik Chaudhari, Haochang Shou, Christos Davatzikos","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. For successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, large amounts of data are necessary for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model's capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND agree with the distributions obtained from real data. Most importantly, the generated normative data significantly enhance the accuracy of downstream machine learning models on tasks such as disease classification. Data and models are available at: https://huggingface.co/spaces/rongguangw/GenMIND.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11275685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Loewinger, Alexander W Levis, Francisco Pereira
Optogenetics is a powerful neuroscience technique for studying how neural circuit manipulation affects behavior. Standard analysis conventions discard information and severely limit the scope of the causal questions that can be probed. To address this gap, we 1) draw connections to the causal inference literature on sequentially randomized experiments, 2) propose a non-parametric framework for analyzing "open-loop" (static regime) optogenetics behavioral experiments, 3) derive extensions of history-restricted marginal structural models for dynamic treatment regimes with positivity violations for "closed-loop" designs, and 4) propose a taxonomy of identifiable causal effects that encompass a far richer collection of scientific questions compared to standard methods. From another view, our work extends "excursion effect" methods, popularized recently in the mobile health literature, to enable estimation of causal contrasts for treatment sequences in the presence of positivity violations. We describe sufficient conditions for identifiability of the proposed causal estimands, and provide asymptotic statistical guarantees for a proposed inverse probability-weighted estimator, a multiply-robust estimator (for two intervention timepoints), a framework for hypothesis testing, and a computationally scalable implementation. Finally, we apply our framework to data from a recent neuroscience study and show how it provides insight into causal effects of optogenetics on behavior that are obscured by standard analyses.
{"title":"Nonparametric causal inference for optogenetics: sequential excursion effects for dynamic regimes.","authors":"Gabriel Loewinger, Alexander W Levis, Francisco Pereira","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Optogenetics is a powerful neuroscience technique for studying how neural circuit manipulation affects behavior. Standard analysis conventions discard information and severely limit the scope of the causal questions that can be probed. To address this gap, we 1) draw connections to the causal inference literature on sequentially randomized experiments, 2) propose a non-parametric framework for analyzing \"open-loop\" (static regime) optogenetics behavioral experiments, 3) derive extensions of history-restricted marginal structural models for dynamic treatment regimes with positivity violations for \"closed-loop\" designs, and 4) propose a taxonomy of identifiable causal effects that encompass a far richer collection of scientific questions compared to standard methods. From another view, our work extends \"excursion effect\" methods, popularized recently in the mobile health literature, to enable estimation of causal contrasts for treatment sequences in the presence of positivity violations. We describe sufficient conditions for identifiability of the proposed causal estimands, and provide asymptotic statistical guarantees for a proposed inverse probability-weighted estimator, a multiply-robust estimator (for two intervention timepoints), a framework for hypothesis testing, and a computationally scalable implementation. Finally, we apply our framework to data from a recent neuroscience study and show how it provides insight into causal effects of optogenetics on behavior that are obscured by standard analyses.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11188134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141433616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Morehead, Nabin Giri, Jian Liu, Jianlin Cheng
The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to unknown structures); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for unknown pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL methods consistently outperform conventional docking algorithms; (2) most recent DL docking methods fail to generalize to multi-ligand protein targets; and (3) training DL methods with physics-informed loss functions on diverse clusters of protein-ligand complexes is a promising direction for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.
{"title":"Deep Learning for Protein-Ligand Docking: Are We There Yet?","authors":"Alex Morehead, Nabin Giri, Jian Liu, Jianlin Cheng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to unknown structures); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for unknown pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL methods consistently outperform conventional docking algorithms; (2) most recent DL docking methods fail to generalize to multi-ligand protein targets; and (3) training DL methods with physics-informed loss functions on diverse clusters of protein-ligand complexes is a promising direction for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11142318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fakrul Islam Tushar, Liesbeth Vancoillie, Cindy McCabe, Amareswararao Kavuri, Lavsen Dahal, Brian Harrawood, Milo Fryling, Mojtaba Zarei, Saman Sotoudeh-Paima, Fong Chi Ho, Dhrubajyoti Ghosh, Sheng Luo, W Paul Segars, Ehsan Abadi, Kyle J Lafata, Ehsan Samei, Joseph Y Lo
Importance: Clinical imaging trials are crucial for definitive evaluation of medical innovations, but the process is inefficient, expensive, and ethically-constrained. Virtual imaging trial (VIT) approach address these limitations by emulating the components of a clinical trial. An in silico rendition of the National Lung Screening Trial (NCLS) via Virtual Lung Screening Trial (VLST) demonstrates the promise of VITs to expedite clinical trials, reduce risks to subjects, and facilitate the optimal use of imaging technologies in clinical settings.
Design, setting, and participants: A diverse virtual patient population of 294 subjects was created from human models (XCAT) emulating the characteristics of cases on NLST, with two types of simulated lung nodules. The cohort was assessed using simulated CT and CXR systems to generate images that reflect the NLST imaging technologies. Deep learning models trained for lesion detection in CXR and CT served as virtual readers.
Results: The study analyzed 294 CT and CXR simulated images from 294 virtual patients, with a lesion-level AUC of 0.81 (95% CI: 0.79-0.84) for CT and 0.56 (95% CI: 0.54-0.58) for CXR. At the patient level, CT demonstrated an AUC of 0.84 (95% CI: 0.80-0.89), compared to 0.52 (95% CI: 0.45-0.58) for CXR. Subgroup analyses on CT results indicated superior detection of homogeneous lesions (lesion-level AUC 0.97) than heterogeneous lesions (lesion-level AUC 0.72). Performance was particularly high for identifying larger nodules (AUC of 0.98 for nodules > 8 mm). The VLST results closely mirrored the NLST, particularly in size-based detection trends, with CT achieving high AUCs for nodules > 8 mm and similar challenges in detecting smaller nodules.
Conclusion and relevance: The VIT results closely replicated those of the earlier NLST, underscoring its potential to replicate real clinical imaging trials.
{"title":"Virtual Lung Screening Trial (VLST): An In Silico Replica of the National Lung Screening Trial for Lung Cancer Detection.","authors":"Fakrul Islam Tushar, Liesbeth Vancoillie, Cindy McCabe, Amareswararao Kavuri, Lavsen Dahal, Brian Harrawood, Milo Fryling, Mojtaba Zarei, Saman Sotoudeh-Paima, Fong Chi Ho, Dhrubajyoti Ghosh, Sheng Luo, W Paul Segars, Ehsan Abadi, Kyle J Lafata, Ehsan Samei, Joseph Y Lo","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Importance: </strong>Clinical imaging trials are crucial for definitive evaluation of medical innovations, but the process is inefficient, expensive, and ethically-constrained. Virtual imaging trial (VIT) approach address these limitations by emulating the components of a clinical trial. An in silico rendition of the National Lung Screening Trial (NCLS) via Virtual Lung Screening Trial (VLST) demonstrates the promise of VITs to expedite clinical trials, reduce risks to subjects, and facilitate the optimal use of imaging technologies in clinical settings.</p><p><strong>Design, setting, and participants: </strong>A diverse virtual patient population of 294 subjects was created from human models (XCAT) emulating the characteristics of cases on NLST, with two types of simulated lung nodules. The cohort was assessed using simulated CT and CXR systems to generate images that reflect the NLST imaging technologies. Deep learning models trained for lesion detection in CXR and CT served as virtual readers.</p><p><strong>Results: </strong>The study analyzed 294 CT and CXR simulated images from 294 virtual patients, with a lesion-level AUC of 0.81 (95% CI: 0.79-0.84) for CT and 0.56 (95% CI: 0.54-0.58) for CXR. At the patient level, CT demonstrated an AUC of 0.84 (95% CI: 0.80-0.89), compared to 0.52 (95% CI: 0.45-0.58) for CXR. Subgroup analyses on CT results indicated superior detection of homogeneous lesions (lesion-level AUC 0.97) than heterogeneous lesions (lesion-level AUC 0.72). Performance was particularly high for identifying larger nodules (AUC of 0.98 for nodules > 8 mm). The VLST results closely mirrored the NLST, particularly in size-based detection trends, with CT achieving high AUCs for nodules > 8 mm and similar challenges in detecting smaller nodules.</p><p><strong>Conclusion and relevance: </strong>The VIT results closely replicated those of the earlier NLST, underscoring its potential to replicate real clinical imaging trials.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11065052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140875085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayesha Vermani, Matthew Dowling, Hyungju Jeon, Ian Jordan, Josue Nassar, Yves Bernaerts, Yuan Zhao, Steven Van Vaerenbergh, Il Memming Park
Function and dysfunctions of neural systems are tied to the temporal evolution of neural states. The current limitations in showing their causal role stem largely from the absence of tools capable of probing the brain's internal state in real-time. This gap restricts the scope of experiments vital for advancing both fundamental and clinical neuroscience. Recent advances in real-time machine learning technologies, particularly in analyzing neural time series as nonlinear stochastic dynamical systems, are beginning to bridge this gap. These technologies enable immediate interpretation of and interaction with neural systems, offering new insights into neural computation. However, several significant challenges remain. Issues such as slow convergence rates, high-dimensional data complexities, structured noise, non-identifiability, and a general lack of inductive biases tailored for neural dynamics are key hurdles. Overcoming these challenges is crucial for the full realization of real-time neural data analysis for the causal investigation of neural computation and advanced perturbation based brain machine interfaces. In this paper, we provide a comprehensive perspective on the current state of the field, focusing on these persistent issues and outlining potential paths forward. We emphasize the importance of large-scale integrative neuroscience initiatives and the role of meta-learning in overcoming these challenges. These approaches represent promising research directions that could redefine the landscape of neuroscience experiments and brain-machine interfaces, facilitating breakthroughs in understanding brain function, and treatment of neurological disorders.
{"title":"Real-Time Machine Learning Strategies for a New Kind of Neuroscience Experiments.","authors":"Ayesha Vermani, Matthew Dowling, Hyungju Jeon, Ian Jordan, Josue Nassar, Yves Bernaerts, Yuan Zhao, Steven Van Vaerenbergh, Il Memming Park","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Function and dysfunctions of neural systems are tied to the temporal evolution of neural states. The current limitations in showing their causal role stem largely from the absence of tools capable of probing the brain's internal state in real-time. This gap restricts the scope of experiments vital for advancing both fundamental and clinical neuroscience. Recent advances in real-time machine learning technologies, particularly in analyzing neural time series as nonlinear stochastic dynamical systems, are beginning to bridge this gap. These technologies enable immediate interpretation of and interaction with neural systems, offering new insights into neural computation. However, several significant challenges remain. Issues such as slow convergence rates, high-dimensional data complexities, structured noise, non-identifiability, and a general lack of inductive biases tailored for neural dynamics are key hurdles. Overcoming these challenges is crucial for the full realization of real-time neural data analysis for the causal investigation of neural computation and advanced perturbation based brain machine interfaces. In this paper, we provide a comprehensive perspective on the current state of the field, focusing on these persistent issues and outlining potential paths forward. We emphasize the importance of large-scale integrative neuroscience initiatives and the role of meta-learning in overcoming these challenges. These approaches represent promising research directions that could redefine the landscape of neuroscience experiments and brain-machine interfaces, facilitating breakthroughs in understanding brain function, and treatment of neurological disorders.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398541/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142303262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S Song
Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.
大型语言模型(LLMs)正在广泛的科学领域产生变革性影响,尤其是在生物医学科学领域。正如自然语言处理的目标是理解单词序列一样,生物学的一个主要目标是理解生物序列。基因组语言模型(gLMs)是在 DNA 序列上训练的 LLMs,有可能极大地推动我们对基因组以及不同尺度的 DNA 元素如何相互作用产生复杂功能的理解。在这篇综述中,我们将重点介绍 gLMs 的关键应用,包括适配性预测、序列设计和迁移学习,从而展示这种潜力。然而,尽管最近取得了显著进展,开发有效和高效的 gLMs 仍然面临着诸多挑战,尤其是对于基因组庞大而复杂的物种而言。我们将讨论开发和评估 gLMs 的主要注意事项。
{"title":"Genomic Language Models: Opportunities and Challenges.","authors":"Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S Song","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11275703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}