Praphulla MS Bhawsar, Cody Ramin, Petra Lenz, Máire A Duggan, Alexandra R Harris, Brittany Jenkins, Renata Cora, Mustapha Abubakar, Gretchen Gierach, Joel Saltz, Jonas S Almeida
Crown-like structures (CLS) in breast adipose tissue are formed as a result of macrophages clustering around necrotic adipocytes in specific patterns. As a histologic marker of local inflammation, CLS could have potential diagnostic utility as a biomarker for breast cancer risk. However, given the scale of whole slide images and the rarity of CLS (a few cells in an entire tissue sample), microscope-based manual identification is a challenge for the pathologist. In this report, we describe an artificial intelligence pipeline to solve this needle-in-a-haystack problem. We developed a zero-cost, zero-footprint web platform to enable remote operation on digital whole slide imaging data directly in the web browser, supporting collaborative annotation of the data by multiple experts. The annotated images then allow for incremental training and fine tuning of deep neural networks via active learning. The platform is reusable and requires no backend or installations, thus ensuring the data remains secure and private under the governance of the end user. Using this platform, we iteratively trained a CLS identification model, evaluating the performance after each round and adding examples to the training data to overcome failure cases. The resulting model, with an AUC of 0.90, shows promise as a first-pass screening tool to detect CLS in breast adipose tissue, considerably reducing the workload of the pathologist. Platform available at: https://episphere.github.io/path
{"title":"Crown-Like Structures in Breast Adipose Tissue: Finding a 'Needle-in-a-Haystack' using Artificial Intelligence and Collaborative Active Learning on the Web","authors":"Praphulla MS Bhawsar, Cody Ramin, Petra Lenz, Máire A Duggan, Alexandra R Harris, Brittany Jenkins, Renata Cora, Mustapha Abubakar, Gretchen Gierach, Joel Saltz, Jonas S Almeida","doi":"arxiv-2409.08275","DOIUrl":"https://doi.org/arxiv-2409.08275","url":null,"abstract":"Crown-like structures (CLS) in breast adipose tissue are formed as a result\u0000of macrophages clustering around necrotic adipocytes in specific patterns. As a\u0000histologic marker of local inflammation, CLS could have potential diagnostic\u0000utility as a biomarker for breast cancer risk. However, given the scale of\u0000whole slide images and the rarity of CLS (a few cells in an entire tissue\u0000sample), microscope-based manual identification is a challenge for the\u0000pathologist. In this report, we describe an artificial intelligence pipeline to\u0000solve this needle-in-a-haystack problem. We developed a zero-cost,\u0000zero-footprint web platform to enable remote operation on digital whole slide\u0000imaging data directly in the web browser, supporting collaborative annotation\u0000of the data by multiple experts. The annotated images then allow for\u0000incremental training and fine tuning of deep neural networks via active\u0000learning. The platform is reusable and requires no backend or installations,\u0000thus ensuring the data remains secure and private under the governance of the\u0000end user. Using this platform, we iteratively trained a CLS identification\u0000model, evaluating the performance after each round and adding examples to the\u0000training data to overcome failure cases. The resulting model, with an AUC of\u00000.90, shows promise as a first-pass screening tool to detect CLS in breast\u0000adipose tissue, considerably reducing the workload of the pathologist. Platform available at: https://episphere.github.io/path","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyu-Jin Jung, Chuanjiang Cui, Soo-Hyung Lee, Chan-Hee Park, Ji-Won Chun, Dong-Hyun Kim
Blood oxygenation level-dependent (BOLD) functional magnetic resonance imaging (fMRI) is widely used to visualize brain activation regions by detecting hemodynamic responses associated with increased metabolic demand. While alternative MRI methods have been employed to monitor functional activities, the investigation of in-vivo electrical property changes during brain function remains limited. In this study, we explored the relationship between fMRI signals and electrical conductivity (measured at the Larmor frequency) changes using phase-based electrical properties tomography (EPT). Our results revealed consistent patterns: conductivity changes showed negative correlations, with conductivity decreasing in the functionally active regions whereas B1 phase mapping exhibited positive correlations around activation regions. These observations were consistent across both motor and visual cortex activations. To further substantiate these findings, we conducted electromagnetic radio-frequency simulations that modeled activation states with varying conductivity, which demonstrated trends similar to our in-vivo results for both B1 phase and conductivity. These findings suggest that in-vivo electrical conductivity changes can indeed be measured during brain activity. However, further investigation is needed to fully understand the underlying mechanisms driving these measurements.
{"title":"Investigation of Electrical Conductivity Changes during Brain Functional Activity in 3T MRI","authors":"Kyu-Jin Jung, Chuanjiang Cui, Soo-Hyung Lee, Chan-Hee Park, Ji-Won Chun, Dong-Hyun Kim","doi":"arxiv-2409.07806","DOIUrl":"https://doi.org/arxiv-2409.07806","url":null,"abstract":"Blood oxygenation level-dependent (BOLD) functional magnetic resonance\u0000imaging (fMRI) is widely used to visualize brain activation regions by\u0000detecting hemodynamic responses associated with increased metabolic demand.\u0000While alternative MRI methods have been employed to monitor functional\u0000activities, the investigation of in-vivo electrical property changes during\u0000brain function remains limited. In this study, we explored the relationship\u0000between fMRI signals and electrical conductivity (measured at the Larmor\u0000frequency) changes using phase-based electrical properties tomography (EPT).\u0000Our results revealed consistent patterns: conductivity changes showed negative\u0000correlations, with conductivity decreasing in the functionally active regions\u0000whereas B1 phase mapping exhibited positive correlations around activation\u0000regions. These observations were consistent across both motor and visual cortex\u0000activations. To further substantiate these findings, we conducted\u0000electromagnetic radio-frequency simulations that modeled activation states with\u0000varying conductivity, which demonstrated trends similar to our in-vivo results\u0000for both B1 phase and conductivity. These findings suggest that in-vivo\u0000electrical conductivity changes can indeed be measured during brain activity.\u0000However, further investigation is needed to fully understand the underlying\u0000mechanisms driving these measurements.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated cell segmentation is crucial for various biological and medical applications, facilitating tasks like cell counting, morphology analysis, and drug discovery. However, manual segmentation is time-consuming and prone to subjectivity, necessitating robust automated methods. This paper presents open-source infrastructure, utilizing the UNet model, a deep-learning architecture noted for its effectiveness in image segmentation tasks. This implementation is integrated into the open-source DeepChem package, enhancing accessibility and usability for researchers and practitioners. The resulting tool offers a convenient and user-friendly interface, reducing the barrier to entry for cell segmentation while maintaining high accuracy. Additionally, we benchmark this model against various datasets, demonstrating its robustness and versatility across different imaging conditions and cell types.
{"title":"Open Source Infrastructure for Automatic Cell Segmentation","authors":"Aaron Rock Menezes, Bharath Ramsundar","doi":"arxiv-2409.08163","DOIUrl":"https://doi.org/arxiv-2409.08163","url":null,"abstract":"Automated cell segmentation is crucial for various biological and medical\u0000applications, facilitating tasks like cell counting, morphology analysis, and\u0000drug discovery. However, manual segmentation is time-consuming and prone to\u0000subjectivity, necessitating robust automated methods. This paper presents\u0000open-source infrastructure, utilizing the UNet model, a deep-learning\u0000architecture noted for its effectiveness in image segmentation tasks. This\u0000implementation is integrated into the open-source DeepChem package, enhancing\u0000accessibility and usability for researchers and practitioners. The resulting\u0000tool offers a convenient and user-friendly interface, reducing the barrier to\u0000entry for cell segmentation while maintaining high accuracy. Additionally, we\u0000benchmark this model against various datasets, demonstrating its robustness and\u0000versatility across different imaging conditions and cell types.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Establishing reasonable standards for edible chrysanthemum seedlings helps promote seedling development, thereby improving plant quality. However, current grading methods have the several issues. The limitation that only support a few indicators causes information loss, and indicators selected to evaluate seedling level have a narrow applicability. Meanwhile, some methods misuse mathematical formulas. Therefore, we propose a simple, efficient, and generic framework, SQCSEF, for establishing seedling quality classification standards with flexible clustering modules, applicable to most plant species. In this study, we introduce the state-of-the-art deep clustering algorithm CVCL, using factor analysis to divide indicators into several perspectives as inputs for the CVCL method, resulting in more reasonable clusters and ultimately a grading standard $S_{cvcl}$ for edible chrysanthemum seedlings. Through conducting extensive experiments, we validate the correctness and efficiency of the proposed SQCSEF framework.
{"title":"Establish seedling quality classification standard for Chrysanthemum efficiently with help of deep clustering algorithm","authors":"Yanzhi Jing, Hongguang Zhao, Shujun Yu","doi":"arxiv-2409.08867","DOIUrl":"https://doi.org/arxiv-2409.08867","url":null,"abstract":"Establishing reasonable standards for edible chrysanthemum seedlings helps\u0000promote seedling development, thereby improving plant quality. However, current\u0000grading methods have the several issues. The limitation that only support a few\u0000indicators causes information loss, and indicators selected to evaluate\u0000seedling level have a narrow applicability. Meanwhile, some methods misuse\u0000mathematical formulas. Therefore, we propose a simple, efficient, and generic\u0000framework, SQCSEF, for establishing seedling quality classification standards\u0000with flexible clustering modules, applicable to most plant species. In this\u0000study, we introduce the state-of-the-art deep clustering algorithm CVCL, using\u0000factor analysis to divide indicators into several perspectives as inputs for\u0000the CVCL method, resulting in more reasonable clusters and ultimately a grading\u0000standard $S_{cvcl}$ for edible chrysanthemum seedlings. Through conducting\u0000extensive experiments, we validate the correctness and efficiency of the\u0000proposed SQCSEF framework.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linear waste management systems are unsustainable and contribute to environmental degradation, economic inequity, and health disparities. Among the array of environmental challenges stemming from anthropogenic impacts, the management of human excrement (human feces and urine) stands as a significant concern. Over two billion people do not have access to adequate sanitation resulting in a global public health crisis. Composting is the microbial biotechnology aimed at cycling organic waste, including human excrement, for improved public health, agricultural productivity and safety, and environmental sustainability. Applications of modern microbiome-omics and related technologies have vast capacity to support continued advances in composting science and praxis. In this article, we review literature focused on applications of microbiome technologies to study composting systems and reactions. The studies we survey generally fall into the categories of animal manure composting, food and landscaping waste composting, biosolids composting, and human excrement composting. We review experiments utilizing microbiome technologies to investigate strategies for enhancing pathogen suppression and accelerating the biodegradation of organic matter. Additionally, we explore studies focused on the bioengineering potential of microbes as inoculants to facilitate degradation of toxins such as pharmaceuticals or per- and polyfluoroalkyl substances (PFAS). The findings from these studies underscore the importance of advancing our understanding of composting processes through the integration of emerging microbiome-omics technologies. We conclude that work to-date has demonstrated exciting basic and applied science potential from studying compost microbiomes, with promising implications for enhancing global environmental sustainability and public health.
{"title":"The microbiome science of composting and human excrement composting: a review","authors":"Jeff Meilander, J. Gregory Caporaso","doi":"arxiv-2409.07376","DOIUrl":"https://doi.org/arxiv-2409.07376","url":null,"abstract":"Linear waste management systems are unsustainable and contribute to\u0000environmental degradation, economic inequity, and health disparities. Among the\u0000array of environmental challenges stemming from anthropogenic impacts, the\u0000management of human excrement (human feces and urine) stands as a significant\u0000concern. Over two billion people do not have access to adequate sanitation\u0000resulting in a global public health crisis. Composting is the microbial biotechnology aimed at cycling organic waste,\u0000including human excrement, for improved public health, agricultural\u0000productivity and safety, and environmental sustainability. Applications of\u0000modern microbiome-omics and related technologies have vast capacity to support\u0000continued advances in composting science and praxis. In this article, we review\u0000literature focused on applications of microbiome technologies to study\u0000composting systems and reactions. The studies we survey generally fall into the\u0000categories of animal manure composting, food and landscaping waste composting,\u0000biosolids composting, and human excrement composting. We review experiments\u0000utilizing microbiome technologies to investigate strategies for enhancing\u0000pathogen suppression and accelerating the biodegradation of organic matter.\u0000Additionally, we explore studies focused on the bioengineering potential of\u0000microbes as inoculants to facilitate degradation of toxins such as\u0000pharmaceuticals or per- and polyfluoroalkyl substances (PFAS). The findings\u0000from these studies underscore the importance of advancing our understanding of\u0000composting processes through the integration of emerging microbiome-omics\u0000technologies. We conclude that work to-date has demonstrated exciting basic and applied\u0000science potential from studying compost microbiomes, with promising\u0000implications for enhancing global environmental sustainability and public\u0000health.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"77 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abu Noman Faruq Ahmmed, MD. Zahidul Islam, Raihan Ferdous
This study aimed to evaluate the effective management strategies of Albugo candida, a pathogen of white rust disease in red amaranth (Amaranthus tricolor L.), accountable for the reduction of seed production. The study was performed during the Rabi season of 2018 and the Kharif season of 2019 at Sher-e-Bangla Agricultural University in Bangladesh. Eight treatments, including chemical, botanical, and biopesticide treatments such as Ridomil Gold 68 WG, Autostin 50 WP, Dithane M 45, Goldton 50 WP, the Bordeaux mixture, G-Derma, Garlic bulb extract, and Allamanda leaf extract, were evaluated. Four foliar sprays were applied at seven-day intervals after disease symptom onset. The field experiments followed a randomized complete block design with three replications. A microscopic study confirmed that Albugo candida was the causal organism. In both seasons, Ridomil Gold demonstrated superior efficacy in reducing disease incidence in plants, disease incidence in leaves, and disease severity, which were 63.07%, 62.78.5, and 84.31%, respectively, in Rabi and 69.73%, 65.71%, and 88.41%, respectively, in the Kharif season. Allamanda leaf extract also had statistically similar results, while Autostin exhibited promising effectiveness. Furthermore, compared with the other treatments, the combination of Ridomil Gold and Allamanda leaf extract significantly enhanced the growth parameters and seed yield in both seasons. Assessing the collective effectiveness of the treatments, Ridomil Gold demonstrated the most efficient control of white rust disease. Consequently, Ridomil Gold holds promise for application in red amaranth seed production. Additionally, the use of Allamanda leaf extract is an environmentally friendly approach to white rust disease management and promotes safer crop production practices. This study offers significant guidance to practitioners seeking optimal disease management strategies.
{"title":"Effective management of white rust disease in red amaranth: a field study in Dhaka, Bangladesh","authors":"Abu Noman Faruq Ahmmed, MD. Zahidul Islam, Raihan Ferdous","doi":"arxiv-2409.07579","DOIUrl":"https://doi.org/arxiv-2409.07579","url":null,"abstract":"This study aimed to evaluate the effective management strategies of Albugo\u0000candida, a pathogen of white rust disease in red amaranth (Amaranthus tricolor\u0000L.), accountable for the reduction of seed production. The study was performed\u0000during the Rabi season of 2018 and the Kharif season of 2019 at Sher-e-Bangla\u0000Agricultural University in Bangladesh. Eight treatments, including chemical,\u0000botanical, and biopesticide treatments such as Ridomil Gold 68 WG, Autostin 50\u0000WP, Dithane M 45, Goldton 50 WP, the Bordeaux mixture, G-Derma, Garlic bulb\u0000extract, and Allamanda leaf extract, were evaluated. Four foliar sprays were\u0000applied at seven-day intervals after disease symptom onset. The field\u0000experiments followed a randomized complete block design with three\u0000replications. A microscopic study confirmed that Albugo candida was the causal\u0000organism. In both seasons, Ridomil Gold demonstrated superior efficacy in\u0000reducing disease incidence in plants, disease incidence in leaves, and disease\u0000severity, which were 63.07%, 62.78.5, and 84.31%, respectively, in Rabi and\u000069.73%, 65.71%, and 88.41%, respectively, in the Kharif season. Allamanda leaf\u0000extract also had statistically similar results, while Autostin exhibited\u0000promising effectiveness. Furthermore, compared with the other treatments, the\u0000combination of Ridomil Gold and Allamanda leaf extract significantly enhanced\u0000the growth parameters and seed yield in both seasons. Assessing the collective\u0000effectiveness of the treatments, Ridomil Gold demonstrated the most efficient\u0000control of white rust disease. Consequently, Ridomil Gold holds promise for\u0000application in red amaranth seed production. Additionally, the use of Allamanda\u0000leaf extract is an environmentally friendly approach to white rust disease\u0000management and promotes safer crop production practices. This study offers\u0000significant guidance to practitioners seeking optimal disease management\u0000strategies.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raj Magesh Gauthaman, Brice Ménard, Michael F. Bonner
How does the human visual cortex encode sensory information? To address this question, we explore the covariance structure of neural representations. We perform a cross-decomposition analysis of fMRI responses to natural images in multiple individuals from the Natural Scenes Dataset and find that neural representations systematically exhibit a power-law covariance spectrum over four orders of magnitude in ranks. This scale-free structure is found in multiple regions along the visual hierarchy, pointing to the existence of a generic encoding strategy in visual cortex. We also show that, up to a rotation, a large ensemble of principal axes of these population codes are shared across subjects, showing the existence of a universal high-dimensional representation. This suggests a high level of convergence in how the human brain learns to represent natural scenes despite individual differences in neuroanatomy and experience. We further demonstrate that a spectral approach is critical for characterizing population codes in their full extent, and in doing so, we reveal a vast space of uncharted dimensions that have been out of reach for conventional variance-weighted methods. A global view of neural representations thus requires embracing their high-dimensional nature and understanding them statistically rather than through visual or semantic interpretation of individual dimensions.
{"title":"Universal scale-free representations in human visual cortex","authors":"Raj Magesh Gauthaman, Brice Ménard, Michael F. Bonner","doi":"arxiv-2409.06843","DOIUrl":"https://doi.org/arxiv-2409.06843","url":null,"abstract":"How does the human visual cortex encode sensory information? To address this\u0000question, we explore the covariance structure of neural representations. We\u0000perform a cross-decomposition analysis of fMRI responses to natural images in\u0000multiple individuals from the Natural Scenes Dataset and find that neural\u0000representations systematically exhibit a power-law covariance spectrum over\u0000four orders of magnitude in ranks. This scale-free structure is found in\u0000multiple regions along the visual hierarchy, pointing to the existence of a\u0000generic encoding strategy in visual cortex. We also show that, up to a\u0000rotation, a large ensemble of principal axes of these population codes are\u0000shared across subjects, showing the existence of a universal high-dimensional\u0000representation. This suggests a high level of convergence in how the human\u0000brain learns to represent natural scenes despite individual differences in\u0000neuroanatomy and experience. We further demonstrate that a spectral approach is\u0000critical for characterizing population codes in their full extent, and in doing\u0000so, we reveal a vast space of uncharted dimensions that have been out of reach\u0000for conventional variance-weighted methods. A global view of neural\u0000representations thus requires embracing their high-dimensional nature and\u0000understanding them statistically rather than through visual or semantic\u0000interpretation of individual dimensions.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philip Fradkin, Puria Azadi, Karush Suri, Frederik Wenkel, Ali Bashashati, Maciej Sypetkowski, Dominique Beaini
Predicting molecular impact on cellular function is a core challenge in therapeutic design. Phenomic experiments, designed to capture cellular morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem ofContrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1x improvement in zero shot molecular retrieval of active molecules over the previous state-of-the-art, reaching 77.33% in top-1% accuracy. These results open the door for machine learning to be applied in virtual phenomics screening, which can significantly benefit drug discovery applications.
{"title":"How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval","authors":"Philip Fradkin, Puria Azadi, Karush Suri, Frederik Wenkel, Ali Bashashati, Maciej Sypetkowski, Dominique Beaini","doi":"arxiv-2409.08302","DOIUrl":"https://doi.org/arxiv-2409.08302","url":null,"abstract":"Predicting molecular impact on cellular function is a core challenge in\u0000therapeutic design. Phenomic experiments, designed to capture cellular\u0000morphology, utilize microscopy based techniques and demonstrate a high\u0000throughput solution for uncovering molecular impact on the cell. In this work,\u0000we learn a joint latent space between molecular structures and microscopy\u0000phenomic experiments, aligning paired samples with contrastive learning.\u0000Specifically, we study the problem ofContrastive PhenoMolecular Retrieval,\u0000which consists of zero-shot molecular structure identification conditioned on\u0000phenomic experiments. We assess challenges in multi-modal learning of phenomics\u0000and molecular modalities such as experimental batch effect, inactive molecule\u0000perturbations, and encoding perturbation concentration. We demonstrate improved\u0000multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics\u0000model, (2) a novel inter sample similarity aware loss, and (3) models\u0000conditioned on a representation of molecular concentration. Following this\u0000recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages\u0000a pre-trained phenomics model to demonstrate significant performance gains\u0000across perturbation concentrations, molecular scaffolds, and activity\u0000thresholds. In particular, we demonstrate an 8.1x improvement in zero shot\u0000molecular retrieval of active molecules over the previous state-of-the-art,\u0000reaching 77.33% in top-1% accuracy. These results open the door for machine\u0000learning to be applied in virtual phenomics screening, which can significantly\u0000benefit drug discovery applications.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Rose, Oliver Wieder, Thomas Seidel, Thierry Langer
The increasing size of screening libraries poses a significant challenge for the development of virtual screening methods for drug discovery, necessitating a re-evaluation of traditional approaches in the era of big data. Although 3D pharmacophore screening remains a prevalent technique, its application to very large datasets is limited by the computational cost associated with matching query pharmacophores to database ligands. In this study, we introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching. Our method reinterprets pharmacophore screening as an approximate subgraph matching problem and enables efficient querying of conformational databases by encoding query-target relationships in the embedding space. We conduct comprehensive evaluations of the learned representations and benchmark our method on virtual screening datasets in a zero-shot setting. Our findings demonstrate significantly shorter runtimes for pharmacophore matching, offering a promising speed-up for screening very large datasets.
{"title":"PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching","authors":"Daniel Rose, Oliver Wieder, Thomas Seidel, Thierry Langer","doi":"arxiv-2409.06316","DOIUrl":"https://doi.org/arxiv-2409.06316","url":null,"abstract":"The increasing size of screening libraries poses a significant challenge for\u0000the development of virtual screening methods for drug discovery, necessitating\u0000a re-evaluation of traditional approaches in the era of big data. Although 3D\u0000pharmacophore screening remains a prevalent technique, its application to very\u0000large datasets is limited by the computational cost associated with matching\u0000query pharmacophores to database ligands. In this study, we introduce\u0000PharmacoMatch, a novel contrastive learning approach based on neural subgraph\u0000matching. Our method reinterprets pharmacophore screening as an approximate\u0000subgraph matching problem and enables efficient querying of conformational\u0000databases by encoding query-target relationships in the embedding space. We\u0000conduct comprehensive evaluations of the learned representations and benchmark\u0000our method on virtual screening datasets in a zero-shot setting. Our findings\u0000demonstrate significantly shorter runtimes for pharmacophore matching, offering\u0000a promising speed-up for screening very large datasets.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taslim Murad, Prakash Chourasia, Sarwan Ali, Murray Patterson
Cancer is a complex disease characterized by uncontrolled cell growth. T cell receptors (TCRs), crucial proteins in the immune system, play a key role in recognizing antigens, including those associated with cancer. Recent advancements in sequencing technologies have facilitated comprehensive profiling of TCR repertoires, uncovering TCRs with potent anti-cancer activity and enabling TCR-based immunotherapies. However, analyzing these intricate biomolecules necessitates efficient representations that capture their structural and functional information. T-cell protein sequences pose unique challenges due to their relatively smaller lengths compared to other biomolecules. An image-based representation approach becomes a preferred choice for efficient embeddings, allowing for the preservation of essential details and enabling comprehensive analysis of T-cell protein sequences. In this paper, we propose to generate images from the protein sequences using the idea of Chaos Game Representation (CGR) using the Kaleidoscopic images approach. This Deep Learning Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images (called DANCE) provides a unique way to visualize protein sequences by recursively applying chaos game rules around a central seed point. we perform the classification of the T cell receptors (TCRs) protein sequences in terms of their respective target cancer cells, as TCRs are known for their immune response against cancer disease. The TCR sequences are converted into images using the DANCE method. We employ deep-learning vision models to perform the classification to obtain insights into the relationship between the visual patterns observed in the generated kaleidoscopic images and the underlying protein properties. By combining CGR-based image generation with deep learning classification, this study opens novel possibilities in the protein analysis domain.
癌症是一种复杂的疾病,其特点是细胞生长失控。T 细胞受体(TCR)是免疫系统中的关键蛋白,在识别抗原(包括与癌症相关的抗原)方面发挥着关键作用。测序技术的最新进展促进了对 TCR 重排的全面分析,发现了具有强大抗癌活性的 TCR,并促成了基于 TCR 的免疫疗法。然而,分析这些错综复杂的生物大分子需要高效的表征方法来捕捉它们的结构和功能信息。与其他生物大分子相比,T 细胞蛋白质序列的长度相对较小,这给分析带来了独特的挑战。基于图像的表示方法成为高效嵌入的首选,它可以保留重要细节,并实现对 T 细胞蛋白质序列的全面分析。在本文中,我们提出利用万花筒图像方法,利用混沌博弈表示(CGR)的思想从蛋白质序列生成图像。这种使用混沌增强万花筒图像对蛋白质序列进行深度学习辅助分析(Deep Learning Assisted Analysis of Protein Sequences Using Chaos EnhancedKaleidoscopic Images,简称 DANCE)提供了一种独特的方法,通过围绕中心种子点递归应用混沌博弈规则,将蛋白质序列可视化。我们使用 DANCE 方法将 TCR 序列转换为图像。我们采用深度学习视觉模型进行分类,以便深入了解在生成的万花筒图像中观察到的视觉模式与潜在蛋白质特性之间的关系。通过将基于 CGR 的图像生成与深度学习分类相结合,这项研究为蛋白质分析领域开辟了新的可能性。
{"title":"DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images","authors":"Taslim Murad, Prakash Chourasia, Sarwan Ali, Murray Patterson","doi":"arxiv-2409.06694","DOIUrl":"https://doi.org/arxiv-2409.06694","url":null,"abstract":"Cancer is a complex disease characterized by uncontrolled cell growth. T cell\u0000receptors (TCRs), crucial proteins in the immune system, play a key role in\u0000recognizing antigens, including those associated with cancer. Recent\u0000advancements in sequencing technologies have facilitated comprehensive\u0000profiling of TCR repertoires, uncovering TCRs with potent anti-cancer activity\u0000and enabling TCR-based immunotherapies. However, analyzing these intricate\u0000biomolecules necessitates efficient representations that capture their\u0000structural and functional information. T-cell protein sequences pose unique\u0000challenges due to their relatively smaller lengths compared to other\u0000biomolecules. An image-based representation approach becomes a preferred choice\u0000for efficient embeddings, allowing for the preservation of essential details\u0000and enabling comprehensive analysis of T-cell protein sequences. In this paper,\u0000we propose to generate images from the protein sequences using the idea of\u0000Chaos Game Representation (CGR) using the Kaleidoscopic images approach. This\u0000Deep Learning Assisted Analysis of Protein Sequences Using Chaos Enhanced\u0000Kaleidoscopic Images (called DANCE) provides a unique way to visualize protein\u0000sequences by recursively applying chaos game rules around a central seed point.\u0000we perform the classification of the T cell receptors (TCRs) protein sequences\u0000in terms of their respective target cancer cells, as TCRs are known for their\u0000immune response against cancer disease. The TCR sequences are converted into\u0000images using the DANCE method. We employ deep-learning vision models to perform\u0000the classification to obtain insights into the relationship between the visual\u0000patterns observed in the generated kaleidoscopic images and the underlying\u0000protein properties. By combining CGR-based image generation with deep learning\u0000classification, this study opens novel possibilities in the protein analysis\u0000domain.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"275 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}