Camilla Sarra, Leopoldo Sarra, Luca Di Carlo, Trevor GrandPre, Yaojun Zhang, Curtis G. Callan Jr., William Bialek
{"title":"Maximum entropy models for patterns of gene expression","authors":"Camilla Sarra, Leopoldo Sarra, Luca Di Carlo, Trevor GrandPre, Yaojun Zhang, Curtis G. Callan Jr., William Bialek","doi":"arxiv-2408.08037","DOIUrl":null,"url":null,"abstract":"New experimental methods make it possible to measure the expression levels of\nmany genes, simultaneously, in snapshots from thousands or even millions of\nindividual cells. Current approaches to analyze these experiments involve\nclustering or low-dimensional projections. Here we use the principle of maximum\nentropy to obtain a probabilistic description that captures the observed\npresence or absence of mRNAs from hundreds of genes in cells from the mammalian\nbrain. We construct the Ising model compatible with experimental means and\npairwise correlations, and validate it by showing that it gives good\npredictions for higher-order statistics. We notice that the probability\ndistribution of cell states has many local maxima. By labeling cell states\naccording to the associated maximum, we obtain a cell classification that\nagrees well with previous results that use traditional clustering techniques.\nOur results provide quantitative descriptions of gene expression statistics and\ninterpretable criteria for defining cell classes, supporting the hypothesis\nthat cell classes emerge from the collective interaction of gene expression\nlevels.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
New experimental methods make it possible to measure the expression levels of
many genes, simultaneously, in snapshots from thousands or even millions of
individual cells. Current approaches to analyze these experiments involve
clustering or low-dimensional projections. Here we use the principle of maximum
entropy to obtain a probabilistic description that captures the observed
presence or absence of mRNAs from hundreds of genes in cells from the mammalian
brain. We construct the Ising model compatible with experimental means and
pairwise correlations, and validate it by showing that it gives good
predictions for higher-order statistics. We notice that the probability
distribution of cell states has many local maxima. By labeling cell states
according to the associated maximum, we obtain a cell classification that
agrees well with previous results that use traditional clustering techniques.
Our results provide quantitative descriptions of gene expression statistics and
interpretable criteria for defining cell classes, supporting the hypothesis
that cell classes emerge from the collective interaction of gene expression
levels.