首页 > 最新文献

Foundations of data science (Springfield, Mo.)最新文献

英文 中文
Unsupervised learning of observation functions in state space models by nonparametric moment methods 非参数矩法在状态空间模型中观测函数的无监督学习
Q2 MATHEMATICS, APPLIED Pub Date : 2023-01-01 DOI: 10.3934/fods.2023002
Qingci An, Yannis Kevrekidis, Fei Lu, Mauro Maggioni
We investigate the unsupervised learning of non-invertible observation functions in nonlinear state space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed.
研究了非线性状态空间模型中不可逆观测函数的无监督学习问题。假设观测过程数据丰富,且状态过程分布均匀,采用非参数广义矩法对观测函数进行约束回归估计。主要的挑战来自于观测函数的不可逆性以及状态和观测之间缺乏数据对。我们从二次损失函数中解决了可辨识性的基本问题,并证明了可辨识性的函数空间是状态过程固有的RKHS的闭包。数值结果表明,前两个矩和时间相关以及上界和下界可以识别从分段多项式到光滑函数的函数,从而得到收敛估计量。本文还讨论了该方法的局限性,如对称性和平稳性所导致的不可识别性。
{"title":"Unsupervised learning of observation functions in state space models by nonparametric moment methods","authors":"Qingci An, Yannis Kevrekidis, Fei Lu, Mauro Maggioni","doi":"10.3934/fods.2023002","DOIUrl":"https://doi.org/10.3934/fods.2023002","url":null,"abstract":"We investigate the unsupervised learning of non-invertible observation functions in nonlinear state space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135534595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Noise calibration for SPDEs: A case study for the rotating shallow water model spde的噪声校正:以旋转浅水模型为例
Q2 MATHEMATICS, APPLIED Pub Date : 2023-01-01 DOI: 10.3934/fods.2023012
Dan Crisan, Oana Lang, Alexander Lobbe, Peter-Jan van Leeuwen, Roland Potthast
{"title":"Noise calibration for SPDEs: A case study for the rotating shallow water model","authors":"Dan Crisan, Oana Lang, Alexander Lobbe, Peter-Jan van Leeuwen, Roland Potthast","doi":"10.3934/fods.2023012","DOIUrl":"https://doi.org/10.3934/fods.2023012","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134980769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weight set decomposition for weighted rank and rating aggregation: An interpretable and visual decision support tool 加权排名和评级聚合的权重集分解:一个可解释和可视化的决策支持工具
Q2 MATHEMATICS, APPLIED Pub Date : 2023-01-01 DOI: 10.3934/fods.2023001
Tyler A. Perini, A. Langville, Glenn Kramer, Jeff Shrager, Mark Shapiro
{"title":"Weight set decomposition for weighted rank and rating aggregation: An interpretable and visual decision support tool","authors":"Tyler A. Perini, A. Langville, Glenn Kramer, Jeff Shrager, Mark Shapiro","doi":"10.3934/fods.2023001","DOIUrl":"https://doi.org/10.3934/fods.2023001","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA. 基因表达数据完全反褶积的几何结构导向模型和算法
Q2 MATHEMATICS, APPLIED Pub Date : 2022-09-01 DOI: 10.3934/fods.2022013
Duan Chen, Shaoyu Li, Xue Wang

Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.

对大量RNA-seq数据进行完整的去卷积分析非常重要,有助于区分患者和正常对照组组织中疾病相关GEP(基因表达谱)的差异是由于组织样本的细胞组成变化,还是由于特定细胞中GEP的变化。执行完全反褶积的主要技术之一是非负矩阵分解(NMF),它在机器学习社区中也有广泛的应用。然而,NMF是一个众所周知的强不适定问题,因此将NMF直接应用于RNA-seq数据将在解决方案的可解释性方面遇到严重困难。在本文中,我们开发了一个基于NMF的数学模型和相应的计算算法,以提高解卷积批量RNA-seq数据的解可识别性。在我们的方法中,我们将标记基因的生物学概念与NMF理论的可解性条件相结合,并开发了一个几何结构引导的优化模型。在该策略中,首先通过光谱聚类技术来探索大块组织数据的几何结构。然后,标记基因的识别信息被整合为可解性约束,而整体相关图被用作流形正则化。使用合成和生物数据来验证所提出的模型和算法,从而显著提高了解决方案的可解释性和准确性。
{"title":"GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.","authors":"Duan Chen, Shaoyu Li, Xue Wang","doi":"10.3934/fods.2022013","DOIUrl":"10.3934/fods.2022013","url":null,"abstract":"<p><p>Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":"441-466"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10798655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42614124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASPECTS OF TOPOLOGICAL APPROACHES FOR DATA SCIENCE. 数据科学拓扑方法的各个方面。
Q2 MATHEMATICS, APPLIED Pub Date : 2022-06-01 DOI: 10.3934/fods.2022002
Jelena Grbić, Jie Wu, Kelin Xia, Guo-Wei Wei

We establish a new theory which unifies various aspects of topological approaches for data science, by being applicable both to point cloud data and to graph data, including networks beyond pairwise interactions. We generalize simplicial complexes and hypergraphs to super-hypergraphs and establish super-hypergraph homology as an extension of simplicial homology. Driven by applications, we also introduce super-persistent homology.

我们建立了一种新理论,通过同时适用于点云数据和图数据(包括超越成对交互的网络),统一了数据科学拓扑方法的各个方面。我们将简单复合物和超图概括为超超图,并建立了超超图同源性作为简单同源性的扩展。在应用的推动下,我们还引入了超持久同源性。
{"title":"ASPECTS OF TOPOLOGICAL APPROACHES FOR DATA SCIENCE.","authors":"Jelena Grbić, Jie Wu, Kelin Xia, Guo-Wei Wei","doi":"10.3934/fods.2022002","DOIUrl":"10.3934/fods.2022002","url":null,"abstract":"<p><p>We establish a new theory which unifies various aspects of topological approaches for data science, by being applicable both to point cloud data and to graph data, including networks beyond pairwise interactions. We generalize simplicial complexes and hypergraphs to super-hypergraphs and establish super-hypergraph homology as an extension of simplicial homology. Driven by applications, we also introduce super-persistent homology.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"4 2","pages":"165-216"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881677/pdf/nihms-1825620.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10592051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A log-Gaussian Cox process with sequential Monte Carlo for line narrowing in spectroscopy 谱线窄化的对数高斯-考克斯过程
Q2 MATHEMATICS, APPLIED Pub Date : 2022-02-26 DOI: 10.3934/fods.2023008
T. Harkonen, Emma Hannula, M. Moores, E. Vartiainen, L. Roininen
We propose a statistical model for narrowing line shapes in spectroscopy that are well approximated as linear combinations of Lorentzian or Voigt functions. We introduce a log-Gaussian Cox process to represent the peak locations thereby providing uncertainty quantification for the line narrowing. Bayesian formulation of the method allows for robust and explicit inclusion of prior information as probability distributions for parameters of the model. Estimation of the signal and its parameters is performed using a sequential Monte Carlo algorithm followed by an optimization step to determine the peak locations. Our method is validated using a simulation study and applied to a mineralogical Raman spectrum.
我们提出了一种统计模型,用于缩小光谱中的线形,这种线形很好地近似为洛伦兹函数或Voigt函数的线性组合。我们引入对数高斯Cox过程来表示峰值位置,从而为线窄化提供不确定性量化。该方法的贝叶斯公式允许鲁棒和显式包含先验信息作为模型参数的概率分布。信号及其参数的估计是使用顺序蒙特卡罗算法执行的,然后是确定峰值位置的优化步骤。我们的方法通过模拟研究得到验证,并应用于矿物学拉曼光谱。
{"title":"A log-Gaussian Cox process with sequential Monte Carlo for line narrowing in spectroscopy","authors":"T. Harkonen, Emma Hannula, M. Moores, E. Vartiainen, L. Roininen","doi":"10.3934/fods.2023008","DOIUrl":"https://doi.org/10.3934/fods.2023008","url":null,"abstract":"We propose a statistical model for narrowing line shapes in spectroscopy that are well approximated as linear combinations of Lorentzian or Voigt functions. We introduce a log-Gaussian Cox process to represent the peak locations thereby providing uncertainty quantification for the line narrowing. Bayesian formulation of the method allows for robust and explicit inclusion of prior information as probability distributions for parameters of the model. Estimation of the signal and its parameters is performed using a sequential Monte Carlo algorithm followed by an optimization step to determine the peak locations. Our method is validated using a simulation study and applied to a mineralogical Raman spectrum.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45413111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data based quantification of synchronization 基于数据的同步量化
Q2 MATHEMATICS, APPLIED Pub Date : 2022-01-01 DOI: 10.3934/fods.2022020
{"title":"Data based quantification of synchronization","authors":"","doi":"10.3934/fods.2022020","DOIUrl":"https://doi.org/10.3934/fods.2022020","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Addressing confirmation bias in middle school data science education 解决中学数据科学教育中的确认偏误
Q2 MATHEMATICS, APPLIED Pub Date : 2022-01-01 DOI: 10.3934/fods.2021035
S. Hedges, Kim Given
More research is needed involving middle school students' engagement in the statistical problem-solving process, particularly the beginning process steps: formulate a question and make a plan to collect data/consider the data. Further, the increased availability of large-scale electronically accessible data sets is an untapped area of study. This interpretive study examined middle school students' understanding of statistical concepts involved in making a plan to collect data to answer a statistical question within a social issue context using data available on the internet. Student artifacts, researcher notes, and audio and video recordings from nine groups of 20 seventh-grade students in two gifted education pull-out classes at a suburban middle school were used to answer the study research questions. Data were analyzed using a priori codes from previously developed frameworks and by using an inductive approach to find themes.Three themes that emerged from data related to confirmation bias. Some middle school students held preconceptions about the social issues they chose to study that biased their statistical questions. This in turn influenced the sources of data students used to answer their questions. Confirmation bias is a serious issue that is exacerbated due to endless sources of data electronically available. We argue that this type of bias should be addressed early in students' educational experiences. Based on the findings from this study, we offer recommendations for future research and implications for statistics and data science education.
需要对中学生参与统计问题解决的过程进行更多的研究,特别是开始的过程步骤:制定问题和制定收集数据/考虑数据的计划。此外,增加大规模电子数据集的可用性是一个尚未开发的研究领域。本解释性研究考察了中学生对统计概念的理解,这些概念涉及到使用互联网上可用的数据在社会问题背景下收集数据以回答统计问题的计划。学生的手工制品,研究人员的笔记,以及来自郊区一所中学的两个资优教育退出班的9组20名七年级学生的音频和视频记录被用来回答研究问题。使用先前开发的框架中的先验代码分析数据,并使用归纳方法找到主题。与确认偏差相关的数据中出现了三个主题。一些中学生对他们选择研究的社会问题有先入为主的观念,这对他们的统计问题有偏见。这反过来又影响了学生用来回答问题的数据来源。确认偏误是一个严重的问题,由于无穷无尽的电子数据来源而加剧。我们认为,这种类型的偏见应该在学生的教育经历的早期解决。基于本研究的发现,我们提出了未来研究的建议以及对统计和数据科学教育的启示。
{"title":"Addressing confirmation bias in middle school data science education","authors":"S. Hedges, Kim Given","doi":"10.3934/fods.2021035","DOIUrl":"https://doi.org/10.3934/fods.2021035","url":null,"abstract":"More research is needed involving middle school students' engagement in the statistical problem-solving process, particularly the beginning process steps: formulate a question and make a plan to collect data/consider the data. Further, the increased availability of large-scale electronically accessible data sets is an untapped area of study. This interpretive study examined middle school students' understanding of statistical concepts involved in making a plan to collect data to answer a statistical question within a social issue context using data available on the internet. Student artifacts, researcher notes, and audio and video recordings from nine groups of 20 seventh-grade students in two gifted education pull-out classes at a suburban middle school were used to answer the study research questions. Data were analyzed using a priori codes from previously developed frameworks and by using an inductive approach to find themes.Three themes that emerged from data related to confirmation bias. Some middle school students held preconceptions about the social issues they chose to study that biased their statistical questions. This in turn influenced the sources of data students used to answer their questions. Confirmation bias is a serious issue that is exacerbated due to endless sources of data electronically available. We argue that this type of bias should be addressed early in students' educational experiences. Based on the findings from this study, we offer recommendations for future research and implications for statistics and data science education.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical inference for persistent homology applied to simulated fMRI time series data 持续同源性的统计推断应用于模拟fMRI时间序列数据
Q2 MATHEMATICS, APPLIED Pub Date : 2022-01-01 DOI: 10.3934/fods.2022014
H. Abdallah, Adam J. Regalski, Mohammad Behzad Kang, Maria Berishaj, N. Nnadi, Asadur Chowdury, V. Diwadkar, A. Salch
Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.
时间序列数据是生物医学科学中最广泛使用的数据之一,包括功能磁共振成像(fMRI)等领域。拓扑数据分析(TDA)工具可以捕获时间序列数据中的结构。持久化同构是TDA中最常用的数据分析工具,它可以有效地将复杂的高维数据总结为可解释的二维表示,称为持久化图。现有的数据持久同调的统计推断方法依赖于一个独立性假设的满足。虽然时间序列中的每个时间指标都可以计算出持久的同源性,但时间序列数据往往不能满足独立性假设。本文提出了一种统计检验方法,通过使用一组持久性图实现多级块采样蒙特卡罗检验,消除了独立性假设。然后在模拟的fMRI数据上证明了其检测任务相关拓扑组织的有效性。因此,这种新的统计检验适用于分析fMRI数据的持续同源性,以及一般的非独立数据。
{"title":"Statistical inference for persistent homology applied to simulated fMRI time series data","authors":"H. Abdallah, Adam J. Regalski, Mohammad Behzad Kang, Maria Berishaj, N. Nnadi, Asadur Chowdury, V. Diwadkar, A. Salch","doi":"10.3934/fods.2022014","DOIUrl":"https://doi.org/10.3934/fods.2022014","url":null,"abstract":"Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Teaching data science to students in biology using R, RStudio and Learnr: Analysis of three years data 使用R、RStudio和Learnr向生物学专业的学生教授数据科学:三年数据分析
Q2 MATHEMATICS, APPLIED Pub Date : 2022-01-01 DOI: 10.3934/fods.2022022
G. Engels, P. Grosjean, Frédérique Artus
We examine the impact of implementing active pedagogical methodologies in three successive data science courses for a biology curriculum at the University of Mons, Belgium. Blended learning and flipped classroom approaches were adopted, with an emphasis on project-based biological data analysis. Four successive types of exercises of increasing difficulties were proposed to the students. Tutorials written with the R package learnr were identified as a critical step to transition between theory and the application of the concepts. The cognitive workload needed to complete the learnr tutorials was measured for the three courses and it was only lower for the last course, suggesting students needed a long time to get used to their software environment (R, RStudio and git). Data relative to students' activity, collected primarily from the ongoing assessment, were also used to establish student profiles according to their learning strategies. Several suboptimal strategies were observed and discussed. Finally, the timing of students contributions, and the intensity of teacher-learner interactions related to these contributions were analyzed before, during and after the mandatory distance learning due to the COVID-19 lockdown. A lag phase was visible at the beginning of the first lockdown, but the students' work was not markedly affected during the second lockdown period which lasted much longer.
我们研究了在比利时蒙斯大学生物学课程的三门连续数据科学课程中实施积极教学方法的影响。采用混合学习和翻转课堂的方法,重点是基于项目的生物数据分析。向学生们提出了四种难度逐渐增加的连续练习。使用R包learnr编写的教程被认为是理论和概念应用之间过渡的关键步骤。我们测量了这三门课程完成学习者教程所需的认知工作量,只有最后一门课程的认知工作量更低,这表明学生需要很长时间来适应他们的软件环境(R, RStudio和git)。主要从正在进行的评估中收集的与学生活动有关的数据也用于根据学生的学习策略建立学生档案。观察并讨论了几种次优策略。最后,分析了由于COVID-19封锁导致的强制性远程学习之前、期间和之后,学生贡献的时间以及与这些贡献相关的师生互动的强度。在第一次封锁开始时,可以看到滞后阶段,但在持续时间更长的第二次封锁期间,学生的工作没有受到明显影响。
{"title":"Teaching data science to students in biology using R, RStudio and Learnr: Analysis of three years data","authors":"G. Engels, P. Grosjean, Frédérique Artus","doi":"10.3934/fods.2022022","DOIUrl":"https://doi.org/10.3934/fods.2022022","url":null,"abstract":"We examine the impact of implementing active pedagogical methodologies in three successive data science courses for a biology curriculum at the University of Mons, Belgium. Blended learning and flipped classroom approaches were adopted, with an emphasis on project-based biological data analysis. Four successive types of exercises of increasing difficulties were proposed to the students. Tutorials written with the R package learnr were identified as a critical step to transition between theory and the application of the concepts. The cognitive workload needed to complete the learnr tutorials was measured for the three courses and it was only lower for the last course, suggesting students needed a long time to get used to their software environment (R, RStudio and git). Data relative to students' activity, collected primarily from the ongoing assessment, were also used to establish student profiles according to their learning strategies. Several suboptimal strategies were observed and discussed. Finally, the timing of students contributions, and the intensity of teacher-learner interactions related to these contributions were analyzed before, during and after the mandatory distance learning due to the COVID-19 lockdown. A lag phase was visible at the beginning of the first lockdown, but the students' work was not markedly affected during the second lockdown period which lasted much longer.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Foundations of data science (Springfield, Mo.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1