The impressive conversational and programming abilities of ChatGPT make it an attractive tool for facilitating the education of bioinformatics data analysis for beginners. In this study, we proposed an iterative model to fine-tune instructions for guiding a chatbot in generating code for bioinformatics data analysis tasks. We demonstrated the feasibility of the model by applying it to various bioinformatics topics. Additionally, we discussed practical considerations and limitations regarding the use of the model in chatbot-aided bioinformatics education.
Background: Mass cytometry (CyTOF) gives unprecedented opportunity to simultaneously measure up to 40 proteins in single cells, with a theoretical potential to reach 100 proteins. This high-dimensional single-cell information can be very useful in dissecting mechanisms of cellular activity. In particular, measuring abundances of signaling proteins like phospho-proteins can provide detailed information on the dynamics of single-cell signaling processes. However, computational analysis is required to reconstruct such networks with a mechanistic model.
Methods: We propose our Mass cytometry Signaling Network Analysis Code (McSNAC), a new software capable of reconstructing signaling networks and estimating their kinetic parameters from CyTOF data. McSNAC approximates signaling networks as a network of first-order reactions between proteins. This assumption often breaks down as signaling reactions can involve binding and unbinding, enzymatic reactions, and other nonlinear constructions. Furthermore, McSNAC may be limited to approximating indirect interactions between protein species, as cytometry experiments are only able to assay a small fraction of protein species involved in signaling.
Results: We carry out a series of in silico experiments here to show (1) McSNAC is capable of accurately estimating the ground-truth model in a scalable manner when given data originating from a first-order system; (2) McSNAC is capable of qualitatively predicting outcomes to perturbations of species abundances in simple second-order reaction models and in a complex in silico nonlinear signaling network in which some proteins are unmeasured.
Conclusions: These findings demonstrate that McSNAC can be a valuable screening tool for generating models of signaling networks from time-stamped CyTOF data.
Background: Mutational signatures computed from somatic mutations, allow an in-depth understanding of tumorigenesis and may illuminate early prevention strategies. Many studies have shown the regulation effects between somatic mutation and gene expression dysregulation.
Methods: We hypothesized that there are potential associations between mutational signature and gene expression. We capitalized upon RNA-seq data to model 49 established mutational signatures in 33 cancer types. Both accuracy and area under the curve were used as performance measures in five-fold cross-validation.
Results: A total of 475 models using unconstrained genes, and 112 models using protein-coding genes were selected for future inference purposes. An independent gene expression dataset on lung cancer smoking status was used for validation which achieved over 80% for both accuracy and area under the curve.
Conclusion: These results demonstrate that the associations between gene expression and somatic mutations can translate into the associations between gene expression and mutational signatures.