Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference. In this work, we propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method, that can work under missing at random (MAR) mechanism with theoretical support. MI-GAN leverages recent progress in conditional generative adversarial neural works and shows strong performance matching existing state-of-the-art imputation methods on high-dimensional datasets, in terms of imputation error. In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.
In this work, we developed an efficient approach to compute ensemble averages in systems with pairwise-additive energetic interactions between the entities. Methods involving full enumeration of the configuration space result in exponential complexity. Sampling methods such as Markov Chain Monte Carlo (MCMC) algorithms have been proposed to tackle the exponential complexity of these problems; however, in certain scenarios where significant energetic coupling exists between the entities, the efficiency of the such algorithms can be diminished. We used a strategy to improve the efficiency of MCMC by taking advantage of the cluster structure in the interaction energy matrix to bias the sampling. We pursued two different schemes for the biased MCMC runs and show that they are valid MCMC schemes. We used both synthesized and real-world systems to show the improved performance of our biased MCMC methods when compared to the regular MCMC method. In particular, we applied these algorithms to the problem of estimating protonation ensemble averages and titration curves of residues in a protein.