The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations.
The concentration of circulating cell-free DNA (cfDNA) in plasma is an important determinant of the robustness of liquid biopsies. However, biological mechanisms that lead to inter-individual differences in cfDNA concentrations remain unexplored. The concentration of plasma cfDNA is governed by an interplay between its release and clearance. We hypothesized that cfDNA clearance by nucleases might be one mechanism that contributes toward inter-individual variations in cfDNA concentrations. We performed fragmentomic analysis of the plasma cfDNA from 862 healthy individuals, with a cfDNA concentration range of 1.61-41.01 ng/mL. We observed an increase in large DNA fragments (231-600 bp), a decreased frequencies of shorter DNA fragments (20-160 bp), and an increased frequency of G-end motifs with increasing cfDNA concentrations. End motif deconvolution analysis revealed a decreased contribution of DNASE1L3 and DFFB in subjects with higher cfDNA concentration. The five subjects with the highest plasma DNA concentration (top 0.58%) had aberrantly decreased levels of DNASE1L3 protein in plasma. The cfDNA concentration could be inferred from the fragmentomic profile through machine learning and was well correlated to the measured cfDNA concentration. Such an approach could infer the fractional DNA concentration from particular tissue types, such as the fetal and tumor fraction. This work shows that individuals with different cfDNA concentrations are associated with characteristic fragmentomic patterns of the cfDNA pool and that nuclease-mediated clearance of DNA is a key parameter that affects cfDNA concentration. Understanding these mechanisms has facilitated the enhanced measurement of cfDNA species of clinical interest, including circulating fetal and tumor DNA.