In the U.S., Salmonella Kentucky is frequently isolated from food animals, but human cases are often linked to international travel. The objectives of this study were to utilize machine learning models to predict the animal hosts (bovine or poultry) and geographic origin (North America or not North America) of S. Kentucky isolates, and to identify the genomic features associated with host specificity. Core-genome single nucleotide polymorphisms (SNPs), gene presence, and intergenic regions were used to train multiple machine learning algorithms, and the highest performing models were XGBoost trained on core-genome SNPs. The top models accurately predicted animal host (F1 scores: 0.943 poultry, 0.891 bovine) and geographic origin (F1 scores: 0.981 North America, 0.982 not North America). Analyses of feature importance identified SNPs and genes that likely contribute to host specificity. In bovine-associated lineages, top features included SNPs or gene variants linked to drug efflux and pathogenesis in ST152, and the virulence factor rhuM in ST198. In poultry-associated lineages, many of the top features were plasmids or other mobile genetic elements, some of which carried resistance genes, as well as proteins with unknown function. When applied to U.S. human clinical isolates, the models predicted that the most prevalent sequence type, ST198, was primarily acquired from poultry outside North America (76.6%), whereas ST152 was mainly acquired from domestic poultry (92.4%). A notable number of U.S. human clinical cases, as well as some produce isolates and surface water isolates, were predicted to originate from bovine sources. These findings demonstrate that machine learning models using core-genome SNPs are highly effective for differentiating animal hosts of S. Kentucky isolates. These tools facilitate the study of foodborne pathogen ecology and help identify host-associated genomic features, which serve as potential targets for mitigation strategies in food animals.
扫码关注我们
求助内容:
应助结果提醒方式:
