Background: Soil metagenomics is a cultivation-independent molecular strategy for investigating and exploiting the diversity of soil microbial communities. Soil microbial diversity is essential because it is critical to sustaining soil health for agricultural productivity and protection against harmful organisms. This study aimed to perform a metagenomic analysis of the soybean endosphere (all microbial communities found in plant leaves) to reveal signatures of microbes for health and disease.
Results: The dataset is based on the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) release "microbial diversity in soybean". The quality control process rejected 21 of the evaluated sequences (0.03% of the total sequences). Dereplication determined that 68,994 sequences were artificial duplicate readings, and removed them from consideration. Ribosomal Ribonucleic acid (RNA) genes were present in 72,747 sequences that successfully passed quality control (QC). Finally, we found that hierarchical classification for taxonomic assignment was conducted using MG-RAST, and the considered dataset of the metagenome domain of bacteria (99.68%) dominated the other groups. In Eukaryotes (0.31%) and unclassified sequence 2 (0.00%) in the taxonomic classification of bacteria in the genus group, Streptomyces, Chryseobacterium, Ppaenibacillus, Bacillus, and Mitsuaria were found. We also found some biological pathways, such as CMP-KDO biosynthesis II (from D-arabinose 5-phosphate), tricarboxylic acid cycle (TCA) cycle (plant), citrate cycle (TCA cycle), fatty acid biosynthesis, and glyoxylate and dicarboxylate metabolism. Gene prediction uncovered 1,180 sequences, 15,172 of which included gene products, with the shortest sequence being 131 bases and maximum length 3829 base pairs. The gene list was additionally annotated using Integrated Microbial Genomes and Microbiomes. The annotation process yielded a total of 240 genes found in 177 bacterial strains. These gene products were found in the genome of strain 7598. Large volumes of data are generated using modern sequencing technology to sample all genes in all species present in a given complex sample.
Conclusions: These data revealed that it is a rich source of potential biomarkers for soybean plants. The results of this study will help us to understand the role of the endosphere microbiome in plant health and identify the microbial signatures of health and disease. The MG-RAST is a public resource for the automated phylogenetic and functional study of metagenomes. This is a powerful tool for investigating the diversity and function of microbial communities.