Dataset

Sampling Strategies for Genotyping Common Bean (P. vulgaris) Genebank Accessions with DArTseq: A Comparison of Single Plants, Multiple Plants, and DNA Pools

Genotyping large-scale gene bank collections requires an appropriate sampling strategy to represent the diversity within and between accessions. A panel of 44 common bean (Phaseolus vulgaris L.) landraces from the Alliance Bioversity and CIAT gene bank, was genotyped with DArTseq using three sampling strategies: a single plant per accession (random individual), 25 individual plants per accession jointly analyzed after genotyping (in silico-pool), and by pooling tissue from 25 individual plants per accession (seq-pool). Sampling strategies were compared to assess technical aspects of the samples, marker information content and genetic composition of the panel. The seq-pool strategy resulted in more consistent DNA libraries for quality and call rate, although with fewer polymorphic markers (6,142 SNPs) than the in silico-pool (14,074) or the random individual sets (6,555). Estimates of allele frequencies by seq-pool and in silico-pool genotyping were consistent, but results suggest that the difference between pools depends on population heterogeneity. Principal Coordinate Analysis (PCoA), Hierarchical Clustering, and the estimation of admixture coefficients derived from random individuals, in silico and seq-pools successfully identified the well-known structure of Andean and Mesoamerican genepools of P. vulgaris across all datasets. In conclusion, seq-pool proved to be a viable approach for characterizing common bean germplasm compared to genotyping individual plants separately by balancing genotyping effort and costs. This study provides insights and serves as a valuable guide for gene bank researchers embarking on genotyping initiatives to characterize their collections. It aids curators in effectively managing the collections and facilitates marker-trait association studies, enabling the identification of candidate markers for key traits.

Methodology:The data was collected from 44 accesions of Phaseolus vulgaris. Briefly, 25 plants were sown in the greenhouse and DNA was collected using two sampling methods. DNA was extracted from each individual plant and leaf tissue from all 25 plants was pooled together and a single extraction was made. The DNA was sent to DArT P/L for genotyping with DArTseq. Using the generated SNP data, sampling methods were compared by the estimation of allele frequencies, genetic distances, Principal Coordinate Analysis (PCoA), and the estimation of admixture coeficients with snmf. (2023-12)