SNP-Based Genetic Linkage Map of Soybean Using the SoySNP6K Illumina Infinium BeadChip Genotyping Array

This study reports a high density genetic linkage map based on the ‘Maryland 96-5722’ by ‘Spencer’ recombinant inbred line (RIL) population of soybean [ Glycine max (L.) Merr.] and constructed exclusively with single nucleotide polymorphism (SNP) markers. The Illumina Infinium SoySNP6K BeadChip genotyping array produced 5,376 SNPs in the mapping population, with a 96.75% success rate. Significant level of goodness-of-fit for each locus was tested based on the ob-served vs. expected ratio (1:1). Out of 5,376 markers, 1,465 SNPs fit the 1:1 segregation rate having ≤20% missing data plus heterozygosity among the RILs. Among this 1,456 just 657 were polymorphic between the parents DNAs tested. These 657 SNPs were mapped using the JoinMap 4.0 software and 550 SNPs were distributed on 16 linkage groups (LGs) among the 20 chromosomes of the soybean genome. The total map length was just 201.57 centiMorgans (cM) with an average marker density of 0.37 cM. This is one of the high density SNP-based genetic linkage maps of soybean that will be used by the scientific community to map quantitative trait loci (QTL) and identify candidate genes for important agronomic traits in soybean.


Introduction
Soybean [Glycine max (L.) Merr.] has 20 chromosomes (2n = 40) and 20 linkage groups (LGs) assigned to them (Zou et al., 2003). To cover the whole soybean genome for the purpose of genome-wide analysis, a large number of molecular markers are imperative (Song et al., 2013). Currently, various types of molecular markers are abundantly using to increase gene discovery and marker assisted selection. Among these molecular markers, single nucleotide polymorphisms (SNPs) represent the most suitable because they occurred at high density within genomes (Gaur et al., 2012). Millions of SNPs were generated in Soybean (Lam et al., 2010), Arabidopsis , Rice (Subbaiyan et al., 2012;Xu et al., 2012), and other crops (Sim et al., 2012a,b;Sharpe et al., 2013;Delourme et al., 2013). High-throughput SNP genotyping is widely used in plant genomics studies such as genome-wide association (Atwell et al., 2010;Huang et al., 2010;Tian et al., 2011;Branca et al., 2010), comparative genomics (Muchero et al., 2009;Luo et al., 2009), and genetic linkage maps construction (Shirasawa et al., 2010;Huo et al., 2011). Genetic linkage maps are important genomic tools for identifying quantitative trait loci (QTL) and candidate genes to enhance marker-assisted selection (MAS) in crop improvement programs.
High numbers of markers are crucial to construct and assemble highly-dense genetic linkage maps useful for identification of QTL of large effect (Lightfoot, 2008). At high marker density candidate genes for important agronomic traits can be identified in a single mapping population and a set of derived isolines. Highly automated SNP discovery platforms are very useful to identify and map high numbers of SNP markers. A universal linkage panel of soybean containing 1,536 SNPs was developed  through Illumina GoldenGate platform. More than 50,000 SNPs were reported for both maize and soybean (Ganal et al., 2011;Song et al., 2013) that were developed using the Illumina Infinium platform. In this study, we analyzed the 'Maryland 96-5722' by 'Spencer' RIL population using the Infinium BeadChip genotyping array of soybean on Illumina platform. The objective was to assemble a high density SNP-based genetic linkage map that will be used for QTL detection and candidate genes discovery of desired agronomic traits.
The cross was made in 2004 By Southern Illinois University at Carbondale (SIUC) Breeding Program and advanced to the F 5 generation and advanced to the F 5:7 generation by singlepod descent method. There was no evidence of unintentional selection for yield or disease resistance. Three to four seeds of parents and RILs were sown in pots containing potting soil and kept in the greenhouse at 28±5 o C under natural daylight. Young leaves were collected from 3-weeks old seedlings and DNA was extracted using DNeasy Plant Mini Kit (QIAGEN, Inc., USA) following the company's procedure with minor modifications.

SNP Genotyping and Genetic Map Assembling
SNP genotyping was performed at Michigan State University using the Illumina platform (Illumina, Inc. San Diego, CA). Song et al. (2013) developed and used SoySNP50K BeadChip to screen >50,000 SNPs in soybean through the Illumina iScan platform (Illumina, Inc. San Diego, CA). The same assay was used for genotyping using SoySNP6K BeadChip of >6,000 SNPs to construct the 'MD 96-5722' by 'Spencer' genetic linkage map. The assay procedures encompass a series of approaches such as incubation, DNA amplification, preparation of the bead assay, hybridization of samples to the bead assay, extension, and staining of the samples and imaging of the bead assay (Song et al., 2013). The SNPs were snapped in each chromosome and then selected with the utterance algorithm. The SNP alleles were called using the Genome Studio Genotyping Module (Illumina, Inc. San Diego, CA; Song et al., 2013).
The SNP-based map was constructed in several steps using JoinMap 4 (Kyazma BV, Wageningen, Netherlands; Van Ooijen, 2006). The regression mapping algorithm and Kosambi's mapping function was used to order the markers in maps and for calculating genetic distances between markers. The minimum LOD was 4, the maximum was 10, and remaining parameters were default in JoinMap 4 for linkage analysis. MapChart 2.2 for Windows (Voorrips, 2002) was used to drawn Linkage maps. A total of 5,361 SNP markers have been produced in the RIL population using Illumina Infinium SoySNP6K BeadChip. For each SNP marker, the genotyping data represents three possible genotypes corresponding: AA homozygote, AB heterozygote and BB homozygote. The oligo pool all success rate of the BeadChip assay was 96.75%. In our genotyping data set, the average amount of heterozygosis was 3.99% and the rate of missing data for all markers was 3.25% (data not shown). Significant deviation of each SNP was tested based on the observed vs. expected ratio (1:1). Preliminarily 1465 SNPs were selected based on >20% missing data plus hetergozygosity. Of the 1,465 SNPs with proper segregation among the RILs, 657 SNPs were polymorphic between two parents and the remaining 808 SNPs were monomorphic. These 657 SNPs was analysed using JoinMap 4.0 software. Subsequently, the linkage group output and map position was launched through the MapChart 2.2 for Windows (Voorrips, 2002) and the process computed and displayed 3-D maps (Figure 2) based on markers distances.

Results
Five hundred fifty (550) SNPs were distributed on 16 linkage groups (Table 1; Figure 2). The remaining 107 SNPs were not linked and so were excluded from the maps or linkage groups. The LGs were numbered (1 to 20) based on the assigned chromosome numbers of soybean (Zou et al., 2003). The LOD score of the markers/loci ranged from 4 to 10. The basic information of the linkage groups (LGs) is presented in Table 1. The current map spanned just 201.57 centiMorgans (cM) with an average marker density of 0.37 cM ( Table 1). The genetic length of the LGs ranged from 4.52 cM (Chr/LG 9) to 22.03 cM (Chr/LG 3) (Table 1; Figure 1). On average, one linkage group encompassed about 35 SNP markers that covered an average of 5.76 cM. The most marker covered linkage group was 14 (Chr/LG 82 14) that had 94 markers with an average marker density of 0.39 cM. In contrast, linkage groups 2, 4, 10, and 20 each had the least number of SNP markers (only 4; Figure 2).

Discussion
The first genetic map in soybean encompassed about 1,500 cM and was constructed based on 150 RFLPs (Keim et al., 1990). Subsequently different markers have been used in constructing genetic maps in soybean of about 2,500 cM by combining markers and maps including RFLPs (Lark et al., 1995), AFLPs (Shoemaker et al., 1995), SSR; (Njiti et al., 2002;Kassem et al., 2006;Song et al., 2004) and SNPs ( Kassem et al., 2012). The 'Maryland' by 'Spencer' genetic linkage map described here is one of among few high density maps published in a single mapping population. In this mapping population the Illumina Infinium BeadChip assay produced high success rate with 96.75%. This percentage is higher than but comparable with the     Continued.
success rates of 89-92% of previously reported studies such as in barley (Rostokset al., 2006), soybean (Hyten et al., 2008), cowpea (Muchero et al., 2009), and maize (Yan et al., 2009). The significant rate of SNP polymorphism among the RILs is approximately 28% which is in the normal range between any two US cultivar parents ('Maryland 96-5722' and 'Spencer') according to estimations of pair-wised distances among a diverse set of cultivars (Song et al., 2013). The BeadChip assay used several restriction enzymes to create difference of restriction sites which identify SNPs in both euchromatic and heterochromatic regions (Schmutz et al., 2010;Song et al., 2013) of a chromosome. During the development of Infinium BeadChip, specific attention was also taken on achieving high allele call success rate with high minor allele frequencies (Song et al., 2013). Rare alleles can be generated for spurious associations between SNP markers and phenotypes but SNPs with minor allele frequency below a certain criterion should be avoided (Song et al., 2013). Thus SNPs markers which had less than 20% heterozygosity were discarded.
In the past few years, SNP markers have been widely used for assembling linkage maps due to having much variation or diversity. SNPs are now projected to become the most useful of genetic markers, especially for the construction of dense maps (Gaur et al., 2012). The distinguished feature of the genetic linkage map presented here is that, it has been assembled based on a single population utilizing segregation data of each of the SNP markers. It was not constructed based on integration of marker types and different populations in a linkage map. This map is comparable in respect of total number of markers (550 SNP markers) to other published maps reported earlier by Kassem et al. (2012) (642 SNP markers; map length 1,524.7 cM; average marker distance 2.37) and Vuong et al. (2010) (252 SSR markers; map length 2,200.00 cM; average marker distance 8.73 cM). However, the map length (201.57 cM) and average marker distance (0.37 cM) are remarkably less than those of maps. It is possible that the parents shared large regions of their genomes that were identical because of descent from common ancestors (Kassem et al 2006;Lightfoot et al 2008) but this would be more extreme than reported previously. There are soybean transcript maps and consensus map reported by Choi et al. (2007) (2,389 cM: total 2,982 markers including 1,361 SNP markers,) and Hyten et al. (2010) (2296.4 cM: total 5,500 markers including 3,792 SNP markers) respectively. However, these maps are not comparable to the map reported here as they were composed from multiple marker types and several populations. In order to increase the number of markers in a particular genome of plants and animals, the consensus maps were computed, but the maps may have lacked in accuracy of the marker order (Gaur et al., 2012). Hence, the current map of this study may be considered more accurate regarding the marker order since the markers were distributed on the linkage groups based on recombination frequencies. In fact the genetic distances between markers, also SNPs genetic vs. physical (position) order are consistent.
In conclusion, the present study displayed a high density genetic linkage map of soybean (550 SNPs; total map length 201.57 cM; average marker density 0. 37 cM) based on single 87 nucleotide polymorphisms (SNPs) markers. The availability of large number of SNP markers apportioned in specific regions of the soybean genome would serve as a base for identification of QTL and candidate genes for agronomic traits of soybean.