-
Maize Genotypic Datasets
The Seeds of Discovery project has generated several types of genotypic data describing maize germplasm, including accessions from CIMMYT’s germplasm bank (CGB) and pre-breeding materials generated from CGB materials. Data types include Single Nucleotide Polymorphisms (SNPs), Presence/Absence Variations (PAVs), and allele frequencies for thousands of markers. These data are released as a small number of key “datasets” for targeted subsets of germplasm and/or markers.Product details and features
In general, very high-density genotypic data (with more than one million markers per sample) are released through Dataverse, whereas allele frequency and lower density SNP call datasets are typically released through Germinate.
Dataverse Genotypic Datasets:
- A list of genotypic datasets available for download in the Dataverse repository provides links to individual studies.Link
- Studies contain fixed data files including the genotypic results file(s), and supporting files such as mapping files to link DNA sample names to germplasm identifiers, protocols for extraction or analysis, or other relevant documents.
- Each study is annotated with standard study-level metadata including: study title, description, authors, data generators, date of generation, keywords, links to related studies, links to relevant journal articles
-
Germinate Genotypic Datasets:- A list of genotypic datasets available in the Germinate data warehouse provides links to individual datasets. These are separated into general genotypic datasets (SNPs and PAVs) and allele frequency genotypic datasets.Link
- Users must first login to Germinate, after registering free of charge and agreeing to the terms of the data sharing license.
- Users can then generate “Groups” of germplasm or markers of interest using one or more search or filtering tools and use them immediately to generate customized subsets of data and/or save them for future use.
- SNP genotypic datasets in the Germinate data warehouse are available for direct download in a simple matrix format.
- Users can select to export marker positions based on specific physical or genetic maps.
- Users can then download the selected genotypic data in a plain text format or as a “Project” immediately available for viewing in the Flapjack software.
Comments
Some of the genotypic datasets, especially the very high-density data available in Dataverse, are very large. Long periods of time may be required to download them, particularly for people who have limited internet connectivity.
Data can be provided in alternative ways if direct download from the internet is not possible. The large file sizes may also make it hard or impossible to work with them on computers with limited memory or using applications, such as Excel, that do not support extremely large files.
Please contact Cimmyt-mab-seed@cgiar.org for additional help with accessing or working with any of these genotypic data files. -
Primary Users
Researchers and students working in relevant fields such as biology, breeding, and bioinformatics, as well as maize pre-breeders, molecular breeders, germplasm bank curators, and users of CIMMYT maize germplasm bank materials.
Availability
These products are currently available:
Genotypic Datasets in Dataverse https://data.cimmyt.org/dataverse/seedsofdiscoverydvn?q=%28keywordValue%3AZea+keywordValue%3Amays%29
Allele Frequency Genotypic Datasets in Germinate http://germinate.cimmyt.org/maize/#allele-freq-dataset
Genotypic Datasets in Germinate: Available November 2017Molecular Maps in Germinate (http://germinate.cimmyt.org/maize/#map-details)
For more information
- please send us a message to Cimmyt-mab-seed@cgiar.org