We call raw DNA data the information extracted from a DNA sample in our laboratory. The MyHeritage DNA test produces about 700,000 pieces of data inherited from both paternal and maternal sides, which are used to determine your Ethnicity Estimate and calculate DNA Matches.
Raw DNA data are provided in a Tab-delimited text file. This file contains a header describing the data, and a line per each variant with five columns containing the following information:
The name of the variant with its rsID (= Reference SNP cluster ID), if available
The chromosome number
The variant position on the chromosome
The pair of observed values (allele 1 and allele 2 - representing the paternal and maternal sides, not necessarily in this order) at this location in the genome for that specific DNA sample.
Here is an example what raw DNA data looks like:
Raw data include two types of genetic variants in your DNA: the SNPs (stands for Single Nucleotide Polymorphisms) and the indels (insertion and/or deletion of nucleotides).
- For SNP variants, each allele is represented by a letter standing for the 4 DNA nucleotides:
A for Adenine
C for Cytosine
G for Guanine
T for Thymine
- For indel variants, each allele is represented by the letters I or D, standing for Insertion or Deletion.
Each SNP position can have a combination of two nucleotides out of the four nucleotides (A, C, G and T). This means that there are three possibilities of genotypes for each variant. For example: if a variant contains either G (Guanine) or C (Cytosine), then the possible genotypes are G G, C C, or G C (the order of the alleles does not matter, so G C is the same as C G).
Each chromosome is composed of 2 strands, commonly called ‘forward strand’ and ‘reverse strand’. Alleles may be on either of those two strands.
For example, an SNP genotype that is A A on the ‘forward strand’, will be T T on the ‘reverse strand’.
It is important to know on which strand your data is located. MyHeritage DNA reports data for the SNPs located on the forward strand as regards to the human reference genome.