HAPLOGROUPS

 

It sometimes comes as a surprise to people when they first receive their Y test results to discover that the haplogroups are actually defined by a type of marker called unique event polymorphisms (UEPs) that are not normally tested by the commercial laboratories because of the expense.  These biallelic (i.e., two-valued) markers are also called Single Nucleotide Polymorphisms (SNP).  As the term UEP implies, changes in these markers occur (except very rarely) only once at a given Y-chromosome location in human history (ergo, the “unique” in UEP), and a new one pops up about every 7000 years in the regions of interest on the Y-chromosome.  A mutation involves the substitution of one of the four subunits of DNA for another.  There are 153 known haplogroups and subgroups.  Major haplogroups are labeled (named) with the letters of the alphabet, while numerals and lower-case letters are used to designate the subgroups.  For example, the most common subgroup in Europe is the R1b group, which is the 1b sub-haplogroup of the R haplogroup..

 

The type of DNA testing normally used for genealogical studies, tests an entirely different set of markers, the short tandem repeats (STR).  These are short, usually four-letter, sequences that are repeated between 8 and 36 times.  The “value” at a particular marker or location is simply the number of times that the sequence is repeated.  Y-chromosome testing is normally done by the commercial testing services on 12, 25, or 37 markers.  The results of such testing is simply a set of 12, 25, or 37 two-digit (or occasionally one-digit) numbers.  The region where the STRs are tested is a region of the Y-chromosome that has no biological function.  It is thought to represent “junk DNA.”  There are no medical or health issues connected to these, other than paternity, so there are no more privacy issues for this set of numbers than there is for the public awareness of ones surname.

 

Because of the extensive anthropological research that has been done on the origin and spread of the various (SNP-defined) haplogroups, there is intense interest among “genetic genealogists” to predict a person’s haplogroup from the STR testing results.  This is done on the basis of a database of STR test results for which the haplogroup is known.  Using the observed allele (test value) frequency for the different markers, one can often determine which haplogroup that a given individual belongs to.  For some of the rarer haplogroups, this can be rather uncertain because of the scarcity of data.

 

It is very important to note that membership in a particular haplogroup does not (by itself) indicate the ethnic group from which the patrilineal line derives.  There is a lot of misinformation posted on the Internet in this regard.  You can see such statements as “R1b means Celtic,” or “I1a means Viking.”  While those two haplogroups are common in those two populations, they also occur in every country in Europe.  It may be possible in the future that sufficient subgroup structure will be discovered that more precise origins will be indicated, but that is not presently possible.

 

A particular set of values for a set of STR or SNP markers is termed a “haplotype.”  The repeat value of a particular STR marker is called an “allele,” and the distribution of values for a marker within a given haplogroup is called the allele frequency distribution.

 

Anthropological information about the haplogroups mentioned above and presented in more detail below, is partly from FTDNA (http://www.familytreedna.com):

R1b  Haplogroup R1b is the most common haplogroup in European populations. It is believed to have expanded throughout Europe as humans re-colonized after the last glacial maximum 10-12 thousand years ago. This lineage is also the haplogroup containing what is called  the “Atlantic modal haplotype”.

The R1b group is defined by a set of mutations that go back tens of thousands of years, some early and some late.  The following table shows the steps, each one involving a UEP mutation, that led from earliest humans to the origin of the R1b group in time and location:

Haplogroup with

The defining

Y Biallelic SNP Marker

(in parentheses)

Years Before Present

Migration Route

[no name](M94)

?

In Africa

[no name](M168)

50,000

Africa->Middle East

F (M89)

45,000

Middle East->South West Asia

K (M9)

40,000

South West Asia->North Central Asia

P(M45)

35,000

North Central Asia->North West Asia

R(M207)

?

In North West Asia

R1(M173)

30,000

North West Asia->Europe

R1b(P25)

25,000

 

R1b1b2(M269)

13,000?

In Europe, probably the

Ice Age Enclave in Spain

 

R1a  The R1a lineage is believed to have originated in the Eurasian Steppes north of the Black and Caspian Seas. This lineage is believed to have originated in a population of the Kurgan culture, known for the domestication of the horse (approximately 3000 B.C.E.). These people were also believed to be the first speakers of the Indo-European language group. This lineage is currently found in central and western Asia, India, and in Slavic populations of Eastern Europe, and is less common in Western Europe.

 

 The I, I1, I1a and I1c lineages are common in northwestern Europe, while the I1b haplogroup is common on Sardinia (45% of males, mostly I1b2 subgroup) and Croatia (one-third of males are I1b). The I1a would most likely have been common within Viking populations.  About 10-15% or Northwestern Europeans are in the I haplogroup.  The following table shows the steps, each one involving a UEP mutation, that led from earliest humans to the origin of the I1a and I1b groups in time and location:

 

 

Haplogroup with

The defining

Y Biallelic SNP Marker

(in parentheses)

Years Before Present

Migration Route

[no name](M94)

?

In Africa

[no name](M168)

50,000

Africa->Middle East

F(M89)

45,000

Middle East->South West Asia

I(M170)

24,000

?

1(P38)

?

?

a(P30)

10-15,000

 

b(P37b)

10-15,000

?

 

E3b Haplogroup:

The E3b haplogroup has a frequency ranging from 1 to 5% at various locations in Britian, with lower frequencies in Ireland.  The haplogroup is much more common in the Middle East and North Africa.

 

Haplogroup with

The defining

Y Biallelic SNP Marker

(in parentheses)

Years Before Present

Migration Route

[no name](M94)

?

In Africa

[no name](M168)

50,000

Africa->Middle East

[no name](M145)

?

?

E(M96)

?

?

3(P2)

?

?

B(M35)

?

?

 

G Haplogroup:

The G haplogroup occurs at a frequency of about 1-3% at various locations in Britain, and increases in frequency towards the southeastern part of Europe.  The ancestors of most males in the G haplogroup migrated to Europe from the Middle East with the spread of agriculture 6-8000 years ago.  There is a map on the 4th page of the paper by King and Underhill that shows the distribution of haplogroups (including G) that were associated with the spread of agriculture:

http://hpgl.stanford.edu/publications/A_2002_v76_p707-714.pdf

 

Haplogroup with

The defining

Y Biallelic SNP Marker

(in parentheses)

Years Before Present

Migration Route

[no name](M94)

?

In Africa

[no name](M168)

50,000

Africa->Middle East

F(M89)

45,000

Middle East->South West Asia

G(M201)

?

Middle East

 

 

J Haplogroup

 

The J haplogroup is common in the Middle East.  Many Jews are in the J haplogroup, but so are many Arabs, Kurds, Turks, etc.  The group originated in the Middle East or Ethiopia long before there were any people who identified themselves as Jews or Arabs.  The people who became the Jews have a large J haplogroup frequency simply because this is the genetic background of all of the Middle Eastern peoples.  With the Neolithic expansion of farming from the Fertile Cresent region into Europe, there were naturally many migrants who were of the J haplogroup and they spread it all over Europe.  It occurs in most locations at frequencies ranging from 1 – 5% in the northwest to 5-15% in the southeast.  A small fraction of the J haplotypes in Northwestern Europe were spread in historical times by the Jewish diaspora.