HAPLOGROUPS
It sometimes comes as a surprise to people when
they first receive their Y test results to discover that the haplogroups are
actually defined by a type of marker called unique event polymorphisms (UEPs) that are not normally tested by the commercial
laboratories because of the expense.
These biallelic (i.e., two-valued) markers are
also called Single Nucleotide Polymorphisms (SNP). As the term UEP implies, changes in these
markers occur (except very rarely) only once at a given Y-chromosome location
in human history (ergo, the “unique” in UEP), and a new one pops up about every
7000 years in the regions of interest on the Y-chromosome. A mutation involves the substitution of one
of the four subunits of DNA for another.
There are 153 known haplogroups and subgroups. Major haplogroups
are labeled (named) with the letters of the alphabet, while numerals and
lower-case letters are used to designate the subgroups. For example, the most common subgroup in Europe is the R1b group, which is the 1b sub-haplogroup of the R haplogroup..
The type of DNA testing normally used for genealogical studies,
tests an entirely different set of markers, the short tandem repeats (STR). These are
short, usually four-letter, sequences that are repeated between 8 and 36
times. The “value” at a particular
marker or location is simply the number of times that the sequence is
repeated. Y-chromosome testing is
normally done by the commercial testing services on 12, 25, or 37 markers. The results of such testing is simply a set
of 12, 25, or 37 two-digit (or occasionally one-digit) numbers. The region where the STRs are tested is a
region of the Y-chromosome that has no biological function. It is thought to represent “junk DNA.” There are
no medical or health issues connected to these, other than paternity, so there
are no more privacy issues for this set of numbers than there is for the public
awareness of ones surname.
Because of the extensive anthropological research
that has been done on the origin and spread of the various (SNP-defined)
haplogroups, there is intense interest among “genetic genealogists” to predict
a person’s haplogroup from the STR
testing results. This is done on the
basis of a database of STR test results for which the haplogroup is
known. Using the observed allele (test
value) frequency for the different markers, one can often determine which
haplogroup that a given individual belongs to.
For some of the rarer haplogroups, this can be rather uncertain because
of the scarcity of data.
It is very important to note that membership in a
particular haplogroup does not (by itself) indicate the ethnic group from which
the patrilineal line derives. There is a
lot of misinformation posted on the Internet in this regard. You can see such statements as “R1b means
Celtic,” or “I1a means Viking.” While
those two haplogroups are common in those two populations, they also occur in
every country in Europe. It may be
possible in the future that sufficient subgroup structure will be discovered
that more precise origins will be indicated, but that is not presently possible
except in just a few cases.
A particular set of values for a set of STR or SNP markers is termed a “haplotype.” The repeat value of a particular STR marker is called an “allele,” and the distribution
of values for a marker within a given haplogroup is called the allele frequency
distribution.
Anthropological
information about the haplogroups mentioned above and presented in more detail
below, is partly from FTDNA (http://www.familytreedna.com):
R1b Haplogroup R1b is the most common haplogroup in European
populations. It is believed to have expanded throughout Europe as humans re-colonized after the last glacial maximum 10-12 thousand
years ago. This lineage is also the haplogroup containing what is called the “Atlantic
modal haplotype”.
The R1b group is defined by a
set of mutations that go back tens of thousands of years, some early and some
late. The following table shows the
steps, each one involving a UEP mutation, that led from earliest humans to the
origin of the R1b group in time and location:
Haplogroup with
The defining
Y Biallelic SNP Marker
(in parentheses)
|
Years Before Present
|
Migration Route
|
BT(M94)
|
?
|
In Africa
|
CF(M168)
|
50,000
|
Africa->Middle East
|
F
(M89)
|
45,000
|
Middle East->South West Asia
|
K
(M9)
|
40,000
|
South West Asia->North
Central Asia
|
P(M45)
|
35,000
|
North Central Asia->North West Asia
|
R(M207)
|
?
|
In North West Asia
|
R1(M173)
|
30,000
|
North West Asia->Europe
|
R1b(P25)
|
25,000
|
|
R1b1b2(M269)
|
13,000?
|
In Europe,
probably the
Ice Age Enclave in Spain
|
R1a The R1a lineage is believed to have originated in the
Eurasian Steppes north of the Black and Caspian Seas. This lineage is believed to have originated in a population of the Kurgan culture, known for the domestication of the horse (approximately 3000
B.C.E.). These people were also believed to be the first speakers of the Indo-European language group. This lineage is currently found in central
and western Asia, India, and in Slavic populations of Eastern Europe, and is less common in Western Europe.
I The I, I1, I1a and
I1c lineages are common in northwestern Europe, while the I1b haplogroup is common on Sardinia (45% of males, mostly I1b2 subgroup) and Croatia (one-third of males are I1b). The I1a would most
likely have been common within Viking populations. About 10-15% or Northwestern Europeans are in
the I haplogroup.
The following table shows the steps, each one involving a UEP mutation, that led from earliest humans to the origin of the
I1a and I1b groups in time and location:
Haplogroup with
The defining
Y Biallelic SNP Marker
(in parentheses)
|
Years Before Present
|
Migration Route
|
BT(M94)
|
?
|
In Africa
|
CF(M168)
|
50,000
|
Africa->Middle East
|
F(M89)
|
45,000
|
Middle East->South West Asia
|
I(M170)
|
24,000
|
?
|
|
|
|
|
|
|
|
|
|
E1b1 (formerly E3b)
Haplogroup:
The E3b haplogroup has
a frequency ranging from 1 to 5% at various locations in Britian,
with lower frequencies in Ireland. The haplogroup is much more common in the Middle East and North Africa.
Haplogroup with
The defining
Y Biallelic SNP Marker
(in parentheses)
|
Years Before Present
|
Migration Route
|
BT(M94)
|
?
|
In Africa
|
CF(M168)
|
50,000
|
Africa->Middle East
|
[no name](M145)
|
?
|
?
|
E(M96)
|
50,000
|
?
|
E1b1a(M2)
|
20,000
|
Sub-Saharan Africa
|
E1b1b1(M35)
|
30,000
|
North Africa
|
G Haplogroup:
The G haplogroup
occurs at a frequency of about 1-3% at various locations in Britain, and increases in frequency
towards the southeastern part of Europe. The
ancestors of most males in the G haplogroup migrated to Europe from the Middle East with the spread of agriculture
6-8000 years ago. There is a map on the
4th page of the paper by King and Underhill that shows the
distribution of haplogroups (including G) that were associated with the spread
of agriculture:
Haplogroup with
The defining
Y Biallelic SNP Marker
(in parentheses)
|
Years Before Present
|
Migration Route
|
BT(M94)
|
?
|
In Africa
|
CF(M168)
|
50,000
|
Africa->Middle East
|
F(M89)
|
45,000
|
Middle East->South West Asia
|
G(M201)
|
20,000
|
Middle East
|
J
Haplogroup
The J haplogroup is common
in the Middle East. Many Jews
are in the J haplogroup, but so are many Arabs, Kurds, Turks, etc. The group originated in the Middle East or
Ethiopia long before there were any people who identified themselves as Jews or
Arabs. The people who became the Jews
have a large J haplogroup frequency simply because this is the genetic
background of all of the Middle Eastern peoples. With the Neolithic expansion of farming from
the Fertile Cresent region into Europe, there were naturally many migrants who were of the J haplogroup and
they spread it all over Europe. It occurs
in most locations at frequencies ranging from 1 – 5% in the northwest to 5-15%
in the southeast. A
small fraction of the J haplotypes in Northwestern Europe were spread in historical times by the Jewish diaspora.