There are an awful lot of new terms in DNA research. You won't run across all these in this report, but over the years, as you do your own research, you'll have this handy glossary to use. 

Adenine:  One of the four bases in DNA that make up the letters ATGC, adenine is the "A".  The others are “G” for guanine, “C” for cytosine, and “T” for thymine.  Adenine always pairs with thymine.  Cytosine always pairs with guanine.  These letters are used as shorthand for the sequences of fragments of DNA e.g. CCAAGTAC.  These sequences are the code for genetic information.

Allele: Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent (e.g., at a locus for eye color the allele might result in blue or brown eyes).

Allele Frequency: The proportion of a particular allele among the chromosomes carried by individuals in a population.

AMH:  See Atlantic Modal Haplotype

Atlantic Modal Haplotype (AMH):  the descriptive term used by James F. Wilson to characterize the most common haplotype in parts of Europe. The markers and most common repeat values for the AMH are;

DYS19   = 14
DYS388 = 12
DYS390 = 24
DYS391 = 11
DYS392 = 13
DYS393 = 13

Autosome: A chromosome not involved in sex determination.  The diploid human genome consists of 46 chromosomes, 22 pairs of autosomes, and 1 pair of  sex chromosomes (the X and Y chromosomes).

Base pair (bp): Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds.  Two strands of DNA are held together in the shape of a double helix by the bonds between base pairs.  A set of two bonded nucleotides on opposite strands of DNA.  There are two possible base pairs: C-G and A-T.  These letters are used as shorthand for the sequences of fragments of DNA e.g. CCAAGTAC.  These sequences are the code for genetic information.  Strung together in chains each base reaches  across and forms a pair with its complementary base on the opposite strand; like the rungs of a ladder.  Base pairing ensures that the genetic information, the sequence of bases in the DNA, is passed securely from generation to generation in a process called DNA replication.  

Bp:  See base pair.

Chromosome: The self-replicating genetic structure of cells containing the cellular DNA that bears in its nucleotide sequence the linear array of genes.   In prokaryotes, chromosomal DNA is circular, and the entire genome is carried on one chromosome.  Eukaryotic genomes consist of a number of chromosomes whose DNA is associated with different kinds of proteins.  A rod-like structure of tightly coiled DNA found in the cell nucleus of plants and animals.  Chromosomes are normally found in pairs;human beings typically have 23 pairs of chromosomes.

Clade: from the Greek word klados, meaning branch.  A branch of biological taxa or species that share features inherited from a common ancestor. A single phyletic group or line. Also cladus.  Monophyletic group of taxa.

Cladistics: School of phylogenetic analysis emphasizing the branching patterns of monophyletic taxa relying on synapomorphies (vs. symplesiomorphies) to unite sister taxa. [See Avise, pp. 34-39, 121-122].

Cladogram: A diagram, in the form of a stylized tree, showing inferred historical branching patterns among taxa.

Cline: Continuous change in a trait or trait frequency over space or time.

Cytosine: One of the four bases in DNA that make up the letters ATGC, cytosine is the "C".  The others are “A” for adenine, “G” for guanine, and “T” for thymine.   Cytosine always pairs with guanine.  Adenine always pairs with thymine.  These letters are used as shorthand for the sequences of fragments of DNA e.g. CCAAGTAC.  These sequences are the code for genetic information.

Diploid: A full set of genetic material, consisting of paired chromosomes one chromosome from each parental set.  Most animal cells except the gametes have a diploid set of chromosomes.  The diploid human genome has 46 chromosomes.

DNA (deoxyribonucleic acid): The molecule that encodes genetic information.  DNA is a doublestranded molecule held together by weak bonds between base pairs of nucleotides.  The four nucleotides in DNA contain the bases: adenine (A), guanine (G), cytosine (C), and thymine (T).  In nature, base pairs form only between A and T and between G and C; thus the base sequence of each single strand can be deduced from that of its partner.

DNA fingerprinting: A term for DNA typing.  The chemical structure of everyone's DNA is the same.  The only difference between people (or any animal) is the order of the base pairs.  There are so many millions of base pairs in each person's DNA that every person has a different sequence.

Using these sequences, every person could be identified solely by the sequence of their base pairs.  However, because there are so many millions of base pairs, the task would be very time-consuming.  Instead, scientists are able to use a shorter method, because of repeating patterns in DNA.

These patterns do not, however, give an individual "fingerprint," but they are able to determine whether two DNA samples are from the same person, related people, or non-related people.  Scientists use a small number of sequences of DNA that are known to vary among individuals a great deal, and analyze those to get a certain probability of a match.

DNA marker: A gene or other fragment of DNA whose location in the genome is known.

DNA sequence: The relative order of base pairs, whether in a fragment of DNA, a gene, a chromosome, or an entire genome.

DNA typing: The analysis of sections of DNA for purposes of identification.

Double helix: The shape that two linear strands of DNA assume when bonded together.

DYS :   D = DNA,  Y = Y chromosome, S = a unique DNA segment.  This nomenclature is controlled by the HUGO Gene Nomenclature Committee, with the assignment of new DYS numbers.  This guideline determines each part of the symbol for naming arbitrary DNA fragments and loci.  See section Appendix App 1.1 DNA Segments located at   http://www.gene.ucl.ac.uk/nomenclature/guidelines.html#1.4

EST:  Expressed Sequence Tag

Flanking Region: For microsatellites, the flanking regions are the stretches of DNA outside the simple sequence tandem repeat (STR). These sequences are used as primer pairs. The flanking regions are usually invariant across a population or species, but mutations in the flanking region can be a cause of null alleles as well as a potentially serious source of homoplasy (see Pemberton et al. 1995).

Forensic: Of or relating to courts or legal matters. Molecular markers are increasingly common in the context of forensics, both in wildlife and human cases involving identity or relatedness.

Gene: The fundamental physical and functional unit of heredity.  A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule).

Gene expression: The process by which a genes coded information is converted into the structures present and operating in the cell. Expressed genes include those that are transcribed into mRNA and then translated into protein and those that are transcribed into RNA but not translated into protein.

Gene mapping: Determination of the relative positions of genes on a DNA molecule (chromosome or plasmid) and of the distance, in linkage units or physical units, between them.

Genetic Distance: various statistics for measuring the 'genetic distance' between subgroups or populations. Major distance measures include Nei's distance (1972, 1978), Reynold's distance (Reynolds et al. 1983) and new distance measures that incorporate the stepwise mutation process in microsatellites (RST of Slatkin 1995a, b; D of Shriver et al., delta mu of Goldstein et al. 1995). 

Genetic markers: Alleles of genes, or DNA polymorphisms, used as experimental probes to keep track of an individual, a tissue, a cell, a nucleus, a chromosome, or a gene. Stated another way, any character that acts as a signpost or signal of the presence or location of a gene or heredity characteristic in an individual in a population.  There are 4 chromosome changes that do occur from generation to generation, and these are known as markers:

a.      indels: these are insertions or deletions of the DNA at particular locations on the chromosome. An example is the YAP (Y chromosome alu polymorphism).

b.     SNPs: these are single nucleotide polymorphisms in which a particular nucleotide is changed (like A is changed to G).   Since SNPs(snips) and indels (stable alus) are very rare, they also are known as unique event polymorphisms (UEPs).

c.      microsatellites: these are short sequences of nucleotides (typically 2 to 5 core base pairs, example: ATCG) which are repeated multiple times in tandem.  Over time changes sometimes do occur, thus the number of repeats may increase or decrease.

d.     minisatellites: these are longer sequences of nucleotides (typically 9 to 80 core base pairs, example: TAAGGGCCA) which are repeated multiple times in tandem. Over time changes sometimes do occur and the number of repeats may increase or decrease. 

Genetic profile: A collection of information about a person's genes.

Genetics: The study of the patterns of inheritance of specific traits.

Genome: All the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs.

Genome project: Research and technology development effort aimed at mapping and sequencing some or all of the genome of human beings and other organisms.

Genotype: the genetic makeup of an organism or set of DNA variants found at one or more loci in an individual, as characterized by its physical appearance or phenotype. Our external features-what scientists call our phenotypes-are different. We have a wide array of skin color, eye shape and color, hair texture. However our interior profile, or genotype - the organization of our genes on our chromosomes-identifies us all as Homo sapiens.

Guanine: One of the four bases in DNA that make up the letters ATGC, guanine is the "G".  The others are “A” for adenine, “C” for cytosine, and “T” for thymine.  Guanine always pairs with cytosine. Adenine always pairs with thymine.   These letters are used as shorthand for the sequences of fragments of DNA e.g. CCAAGTAC. These sequences are the code for genetic information.

Haplogroup (Hg):  a collection of closely related haplotypes.

Haploid: A single set of chromosomes (half the full set of genetic material), present in the egg and sperm cells of animals and in the egg and pollen cells of plants. Human beings have 23 chromosomes in their reproductive cells.   Compare diploid.

Haplotype (Ht):  A set of closely linked alleles (genes or DNA polymorphisms) inherited as a unit. A contraction of the phrase "haploid genotype".  Different combinations of polymorphisms are known as haplotypes.   Collectively the results from several loci could be referred to as a haplotype.  "Haplo" comes from the Greek word for "single".

Heredity: The handing down of certain traits from parents to their offspring. The process of heredity occurs through the genes.

Homology: Having the same origin (used for genes or characters deriving from a common ancestor).

Homoplasy: similarity of traits or genes for reasons other than coancestry (e.g., convergent evolution, parallelism, evolutionary reversals, horizontal gene transfer, gene duplications). Homoplasy violates a basic assumption of the analysis of genetic markers -- variants of similar phenotype (e.g., base pair size) are assumed to derive from a common ancestor. [See Sanderson, M., and Hufford. 1996. Homoplasy: The Recurrence of Similarity in Evolution. Academic Press, NY ISBN 618030-X].

HUGO :  See Human Genome Organization

Human Genome Initiative: Collective name for several projects begun in 1986 by DOE to (1) create an ordered set of DNA segments from known chromosomal locations, (2) develop new computational methods for analyzing genetic map and DNA sequence data, and (3) develop new techniques and instruments for detecting and analyzing DNA. This DOE initiative is now known as the Human Genome Program. The national effort, led by DOE and NIH, is known as the Human Genome Project.

Human Genome Organization (HUGO):  The Human Genome Organization (HUGO) is the international organization of scientists  involved in the Human Genome Project (HGP), the global initiative to map and sequence the human genome. HUGO was established in 1989 by a group of the world's leading genome scientists to promote international collaboration within the project.

HUGO currently has over 1000 members representing over 50 countries. HUGO maintains three regional offices, HUGO Americas, HUGO Europe and HUGO Pacific, which carry out the administrative duties of the organization.

Hugo carries out a complex coordinating role within the Human Genome Project.  HUGO activities range from support of data collation for constructing genetic and physical maps of the human genome to the organization of workshops to promote the consideration of a wide range of ethical, legal, social and intellectual property issues.

Human Genome Project (HGP):  The national effort, initially led by DOE and NIH, is known as the Human Genome Project.  It is now an international initiative to map and sequence the human genome.

Human Genome Program:   This was previously known as the DOE’s Human Genome Initiative and is now known as the Human Genome Program.

Hypervariability: High degree of variation among individuals within local populations at a given genetic marker. Examples of hypervariable markers include minisatellites and microsatellites. 

Informatics: The study of the application of computer and statistical techniques to the management of information. In genome projects, informatics includes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data.

In vitro: Outside a living organism.

ISOGG: The International Society of Genetic Genealogy (ISOGG) was founded in 2005 by DNA project administrators who share the common vision of the promotion and education of genetic genealogy. Learn more.

Karyotype: A picture of the chromosomes in a cell that is used to check for abnormalities.  A karyotype is created by staining the chromosomes with dye and photographing them through a microscope. The photograph is then cut up and rearranged so that the chromosomes are lined up into corresponding pairs.

Linkage map: A map of the relative positions of genetic loci on a chromosome, determined on the basis of how often the loci are inherited together. Distance is measured in centimorgans (cM).

Localize: Determination of the original position (locus) of a gene or other marker on a chromosome.

Loci:  See Locus.

Locus (pl. loci): The position on a chromosome of a gene or other chromosome marker; also, the DNA at that position. The use of locus is sometimes restricted to mean regions of DNA that are expressed. The specific physical location of a gene on a chromosome.  From the Latin for 'place'. A stretch of DNA at a particular place on a particular chromosome — often used for a 'gene' in the broad sense, meaning a stretch of DNA being analyzed for variability (e.g., a microsatellite locus).

Marker: An identifiable physical location on a chromosome (e.g., restriction enzyme cutting site, gene) whose inheritance can be monitored. Markers can be expressed regions of DNA (genes) or some segment of DNA with no known coding function but whose pattern of inheritance can be determined. A gene of known location on a chromosome and phenotype that is used as a point of reference in the mapping of other loci.

Meiosis: The process of two consecutive cell divisions in the diploid progenitors of sex cells. Meiosis results in four rather than two daughter cells, each with a haploid set of chromosomes.

Microsatellite: Repetitive stretches of short sequences of DNA used as genetic markers to track inheritance in families. They are short sequences of nucleotides (example: ATCG) which are repeated over and over again a number of times in tandem. Changes sometimes do occur, however, and the number of repeats may increase or decrease. See also Genetic Markers.

Minisatellites: Segments of repeated DNA often used as genetic markers for individual identification

(forensic DNA 'fingerprinting') or analyses of relatedness. Can be either single- or multi-locus. Minisatellite technology relies on probe-based hybridization. Advantages include lack of need for specific primers and hypervariability.  Disadvantages include inability to use PCR amplification, the need for Southern blotting, and, for multi-locus minisatellites, the lack of locus-specificity (making population genetic analyses difficult). [See Avise, Fig. 3.16, p. 80]. 

Mitochondrial DNA:  See mtDNA

Modal Haplogroup:  All Haplogroups are in a sense made up based on similarities. A modal haplogroup is one in which scientists have noticed similarities within a certain set of markers amongst a group of people. The goal is to tie any haplogroup, even a modal one, to a specific point in time and a precise place in geography.

Monophyletic group (clade): Evolutionary assemblage of taxa that includes a common ancestor and all of its descendants. [See Avise, p. 36].

MRCA: Most recent common ancestor.

mtDNA:   Mitochondrial DNA which is passed down from the mother to all her children, males and females. The genetic material of the mitochondria, the organelles that generate energy for the cell.

Mutation: A permanent structural alteration in DNA. In most cases, DNA changes either have no effect or cause harm, but occasionally a mutation can improve an organism's chance of surviving and passing the beneficial change on to its descendants.

MWTBD:  More Work To Be Done.  I use this almost jokingly throughout the report because it's always true of every part of this work.

Nucleotide: A subunit of DNA or RNA consisting of a nitrogenous base (adenine, guanine, thymine, or cytosine in DNA; adenine, guanine, uracil, or cytosine in RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Thousands of nucleotides are linked to form a DNA or RNA molecule.

Nucleus: The cellular organelle in eukaryotes that contains the genetic material. The center of a cell, where all of the DNA, packaged in chromosomes, is contained.

PCR:  See Polymerase Chain Reaction.

Pedigree: A simplified diagram of a family's genealogy that shows family members' relationships to each other and how a particular trait or disease has been inherited.

Phenotype:  Our external features are called our phenotypes and are very different. We have a wide array of skin color, eye shape and color, hair texture. However our interior profile, or genotype - the organization of our genes on our chromosomes-identifies us all as Homo sapiens.

Phylogeny: the evolutionary history of a species.

Polymerase Chain Reaction (PCR): An in vitro process that yields millions of copies of desired DNA through repeated cycling of a reaction involving the DNA polymerase enzyme.  Technique for amplifying nucleic acids in a thermal cycler. Involves use of forward and reverse primer pairs that start off the reaction. End yield is many orders of magnitude more DNA of the target sequence than one started with. The resulting amplified DNA can then be visualized with stains or radioactive labeling, or sized with fluorescent markers in a sequencer. [See Avise, p. 84, Fig. 3.18, p. 85].

Polymorphism: a term to show that mutations do occur in the Y chromosome, as can happen with other chromosomes. It is a naturally occurring or induced variation in the sequence of genetic information on a segment of DNA.

Primer:  Short, preexisting single-stranded polynucleotide chain to which new deoxyribonucleotides can be added by DNA polymerase (to 'prime' PCR amplification). The primer anneals to a nucleic acid template (DNA of the organism of interest) and promotes copying of the template, starting from the primer site. To amplify microsatellites one uses a forward and reverse primer pair: [agctcagtccctagtcagtact]acacacacacacacacacacac[ggtacttcggagctatccgaattccct]

In this example the bold, italicized bp are the forward and reverse primers (should not differ among individuals), whereas the unitalicized 'ac' repeat is the microsatellite. By running back and forth across the repeat one can amplify a few copies of the microsatellite region by orders of magnitude, yielding sufficient DNA to allow visualization of the amplified product on an acrylamide gel by staining with ethidium bromide. Some primer sequences may be conserved across wide taxonomic gaps (e.g., across families), while others may differ even among congeners.

Protein: A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the body’s cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies.

Recombination: Exchange of gene segments by crossing over at chiasmata (exchange of material between non-sister chromatids). The exchanged sections are usually homologous. The likelihood of recombination increases with increasing physical distance.

Sequence Tagged Site (STS): Short (200 to 500 base pairs) DNA sequence that has a single occurrence in the human genome and whose location and base sequence are known. Detectable by polymerase chain reaction, STSs are useful for localizing and orienting the mapping and sequence data reported from many different laboratories and serve as landmarks on the developing physical map of the human genome. Expressed sequence tags (ESTs) are STSs derived from cDNAs.

Sequencing: Determination of the order of nucleotides (base sequences) in a DNA or RNA molecule or the order of amino acids in a protein.

Sex Chromosome: The X or Y chromosome in human beings that determines the sex of an individual. Females have two X chromosomes in diploid cells; males have an X and a Y chromosome. The sex chromosomes comprise the 23rd chromosome pair in a karyotype. Compare autosome.

Short Tandem Repeats (STR): Multiple copies of an identical DNA sequence arranged in direct succession in a particular region of a chromosome.

Single Nucleotide Polymorphism (SNP): A variation in the genetic code at a specific point on the DNA.  In principle, SNPs could be bi-, tri-, or tetra-allelic polymorphisms.  However, in humans, tri- allelic and tetra- allelic SNPs are rare almost to the point of non- existence, and so SNPs are sometimes simply referred to as bi- allelic markers (or di- allelic to be etymologically correct).  This is somewhat misleading because SNPs are only a subset of all possible bi- allelic polymorphisms (e.g., multiple base variations).  About 30 million SNPs are thought to exist, making them much better markers than alternative markers, such as micro- satellite repeats or short tandem repeats. But it has been the discovery that some SNPs are linked to particular diseases that has fueled the rising interest in this field.   see also The SNP Consortium.

Slippage Replication: A mutation process whereby a simple sequence tandem (microsatellite) repeat grows by addition or subtraction of the "beads" of simple units that make up the "necklace". A dinucleotide AC repeat would grow by addition or subtraction of AC units. 

SNP: see Single Nucleotide Polymorphism

Species: A single, distinct class of living creature with features that distinguish it from others.

Stepwise mutation: Microsatellite variation appears to result from slippage in replication, which is most likely to add or delete a single repeat unit (steps of one). As a result, alleles more similar in size will presumably be more closely related. This additional 'phylogenetic' information can be used in assessing genetic differentiation or genetic distance.

STR:  See Short Tandem Repeats.

Tandem Repeat Sequences: Multiple copies of the same base sequence on a chromosome; used as a marker in physical mapping.

Taxon (plural taxa): Group of organisms linked by common ancestry. Taxa can range in scale from populations to kingdoms.

The SNP Consortium (TSC): Established (April 1999) to identify SNPs and add them to the public domain rather than patenting them for commercial use. A joint enterprise of  pharmaceutical companies (AstraZeneca, Bayer, Bristol-Myers Squibb, F. Hoffmann-La Roche, Glaxo Wellcome, Aventis/Hoechst Marion Rouse, Novartis, Pfizer, Searle, and SmithKline Beecham) and the Wellcome Trust, with sequencing carried out by three public genomics institutes (Sanger Centre, Washington University School of Medicine, and the Whitehead Institute) with Stanford University Human Genome Center contributing mapping and Cold Spring Harbor Laboratory coordinating bioinformatics activities. [CHI SNPs report] Motorola joined in October 1999 and IBM in March 2000.

Thymine:  One of the four bases in DNA that make up the letters ATGC, thymine is the "T". The others are “A” for adenine, “G” for guanine, and “C” for cytosine. Thymine always pairs with adenine. Cytosine always pairs with guanine. These letters are used as shorthand for the sequences of fragments of DNA e.g. CCAAGTAC. These sequences are the code for genetic information.

Traits: Ways of looking, thinking, or being. Traits that are genetic are passed down through the genes from parents to offspring.

TSC: see The SNP Consortium

X-Chromosome: A chromosome that is different in the two sexes and involved in sex determination.  The female in our species has two X chromosomes.

Y-Chromosome:  A chromosome that is different in the two sexes and involved in sex determination.  The male in our species has one Y and one  X chromosome.

Home |  Contact