Human Genome Project (HGP)
· Human Genome Project was a 13-year project, that was launched in the year 1990 and completed in 2003.
· This project was coordinated by the U.S. Department of Energy and the National Institute of Health.
· During the early years of the project, the Wellcome Trust (U.K) became a major partner; other countries like Japan, Germany, China and France contributed significantly.
· Its aim was to find out the complete DNA sequences for the human genome.
· The two factors that made this possible are:
(I) Genetic engineering techniques, with which it was possible to isolate and clone any segment of DNA
(II) Availability of simple and fast techniques, for determining the DNA sequences.
· Human Genome Project was called a mega project for the following facts:
(I) The human genome has approximately 3.3 109 bp; if the cost of sequencing is US $3 per bp, the approximate cost is about US $9 billions.
(II) If the sequence obtained were to be stored in typed form in books and if each page contained 1000 letters and each book contained 1000 pages, then 3300 such books would be needed to store the complete information.
(III) The enormous quantity of data expected to be generated also necessitates the use of high speed computational devices for data storage, retrieval and analysis.
· The project was closely associated with a new branch of biology, called bioinformatics.
(A) Goals of HGP
· Some major/important goals of HGP are to:
(I) Identity all the genes (approximately 20000-25000) in human DNA.
(II) Determine the sequences of the three billion base pairs present in human DNA.
(III) Store this information in data bases.
(IV) Improve the tools for data analysis.
(V) Transfer the technologies to other sectors (like industries).
(VI) Address the ethical, legal and social issues (ELSI), that may arise from this project.
(B) Advantages/Uses of HGP
(I) Knowledge of the effects of variations of DNA among individuals can revolutionise the ways to diagnose, treat and even prevent a number of diseases/disorders that affect human beings.
(II) It provides clues to the understanding of human biology.
(C) Methodologies of HGP
¾ The methods involved two major approaches:
(I) One approach, called Expressed Sequence Tags (ESTs), focused on identifying all the genes that expressed as RNA.
(II) Second approach, called Sequence Annotation, was to simply sequence the whole set of genome, that included all the coding and non-coding sequence and later assigning functions to different regions in the sequence.
¾ The total DNA from the cells is isolated and converted into random fragments of relatively smaller sizes . These fragments are then cloned in suitable hosts using specialized vectors. The commonly used hosts are bacteria and yeast and the vectors are bacterial artificial chromosomes (BAC) and yeast artificial chromosomes (YAC).
¾ The fragments are then sequenced using automated DNA sequences, which work on the principle developed by Frederick Sanger.
¾ The sequences were them arranged on the basis of certain overlapping regions present in them; this required the generation of overlapping fragments for sequencing.
¾ Specialised computer based programmes were developed for alignment of the sequences.
¾ These sequences were annotated and assigned to the respective chromosomes.
¾ The next task was to assign the genetic and physical maps on the genome; this was generated using the information on polymorphism of restriction endonuclease recognition sites and certain repetitive DNA sequences, called microsatellites.
(D) Salient features of Human Genome
¾ Following are some of the salient observations derived from HGP.
(I) The human genome contains 3164.7 million nucleotides (base pairs).
(II) The size of the genes varies; an average gene consists of 3000 bases, while the largest gene, dystrophin consists of 2.4 million bases.
(III) The total number of genes is estimated as 30000 and 99.9% of the nucleotides are the same in all humans.
(IV) The functions of over 50% of the discovered genes are not known.
(V) Only less than 2% of the genome codes for proteins.
(VI) Repetitive sequences make up a large portion of the human genome.
(VII) Repetitive sequences throw light on chromosome structure and dynamics and evolution, thought they are thought to have no direct coding functions.
(VIII) Chromosome 1 has 2968 genes (the maximum) and the Y-chromosome has 231 genes (the least).
(IX) Scientists have identified about 1.4 million locations, where DNA differs in single base in human beings; these are called single nucleotide polymorphisms (SNPs).
(E) Applications/Future challenges of HGP
(I) Having the complete sequence of human genome, will enable a radically new approach to biological research, i.e., a systematic approach on a much broader scale.
(II) All the genes in a genome or all the transcripts in a particular tissue/organ/tumor can be studied.
(III) It will be possible to understand how the enormous number of genes and projects work together in interconnected networks in the chemistry of life.