The Human Genome ProjectWritten and produced by Bill Stonebarger
Chemists, physicists, biologists and computer scientists from around the world have now completed the most ambitious biological research project of all time - the Human Genome Project.
One of the lead scientists who worked overtime on this project was Dr. Lloyd Smith, a chemist at the University of Wisconsin in Madison. We asked him just what is the Human Genome Project?
"The pithiest summary of it is that it is a project to determine the sequence of the structure of all the DNA of the human species. And that ends up being a pretty good-sized task because it turns out that every cell in your body has about 3 billion bases of DNA in it and the total amount of DNA sequences that have been determined in the history of mankind over the last 25 years is maybe 100 million bases.”
That was five years ago. Thanks to the invention and hard work of Dr Smith and a few hundred other genome scientists around the world the pace of progress increased dramatically from 1998 to 2001. In 2001 in a White House ceremony Dr. Francis Collins, head of The Human Genome Project and Dr. J. Craig Venter, president of a private company Celera Genomics joined with then President Clinton to announce the first working draft of the Human Genome. And then in April of 2003 the final draft of the Human Genome was announced and published —in effect the completion of this monumental project.
We now have, in other words, nature’s blueprint for constructing a human being. Here it is. A four-letter code written in very long unpunctuated sentences on 23 pair of human chromosomes. This blueprint for making a human being took nature three and a half billion years to create. Dr. Smith along with hundreds of colleagues in laboratories around the world took 13 years to learn to read and record the entire code.
Now comes the hard part—understanding what it says and how it works.
To understand what has been accomplished already as well as what remains to be done, we need to go back and start with the biggest picture of all, as Dr. Smith implied, the history of mankind. He, however, was talking only about the strides taken in the last 25 years in molecular biology laboratories. In a sense the history of the last 4 billion years provides a broader insight into the importance of the Human Genome Project.
Consider: Four or five billion years ago the dust from an exploded star gathered together to make a small planet we now call earth. A billion or so years later, in the ancient oceans, some of the most common of the atoms that made up that dust: carbon, hydrogen, oxygen, nitrogen, sulfur, phosphorus, a few more gathered together to make structures that we call living. The important thing to note for our story is that it was in those first days of life on earth that the basic patterns were set. All life thereafter, including human life today, is built on those first basic templates.
The most basic template of all looks like this: a very large molecule called deoxyribonucleic acid, DNA for short. It is the DNA in the nucleus of all living cells that directs the life activities of all living cells and hence of all living organisms, plant, animal and human. All means not only the organisms living today, but all the plants and animals that have lived in the three to four billion year history of life on earth!
DNA is like the software code in a computer, the program that tells the computer hardware what to do and how to do it. Humans invented this computer code. Nature invented the code on DNA that directs all living activities. The Human Genome Project is deciphering that code. You can see why Lloyd Smith says it is a pretty good-sized task. Before looking at some details of just how scientists are deciphering that code, let’s review what we know today about how DNA is built and how it works.
First the structure. Scientists discovered a few decades ago that DNA is a very long molecule shaped like a twisted ladder. The rails of the ladder are made of alternating chemical groups of sugar and phosphate. The all-important code is carried by the DNA molecule on the rungs of the ladder, as in this model diagram.
There are four different kind of rungs. Each rung has a particular chemical structure as in this diagram. A. G. C. T. Adenine, Guanine, Cytosine and Thymine. Chemically these are four examples of well-known structures called bases, the opposite of acids.
Just as the 26 letters of the English alphabet can be put together in an almost infinite variety of sequences to make an almost infinite number of words, so too, the four letters of the DNA alphabet (these four bases-A G C and T) can be placed in lengthy sequences in an almost infinite variety of ways to make an almost infinite number of messages. In practice there are only a few hundred thousand English words in even the largest dictionary. In nature there are tens of millions of species of plants and animals that live or have ever lived on planet earth. The amazing thing is that all of these millions of separate species of plants and animals use very similar codes on their very similar DNA molecules to stay alive and to reproduce.
The DNA in microscopic yeast cells, for instance, has already been sequenced and shown to have many of the same codes as the DNA in humans. All of the DNA in a common microscopic roundworm has now been sequenced and shown to have many of the same codes as the DNA in humans. We share over 90% of our DNA code with mice and rats and over 99% with our close relatives, the chimpanzees.
How does it work? How does the master DNA code get translated into living activities whether the living organism is a yeast, a worm, a mouse, a chimpanzee or a human? The code works at the basic level of life, the cell. The human body is made of a hundred trillion cells. Each of these cells has a nucleus. Inside that nucleus are spindly structures called chromosomes. Chromosomes come in pairs. Humans have 23 pairs. One of each pair came from the mother and one from the father.
A single chromosome is in fact a single very long DNA molecule coiled inside a protective protein coat. If you take out the DNA from a single chromosome, uncoil it and stretch it out, it would be a few centimeters in length. If you took all the DNA from a single human cell and stretched it end to end it would be about 3 billion bases long and 2 or 3 meters in length. 3 billion bases means that our four letters, A-Adenine, C-Cytosine, G-Guanine, and T-Thymine would appear 3 billion times!
And that’s only one cell. If you could take all the DNA from all of the hundred trillion cells of one human baby and uncoil it and stretch it in a line ~ that line would be long enough to make 15 round trips from the Sun to Pluto, the farthest planet in the solar system.
How's that for efficient packaging!
The coded instructions that actually govern the cell's activities come on discrete small sections of the DNA. These sections are called genes. Each gene is itself a sequence of between a few thousand and a few hundred thousand – or more -- bases! This sequence of bases on a gene is the code that tells the cell how to manufacture a protein. Say a protein like insulin or hemoglobin or one of the digestive juices in your stomach. Or one of the combination of proteins that make for the color of your eyes, your hair, your skin. Or one of the combination of proteins that make for your height, your stature, your athletic or musical or mathematical ability. Proteins, in other words, are the stuff of which cells and tissues and whole human beings are built and operated. And since genes make proteins, you can see why an understanding of the genetic code is central to an understanding of life itself-all life. Genes, surprisingly, make up only about three percent of the DNA.
The rest is sometimes called “junk DNA.” This is probably not an accurate term because molecular biologists and geneticists are already finding it does play important parts in turning genes on and off and probably in many other as yet unknown functions of the cell.
So how do scientists today decipher that living code? Let's take a look.
"The basic idea is ... DNA sequencing is enzymatic chemistry that allows you to take a piece of DNA and create a set of fragments that all have a common start point but terminate in a base specific manner, four bases, A, C, G, T." The chemistry of how to break up a DNA molecule into workable fragments was worked out over the last three decades using test tubes, centrifuges, special chemical reagents called restriction enzymes and most important of all a recent invention, the PCR machine. PCR stands for polymerase-chain-reaction. Here is what one looks like. They are widely available today for a few thousand dollars each. What they do is take one small segment of DNA and using a heating and cooling routine, multiply that segment, producing in a few hours millions of exact copies. Millions of DNA clones in other words.
Today much of this molecular biology can be done in automated laboratories like this one, custom-built by Dr. Smith and his colleagues at the University of Wisconsin. Dr. Dave Rank explains ...
"This robotic work station can perform molecular biology tasks and DNA sequencing for isolation of DNA through the sequencing of DNA and purification of the sequencing reaction."
In other words you put DNA samples obtained from a human cell in at one end and in a few hours you have cut up the DNA into small pieces and then purified and multiplied these fragments. You take these purified fragments to another room where you can separate them from one another by a technique called electrophoresis and then the actual sequence of bases on each DNA fragment can be detected by a moving laser beam. Dr. Mike Westphall describes how this works.
"The DNA goes into the top portion and there is a gel, a kind of jello-like substance between two plates of glass and the substance has a bunch of pores which the DNA moves through and we load the DNA at the top using a syringe, into wells formed in the gel, using a device called a sharks-tooth comb. You apply a large electrical field and it causes the DNA to migrate through the gel and it migrates from the top to the bottom.. and this detector is scanning over the gel and it detects the DNA as it passes through... " There are multiple lanes in the gel. A different fragment of DNA runs down each lane. Each of the four bases has a different dye attached to it. The detector has a laser probe which moves across each lane and identifies each of four bases - A C G and T - as they pass through the bottom of the machine. It sends this data to a computer nearby.
"After we get done with the DNA sequencing all the data is stored in a single file and we must transfer that file and then reconstruct it with a software program. That program goes by the name of Gel Imager which takes the image produced by the scan in all the individual lanes, each containing one of the separate DNA samples we loaded into the gel using the sharks-tooth comb. What happens is the sample separates out as it goes down the gel exposing its individual bases on a given strand of DNA."
After a special computer program massages this data you can see it here as many lanes of color blips on the computer screen. You can convert these confusing lanes of data to individual graphs on a horizontal line. Each blip and each blip color there stands for an individual base. A C G or T sequenced along a fragment of DNA, a fragment of a human gene.
"So this would be one very small segment of a human gene?"
"That's right, a very small one is correct.. In the whole sequencing process you take this long segment of DNA and you keep going through different phases of breaking it down into smaller and smaller pieces and hopefully keeping some order to these pieces .. and after you've broken it down to its smallest constituents we're now finding individual base sequences of that smallest chunk. Then you kind of work backwards and it’s kind of like assembling a puzzle, putting it all back together and seeing if you can determine the four letters. You've broken it down into chunks which are about 40 thousand bases long and from that we've broken it down into smaller segments which are one thousand to 1.5 thousand bases long. ..and then you can start tacking these together and when all is said and done you come back to the great big piece you started with."
That great big piece is one human gene! Decoded. Put now in the computer dictionary as one entry in the predicted one hundred thousand human genome encyclopedia to come. This automated laboratory at the University of Wisconsin was one of the pioneer labs funded by the Department of Energy and the National Institutes of Health, partners in the federal Human Genome Project that began in 1990. At the beginning of the Project the cost to discover just one base in the 3 billion-base human DNA sequence was about $10. Building on this technical foundation, the automation technologies and techniques improved dramatically until by the end of the project in 2003 the cost to sequence a single base had decreased 100-fold to less than ten cents a base.
At roughly the same Dr. Smith and his colleagues around the world were working under the federally financed Human Genome Project another privately funded company, Celera Genomics in Maryland, was also pursuing the same goal using a different automated strategy called “shotgun sequencing.” Both strategies worked and both achieved their final goal at almost exactly the same time in 2003. By then both also had an error rate of only 1 base in every 100,000 base pairs, ten times more accurate than they originally had predicted.
The big surprise, however, was that contrary to almost all expectations, both groups found many many fewer human genes than just about all geneticists had predicted. It seems clear now that human beings are constructed with a blueprint of only 20,000 to 25,000 working genes, rather than the 100,000 that had been predicted just a few years ago.
Up until now we have shown only the basic research. What good will it do?
"If you want to understand biology at a really comprehensive level, a systematic level, you kind of pretty quickly know that you'd want to know at a minimum the information as to what biology is made of. It’s hard to analyze something if you don’t know what you're analyzing. And basically the sequence of all the genes encodes the structure of all the proteins, sort of the minimal fundamental building blocks. There was a lot of resistance to this concept when the genome project started. That resistance has pretty much gone away, in part because the power and utility of this information has become so obvious no one can overlook it.
"So, for instance, the yeast genome has been completely sequenced. It has about 10 million bases and yeast biologists all over the world are now using all the yeast sequence all the time and making maps now and starting to put it all together. ..even then we know remarkably little about what really makes this thing tick .. even with this sequencing information. You know, how does it divide, how does it control its growth, how does it respond to its environment, how does it move and control its gene expression. All these questions are hard questions which you can work on for a long time and what you do when you produce all this sequencing is you give people a way, a very powerful sort of infrastructure tool for help as they go after the hard questions."
Scientists are also rapidly sequencing the genomes of many other species. Besides yeast, the genomic sequences of bacteria, roundworms, mustard plants, fruit flies, mice, monkeys, chimpanzees as well as many other species of plants and animals have already been sequenced. Many gene sequences are the same in species as different as yeast and human beings, showing once again the basic unity of all living things on the earth.
As more and more genomes become sequenced and available to researchers we will have a more and more complete understanding of the evolution of all plant and animal species on earth, including our own species, Homo sapiens. And by comparing DNA sequences in humans from around the world we can already begin to trace a clearer line of evolutionary change and development from our ancestors in Africa to the many ethnic, racial and even down to the individuals in the world today.
So far we have emphasized the likeness of DNA in all species of living things and talked of human DNA as though it was the same in all humans. It is over 99.99% the same in all humans. But that is not exactly the same. In fact, except for identical twins, each human being has a unique genome. Remember, human chromosomes come in pairs. One chromosome from your mother, one from your father. That means that genes come in pairs too. One set from your mother, one set from your father.
In order to make the protein hemoglobin, for instance, you have not one, but two genes. If both of those genes are exactly the same normal code there is no problem. You have normal hemoglobin, and normal red blood cells. However sometimes a gene for making this protein can have a single mistake in the base code. What is called a mutation happened at some time in the past. That is, the normal gene changed-maybe just one base in the long sequence got knocked out of order somehow. It is called a SNP. This mutated gene, this SNP gets passed on to children. In the case of the hemoglobin genes an unfortunately fairly common mutation can lead to a serious hereditary disease called sickle cell anemia. In sickle cell anemia the red blood cells take on a sickle shape and cannot as easily carry oxygen to body cells. Now, if one of the hemoglobin gene pair is normal and the other member of the pair is defective, the person will not have the disease. However this person will be a carrier of the mutated gene. When he or she has a child, the child has a fifty-fifty chance of getting this mutated gene. If both parents are carriers and each passes on the mutated gene, in the child both hemoglobin genes in the pair would be defective and the child would have the disease.
We think of mutated genes as bad genes, and they usually are. In the long run of evolution, however, human beings are the result of mutated genes that were good. In fact everyone of our 20,000 to 25,000 genes is a successful mutant gene, created and then passed on over four billion years by the evolutionary process..
And in fact it is these mutant genes that make us all delightfully different from each other. It is mutant genes, in other words, that have led to the incredible diversity of living things on this living earth.
In the short run, though, mutant genes are a often a serious problem in the human species. Diseases like sickle cell anemia, cystic fibrosis, Tay-Sachs, Down's Syndrome and Huntington disease are all caused by a mutation in a single gene. Here is one place where the human genome project may be of great help in the future.
Dr. Smith explains just one example of benefits already realized from basic research into the human genome.
"Some good examples of that ... another company I work with, Visible Genetics, has been doing sequencing of the retino blastoma gene. That is a gene that is involved in the generation of cancer of the eye and it turns out that if you’re in a family that has that gene and you don’t do any genetic testing then you don't know which of your children have that gene and which don't. And it turns out since you don't know what you have to do is end up having to do these examinations of the eye under general anesthesia which are pretty expensive and they have to do them every six months on young children to detect if there is going to be an early occurrence of eye cancer. So you can try to avoid it by laser surgery at an early stage.
"So it's a very expensive test. It's very stressful for the children, for the whole family. And everybody's sort of sitting on a bomb and they don’t know when it's going to go off.
"Visible Genetics put out this test that allows them to go in and rapidly sequence those genes from the affected members of the family. Once they do that they can find out what the mutation is in the gene that's causing the problem and that allows them to go and very quickly and easily test the children and find out which children have the bad gene and which don’t. The ones that don’t are right away free and clear. They're out of it. No test, no general anesthesia, no anxiety. So that’s a huge benefit right away. And the ones that do, you can also begin building up a data base to look at what the prognosis is based on different types of mutations and also try to tailor treatment that is specific for those mutations. So in general that's one interesting example of the social payoffs. The social payoffs going toward targeted treatment of people based on their genetic types."
As in all other fields of science and technology, when genuine knowledge increases, there are benefits and there are risks. Automobiles in the 20th century changed the world by making it easier for humans to move around. Few people would want to give up the benefits. On the other hand, more people have been maimed and killed in automobile crashes than in all the wars of the 20th century. So too, the human genome project will dramatically increase our ability to diagnose, to cure and to prevent many diseases.
Some critics worry that it will also give us more information than we want to know? Let's say a doctor finds in a routine physical that would include a genetic scan, that you carry a gene that predicts a high probability of you getting breast cancer or prostate cancer or heart attack at an early age. You may or may not want to know this, but you almost certainly do not want your insurance company to know it. In the future will insurance companies require this kind of genetic testing as a prerequisite for giving you insurance? Legislatures are already considering laws to prevent this from happening
Genetic studies have already shown that at least five or six thousand human genes out of the 20 to 25 thousand total do have mutations that can cause health problems, some minor, others very serious. Examples of the latter would be cystic fibrosis, Huntington’s disease, diabetes, muscular dystrophy and perhaps Alzheimer's disease and some forms of cancer. As with sickle cell anemia, many people are carriers of these genes but do not actually have the disease because the gene on the opposite chromosome in the pair is normal. Knowledge gained in the Genome Project will enable technicians to test and discover whether you carry a mutated gene in your unique genome.
What do you do when you find out that you do carry this gene and your spouse also carries it? In that case there is a 25% chance any children you have will inherit the disease. Here is one possibility that has already been used. Using new in-vitro fertilization techniques doctors can genetically screen a number of very early embryos while they are still in a culture dish and find out whether they do have the gene pair that produces the disease. Only embryos that do not carry the mutated gene pair will then be implanted in the mother's uterus.
Of course using the same techniques some critics have pointed out, couples could in the more distant future not only prevent serious diseases from being passed on, they could also choose among desirable genes for their offspring. While there is no single gene that leads to "good looks, or intelligence or athletic ability or musical genius, there is a strong genetic component to these traits. When the complete human genome is worked out we will probably know more about which genes are involved in complex human traits. Armed with this knowledge you might be able to screen embryos for genes that would lead to children with not only no genetic defects but for children with desired sex, height, body build, athletic talent, musical talent, mathematical talent.
Some have even envisioned the day will come when prospective parents could sit before a computer screen and choose among many alternatives what traits they want their next child to have. It is not inconceivable that cut and paste techniques already developed in other plant and animal genetics could be used to insert desirable genes or remove undesirable ones. There is even the possibility of adding an entire chromosome with many novel genes to the embryo-genes never before a part of the human genome!
Other animals, for instance, have genes for seeing infrared or ultraviolet light. You might be able to insert a gene for such a capability into a human embryo. Or you could insert a gene to enable the offspring to detect odors as well as a dog or cat. Or to navigate by sonar like a bat. Or generate electricity like an eel. Or navigate magnetically like a bird. Or ....
Lee M. Silver, a professor of Molecular Biology at Princeton University, has gone so far as to predict that such genetic screening and reconstruction is not only possible but inevitable. He thinks that by the 22nd century this may well be the norm-that is, for educated, well-to-do parents who can afford the expense. What about parents who either cannot afford the expense or decide on religious or moral grounds that they do not want to interfere with nature in these dramatic new ways? The long term effect of this split may be surprising says Silver.
Controlling human genetic continuity for the first time in history may lead to a truly revolutionary change in a few centuries, a blink of the eyelash when it comes to evolution. The well-to-do scientifically oriented humans, says Silver, may well make themselves into a different sub-species from the natural humans. And in fact, human controlled evolution may accelerate to the point where there will be at least two varieties of Homo sapiens, so distinct in their genomes that they will be actually different species and will no longer be able to mate and produce fertile offspring.
Silver calls his book Remaking Eden to call attention to the revolutionary events that may accompany that step into the genetic unknown.
The choices individuals and society will face in the coming century are indeed awesome. Whatever actually happens in the 21st century, it seems certain that the words of Neil Armstrong when he first landed on the moon will apply even more aptly to the Human Genome Project. "A small step for man, a giant step for mankind."