Whose Idea Was This?TopPatterns and LanguagesThe Language of Our Genes

The Language of Our Genes

One regularly reads about genes containing language; sometimes the "book of life" others would have it the "program" for the development of an organism. Other times you read about the DNA being a "blueprint" for an animal.

In a ceremony at the White House, on June 26, 2000, to celebrate the initial successes of the Human Genome Project, President Clinton said, "Today we are learning the language in which God created life."1

Kevin Davies, the author of a history of the Human Genome Project, the decade-long effort to "read" the sequence of the human genome, writes: "We are the first species with the intelligence to be able to read the text of life."2 Later, he refers to DNA explicitly as a computer program: "DNA is essentially digital information, a 3-billion-year-old Fortran code."3 Matt Ridley, the author of a popular recent book on genetics, thinks it's more than just a comparison:

The idea of the genome as a book is not, strictly speaking, even a metaphor. It is literally true. A book is a piece of digital information written in linear, one-dimensional and one-directional form and defined by a code that transliterates a small alphabet of signs into a large lexicon of meanings through the order of their groupings. So is a genome.4

Any alert reader of popular genetics literature can come up with dozens of similar references. But what are we to make of this comparison? What is it that these writers mean by their comparison? Ridley goes on to elaborate the analogy:

The filament of DNA is information, a message written in a code of chemicals, one chemical for each letter. It is almost too good to be true, but the code turns out to be written in a way that we can understand.5

As Ridley has said, this is not meant as an analogy. We are meant to understand this as a scientific fact. Ridley even includes a short history of information theory to make the point. None of this is meant to be his own contribution; he is simply representing the consensus view of the subject, shared by every writer who has let slip the unqualifed metaphor.

A characteristic of scientific facts is that we can infer conclusions from them. What kind of conclusions do we infer from the equation between DNA and information?

Here's a passage from a recent introductory genetics text:

Our bodies contain billions of cells. In each of those cells is a nucleus that contains all the information required to make a complete human being. The information exists in the form of 50,000 to 100,000 structures called genes. Each gene possesses the ability to encode one protein...6

The authors here have concluded from somewhere that all the information necessary to make an organism is contained in the DNA. They present it as uncontested; this is the opening paragraph of the book. One might complain that this was not inferred from the metaphor, but from data about DNA. But in fact, the assertion that one could duplicate some animal relying solely on the information in its DNA is nothing more than speculation, unsupported by data. No one has ever proven it to be true, and there exists a wealth of data to contradict it, much of which has been available since the dawn of the age of the double helix. So where did this idea come from that all the information needed to create an organism is in the "language" of DNA?

It could have come from Nobel Laureates. Here's what Walter Gilbert, who recieved the Nobel Prize in 1980 for his work in uncovering the function of DNA, said back at the inception of the Human Genome Project:

Three billion bases of sequence can be put on a single compact disc (CD), and one will be able to pull a CD out of one's pocket and say, `Here is a human being; it's me!'7

Though few molecular biologists today will admit to having shared in it, this kind of hubris was not at all uncommon in the recent past. In an interview during the 1970's, Jacques Monod, who helped discover the regulatory function of DNA said:

The secret of life? But this is in large part known--in principle if not in all details. For a simple living creature to be synthesized, in my opinion, there is no further principle that would need to be discovered.8

The Human Genome Project, recently past the first hurdle of completing the human DNA sequence, has put a merciful end to this sort of talk. This was, of course, not the intent of the planners, who, like Gilbert, apparently expected to see the inner workings of life itself unfolded on the computer screens in front of them. But as with any field, the more you learn, the more you realize how much is left to learn. We've learned a lot from the HGP, and a lot of it is about the dimensions of our ignorance. We've learned, for example, that there are many fewer genes encoded in our DNA than we'd thought, that rearrangements of genetic elements play a much more important role than previously thought, and that duplication of genetic elements is far more common in humans than in yeast, fly, or worm genomes.9 In other words, we've learned a lot of facts whose proper interpretation is far from obvious. That is, we've learned that we are not about to find the Holy Grail, and that though it is possible we may now be in the same fairy land as the Grail, it is a far larger domain than we once could have imagined. (With a more spiteful Faerie Queen, too.) And so, scientists have learned to use a new spice in their gene talk: humility.

One of the two papers presenting the completion of the first sequencing of the human genome contained this admission:

The modest number of human genes means that we must look elsewhere for the mechanisms that generate the complexities inherent in human development.10

So if the idea that all our genetic information is in our DNA didn't come from the data, where did it come from?


Whose Idea Was This?TopPatterns and LanguagesThe Language of Our Genes