- Shopping Bag ( 0 items )
"Captivating... hard to put down."Choice
"A bracing insider's account of why gene structure matters to science and commerce."American Scientist
How genomics is bringing biology into the Digital Age
In this important book, a scientist gives us an inside account of the historic paradigm shift under way in the life sciences as a result of the Human Genome Project and provides a philosophical framework in ...
Ships from: Somerville, NJ
Usually ships in 1-2 business days
Ships from: Chatham, NJ
Usually ships in 1-2 business days
Ships from: acton, MA
Usually ships in 1-2 business days
"Captivating... hard to put down."Choice
"A bracing insider's account of why gene structure matters to science and commerce."American Scientist
How genomics is bringing biology into the Digital Age
In this important book, a scientist gives us an inside account of the historic paradigm shift under way in the life sciences as a result of the Human Genome Project and provides a philosophical framework in which to understand biology and medicine as information sciences. In a story told on many fascinating levels, Gary Zweiger introduces us to the visionaries who first understood genes as information carriers and chronicles how their early efforts led to the birth of the new science of genomics. He provides insights into the uneasy collaboration of private, government, and academic efforts, the role of the pharmaceutical companies, and the influence of venture capitalists on one of the most ambitious and potentially significant scientific undertakings in history. Most important, he explores the profound impact that the transducing of biological information into a digital format already has had on biological research and medicine, and the equally profound effect it is sure to have on our understanding of ourselves and all living creatures.
Although most biology textbooks neglect to mention it, information is as fundamental and unique to life as either metabolism orreproduction. Encoded messages occur in a myriad of forms and are transmitted between a myriad of different types of receivers and senders. Information is sent when a beaver slaps its tail on the water upon sensing a danger, when a plant releases a fragrant odor, when a bacterial protein signals a gene to turn on the production of catabolic enzymes, and when a nerve impulse causes a muscle to contract. In each case, whether between organisms or within an organism, a coded message is provided. Communications such as these are not just the foundation of life, they are its essence.
Human beings have distinguished themselves among other species on earth by continually developing and adopting new and improved ways of exchanging information. No other species comes close to matching our language, speech, and writing capabilities. Of course, human ingenuity and innovation have not stopped there. Modern information technology greatly facilitates the storage, processing, and conveyance of information. Weightless or nearly weightless electrons and electromagnetic waves travel at or near the speed of light (almost one billion feet per second). And, after only a few hundred years of development, these information conduits have enabled stunning advances, such as the Internet and other global communication networks and a machine that can outperform the best living chess player.
Welcome to the Information Age, where the movement of speedy electrons and electromagnetic waves has replaced much of our mechanical and mental work. Fewer and fewer people turn knobs on TV sets, rotate dials on telephones, write letters by hand, and/or tally bills on abacuses. We can (and, more often than not, want to) do it faster, better, and cheaper using devices that transduce our thoughts or desires into electronic signals. We telecommute and use e-mail, remote control devices, voice recognition software, and so forth. The resulting electric signals may be readily digitized, processed, stored, replicated countless times over, and transmitted over long distances before being converted into images, sounds, or other stimuli suitable for the human senses.
Information circuitry is not only external. The senses are portals to an internal information network. Ears, eyes, nose, skin, and mouth convert patterns of touch, sound, odor, taste, or light into patterns of nerve impulses. This information is passed along by waves of ions (charged atoms or molecules) moving across the outer membrane of neurons (nerve cells). These waves (action potentials) move at rates of up to 100 feet per second (68 mph). This may seem pitifully slow compared to today's electronic and electromagnetic speedways, but at the time that it developed, beginning about 500 million years ago, neuronal signaling was revolutionary. Neurons provide a much faster and more efficient channel of communication than either chemical diffusion or any sort of fluid pump. Neuronal signaling allowed quick and coordinated responses to environmental stimuli. It also led to the development of the brain, cognition, and consciousness.
The nervous system was derived from and superimposed upon yet another information network, a more ancient network, and the mother of all information networks. Living tissue is composed of millions of different proteins, nucleic acids, fats, and other chemical entities. These are the molecules of life and the subject of countless research studies. They can be understood in terms of their physical structures and their chemical properties. However, they can also be understood in terms of the information that they convey.
What kind of molecular messages are being sent within us? Who is the sender and receiver and what is being said? Genes are the most obvious conveyors of information within living beings. Gregor Mendel first characterized genes as units of hereditary information, agents responsible for determining particular traits and characteristics that are passed from parent to offspring. He referred to them as factors in an 1865 publication, but at that time and for many decades thereafter no one knew precisely how these units of hereditary information were stored or how they translated into traits and characteristics. In 1943 Erwin Schrodinger speculated that genes were "some kind of code-script," and this is indeed the case. We now know that genes are encoded by a series of molecules known as nucleotides, which are the components of deoxyribonucleic acid (DNA). Genes provide coded instructions for the production of millions of additional nucleic acids and proteins.
The word factor has a physical connotation, and thus from the very start genes have had dual personalities. Like a photon of light teetering between matter and energy, a gene teeters between matter and information. On one hand, a gene is not unlike the 100 or so chemical elements of nature. Each gene may be a distinct and rigidly defined composition of nitrogen, oxygen, hydrogen, carbon, and phosphorous; one that participates in a series of chemical reactions that results in the production of additional nucleic acids and proteins. But this is like saying that the United States Constitution is a particular construction of plant pulp and ink. Genes encoded in DNA convey information to additional nucleic acids, which relay the messages to proteins, which convey signals throughout the organism. A complete set of genes, the genome, carries instructions, or a blueprint, for the development and function of an entire organism.
As information it does not really matter how the gene is encoded, so long as the message can be received and decoded. There is redundancy in the genetic code; the same gene may be encoded by any one of a number of different nucleotide sequences. A gene may also be encoded in an entirely different medium. In a classic instructional film from the early 1970s, a DNA sequence is portrayed by dancers, each wearing one of four brightly colored costumes, representing four types of nucleotides. The dancers simulate the production of a protein through their choreographed movements. Nowadays, no matter whether the gene is encoded by a string of nucleotides, costumed dancers, words, or Os and 1s (binary code), in a laboratory it can be readily converted into biologically active proteins...
Science is about providing truthful explanations and trustworthy predictions to an otherwise poorly understood and unpredictable world. Among the greatest of scientific challenges is cancer. We've been in a state of declared war with cancer for decades, yet despite rising expenditures on research (close to $6 billion in 2000 in the United States alone) and treatment (about $40 billion in 2000 in the U.S.), cancer remains a mysterious and seemingly indiscriminant killer. Each year about 10 million people learn that they have cancer (1.2 million in the U.S.) and 7.2 million succumb to it (600,000 in the U.S.), often after much suffering and pain.
Cancer is a group of diseases characterized by uncontrolled and insidious cell growth. The diseases' unpredictable course and uncertain response to treatment are particularly vexing. Cancer patients show tremendous variation in their response to treatment, from miraculous recovery to sudden death. This uncertainty is heart-wrenching for patients, their loved ones, and their caregivers. Moreover, there is little certainty about what will trigger the onset of uncontrolled cell growth. With cancer, far too frequently, one feels that one's fate relies on nothing more than a roll of the dice. If your aunt and your grandmother had bladder cancer, then you may have a 2.6-fold greater chance of getting it than otherwise. If you butter your bread you may be twice as likely to succumb to a sarcoma than you will be if you use jam. A particular chemotherapeutic drug may give you a 40 percent chance of surviving breast cancer—or only a 10 percent chance if you already failed therapy with another chemotherapeutic drug. Clearly, cancers are complex diseases with multiple factors (both internal and external) affecting disease onset and progression. And clearly, despite tremendous advances, science has yet to win any battle that can be seen as decisive in the war against cancer.
Perhaps, a revolutionary new approach, a new framework of thinking about biology and medicine, will allow us to demystify cancer and bring about a decisive victory. The outlines of what may prove to be a successful new scientific paradigm are already being drawn.
Knowing one's enemy often helps in defeating one's enemy, and in the early 1980s Leigh Anderson, John Taylor, and colleagues at Argonne National Laboratory in Illinois pioneered a new method for knowing human cancers. Indeed, it was a new way of knowing all types of cells. Previous classification schemes relied on visual inspection of cells under a microscope or on the detection of particular molecules (known as markers) on the surface of the cells. Such techniques could be used to place cancers into broad categories. A kidney tumor could be distinguished from one derived from the nearby adrenal gland, for example. However, a specific tumor that might respond well to a particular chemotherapeutic agent could often not be distinguished from one that would respond poorly. A tumor that was likely to spread to other parts of the body (metastasize) often could not be distinguished from one that was not. They often looked the same under a microscope and had the same markers. The Argonne team took a deeper look. They broke open tumor cells and surveyed their molecular components. More precisely, they surveyed their full complement of proteins. Proteins are the workhorses of the cell. They provide cell structure, catalyze chemical reactions, and are more directly responsible for cell function than any other class of molecules. Inherent differences in tumors' responses to treatment would, presumably, be reflected by differences in their respective protein compositions.
Anderson, who holds degrees in both physics and molecular biology, was skilled in a technique known as two-dimensional gel electrophoresis. In this procedure the full set of proteins from a group of cells is spread out on a rectangular gel through the application of an electrical current in one direction and a chemical gradient in the orthogonal (perpendicular) direction. The proteins are radioactively labeled, and the intensity of the emitted radiation reflects their relative abundance (their so-called "level of expression"). X-ray film converts this radiation into a constellation of spots, where each spot represents a distinct protein, and the size and intensity of each spot corresponds with the relative abundance of the underlying protein. Each cell type produces a distinct constellation, a signature pattern of spots. If one could correlate particular patterns with particular cell actions, then one would have a powerful new way of classifying cell types. Anderson and colleagues wrote:
2-D protein patterns contain large amounts of quantitative data that directly reflect the functional status of cells. Although human observers are capable of searching such data for simple markers correlated with the available external information, global analysis (i.e., examination of the entire data) for complex patterns of change is extremely difficult. Differentiation, neoplastic transformation [cancer], and some drug effects are known to involve complex changes, and thus there is a requirement to develop an approach capable of dealing with data of this type.
If only one protein is assayed, one can readily imagine a classification scheme derived from a simple number line plot. A point representing each cell sample is plotted on the number line at the position that corresponds to the level of the assayed protein. Cell samples are then classified or grouped according to where they lie on the line. This is how classical tumor marker assays work. The marker (which is usually a protein on the surface of the cell) is either present at or above some level or it is not, and the tumor is classified accordingly.
With two proteins, one can plot tumor cell samples as points in two-dimensional space. For each cell sample, the x-coordinate is determined by the level of one protein, and the y-coordinate is determined by the level of the second protein. A cell sample could then be classified as being high in protein 1 and low in protein 2, or high in both proteins 1 and 2, etc. Thus, having two data points per tumor enables more categories than having just one data point. However, variations in just one or two proteins may not be sufficient to distinguish among closely related cell types, particularly if one does not have any prior indication of which proteins are most informative. The Argonne group had the benefit of 285 protein identifiers for each tumor cell sample.
Mathematically, each cell sample could be considered of as a point in 285-dimensional space. Our minds may have trouble imagining so many dimensions, but there are well-established mathematical methods that can readily make use of such information. A computer program instantly sorted Anderson and Taylor's five tumor cells samples into categories based on their 4560 protein values. Another program created a dendrogram or tree diagram that displayed the relationships among the five tumor cell types. A powerful new method of cell classification had been born.
The five cancer cell culture protein patterns were intended to be a small portion of a potential database of thousands of different cell cultures and tissue profiles. Leigh Anderson and his father Norman Anderson, also of the Argonne National Laboratory, had a grand scheme to catalogue and compare virtually all human proteins. Since the late 1970s they had campaigned tirelessly for government and scientific support for the initiative, which they called the Human Protein Index. The Andersons had envisioned a reference database that every practicing physician, pathologist, clinical chemist, and biomedical researcher could access by satellite. Their two-dimensional gel results would be compared to protein constellations in this database, which would include links to relevant research reports. The Andersons had also planned a computer system that would manage this information and aid in its interpretation. They called the would-be system TYCHO, after Tycho Brahe, the famous Danish astronomer who meticulously catalogued the positions of stars and planets in the sky. The Andersons figured that $350 million over a five-year period would be required to make their dream a reality. Their appeal reached the halls of the U.S. Congress, where Senator Alan Cranston of California lent his support for what could have been the world's first biomedical research initiative to come close to matching the size and scale of the U.S. Apollo space initiatives.
The Argonne group's cancer results, the culmination of nearly a decade of work, could have been interpreted as proof of the principles behind the Human Protein Index. Instead, most scientists took little or no notice of their report, which was published in 1984 in the rather obscure journal Clinical Chemistry. Anderson and Taylor did not receive large allocations of new research funds, nor were they bestowed with awards. And why should they? The Argonne group certainly hadn't cured cancer. They hadn't even classified real human tumors. Instead, they used cultured cells derived from tumors and they used only a small number of samples, rather than larger and more statistically meaningful quantities. They hadn't shown that the categories in which they sorted their tumor cell samples were particularly meaningful. They hadn't correlated clinical outcomes or treatment responses with their computer-generated categories.
Indeed, the Argonne team appeared to be more interested in fundamental biological questions than in medical applications. They wrote, "Ideally, one would like to use a method that could, by itself, discover the underlying logical structure of the gene expression control mechanisms." They felt that by electronically tracking protein changes in cells at various stages of development, one could deduce an underlying molecular "circuitry." Thus, the Andersons and their coworkers believed that they were onto a means of solving one of biology's most difficult riddles. How is it that one cell can give rise to so many different cell types, each containing the very same complement of genetic material? How does a fertilized egg cell differentiate into hundreds of specialized cell types, each appearing in precise spatial and temporal order? But these lofty scientific goals also garnered scant attention for the molecular astronomers, in part because the proteins were identified solely by position on the gel. The Andersons and their colleagues couldn't readily reveal their structures or functions. (This would require purification and sequencing of each protein spot, a prohibitively expensive and time-consuming task at that time.) It was hard to imagine the development of a scientific explanation for cellular phenomena that did not include knowledge of the structure and function of the relevant molecular components. Similarly, it was hard to imagine any physician being comfortable making a diagnosis based on a pattern of unidentified spots that was not linked to some plausible explanation. Furthermore, despite the Andersons' and their colleagues' best efforts, at that time two-dimensional protein gels were still difficult to reproduce in a way that would allow surefire alignment of identical proteins across gels. In any case, in the mid-1980s too many scientists felt that protein analysis technologies were still unwieldy, and too few scientists were compelled by the Andersons' vision of the future, so the Human Protein Index fell by the wayside. Thus, instead of being a catalyst for biomedicine's moon shot, the Argonne team's cancer work appears as little more than a historical footnote, or so it may appear.
When asked about these rudimentary experiments 16 years later, Leigh Anderson would have absolutely nothing to say. Was he discouraged by lack of progress or by years of disinterest by his peers? Hardly! The Andersons had managed to start a company back in 1985, aptly named Large Scale Biology Inc., and after years of barely scraping by, the Maryland-based company was finally going public. In the year 2000 investors had discovered the Andersons' obscure branch of biotechnology in a big way, and Leigh Anderson's silence was due to the self-imposed "quiet period" that helps protect initial public offerings (IPOs) from investor lawsuits. Leigh Anderson, Taylor, and a few dozen other research teams had made steady progress and, as will be shown in later chapters, the Argonne work from the 1980s was indeed very relevant to both medical applications and understanding the fundamental nature of life.
For the Andersons in 2000 the slow pendulum that carries the spotlight of scientific interest had completed a circle. It began for Norman Anderson in 1959, while at the Oak Ridge National Laboratory in Tennessee, where he first conceived of a plan to identify and characterize all the molecular constituents of human cells and where he began inventing centrifuges and other laboratory instruments useful in separating the molecules of life. The Human Protein Index was a logical next step. "Only 300 to 1000 human proteins have been characterized in any reasonable detail—which is just a few percent of the number there. The alchemists knew a larger fraction of the atomic table." In other words, how can we build a scientific understanding of life processes or create rational treatments for dysfunctional processes without first having a catalogue or list of the molecular components of life? Imagine having your car being worked on by a mechanic who is, at most, slightly familiar with only 1 or 2 percent of the car's parts.
The Andersons' early 1980s campaign, their efforts to rally scientists and science administrators for a huge bioscience initiative, their call for a "parts list of man" with computer power to support its distribution and analysis, and their daring in laying forth their dreams ... all of these did not vanish without a trace. They were echoed a few years later when scientists began to seriously contemplate making a list of all human genes and all DNA sequences. This led to the launch of biomedicine's first true moon shot, the Human Genome Project, and, leaping forward, to a December 1999 press release announcing that the DNA sequence of the first entire human chromosome was complete. The accompanying report, which appeared in Nature magazine, contained a treasure trove of information for biomedical researchers and served to remind the public that the $3 billion, 15-year Human Genome Project was nearing its end a full four years ahead of schedule. The entire DNA sequence of all 24 distinct human chromosomes, along with data on all human genes (and proteins), would soon be available. In response, investors poured billions of dollars into companies poised to apply this new resource, including a few hundred million dollars for the Andersons' Large Scale Biology outfit.
As far back as the early 1980s, Leigh and Norman Anderson had contemplated what they referred to as a "list-based biology." They had a vision of an electronic catalogue of the molecular components of living cells and mathematical analyses that would make use of this data. They had even gone so far as to suggest that a "list-based biology, which [the proposed Human Protein Index] makes possible will be a science in itself." The Argonne group's cancer study, despite the fact that the proteins were identified only by position, was a prototype for this new type of biology. Many more would follow.
The search for a cure for cancer played an even bigger role in another landmark information-intensive research effort begun in the 1980s. It was initiated by the world's biggest supporter of cancer research, the U.S. National Cancer Institute (NCI). One of the NCI's charges is to facilitate the development of safer and more effective cancer drugs, and in the mid-1980s Michael Boyd and other NCI researchers devised an anticancer drug-screening initiative that was fittingly grand. About 10,000 unique chemical entities per year were to be tested on a panel of 60 different tumor cell cultures. Each chemical compound would be applied over a range of concentrations. Each tumor cell culture would be assayed at defined time points for both growth inhibition and cell death.
Drug discovery has always been a matter of trial and error, and as medicinal chemists and molecular biologists became adept at synthesizing and purifying new compounds, preliminary testing became a bottleneck in the drug development pipeline. Laboratories from throughout the world would gladly submit compounds to the NCI for testing. At that time, researchers were looking for compounds that gave favorable cellular response profiles, and they were looking to further define those profiles. The NCI initiative would establish a response profile based on the pattern of growth inhibition and cell death among 60 carefully selected tumor cell cultures. A poison that killed all of the cell types at a particular concentration would not be very interesting, for it would likely be toxic to normal cells as well. However, compounds that killed particular types of tumor cells, while sparing others, could be considered good candidates for further studies. The response profiles of both approved cancer drugs and those that failed in clinical testing would be used as guideposts for testing the new compounds, and as new compounds reached drug approval and others failed, retrospective studies could be used to further refine model response profiles.
The NCI's bold initiative, named the Development Therapeutics Program, was launched in 1990, and by 1993 30,000 compounds had been tested on each of the 60 cell cultures. This work generated over a million points of data, information that had to be digitized and stored on a computer. How else could one build even the most rudimentary understanding of the actions of tens of thousands of different molecules? How else could this information be stored and shared?
The complexities of cancer placed tremendous demands on biologists and medical researchers—demands that could only be met through electronics and computation. The NCI's Development Therapeutics Program, like the Argonne protein studies, required machines to collect information and transduce it into electrical signals. There are many other ways of collecting data: For example, armies of technicians could take measurements by eye and record them in volumes of laboratory notebooks. However, as any citizen of the Information Age knows, for gathering, storing, and manipulating information, nothing beats the speed, cost, versatility, and ease of electronics. Information technology thus greatly facilitated the development of new efforts to understand and attack cancer. The NCI Development Therapeutics Program developed automated devices to read cell density before and after treatment. An electronic growth response curve was generated and for each compound the concentration responsible for 50 percent growth inhibition was automatically calculated. COMPARE, a computer program written by Kenneth Paull and colleagues at the NCI, compared response profiles, ranked compounds on the basis of differential growth inhibition, and graphically displayed the results.
Initially, the NCI's Development Therapeutics Program was only slightly more visible than the Argonne team's early protein studies. However, both initiatives shared a characteristic common to many information-intensive projects. Their utility skyrocketed after incremental increases in data passed some ill-defined threshold and upon the development of so-called "killer applications," computer algorithms that greatly empower users. By 1996 60,000 compounds had been tested in the Development Therapeutics Program and at least five compounds, which had been assessed in the screen and analyzed by COMPARE, had made it into clinical trials. New databases were then linked to the cell response data set, including a database of the chemical compounds' three-dimensional structures, a database of the compounds' molecular targets, and a database of the pattern of proteins that appear in the 60 targeted tumor cell cultures. With this last data set some of Anderson's two-dimensional gel electrophoresis work became enjoined with the work of the Development Therapeutics Program. This data was all electronically linked and led to numerous scientific articles, including a prominent piece written by John Weinstein of the NCI, appearing in Science magazine in 1997. In this outstanding article Weinstein and coworkers caution that "it remains to be seen how effective this information-intensive strategy will be at generating new clinically active agents." Skepticism is a trademark of good science, but no one could possibly suppress the wonderment and pride that arises from this account of hundreds of millions of individual experiments distilled by powerful number-crunching algorithms and vivid color graphics into meaningful new medical leads and biological insights.
Is even one decisive victory over at least one major type of human cancer imminent? Four years after Weinstein's paper, the answer is still not clear. It is clear, however, that information technology has enabled fantastic new tools for unraveling the complexities of cancer. Some of these tools, their application to cancer and other disorders, and the prospects for new treatments will be discussed in greater detail in later chapters, as will the profoundly relevant and enormously information-intensive efforts to understand and account for our genetic makeup.
Does an information-intensive approach represent a revolutionary new framework for understanding cancer and other biological phenomena? Absolutely, for it prompts us look at life in a dramatically new way.