Protein-Nucleic Acid Interactions : Structural Biologyby Phoebe A Rice (Editor)
The structural biology of protein-nucleic acid interactions is in some ways a mature field and in others in its infancy. High-resolution structures of protein-DNA complexes have been studied since the mid 1980s and a vast array of such structures has now been determined, but surprising and novel structures still appear quite frequently. High-resolution structures of protein-RNA complexes were relatively rare until the last decade. Propelled by advances in technology as well as the realization of RNA's importance to biology, the number of example structures has ballooned in recent years. New insights are now being gained from comparative studies only recently made possible due to the size of the database, as well as from careful biochemical and biophysical studies. As a result of the explosion of research in this area, it is no longer possible to write a comprehensive review. Instead, current review articles tend to focus on particular subtopics of interest. This makes it difficult for newcomers to the field to attain a solid understanding of the basics. One goal of this book is therefore to provide in-depth discussions of the fundamental principles of protein-nucleic acid interactions as well as to illustrate those fundamentals with up-to-date and fascinating examples for those who already possess some familiarity with the field. The book also aims to bridge the gap between the DNA- and the RNA- views of nucleic acid - protein recognition, which are often treated as separate fields. However, this is a false dichotomy because protein - DNA and protein - RNA interactions share many general principles. This book therefore includes relevant examples from both sides, and frames discussions of the fundamentals in terms that are relevant to both. The monograph approaches the study of protein-nucleic acid interactions in two distinctive ways. First, DNA-protein and RNA-protein interactions are presented together. Second, the first half of the book develops the principles of protein-nucleic acid recognition, whereas the second half applies these to more specialized topics. Both halves are illustrated with important real life examples. The first half of the book develops fundamental principles necessary to understand function. An introductory chapter by the editors reviews the basics of nucleic acid structure. Jen-Jacobsen and Jacobsen discuss how solvent interactions play an important role in recognition, illustrated with extensive thermodynamic data on restriction enzymes. Marmorstein and Hong introduce the zoology of the DNA binding domains found in transcription factors, and describe the combinational recognition strategies used by many multiprotein eukaryotic complexes. Two chapters discuss indirect readout of DNA sequence in detail: Berman and Lawson explain the basic principles and illustrate them with in-depth studies of CAP, while in their chapter on DNA bending and compaction Johnson, Stella and Heiss highlight the intrinsic connections between DNA bending and indirect readout. Horvath lays out the fundamentals of protein recognition of single stranded DNA and single stranded RNA, and describes how they apply in a detailed analysis of telomere end binding proteins. Nucleic acids adopt more complex structures - Lilley describes the conformational properties of helical junctions, and how proteins recognize and cleave them. Because RNA readily folds due to the stabilizing role of its 2'-hydroxyl groups, Li discusses how proteins recognize different RNA folds, which include duplex RNA. With the fundamentals laid out, discussion turns to more specialized examples taken from important aspects of nucleic acid metabolism. Schroeder discusses how proteins chaperone RNA by rearranging its structure into a functional form. Berger and Dong discuss how topoisomerases alter the topology of DNA and relieve the superhelical tension introduced by other processes such as replication and transcription. Dyda and Hickman show how DNA transposes mediate genetic mobility and Van Duyne discusses how site-specific recombinases "cut" and "paste" DNA. Horton presents a comprehensive review of the structural families and chemical mechanisms of DNA nucleases, whereas Li in her discussion of RNA-protein recognition also covers RNA nucleases. Lastly, FerrÚ-D'AmarÚ shows how proteins recognize and modify RNA transcripts at specific sites. The book also emphasises the impact of structural biology on understanding how proteins interact with nucleic acids and it is intended for advanced students and established scientists wishing to broaden their horizons.
Read an Excerpt
Protein-Nucleic Acid Interactions
By Phoebe A. Rice, Carl C. Correll
The Royal Society of ChemistryCopyright © 2008 Royal Society of Chemistry
All rights reserved.
CARL C. CORRELL AND PHOEBE A. RICE
Nucleic acids are the information storehouse of life and in many cases serve as the regulators and construction workers as well. Indeed, self-replicating RNAs may have been the beginning of life itself, predating evolution of the first protein. Modern organisms, however, depend on a complex interplay between nucleic acids and proteins. This chapter reviews the basic features of the structures that nucleic acids can adopt and highlights the ways that nucleic acids and their cognate proteins interact with one another.
1.2 Fundamentals of DNA and RNA Structure
1.2.1 Stabilizing Forces
The fundamental forces that stabilize nucleic acid and protein structure are the same. The hydrophobic effect drives folding of these molecules as they attempt to simultaneously satisfy multiple goals: minimizing the exposure of hydrophobic surfaces to water, satisfying all the hydrogen bond donors and acceptors that become buried from solvent, maximizing van der Waals interactions, and ensuring that all charges are either solvated or neutralized with opposing charges. Nucleic acids differ from proteins in that the hydrogen bonds that are key to secondary structure formation are between the variable moieties (the bases) rather than the constant ones (the backbones). The backbone of nucleic acids is more flexible than that of proteins, with six variable torsion angles rather than two (Figure 1.1). Nucleic acid backbones also differ from proteins in that their backbones are uniformly negatively charged. To form large tertiary structures with buried backbones nucleic acids must rely on external sources of positive charge such as solvent cations or helper proteins. Relative to tertiary structure, secondary structure is more stable in nucleic acids than proteins. As a consequence, nucleic acid folding is generally less cooperative than protein folding, with the formation of tertiary structure following that of secondary structure.
1.2.2 Chemical Differences between DNA and RNA
The defining difference between RNA and DNA is the presence of the 2'-hydroxyl group on the pentose ring of RNA. This group is distinctive in two fundamental ways. First, it is an Achilles heel that renders the RNA chain more susceptible to cleavage than the DNA chain (Figure 1.2). Apparently, susceptibility to cleavage is the reason why the 2'-hydroxyl is removed, at considerable metabolic expense, to make a more stable molecule (DNA) for information storage. Second, the 20-hydroxyl is the glue permitting RNA to readily fold: it is the only group on the entire phosphodiester backbone that can donate as well as accept hydrogen bonds (Figure 1.1). This hydrogen-bonding capacity plays an important role in stabilizing the large variety of structures adopted by RNA molecules.
DNA also differs from RNA in that the pyrimidine base with two keto groups is thymine rather than uracil. Chemically, the difference is minor: thymine is merely 5-methyluracil, and the additional methyl group does not change the overall structure, but does provide a recognition opportunity. However, like the removal of the sugar ring's 2'-hydroxyl group, the addition of this methyl group requires considerable metabolic expense. Mother Nature's presumed logic in this case is slightly more convoluted. Because cytosine is disturbingly readily deaminated to form uracil, the methyl group added to thymine allows repair enzymes to discriminate between pyrimidines that were intended to have two keto groups (i.e., thymine) and those that are the products of cytosine deamination and need to be removed (i.e., uracil).
The chemical repertoire of nucleic acids can be greatly expanded by modifications introduced after replication or transcription. Particularly for functional RNAs, species from all kingdoms of life have evolved a vast number of enzymes, comprising up to approximately 10% of coding genomes, that modify nucleobases after transcription (Chapter 14). At last count, about 100 different modified nucleosides have been identified in RNA. It is becoming clear that base modifications, in particular, are functionally significant: they are required for pre-mRNA splicing, they improve translational fidelity, and they increase RNA stability.
1.2.3 Canonical A- and B-form Helices
Despite the great variety of folds that RNA can adopt and the variation observed in DNA structure, the double helix remains the most common element of nucleic acid structure. The base pairing scheme first suggested by Watson and Crick is special in that the distances and angles between glycosidic bonds are constant, creating a regular structure that is independent of sequence, to a first approximation (Figure 1.3). Thus, nucleic acids with Watson– Crick base pairs adopt a deceptively simple structure: the double helix. When viewed in more detail however, the double helix is really a family of related conformations, by far the most common of which are the two forms termed A and B. Detailed descriptions of their anatomy can be found in many texts;1 only a basic reminder is included here.
The backbone conformations of A- and B-form helices differ primarily in the puckers of their sugar rings: C2'-endo for B-form, and C3'-endo for A-form (Figure 1.4). The pucker makes relatively little difference to the placement of the atoms within the sugar ring itself. However, the pucker determines the relative placement of the substituents, namely the base and the flanking phosphates, which are critical to the overall conformation of the duplex.
Duplex RNA is largely limited to the A-form, for two reasons: canonical B-form helices are sterically incompatible with the protruding 2'-hydroxyl groups of RNA, and the intrinsic sugar pucker preferences are affected by the 2'-substituent. Both A- and B-forms are readily accessible to DNA, although the B-form predominates under physiological conditions. Local B-to-A conformational transitions can be triggered by DNA binding proteins, and are often associated with DNA bending (Chapter 8).
Overall, A- and B-form double helices differ most dramatically in the relative sizes and shapes of their grooves, with important consequences for their interactions with proteins (Figure 1.3). In B-form duplexes, the major groove is wider than the minor groove and both are readily accessible for protein recognition. In A-form, the major groove is deeper and narrower and thus less accessible to probing proteins; the shallower and wider minor groove is accessible but offers limited opportunity for sequence specific recognition (Section 1.3.3). Another difference between A- and B-form duplexes is the position of the sugar's C2' atom. In the A-form, the C2' atom positions the 2' hydroxyl groups to line the outside rim of the minor groove, whereas in the B-form, it protrudes toward the major groove. The minor groove of A-form RNA duplexes is thus lined with the hydrogen bond donor and acceptor moieties of the 2'-hydroxyl groups. When DNA adopts A-form geometry, the resulting sugar puckers present more hydrophobic surface area to the minor groove than do B-form ones, a feature sometimes exploited by DNA bending proteins.
1.2.4 Deviation is the Norm
When nucleic acid structures are examined in detail, it becomes clear that very few are strictly canonical. As described in Chapters 4 and 8, even B -form DNA is quite flexible, with wide variations in parameters such as twist and groove width, even in the absence of proteins. The degree of fexibility and the conformational preferences are highly sequence-dependent, and these features are often important in site recognition by proteins. DNA damage can also change the structure and/or the stability of the double helix. Recognizing such damage is crucial for genetic integrity. In fact, the types of possible damage and the recognition strategies used by repair enzymes are so numerous that entire books have been dedicated to them and thus they are not covered here. However, the repertoire of structures seen for DNA is still relatively limited compared to the tremendous variety seen for folded RNA (Section 1.2.6).
1.2.5 Bending and Supercoiling DNA
The double helix is a stiff but not inflexible structure: in the absence of proteins, B-form DNA has a persistence length of ~ 150 base pairs (Chapter 8). Thus, the probability of two ends of a long DNA molecule meeting peaks at ~ 450–500 bp. The main features of the double helix that resist bending are the stacking of the aromatic bases, and the mutual repulsion of the negatively charged phosphate groups. Proteins that induce large bends in DNA have evolved mechanisms to counteract one or both of these forces (Chapters 4 and 8).
Long DNA segments can become torsionally strained (supercoiled) when the ends are restrained such that one strand of the duplex cannot rotate freely about the other. This is the case not only for circular DNA molecules such as bacterial plasmids but also for chromosomal segments that are restrained by bacterial nucleoid-associated proteins or eukaryotic chromatin. Chapter 10 describes the many families of enzymes that modulate DNA supercoiling.
1.2.6 Folded RNA and Noncanonical DNA
Not all nucleic acids are double-stranded, and the two base pairs proposed by Watson and Crick are by no means the only possible base–base hydrogen bonding schemes. In fact, long stretches of fully Watson–Crick base paired duplex are generally only found in the genetic storage material – and thus, for most organisms, only in DNA (a few viruses use double-stranded RNA as their genetic material). In folded RNA, all types of non-Watson–Crick base pairings have been observed (e.g., A can pair with A, C, G or U).
In contrast with DNA, all RNAs are synthesized as single strands that often then fold into far more elaborate and idiosyncratic structures than the simple A- and B-forms described above. Folding creates surfaces that can form catalytic sites, thereby transforming this polymer from merely a carrier of information into one that also catalyzes chemical reactions (peptide bond formation, pre-mRNA splicing and RNA processing to name a few).
To create a global structure, the RNA backbone twists and turns, permitting it to fold back upon itself. As stated above, 2'-hydroxyl groups play key roles in stabilizing RNA secondary, tertiary and quaternary structure. RNA secondary structures are built of reoccurring modular motifs reminiscent of "lego" pieces. These pieces include one Watson–Crick element (A-form helices) and many loop elements that are stabilized by non-Watson–Crick interactions, including bulges, turns, linker regions, and multi-way junctions. RNA tertiary structure is formed by interactions among these elements, often involving the formation of base pairs, triples and/or quadruples. Interestingly, adenine, the most hydrophobic of the four bases, is highly over-represented in such inter- actions. Perhaps the most common recurring feature of RNA tertiary structure is the "A minor" motif where two adjacent adenosines dock into the minor groove of an RNA helical structure, often resulting in tandem base triples. Base stacking also plays a role by stabilizing docking of co-axial helices and inter- actions with side-by-side A base pairs (designated as A-platform/receptor interactions).
Although DNA has a limited structural repertoire compared to RNA, DNA is not always found as a canonical duplex. For example, during replication, the strands must be separated so that the precious information within can be copied, and repair and recombination also often involve unusual DNA structures such as hairpins and Holliday junctions (Section 1.3.6).
1.3 Principles of Recognition
Despite the differences between RNA and DNA, proteins use similar strategies to recognize them. Nucleic-acid binding proteins discriminate among potential binding sites based on their sequence, structure, or a combination of both. Although the details of the strategies used by proteins to recognize their cognate sites vary widely, the general principles remain the same.
1.3.1 Forces that Contribute to Complex Formation
As with other macromolecular interactions, the forces involved in noncovalent protein-nucleic acid complex formation are all relatively weak, and overall affinity results from the sum of many interactions, some favorable and others not. Perhaps the most obvious force among these is electrostatics: because nucleic acids are polyanions, most proteins that bind to them are rich in the positively charged amino acids lysine and arginine. As described in Chapter 2, screening by solvent ions modulates the strength of these interactions. However, hydrophobic interactions are surprisingly prevalent: even for proteins that bind B-form DNA, an average of nearly 50% of the surface area buried at the interface is nonpolar. Many complexes involving less canonical nucleic acid structures are surprisingly resistant to high salt, often reflecting strong hydrophobic stacking interactions between the protein and nucleic acid bases (Chapter 6). Polar interactions involving direct and water-mediated hydrogen bonds are also widespread, and often of particular importance to sequence specificity. The above interactions are revealed upon inspection of the structure of a protein-nucleic acid complex. However, the initial states of the partners are also important, since complex formation generally involves desolvation of the interface and changes in the conformation and flexibility of one or both partners (Chapter 2).
1.3.2 Site Recognition Overview
Many proteins display high specificity for a particular nucleic acid sequence or structure, often binding their cognate site several orders of magnitude more tightly than random ones. The challenges presented by this recognition problem vary with the type of nucleic acid to be recognized, and are perhaps the most formidable for Watson–Crick paired duplexes (usually DNA) due to their structural homogeneity (Figure 1.3). Relatively unstructured single-stranded nucleic acids are the most flexible, allowing easy access to discriminating functional groups. In contrast, protein-RNA recognition is in many ways more similar to protein-protein recognition because of the great variety of surfaces that folded RNA can create; both types of recognition involve interacting surfaces that have each evolved to complement the other's shape and electro-static properties.
1.3.3 Recognizing Duplex DNA via Direct and Indirect Readout
The sequence of a duplex DNA molecule can be read by examining the pattern of unique functional groups exposed in the major groove (Figure 1.3). The minor groove is less interesting: all four bases display hydrogen bond acceptors in similar locations. The protruding 2-amino group of G distinguishes G:C pairs from A:T ones, but is centrally located such that it is still hard for a protein to determine which base is the G and which is the C.
The simplest way for a protein to recognize a specific sequence in DNA is thus to bind in the major groove, and "interrogate" the unique features of the bases that are exposed there by making favorable contacts with the correct sequence and by making unfavorable contacts with incorrect sequences. As proposed by Zubay and Doty in 1959, an alpha helix fits nicely into the major groove of B-form DNA. All combinations of DNA grooves and protein secondary structure elements have now been seen, but a "recognition" helix that inserts into the major groove is still the most common feature of sequence -specific DNA binding proteins (Chapter 3). The major groove of A-form helices is narrower and deeper than that of B-form helices, and thus less accessible to proteins (Figure 1.3). This could be a problem for RNA-protein recognition if RNAs were constrained to forming only A-form helices. However, as described below, the folding of RNA molecules generally produces other features that can guide protein recognition.
Comparison of specific and nonspecific protein-DNA complexes highlights the importance of dehydration in this process: formation of nonspecific complexes generally displaces fewer water molecules than formation of specific ones, and the resulting interfaces are less complementary (Chapter 2).
In contrast to the simple, direct readout described above, proteins can also recognize DNA sequences through sequence-dependent variations in flexibility and structural parameters such as the groove width and the twist between base pairs, a strategy referred to as "indirect" readout. Many proteins use both direct and indirect readout to identify their preferred binding sites. However, as described in Chapters 4 and 8, proteins that bend DNA and those that primarily contact the minor groove generally rely more heavily on indirect readout.
Excerpted from Protein-Nucleic Acid Interactions by Phoebe A. Rice, Carl C. Correll. Copyright © 2008 Royal Society of Chemistry. Excerpted by permission of The Royal Society of Chemistry.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Meet the Author
Dr Rice is an associate professor in the Department of Biochemistry & Molecular Biology at the University of Chicago. Having trained in the labs of Thomas Steitz at Yale and Kiyoshi Mizuuchi at NIH, she has a long-standing interest in the mechanisms and structural biology of DNA recombination. Dr Correll is an associate professor in the Department of Biochemistry & Molecular Biology at Rosalind Franklin University of Medicine and Science. Having trained in the labs of Profs. Martha Ludwig at the University of Michigan and Thomas Steitz at Yale University, he has a long-standing interest in structural biology. For the last decade his research has focused on how proteins cleave and/or rearrange the structure of RNA molecules.
Most Helpful Customer Reviews
See all customer reviews