Read an Excerpt
 RNA Helicases
 
By Eckhard Jankowsky The Royal Society of Chemistry
  Copyright © 2010 Royal Society of Chemistry
 All rights reserved.
 ISBN: 978-1-84755-914-2  
   CHAPTER 1
An Introduction to RNA Helicases: Superfamilies, Families, and Major Themes
ECKHARD JANKOWSKY AND MARGARET E. FAIRMAN-WILLIAMS
1.1 RNA Helicases in the Helicase Universe
1.1.1 The Sequence-Based Helicase Classification
In the late 1970s, the term helicase was proposed to describe enzymes that unwound DNA duplexes in an ATP-dependent fashion. In 1988, T.C. Hodgeman and a group around Alexander Gorbalenya and Eugene Koonin noted that proteins with DNA helicase activity contained several highly conserved-sequence motifs that were also found in a number of viral proteins. Among these motifs were the NTPase/ATPase signatures of P-loop proteins, a set of functionally diverse, NTP hydrolysing enzymes that also include G-proteins, the kinesin and myosin motor proteins, ABC transporters, and F-type ATPases. The reports speculated that the helicase-like proteins encoded by RNA viruses could be units of "RNA helicases" that unwound RNA during viral replication, in analogy to the helicases that unwound DNA.
At this time, no viral protein had been shown to have RNA helicase activity. However, a eukaryotic protein, the eukaryotic initiation factor 4A (eIF4A) had been demonstrated to unwind RNA duplexes in an ATP-dependent fashion, and thus qualified as RNA helicase (see also the foreword of this volume). Indeed, the presence of the typical "helicase" sequence motifs in eIF4A and the then "putative" RNA helicase p68 was noted shortly after the reports by Hodgeman and Gorbalenya and colleagues, and extensive similarities between proteins with DNA and RNA helicase activities were highlighted. Early 1989, Patrick Linder and coworkers showed that even more proteins shared extensive sequence similarity to eIF4A and p68 over several conserved-sequence motifs. One of these motifs read, in single letter code, "D-E-A-D", thus marking the birth of the D-E-A-D box.
The following years saw a tremendous increase in the number of proteins containing the characteristic helicase motifs, and in 1993 Gorbalenya and Koonin proposed a systematic, sequence-based classification of helicases. Based on their sequences, proteins containing helicase motifs were divided into three helicase superfamilies (two big and one small) and two smaller helicase families. The helicase superfamilies were further divided into protein families, one of which was comprised of the DEAD-box proteins. Gorbalenya and Koonin also noted that some of the amino acids in the helicase motifs were conserved across all superfamilies, whereas others were specific for only one protein family or superfamily.
This sequence-based helicase classification, along with the definition of the characteristic helicase motifs has proven remarkably robust, and is still valid despite the exponential increase in the number of available protein sequences, and the advent of helicase structures. In fact, the helicase structures confirmed the suitability of the classification by Gorbalenya and Koonin. Exceptional structural conservation is seen within the superfamilies and families, especially in the two large helicase superfamilies 1 and 2. More pronounced differences between the structures in the remaining helicases prompted Singleton et al. in 2007 to amend the helicase classification for these proteins and reassign them to four different superfamilies (3–6).
This most recent helicase systematics thus consists of the superfamilies 1–6 (Figure 1.1). All helicases are P-loop NTPases, and therefore contain the typical Walker A and B sites for NTP binding and hydrolysis (Figure 1.1). Proteins of the superfamily 1 and 2 are characterised by a helicase core formed by two structurally almost identical helicase domains. The conserved helicase sequence motifs are located in both of these helicase core domains. The helicase superfamily 2, by far the largest helicase superfamily in eukaryotes, consists of several well-defined protein families with distinct sequence, structure, and functional signatures (detailed in the Chapters 2–7). Helicases of the superfamilies 3–6 form hexameric toroids. These proteins contain only one helicase domain with overall similarity to, but also notable differences from, the helicase domains of SF1 and 2 helicases.
The classification of helicases in superfamilies and families generally correlates with functional characteristics of the enzymes. For example, proteins of the DEAD-box family employ unwinding mechanisms distinct from other SF2 helicases (see also Chapter 3). DEAH/RHA proteins and viral DExH proteins can hydrolyse different NTPs, while proteins in other SF2 families are generally specific for adenosine triphosphate (Chapters 3–7). These correlations illustrate that the recent helicase classification is a useful foundation for a systematic view of these enzymes. Given that the classification is now based on a considerable number of structures and a large number of sequences, many from completely sequenced genomes, this system is likely to endure.
1.1.2 Helicases that do not Unwind Duplexes? Discrepancies Between Sequence-Based and Functional Helicase Definitions
Based on the presence of the characteristic sequence motifs, a large number of proteins qualify as helicases. Helicases, as defined by sequence, are among the largest protein classes. In eukaryotic RNA metabolism, helicases are the largest group of enzymes. However, following the establishment of the sequence-based helicase classification by Gorbalenya and Koonin, it became clear that many of the proteins classified as "helicases", while generally able to hydrolyse ATP in a nucleic-acid-dependent fashion, did not necessarily unwind DNA or RNA duplexes. Proteins of certain SF2 families, including the SWI/SNF proteins and ATP-dependent restriction endonucleases, displayed no unwinding activity. It became apparent that a helicase, as defined by sequence, was not necessarily a helicase as defined by enzymatic function.
Much as certain DNA "helicases" do not actually unwind duplexes, many RNA helicases may not primarily or perhaps not at all function to unwind RNA duplexes in the cell. Although RNA helicases generally unwind RNA duplexes in vitro, provided appropriate substrates are used, the enzymes are known to have a wide spectrum of biological roles, some of which may have little to do with duplex unwinding (see Section 1.3). A compelling example is the DEAD-box protein eIF4A-III, which functions as a stationary, ATP-dependent RNA clamp around which other proteins assemble on RNA.
Several helicases have been shown to translocate on double-stranded or single-stranded nucleic acids in an ATP-dependent fashion. This translocation seems critical for the biological function of several helicases that cannot unwind duplexes, most notably for ATP-dependent restriction enzymes and the chromatin remodeling factors of the SWI/SNF family. The apparent prevalence of ATP-driven translocation among a wide range of DNA helicases prompted suggestions that, on a basic level, helicases might be ATP-driven translocases. This seemingly intuitive notion has become very popular. In recent publications helicases have even been occasionally recast from their original definition as enzymes that unwind duplexes in an ATP-dependent fashion into enzymes that use ATP hydrolysis to translocate on nucleic acids.
For RNA helicases, translocation on nucleic acid was demonstrated for Rho, a bacterial RNA helicase (Chapter 10), and some hexameric viral RNA helicases (Chapter 9). Translocation has been recently suggested for the eukaryotic RNA helicase RIG-I. The viral RNA helicases NPH-II (vaccinia virus) and NS3 (hepatitis C virus) also unwind duplexes by a translocation-based mechanism (Chapter 7). However, biochemical characteristics of duplex unwinding for many cellular RNA helicases, most prominently those of the large DEAD-box protein family, are not consistent with a translocation-based unwinding mechanism (see Chapter 3 for a detailed discussion). Thus, neither translocation nor duplex unwinding is the common functional denominator of all proteins classified as helicases, based on their sequence. Different "helicases" either unwind by translocation, translocate without unwinding, or unwind without translocation. The common functional denominator of all helicases may be limited to the ability of the enzymes to modulate nucleic acid binding in an ATP-dependent fashion.
1.1.3 RNA vs. DNA Helicases: Nucleic acid Specificity and Processivity
It has long been appreciated that RNA and DNA helicases perform different physiological functions. Yet, there is no clear distinction between RNA and DNA helicases based on sequence or structure, and both DNA and RNA helicases are found in all helicase superfamilies, except in SF6, which contains only DNA helicases (Figure 1.1). In the large SF2, RNA helicases cluster in certain, but not all families, and many SF2 families include both RNA and DNA helicases. Some helicases catalyse unwinding of DNA and RNA duplexes and certain helicases have been implicated in both DNA and RNA metabolism (Chapters 7 and 8).
The lack of a clear distinction between DNA and RNA helicases suggests that discrimination between DNA and RNA substrates may not have been the prevalent evolutionary driving force for the differentiation of the helicase families and superfamilies. Instead, mechanistic features of proteins from the respective families appear to be utilised in both RNA- and DNA-related processes, and RNA vs. DNA specificity may have been acquired after the families were established. While each helicase family has distinct, if sometimes subtle structural characteristics, structures of DNA and RNA helicases within each family are highly similar, and it is thus not clear which structural features dictate function on DNA, RNA, or both.
Along with the ability to translocate on DNA, the capacity of certain DNA helicases to processively unwind duplexes with hundreds or even thousands of base-pairs has received much attention. For a helicase, processivity (P) defines the probability of an enzyme to perform the next unwinding step vs. dissociating from the nucleic acid. A highly processive helicase (P [right arrow] 1) rarely dissociates during unwinding and is thus able to perform many consecutive unwinding steps. A helicase that dissociates more often displays lower processivity (P≪ 0.9) and unwinds fewer base-pairs per binding event. For a helicase that always dissociates between unwinding steps, the processivity is zero.
Highly processive DNA helicases are occasionally used as a benchmark for less "potent" enzymes, often with the stated or at least implicit suggestion that each helicase will ultimately function to unwind thousands of base-pairs, provided appropriate processivity factors are present. Indeed, some less-processive DNA helicases become more processive when bound to such factors. However, highly processive duplex unwinding is hardly characteristic of all DNA helicases, and unwinding of thousands of base-pairs at a time may not be desirable for many enzymes.
For RNA helicases, processive duplex unwinding is even less typical. Processivity has only been documented for the bacterial transcription terminator Rho, a unique hexameric helicase with no eukaryotic homologues (Chapter 10), and for members of the SF2 family of viral DExH RNA helicases (Chapter 7). The latter proteins are involved in viral replication, where long RNA (or RNA/DNA) duplexes are thought to occur as replication intermediates (Chapter 7).  In cellular RNA and RNA–protein complexes, duplexes generally do not exceed 10–15 consecutive base-pairs. Thus, it is unlikely that cellular RNA helicases have evolved to processively separate duplexes with hundreds of base-pairs. Even if the enzymes had this inherent ability, high processivity would need to be strongly curtailed to avoid unregulated, global disruption of carefully assembled RNPs. Accordingly, most cellular RNA helicases examined to date do not efficiently unwind duplexes with more than one and a half-helical turn in vitro, but they efficiently separate shorter duplexes.
It is not clear to what extent processivity, or the lack thereof is a distinguishing feature between DNA and RNA helicases within a given family. It is possible that processivity is a specific feature of only certain superfamilies and families, regardless of whether the enzymes work on DNA or RNA. The main variations between helicases appear to exist between enzymes of different superfamilies and families, but not between DNA and RNA helicases.
1.2 Classification of RNA Helicases: Superfamilies and Families
For the purpose of this book, RNA helicases are classified according to the sequence and structure-based helicase superfamily/family system outlined above (cf. Section 1.1). RNA helicases, as defined as enzymes that catalyse ATP-dependent separation of RNA duplexes, are found in the helicase superfamilies 1–5 (Figure 1.1). The superfamily 2 is subdivided into at least 10 families, based on phylogenetic analysis of the sequences of the helicase core domains (Figure 1.2). Five of these SF2 families (DEAD-box, DEAH/RHA, Ski2-like, RIG-I-like, and viral DExH proteins, the NS3/NPH-II family) are comprised mainly of RNA helicases and are thus termed "RNA helicase families". RNA helicases have not been identified in the remaining SF2 families.
Proteins have been frequently designated as RNA helicases based on genetic data that suggest a role in RNA metabolism, but without direct evidence of an RNA-related function. Designation of enzymes as RNA helicases in the absence of biochemical information is problematic for SF1, 3, 4, and 5, where the degree of sequence conservation between DNA and RNA helicases is not well understood. Moreover, several SF1 proteins work on both RNA and DNA (Chapter 8). For SF2 proteins sequence conservation within the distinct families is better understood. Assignment of a protein to a SF2 "RNA helicase family" is thus more straightforward than for SF1 proteins. Notwithstanding, the designation of a given protein as RNA helicase, based solely on sequence is problematic, because, as mentioned, several "RNA helicase families", also contain DNA helicases or enzymes that function on both DNA and RNA (e.g., Chapters 7 and 8). The term RNA helicase will be used here with the understanding that ATP-dependent unwinding of RNA duplexes has not been shown in all cases, but that compelling evidence implicates a given enzyme in RNA metabolism.
1.2.1 Organisation of this Book
To provide a comprehensive overview of RNA helicase enzymes in this book, the proteins are discussed in the context of their superfamilies and families. Each of the SF2 families containing RNA helicases (Figure 1.2) is discussed in a dedicated chapter (Chapters 2–7). The exception is the DEAD-box family, which, because of the family size and the amount of available data, is described in two chapters, one devoted to the biological function, the second to the molecular mechanism of these enzymes (Chapters 2 and 3). SF1 RNA helicases will be discussed in one chapter (Chapter 8).
RNA helicases of the SF3-5 form hexameric toroids. Such RNA helicases have not been identified in eukaryotes, and only one cellular RNA helicase, the bacterial transcription terminator Rho belongs to SF3-6. Nevertheless, a comparably large amount of data exists on structure, mechanism and biological function of Rho, warranting the dedication of a specific chapter to this enzyme (Chapter 10). Other hexameric RNA helicases appear to occur exclusively in viruses. Well-characterised enzymes include the P4 packaging motor and the SV40 large antigen. These two enzymes are discussed together in one chapter (Chapter 9). Below, we give a brief overview of the RNA helicase superfamilies and SF2 families, with emphasis on common themes whose discussion would have been beyond the scope of individual chapters.
1.2.2 RNA Helicases of the Superfamilies 1 and 2
Most helicases known to date belong to the superfamilies 1 or 2. In humans, there are at least 103 SF2 and 17 SF1 helicases, in Saccharomyces cerevisiae at least 59 SF2 and 9 SF1 helicases have been identified. As mentioned, all SF1 and SF2 proteins contain a helicase core consisting of two highly similar, conserved helicase domains, arranged in tandem (Figure 1.3). The helicase domains each resemble the RecA protein, and are therefore often referred to as RecA-like domains. The two helicase domains are connected through a linker, which varies considerably between SF1 and SF2 proteins and also between SF2 families. With bound ATP and nucleic acid, the two helicase domains form a deep cleft. The ATP binding site is located on one side of this cleft, and the nucleic acid binds on the other side, across both domains (Figure 1.3(C)). In SF1 helicases, the helicase core domains frequently contain large inserts, which generally fold as individual domains without disturbing the helicase core fold (Figures 1.3(B) and (D)). In SF2 helicases, such inserts are only seen in a few cases (e.g., DDX1). However, in many SF2 proteins, the helicase domains contain an additional beta strand at the end of the fold, which extends the helicase core fold (Figures 1.3(C) and (E)).     
 (Continues...)  
Excerpted from RNA Helicases by Eckhard Jankowsky. Copyright © 2010 Royal Society of Chemistry. Excerpted by permission of The Royal Society of Chemistry. 
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.