Preface to the Second Edition
Well, I had to write a second edition. Too much of what I predicted in the first edition became history. You know, it feels good to be able to say that. They even code-named the development project at Microsoft “Wolfpack” after the cover. “Dogpack” or “dogfight” didn't have the right connotations, I guess. Sent me a logo T-shirt, too. I also got some nice e-mail from other developers, but nobody else was that classy. Before I completely dislocate my shoulder patting myself on the back, I should mention that I missed a couple of rather major things the first time around. I didn't foresee the importance of mass-market high availability. I also didn't foresee how much confusing “NUMA” rhetoric would be used. Neither were left out entirely, mind you, but they certainly didn't get anywhere near the attention they either deserve or require. That's been corrected, in the form of two major added chapters, major revisions to other chapters, and scattered revisions throughout the book.
Another major change is the inclusion of information about cluster hardware and software acceleration, a subject that literally did not exist in sufficient quantity to take notice of when the first edition was written. Of course the chapter of examples was trash about 40 seconds before the first edition hit the stands. This edition's version probably will be, too. There has to be a better way to do that part; books can't compete with magazines' rates of publication, much less the Internet. I've tried to be more generic in this edition, but you can't ignore the real systems and do that job right.
However, the basic original structure of the book has stood up adequately, for which I'm grateful; this edition would have been far more work were that not true. As a result, readers of the first edition will probably have an odd sense of reverse deja vu (jamais vu?), like “Hey, I thought I read that before, but it didn't say that.” Believe me, literally every page of this thing has been changed. Why? When the first edition was written, it really was true that most people in the computer industry had not heard of clusters, and those that had mostly considered them a lower form of life. Products that really were clusters weren't called that, because there were much cooler things to claim to be: Massively Parallel. Distributed. Hemidemisemicoupled. Whatever.
Now you would have to be deaf to not have heard of clusters. All God's chilluns got a cluster product, or two, or four, and are talking about them'if that's the phrase'with all the power of their collective lungs. The products are mostly (not always) fairly crude, but, hey, you have to start somewhere. At least they recognize the name. I've had to revise things fairly pervasively to take this new milieu into account, and have also removed some of the ranting about how this might actually be a useful thing to do. Not all. Some.
I'd like to think that the publication of the first edition had something to do with this change in the state of affairs. That would be far more satisfying than merely having correctly nailed a few short-term predictions.
I of course remain grateful for IBM's rather enlightened policy towards book authors, which still provides both support for writing and motivation to complete the job. The views expressed here are not necessarily those of the IBM Corporation, of course.
I am also again grateful for the support of my family, who once again put up with my lack of attention while immersed in this project, even though the first edition didn't produce all its promised benefits: my children Danielle and Jonathan, and, of course, my wife, Cyndee Stines Pfister'who originally said that was probably her only chance to get her name in a book. Well, lightning strikes twice.
I also again owe a large debt to the many people who have discussed the subjects of this book with me, both within and outside IBM. I feel privileged to say that the cliché remains true: There are far too many to mention all of them individually. However, my manager, IBM Fellow and Vice President Rick Baum, must certainly be thanked for uttering the fateful words, “Don't you think it's time for you to do a second edition?” And then giving me the time to actually do it'possibly more time than he anticipated, and certainly more than I originally estimated, but it did finally get done. Jim Rymarczyk, Dave Elko, and Pete Sargent at least partly repaired my woeful ignorance of Parallel Sysplex. I'm particularly grateful to Dave for the several sessions at which he endured my intemperate questioning. Tom Weaver and Lisa Spainhower provided extremely useful review feedback, as well as much useful discussion over the years. Renato Recio, Tom Chen, Jeff Weiner, and other members of the System Architecture and Performance I/O team are also to be thanked for the times when, hiding behind the clever faade of “we're just dorky I/O guys who don't know nothing about the serious stuff,” they filled in large gaps in my understanding of that Rodney Dangerfield of computing. Not all the gaps, by any means; but I'm better off than I was before. And Bill (“Rocky” ) Rockefeller: May he always keep The System off all our backs.
Once again, however, the people who are most to be thanked are the IBM customers I have met over the years, as well as the members of the IBM field force who brought us together. Those customers gave me the opportunity to find out what was really of importance to them, which of course must be of paramount importance to we who serve them. It appears that many of them really do want to understand the kind of information that is in this book'at least when it is properly explained, as I have tried to do. They also unwittingly provided me with numerous opportunities to try out ways of explaining various topics, and to debug the analogies, metaphors, and jokes in response to questions and quizzical looks. As a direct result, getting through many of the chapters that follow will require markedly less caffeine.
Greg Pfister August, 1997
Austin, Texas firstname.lastname@example.org
Anyone planning to purchase, sell, design, or administer a server or multiuser computer system should buy and read this book.
Key needs of those systems'high performance, an ability to grow, high availability, appropriate cost, and so on'imply the use of parallel processing: multiple computing elements used together as a single entity. Parallel processing, with a bit of distributed processing, is what this book is about; it will give you the background needed to understand where the real issues lie in that realm. However, this doesn't mean that this book discusses “highly” or “massively” parallel computers. Those are flamboyant enough to have already attracted a multitude of variably successful explanations and are really of direct interest only in a vanishingly small fraction of the computer market.
Instead, this book uniquely discusses both the hardware and the software of “lowly” parallel computers, the everyday, practical work gangs of computing: symmetric multiprocessors, so-called “NUMA” systems, and, in particular, clusters of computers.
You do not have to be a died-in-the-denim “techie” to enjoy and profit from this book. Its form and content reflect the author's experience in explaining these issues quite literally hundreds of times to people with at best a semi-technical computer background. This has included customers who have better things to do with their time than become computer technophiles; marketing reps, both the technically oriented and the Jag-driving backslappers; and development managers, who too often think they have better things to do with their time than understand the technological basis of their business. You do need familiarity with the current computing milieu. An ability to (mostly) understand the thick, monthly computer magazines demonstrates a background adequate to get a lot out of this book. In some areas this may even be overkill. If you've understood this preface so far, you're in good shape. But that this book has been written to be accessible does not mean that it is Ye Compleate Moron's Guide to a Child's Garden of Stupid Tiny BASIC Tricks, either. Its content is not technically trivial. Because it approaches both parallel and distributed systems from a nonstandard viewpoint, that of clusters, it offers a fresh perspective that can potentially enrich both. As discussed below in “History,” technically sophisticated readers have profited from prior versions that weren't publicly published. Of particular interest have been the analysis of single system image and the characterization of the programming models used in commercial computing.
One result of this unusual perspective is that while many different groups of people will find many items of interest herein, many are also going to find something to be annoyed about.
- Promoters of distributed systems will be annoyed when the book points out that their already diametrically challenged systems have ignored a significant area, one whose support will of course add even more expense. Also, they may not have realized they were in league with the next category.
- Vendors and designers of heroically large symmetric multiprocessors will be annoyed when the book warns readers of their necessarily higher costs, and therefore warns them to avoid addiction to their products'a new form of being “locked in,” just like the bad old days, but this time “locked in” to an architecture, not necessarily a specific manufacturer.
- Proponents of so-called “NUMA” systems will be annoyed that this book debunks many of their statements about “maintaining the SMP programming model” and brings out of the closet their never-mentioned high availability issue.
- Cluster proponents will be annoyed when the book airs the real meaning of the industry-standard benchmarks for clusters.
- Highly massive parallelizers are already annoyed. “Flamboyant” ? “A vanishingly small fraction of the computer market” ? For that matter, “highly massive” ?
- All traditional proponents of parallelism will be vexed because, unlike other books talking about parallelism, this one does not try to say that parallel programming can be easy or will be if only enough funding were applied to the problem. Rather, it demonstrates that parallel programming is very hard; and therefore mainstream software cannot directly use it, since any increase in the difficulty of writing software simply cannot be absorbed by the industry. There is a way around this conundrum, an already commonly used but formally ignored technique that is often deprecated by parallelizers; that technique forms a major theme of this book.
- Purchasers of server systems will, I sincerely hope, be happy to be given a straight, comprehensible story for once. It is really for them that this book is being written.
The humor- and irony-impaired will also have a spot of trouble here and there. (That includes one reviewer whose mind is apparently wrapped in sandpaper, to whom my editor thankfully paid no attention.) Finally, this book will also annoy those uncomfortable unless information is presented using the puritanical parody of “the” scientific method taught to my high-school children: Just the dry facts, ma'am, boring is fine, science is rote memorization of terms (God, I hate that one), and one must not appear to be contaminating oneself with preconceived notions. All that practice serves to do is to hide the actual prejudices of the presenter, be they conscious or unconscious. That situation is far from true here. The only reason this book exists is that I am convinced that clusters of machines are good for us, the members of the computer industry; clusters will happen whether or not we like the idea; and perhaps we should try to make the inevitable transition less than maximally painful. Its purpose is to convince you, too. Whether or not you end up agreeing with me, this book will explain an awful lot about lowly parallel computer systems'clusters, symmetric multiprocessors, and “NUMA' systems'without the rose-colored classes usually coloring most views of those subjects.
This book began back in about 1991, while I was a participating in an internal IBM work group. Formed from members of several development laboratories and Divisions of IBM, the group's charter was to figure out what to do with, or about, the groups of computers just then being called clusters.
Clusters had, of course, existed for many years as commercially available products from several computer vendors, although just how common they were none of us really understood at that time. They appeared to have been, if not niche products, at least outside the mainstream of the greater computer milieu. But now they seemed to be popping up all over the place. These new clusters were not formal products from vendors but rather informal bunches of computers assembled by customers. Those worthies were using bunches of computers in a number of ways that were both technically interesting and, we hoped, commercially useful and exploitable. In addition, there were some group members making deep, or at least loud, arguments in favor of clusters as a particularly efficient and attractive product offering. Interesting stuff. My own interest was less than overwhelming, however. This was not because of the topic, but because of the situation. I was jaded. This was far from the first working group on various parallel architectures in which I'd been involved. I could tell it wouldn't be the last, either, since like most of the others, it was going nowhere fast. A basic problem was that this collection of generally rather high-level, intelligent, and experienced computer architects, software system architects, technical strategists, product designers, researchers, and market analysts didn't know what in Sam Hill they were talking about. No, that's actually wrong. We did have a problem, but it was nearly the exact opposite of that one. Each individual knew precisely what he or she was talking about. They were (mostly) rather smart folks and were earnestly expressing worthwhile, useful points. But each person's point of view, and often what they meant by seemingly common words, was completely or, even worse, subtly different from most of the others. So the discussion was going in circles and everyone was trying, with decidedly varying intensity and success, to believe that not everyone else was a complete dolt.
Communication was just not happening because we differed at every possible level: feasible market and application areas, appropriate performance measures, the amount and type of software support necessary, whether that support could be “open,” what “open” meant in this context, what if anything all of that implied for the hardware, what the “natural” hardware (whatever that was) implied for software, where applications and subsystems were going to come from, and so on ad nauseam. Nobody could even use what seemed simple terms (for example, “single system image” ) without significant misunderstanding by others.
It was probably fortunate that I was bored. If my mind hadn't been wandering, I probably would have “contributed” more, listened less, and never realized that this meta-problem of communication was lurking below the surface of the obvious morass. Since anything was obviously better than actively participating in yet another accursed work group, I began to work out how some of the various positions were different and what their relationships were.
That lead to my collecting in an organized form various aspects of “clusters,” whatever that meant, simply to give us the common vocabulary without which progress was impossible. The outcome was a presentation that was at first short, but incrementally expanded to include possible hardware organization, aspects of software support, a number of examples, and finally a definition of the term “cluster.” As I picked up more information or realized something else, I just kept fitting it into the presentation's organization, massaging that organization as necessary to make things flow logically. The original work group was soon enough disbanded, but this had become a personal project with a life of its own, and it was becoming larger than I realized at the time. I gave parts of the presentation several times, in various circumstances, and the whole thing exactly once. Yes, to Another Work Group. To my amazement, one of the audience observed when I concluded that the complete presentation took approximately 16 hours, spread over two days. That time did include a large amount of lively discussion; otherwise I'd probably still be hoarse.
I concluded that the circumstances leading to my having a captive audience for that long were at least unusual, and certainly unlikely to be repeated. If this material was to be of use to anybody, it had to be in a different form. So I stayed home for two weeks of uninterrupted, intense effort and wrote out in prose a lengthy white paper based on that presentation.
That white paper, an internal document completed in the early fall of 1992, was version 1 of In Search of Clusters. I made about 100 copies, informally distributed it to people who I suspected might be interested, and, well, just stopped. There were other things to do.
I later found out that the copying machines had been busy. In Search of Clusters had become, informally, what amounted to required reading in the newly formed Power Parallel Systems group in the IBM Kingston laboratory, as well as in the Future Systems group in Austin and several groups in IBM Research. Not that they agreed with all of it or even modelled their projects after its precepts, but it apparently had succeeded in providing a useful set of common terms and concepts in their contexts. Comments and suggestions, both pro and con, of course began arriving.
About a year later, members of the IBM AIX Executive Briefing Center in Austin contacted me. They had been requested to provide the IBM field and marketing forces a paper that would clear up what the differences were between various forms of parallel, distributed, and clustered computing'one of those so-called “market positioning” statements. After spending several months attempting this themselves, they had come across In Search of Clusters. It had the information they needed, and they wanted to distribute it. This implied some work to eliminate the company-confidential elements, but at one year old those were history anyway. So, I did that, added a good amount of information I'd collected over the intervening year, did some substantial reediting, and published it electronically by placing softcopy on the internal IBM distribution facility called mkttools.
That was version 2 of In Search of Clusters, an “IBM Internal Use Only” 80-page document completed in September of 1993. About 1500 people obtained copies from mkttools. The copying machines have also been busy again, since it's not the simplest thing to print a 70 page document in the field in IBM, especially one translated into the common IBM printing format from PostScript¨ (this process usually creates huge files that print at an incredibly slow rate). I also made it available within IBM development by an easier-to-access electronic means that does not keep track of requests; from inquiries back to me, I know that has seen substantial use. I estimate that at least 2000 copies got into circulation within IBM.
Along with that circulation came a stream of electronic mail and telephone calls from the field with many comments, a fair number of kudos, and could I please make a version that wasn't “internal use only” so they could give it to customers? In addition, word inevitably began to leak out of IBM. I saw several postings to usenet discussion groups on the Internet of the form “I hear there's this paper on clusters or something that somebody in IBM wrote. Anybody know where I can get a copy?” It sounded like time for a completely unclassified version 3.
Well, somehow despite my day job I managed to put that version together. In the six months that had passed since version 2, the entire chapter on “Examples” had been made obsolete by new product announcements, but the basic structure and content had stood up well. Comments from readers indicated that several other chapters needed expansion, so that was done along with some further general tweaking. By this time it had grown to over 140 pages. Problem.
This was too long to be a white paper. I was wondering what to do with it when I accidentally heard of a seminar being run by the Austin IBM “Technical Author Recognition Program,” a seminar for anyone who interested in publishing books'at which representatives of publishing companies would be present. Bingo.
What you are holding is essentially the second edition of version 4 of that original 16-hour presentation. It's been greatly expanded, brought up-to-date, and reshaped by feedback from a large number of internal IBM readers. The book format'translation: more words than I ever thought I'd write in my lifetime'has allowed me to make the treatment of most topics much more self-contained than the original, which more or less assumed that the reader was a competent, practicing computer hardware or software architect. It also allowed me to include several new nonbackground elements, such as the chapter explaining why we need the concept of a cluster.
It's been a long, interesting trip for me. I hope you enjoy the ride, too.
First Edition Acknowledgments
My thoughts about clusters and symmetric multiprocessors, and this book in particular, have benefited tremendously from discussions over many years with a large number of people, both within and outside IBM. There are too many to mention all of them individually, but a few deserve special mention. The elaboration of workload characteristics occurred in discussion with Tilak Agerwala, who set me on the topic of clusters to begin with. (I originally didn't want to do it, since it involved all that ugly “distributed stuff.” ) Recognition that there were levels of single system image first occurred in discussions with Patrick Goal; a number of “at large” discussions with Patrick also contributed greatly. Jim Cox helped further my understanding of single system image concepts, along with other members of the Yet Another Work Group that met in the fall of Ô92. Jim also contributed to my understanding of system management issues. None of these fine people is to be blamed, of course, if I mutilate their knowledge in this book.
But the people who are most to be thanked are the IBM customers I met with during the summer of 1991, as part of a kind of technical market survey primarily targeting another set of issues. I began that exercise with a specific notion of what a “cluster” was. The customers I met had myriad other, very different notions. The disparities between the many expressed views are what first sent me in search of clusters.
Greg Pfister May, 1995
Austin, Texas email@example.com