Mapping out all the preparations necessary for an effective disaster recovery plan and its safeguard—a continuous maintenance program—this guide is aimed at IT managers of small and medium businesses. The opening section covers the initial steps of auditing vulnerability, ranking essential IT functions, and reviewing the storage of tape backups, with the following discussion focused on the elements of the plan itself. The plan includes a mission statement, a definition of disaster, the assignment of staff to teams, methods of compensating for human error, and standards for documenting the steps of recovery. The final portion of the guide covers the all-important initial testing of the system as well as the proper maintenance thereafter and weighs in on the pros and cons of using outside vendors for recovery systems.
|Publisher:||MC Press, LLC|
|Product dimensions:||7.00(w) x 9.00(h) x 1.40(d)|
About the Author
Richard Dolewski is a certified systems integration specialist and disaster recovery planner. He has extensive experience in Server Enterprise Availability, Disaster Recovery Planning (DRP), Business Continuity Planning (BCP), High Availability, and Backup and Recovery program design. Richard has supported 18 computer room disasters and conducted over 200 disaster recovery tests. He is an award-winning speaker at technical conferences including IBM, COMMON, IBM Europe, IT Executive events, Quest, NEMA, and local user groups. Richard has held positions as subject matter expert at IBM and COMMON, a member of the Advisory Committee for IBM Business Resiliency, and president of local user groups. He also regularly contributes to several technical newsletters. He lives in Toronto, Ontario.
Read an Excerpt
System i Disaster Recovery Planning
By Richard Dolewski
MC PressCopyright © 2008 Richard Dolewski
All rights reserved.
Building a Disaster Recovery Plan — The Need
Disasters can strike any time, anywhere. Years of organizational success can be lost in minutes. Suppose you had to piece your organization together if it were to all go away tomorrow. It is a difficult task and a big job. You need a starting point, an ending point, and a roadmap to show you the way to systems recovery. You need a disaster recovery plan in place and staff dedicated to making the plan actually happen. The difference between losing your business and surviving in business depends on how well you're prepared for the unexpected. If a disaster struck today, how would your company do? Organizations must not only protect mission-critical data, but also put disaster recovery plans into place to prevent and support the recovery of any business outages.
Why is it that many organizations today still do not have a disaster recovery plan? Here are some typical excuses I have heard:
There's not enough time.
My résumé is offsite.
It will never happen to me!
It's not in this year's budget.
I just do not know where to start.
Do any of these sound familiar? If you have used any one or even all of these excuses and still don't have a comprehensive written and tested disaster recovery plan, you are not alone. A great amount of creative energy is spent formulating excuses instead of developing a plan. Even if you have a disaster recovery plan, are you really prepared? It is equally important to regularly test the plan to ensure that your business can be recovered as documented. Survival in business depends on how well you're prepared and trained for the unexpected. It is safe to say that disasters come without warning. A disaster will not happen during a convenient time to meet personal or work schedules.
The need for disaster recovery planning has been recognized by industry as an essential tool for business survival. Don't kid yourself — disasters do occur. The very survival of a business is in question when that business does not have a current, documented, and implemented recovery strategy. Insurance can help fund recovery, but it cannot service or replace your valued customers.
A fully documented and tested disaster recovery plan keeps bad things from happening to good companies. Disaster recovery (DR) means being aware of the threat and supporting the resumption of your business following any man-made or natural disasters. A disaster recovery plan not only protects your organization's most vital asset — your corporate information — it also helps create awareness within your organization. In addition, a DR plan helps you refine your infrastructure processes. Incorporating DR methodology into all IT integration strategies is forward thinking.
Most companies depend on their Information Technology (IT) to remain in business. Planning helps eliminate the need to gamble the livelihood of your business in hopes that a disaster will never strike your organization.
The success of your company can be attributed to years and years of hard work and risks that you successfully managed. Companies simply do not financially recover from a disaster when there is no fully documented and tested plan integrated into the business. Disaster recovery planning helps mitigate risks associated with the failure of the IT services on which your business depends. The most important goal is to enable your company to remain in business. If a disaster strikes, your company has everything to lose: critical data, profits, and information. All of these are critical assets in any company.
After 14 days without access to IT systems, 43 percent of businesses will not reopen; 29 percent of those that do reopen will close for good within two years.
U.S. Bureau of Labor
What happens if the power goes out in your home? You grab a flashlight or light a candle, and look out the window to see if the neighbors' homes are dark. If they are, this is probably a widespread outage rather than just a circuit breaker in your basement. In that case, you know that your lights will come back on when the utility company restores power. You can view your company's computer processing as just another utility, like power and water. It is a utility that supports your business; it's not the business itself.
For over 15 years, I have had the opportunity to study disasters first-hand. The one thing they all have in common is that no one ever believes it will ever happen to them. You will have a halt in your business activity ... a halt in your flow of information. How quickly you recover will determine if it is business as usual. If the business is worth the investment in the first place, it's probably worth protecting and recovering.
Some companies take for granted that a disaster will never strike. Rather than developing proactive solutions for such an event, importance usually falls on other corporate IT deliverables. Does your company have a comprehensive disaster recovery plan (DRP) that would allow it to continue to function in the event of a disaster?
Recent events around the country have kept us all on our toes. You just cannot pick up a newspaper or watch the news without hearing some bad news that requires some form of disaster-recovery planning response. Hurricanes Katrina and Rita in 2005, and Ivan and Francis in 2004. The great power outage in the American and Canadian northeast. The events of September 11, 2001. There are more everyday disasters, too, such as rotating power shortages or brown-outs. Finally, do not forget hardware failures. (Yes, the System i does break down!) All of these can have major impacts on today's business needs.
The underlying philosophy of disaster recovery planning needs to be deeply rooted in your organization's desire to protect the viability of its business, public image, and information assets. Your sales and marketing teams work extremely hard to build your corporate image and acquire new customers. New customers can be very difficult to get. Statistics show that it takes much more effort to gain a new customer than to maintain a customer. And once customers are lost, it is nearly impossible to get them back. So customer satisfaction is paramount. Trying to get new customers or convincing the old ones to hang around in a disaster is an uphill struggle if your corporate image has been damaged.
What Is a Disaster?
The textbook definition of a disaster is "a sudden, unplanned event that causes great damage and loss to an organization." The time factor determines whether the interruption in IT service delivery is an inconvenience or a disaster.
The time factor varies from organization to organization, of course. What does the face of disaster look like? What types of disasters should you consider? The list in Figure 1.1 is by no means complete, but it should give you an appreciation of the types of disaster you might wish to evaluate.
My own definition of a disaster is quite simple: "A disaster is anything that stops your business from functioning and that cannot be corrected within an acceptable amount of time." Disasters are defined and quantified in relation to time. Time is important from the standpoint of when an interruption occurs and how long the interruptions lasts. The bottom line is that a disaster is defined as any interruption of mission-critical business processes for an unacceptable period of time.
This time-related definition reflects the very nature of a disaster and avoids the problems that frequently arise by only applying categorical adjectives to a disaster. We all tend to get caught up in categories and types of disasters instead of the impact they can potentially inflict. A category that constitutes a disaster for Company A might not be a disaster for Company B. For this reason, you need to take a holistic approach to examining what constitutes a disaster and examine the business and regulatory impacts to your specific organization. Whether it is a hardware failure of the RAID5 disk array or the loss of power due to a weather- related event like an ice storm, anything that could severely impact your own company is a type of disaster.
What Is Disaster Recovery?
Disaster recovery is your IT response to a sudden, unplanned event that will enable your organization to continue critical business functions until normal IT-related services can resume. Disaster recovery must address the continuation of critical business operations. A major incorrect assumption made in our industry is that disaster recovery can be fully realized by simply prearranging for hardware replacement with your business partner or channel distributor. Write one check and you have a DR plan. Call the supplier and they will come running with all the hardware you require at time of need. Will they? Even if they will, is disaster recovery only about hardware? The obvious answer is NO!
What Is Your Level of Disaster Preparedness?
Most of us initially think our chances of being hit by a disaster are remote. Unfortunately, this view might not change until after the fact — like buying a home alarm system after you have been robbed. While threats of a major disaster from a storm, earthquake, or flood are always present, it is more likely that your IT department would experience an extended communications outage, technology failure, or loss of power. Most organizations are ill-prepared to manage any sort of emergency. Time and money spent on a disaster recovery plan is a good business investment. Planning and preparation before a disaster can minimize the loss of revenue and help ensure an effective, timely recovery.
Suppose you get a phone call in the middle of night. (We all know those types of calls can only bring bad news!) The IT person at the other end of the call states that there has been a terrible accident in the manufacturing plant. The fire marshal has cut power to the building, and things do not look good. Your centralized data center is there, which supports national manufacturing plants, sales offices, and distribution centers.
Quick — what would you do?
If your answer takes longer than 10 seconds to formulate or includes more than "make one telephone call," you've got a problem. If you simply do not know the answer, or if you answer "Maybe I'd do this ...," you have a serious problem. It might be some comfort to know that, unfortunately, you have plenty of company. Despite the increasing dependence on the integration of technology into nearly every aspect of business, most corporations remain unprepared to recover IT infrastructure supporting critical business functions in a disaster. By remaining unprepared, you are putting your successful enterprise at risk.
Organizations fall into one of four levels of disaster preparedness, compared in Figure 1.2 to popular movies. Which level of preparedness best represents your organization? This question is vital to knowing the organizational culture in the eyes of senior management.
If you don't know what level you're at, there's a relatively quick and easy way to find out. Ask yourself what are your organization's key business functions and which server infrastructures support these functions. Now assume that you were no longer able to use the systems because of an unplanned event — one hour, 12 hours, one day, two days, more? Then, estimate the financial impact this loss would have on your business based on how long your systems would be out. Determining your level of disaster preparedness may be a sobering exercise, when you consider lost sales, lost revenues, penalties from regulatory agencies, SLA-driven fines, and worst of all, damage to your public image! Obviously, quite a bit is at stake.
Questions for Preparedness
Here are some questions to help you assess your level of preparedness:
Is your IT department positioned to respond in a disaster situation?
What appropriate steps are currently in place to resume IT services?
Is IT positioned to continue critical business functions during a disaster?
Which daily business functions could IT afford to lose without suffering potential financial loss or disruption of expected services?
Is IT positioned to respond to its business expectations, needs, and commitments in an acceptable manner despite a serious disruption?
Is the IT management team trained in the discipline of crisis management?
Who will make decisions during the disruption, and how will those decisions be communicated through the IT department?
Is there a vital-records program in place that will allow the organization to retrieve and restore information following a major loss?
Is there a contracted commercial or internal solution in place to test and train for disaster preparedness?
Effects of a Disaster
The effects of a disaster include the following:
In a disaster, one of the first things you will notice is a halt to your business momentum. It's not business as usual. The key is to minimize that and have a quick response so you can make your organization viable. A halt to your business momentum for an extended period of time could lose you any competitive edge that you hold in the marketplace. If it's a day or two, your customers will roll with you. If it's for an extended time, they will go elsewhere. So, if you are out for an extended period of time, it will start to effect your cash flow at a time when your company needs it most. If you experience a halt, you are going to need cash to control the problem. If you cannot send out your invoices to collect your accounts receivable, for example, you are effecting the thing that hurts the most: the bottom line.
Shock is another important effect of a disaster. Even your most competent staff, the person who's cool day-in and day-out, can experience shock. That's one of the reasons to document your course of action and develop task lists to keep people on track.
Information Technology Dependence
It is not necessarily the size of the disaster, but the likelihood of its occurrence and it potential effect on your IT installation, that you should weigh when evaluating and maintaining a disaster recovery plan.
Today, IT has become a strategic part of everyone's business. If the IT systems go down, it's very likely your business will not be able to continue its day-to-day operations. Disaster recovery planning is all about being able to mange the impact of disasters. More precisely, it must be about the ability to meet your organization's commitments, maintaining reliability, consistency, and dependability. A properly managed disaster recovery response can be a differentiating factor in this highly competitive business world. Most importantly, it supports your organization's commitment to shareholders, employees, customers, and suppliers.
It was not all that long ago that most companies were only open for business from nine to five, and just Monday to Friday. Having a system unavailable did not prevent a sale from happening. Customer transactions were usually conducted in person or over the phone, with details transferred from paper via a data-entry department, usually overnight. If a disaster shut down computing services for a few days, you could simply continue working in a manual business mode. In other words, it was business as usual.
Excerpted from System i Disaster Recovery Planning by Richard Dolewski. Copyright © 2008 Richard Dolewski. Excerpted by permission of MC Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
Building a Disaster Recovery Plan-The Need 1
The Need 3
Plan for All Types of Disasters 11
Reasons for Planning 13
Let's Get Started 17
Definitions and Risk Mitigation 26
Server Criticality and Recovery Strategies 33
Develop the Plan 40
Validate the Recovery Plan 43
Vulnerability Assessment & Risk Analysis 49
Site Vulnerability Assessment 50
Vulnerability Assessment Summary 73
Performing a Risk Analysis 74
Conducting a Business Impact and Recoverability Analysis 87
Starting the Business Impact Analysis 89
Tangible Costs 90
Intangible Costs 94
Identifying Mission-Critical Functions 96
Outage Impact 100
Recovery Time Objective vs. Recovery Point Objective 104
Shifting Focus for Return on Investment (ROI) 115
The Process of the BIA 116
Critical Server Ranking 123
Classifying Systems for Recovery Priority 124
Mission-Critical Only, Please 125
Rank Your Data Backup Priorities 127
Backups, and Recovery Time and Point Objectives 129
Critical Systems Definition, A List 132
Critical Systems Definition, B List 134
Is Email Mission-Critical? 135
Hardware Requirements for Mission-Critical Servers 135
Building Recovery Strategy Requirements 139
The Disaster Recovery Challenge 140
Guidelines for Selecting Recovery Strategies 141
Market Trends 144
Recovery Strategies 146
Data Center Recovery Solutions 150
Determine the Level of Business Resiliency You Want to Achieve 156
Overall Site Restoration Strategy Sample 158
Backup and Recoverability 161
Plan for Data Recovery 162
10 Issues for the Administration of Backups 167
Checklist for Backup and Recovery 174
Backup Media Management 176
How Much to Back Up for Disaster Recovery 185
Backup Recovery and Media Services (BRMS) 186
A Simple Save Strategy 192
Save More with Save-While-Active 195
Richard's Backup Solution 198
Backups for Planned Maintenance Windows 199
IBM's Virtual Tape Solution (VTL) 200
Duplicate Your Removable Media 203
Restoration Commands 204
The BRMS System Recovery Report 207
How the System Restores Access Paths 209
Backing Up and Recovering a Domino Server 209
Hardware Management Console (HMC) 213
Your Business Value of Systems Availability 217
High Availability-Take the High Road 219
Recovery on Your High-Availability Investment 220
Is Your H/A Truly High Availability? 232
IBM's Capacity Backup Offering 248
Vital Records and Critical Data Offsite Storage 251
Vital Record Management 253
Offsite Storage Considerations 268
Choosing an Offsite Storage Provider 271
Building Your Teams 275
Selecting Candidates: Pick Me! No, Don't Pick Me! 277
When There Is Loss of Life or Missing People 281
Building Your Recovery Teams 284
How to Work Together 289
The IT Recovery Management Team 293
The IT Technical Recovery Team 298
The Network Team 300
The Hardware Recovery Team 301
Application Recovery Team 302
Facility Recovery Team 303
Replacement Equipment 304
Disaster Recovery Preparedness 304
Administrative Responsibilities 305
Care for Your Recovery Teams During a Disaster 305
The Team's Meeting Place 310
Effective Communications 317
Develop an Employee Call Sheet 319
Who Do You Contact? 323
Selecting a Meeting Place for the Command Center 329
Facing and Dealing with the Media 334
Notification Solution Design 338
How to Develop and Document a Disaster Recovery Plan 341
Disaster Recovery Plan Development Overview 342
Ready, Set, Write the Plan 353
The Disaster Recovery Plan's Structure 359
Developing and Writing the Procedures 365
Disaster Recovery Teams Overview 381
Effective Plan-Activation Procedures 391
The Disaster-Alert Notification Procedure 393
First-Alert Response 396
Hotsite Call-up Procedures 406
Recalling Tapes from Your Offsite Storage Provider 412
Site Restoration Activities 413
The Need for System-Related Documentation 425
A Change in the i5 Philosophy Silos 427
Write It All Down 428
I Thought Those Backup Tapes Had Everything! 429
Collecting and Maintaining System Information 431
The Prtsysinf Command 432
Complete Site Loss versus Server Loss 434
System i5/iSeries Restoration Procedures 443
Recovery Procedures 444
Case Study Sample 444
System i5/iSeries BRMS Restoration Procedures 477
Testing Your Disaster Recovery Plan 507
Practice Just Like the Pros 510
Satisfy the Need for Testing 511
The Embarrassment of Testing: What If We Fail? 512
Open-Book Testing 514
Define a Complete Testing Project 515
Passive Testing 518
Active Testing 529
Disaster Recovery Coordinator Testing Duties 533
Introducing Murphy's Law 534
Evaluation of Test Results 535
Be a Survivor 536
Plan Maintenance 539
Your Plan Design 541
Implementing a Maintenance Philosophy 542
Revisit Your Plan-Get into Maintenance Mode 545
Change Management 549
Selecting a Commercial Hotsite Provider 563
Advance Planning = Hotsite 564
Internal or External Hotsite? 566
What to Look for in a Hotsite Provider 567
Cost Considerations 576
A Family DR Plan 583
Disaster Recovery Begins at Home 584
Emergency Supplies 585
Practice and Maintain Your Plan 587
Personal and Family Requirements 588
Awareness Training 589
Information on Family Disaster Plans 589
Sample Documents 591
Business Impact Analysis Questionnaire 591
Operational Priorities 592
Operational Impacts 592
Customer Service 593
Cash Flow/Revenue 593
Regulatory (If Applicable) 594
Increases In Liability 594
Vendor Relations 595
Financial Control/Reporting 595
Mission Critical IT Applications 596
Server Criticality Analysis 597