System i Disaster Recovery Planning

System i Disaster Recovery Planning

by Richard Dolewski


$60.52 $64.95 Save 7% Current price is $60.52, Original price is $64.95. You Save 7%.
View All Available Formats & Editions
Usually ships within 6 days


Mapping out all the preparations necessary for an effective disaster recovery plan and its safeguard—a continuous maintenance program—this guide is aimed at IT managers of small and medium businesses. The opening section covers the initial steps of auditing vulnerability, ranking essential IT functions, and reviewing the storage of tape backups, with the following discussion focused on the elements of the plan itself. The plan includes a mission statement, a definition of disaster, the assignment of staff to teams, methods of compensating for human error, and standards for documenting the steps of recovery. The final portion of the guide covers the all-important initial testing of the system as well as the proper maintenance thereafter and weighs in on the pros and cons of using outside vendors for recovery systems.

Product Details

ISBN-13: 9781583470671
Publisher: MC Press, LLC
Publication date: 04/01/2008
Pages: 616
Product dimensions: 7.00(w) x 9.00(h) x 1.40(d)

About the Author

Richard Dolewski is a certified systems integration specialist and disaster recovery planner. He has extensive experience in Server Enterprise Availability, Disaster Recovery Planning (DRP), Business Continuity Planning (BCP), High Availability, and Backup and Recovery program design. Richard has supported 18 computer room disasters and conducted over 200 disaster recovery tests. He is an award-winning speaker at technical conferences including IBM, COMMON, IBM Europe, IT Executive events, Quest, NEMA, and local user groups. Richard has held positions as subject matter expert at IBM and COMMON, a member of the Advisory Committee for IBM Business Resiliency, and president of local user groups. He also regularly contributes to several technical newsletters. He lives in Toronto, Ontario.

Read an Excerpt

System i Disaster Recovery Planning

By Richard Dolewski

MC Press

Copyright © 2008 Richard Dolewski
All rights reserved.
ISBN: 978-1-58347-696-3


Building a Disaster Recovery Plan — The Need

Disasters can strike any time, anywhere. Years of organizational success can be lost in minutes. Suppose you had to piece your organization together if it were to all go away tomorrow. It is a difficult task and a big job. You need a starting point, an ending point, and a roadmap to show you the way to systems recovery. You need a disaster recovery plan in place and staff dedicated to making the plan actually happen. The difference between losing your business and surviving in business depends on how well you're prepared for the unexpected. If a disaster struck today, how would your company do? Organizations must not only protect mission-critical data, but also put disaster recovery plans into place to prevent and support the recovery of any business outages.

Why is it that many organizations today still do not have a disaster recovery plan? Here are some typical excuses I have heard:

• There's not enough time.

• We're downsizing.

• My résumé is offsite.

• It will never happen to me!

• It's not in this year's budget.

• I just do not know where to start.

Do any of these sound familiar? If you have used any one or even all of these excuses and still don't have a comprehensive written and tested disaster recovery plan, you are not alone. A great amount of creative energy is spent formulating excuses instead of developing a plan. Even if you have a disaster recovery plan, are you really prepared? It is equally important to regularly test the plan to ensure that your business can be recovered as documented. Survival in business depends on how well you're prepared and trained for the unexpected. It is safe to say that disasters come without warning. A disaster will not happen during a convenient time to meet personal or work schedules.

The need for disaster recovery planning has been recognized by industry as an essential tool for business survival. Don't kid yourself — disasters do occur. The very survival of a business is in question when that business does not have a current, documented, and implemented recovery strategy. Insurance can help fund recovery, but it cannot service or replace your valued customers.

A fully documented and tested disaster recovery plan keeps bad things from happening to good companies. Disaster recovery (DR) means being aware of the threat and supporting the resumption of your business following any man-made or natural disasters. A disaster recovery plan not only protects your organization's most vital asset — your corporate information — it also helps create awareness within your organization. In addition, a DR plan helps you refine your infrastructure processes. Incorporating DR methodology into all IT integration strategies is forward thinking.

Most companies depend on their Information Technology (IT) to remain in business. Planning helps eliminate the need to gamble the livelihood of your business in hopes that a disaster will never strike your organization.

The success of your company can be attributed to years and years of hard work and risks that you successfully managed. Companies simply do not financially recover from a disaster when there is no fully documented and tested plan integrated into the business. Disaster recovery planning helps mitigate risks associated with the failure of the IT services on which your business depends. The most important goal is to enable your company to remain in business. If a disaster strikes, your company has everything to lose: critical data, profits, and information. All of these are critical assets in any company.

After 14 days without access to IT systems, 43 percent of businesses will not reopen; 29 percent of those that do reopen will close for good within two years.

U.S. Bureau of Labor

The Need

What happens if the power goes out in your home? You grab a flashlight or light a candle, and look out the window to see if the neighbors' homes are dark. If they are, this is probably a widespread outage rather than just a circuit breaker in your basement. In that case, you know that your lights will come back on when the utility company restores power. You can view your company's computer processing as just another utility, like power and water. It is a utility that supports your business; it's not the business itself.

For over 15 years, I have had the opportunity to study disasters first-hand. The one thing they all have in common is that no one ever believes it will ever happen to them. You will have a halt in your business activity ... a halt in your flow of information. How quickly you recover will determine if it is business as usual. If the business is worth the investment in the first place, it's probably worth protecting and recovering.

Some companies take for granted that a disaster will never strike. Rather than developing proactive solutions for such an event, importance usually falls on other corporate IT deliverables. Does your company have a comprehensive disaster recovery plan (DRP) that would allow it to continue to function in the event of a disaster?

Recent events around the country have kept us all on our toes. You just cannot pick up a newspaper or watch the news without hearing some bad news that requires some form of disaster-recovery planning response. Hurricanes Katrina and Rita in 2005, and Ivan and Francis in 2004. The great power outage in the American and Canadian northeast. The events of September 11, 2001. There are more everyday disasters, too, such as rotating power shortages or brown-outs. Finally, do not forget hardware failures. (Yes, the System i does break down!) All of these can have major impacts on today's business needs.

The underlying philosophy of disaster recovery planning needs to be deeply rooted in your organization's desire to protect the viability of its business, public image, and information assets. Your sales and marketing teams work extremely hard to build your corporate image and acquire new customers. New customers can be very difficult to get. Statistics show that it takes much more effort to gain a new customer than to maintain a customer. And once customers are lost, it is nearly impossible to get them back. So customer satisfaction is paramount. Trying to get new customers or convincing the old ones to hang around in a disaster is an uphill struggle if your corporate image has been damaged.

What Is a Disaster?

The textbook definition of a disaster is "a sudden, unplanned event that causes great damage and loss to an organization." The time factor determines whether the interruption in IT service delivery is an inconvenience or a disaster.

The time factor varies from organization to organization, of course. What does the face of disaster look like? What types of disasters should you consider? The list in Figure 1.1 is by no means complete, but it should give you an appreciation of the types of disaster you might wish to evaluate.

My own definition of a disaster is quite simple: "A disaster is anything that stops your business from functioning and that cannot be corrected within an acceptable amount of time." Disasters are defined and quantified in relation to time. Time is important from the standpoint of when an interruption occurs and how long the interruptions lasts. The bottom line is that a disaster is defined as any interruption of mission-critical business processes for an unacceptable period of time.

This time-related definition reflects the very nature of a disaster and avoids the problems that frequently arise by only applying categorical adjectives to a disaster. We all tend to get caught up in categories and types of disasters instead of the impact they can potentially inflict. A category that constitutes a disaster for Company A might not be a disaster for Company B. For this reason, you need to take a holistic approach to examining what constitutes a disaster and examine the business and regulatory impacts to your specific organization. Whether it is a hardware failure of the RAID5 disk array or the loss of power due to a weather- related event like an ice storm, anything that could severely impact your own company is a type of disaster.

What Is Disaster Recovery?

Disaster recovery is your IT response to a sudden, unplanned event that will enable your organization to continue critical business functions until normal IT-related services can resume. Disaster recovery must address the continuation of critical business operations. A major incorrect assumption made in our industry is that disaster recovery can be fully realized by simply prearranging for hardware replacement with your business partner or channel distributor. Write one check and you have a DR plan. Call the supplier and they will come running with all the hardware you require at time of need. Will they? Even if they will, is disaster recovery only about hardware? The obvious answer is NO!

What Is Your Level of Disaster Preparedness?

Most of us initially think our chances of being hit by a disaster are remote. Unfortunately, this view might not change until after the fact — like buying a home alarm system after you have been robbed. While threats of a major disaster from a storm, earthquake, or flood are always present, it is more likely that your IT department would experience an extended communications outage, technology failure, or loss of power. Most organizations are ill-prepared to manage any sort of emergency. Time and money spent on a disaster recovery plan is a good business investment. Planning and preparation before a disaster can minimize the loss of revenue and help ensure an effective, timely recovery.

Suppose you get a phone call in the middle of night. (We all know those types of calls can only bring bad news!) The IT person at the other end of the call states that there has been a terrible accident in the manufacturing plant. The fire marshal has cut power to the building, and things do not look good. Your centralized data center is there, which supports national manufacturing plants, sales offices, and distribution centers.

Quick — what would you do?

If your answer takes longer than 10 seconds to formulate or includes more than "make one telephone call," you've got a problem. If you simply do not know the answer, or if you answer "Maybe I'd do this ...," you have a serious problem. It might be some comfort to know that, unfortunately, you have plenty of company. Despite the increasing dependence on the integration of technology into nearly every aspect of business, most corporations remain unprepared to recover IT infrastructure supporting critical business functions in a disaster. By remaining unprepared, you are putting your successful enterprise at risk.

Organizations fall into one of four levels of disaster preparedness, compared in Figure 1.2 to popular movies. Which level of preparedness best represents your organization? This question is vital to knowing the organizational culture in the eyes of senior management.

If you don't know what level you're at, there's a relatively quick and easy way to find out. Ask yourself what are your organization's key business functions and which server infrastructures support these functions. Now assume that you were no longer able to use the systems because of an unplanned event — one hour, 12 hours, one day, two days, more? Then, estimate the financial impact this loss would have on your business based on how long your systems would be out. Determining your level of disaster preparedness may be a sobering exercise, when you consider lost sales, lost revenues, penalties from regulatory agencies, SLA-driven fines, and worst of all, damage to your public image! Obviously, quite a bit is at stake.

Questions for Preparedness

Here are some questions to help you assess your level of preparedness:

• Is your IT department positioned to respond in a disaster situation?

• What appropriate steps are currently in place to resume IT services?

• Is IT positioned to continue critical business functions during a disaster?

• Which daily business functions could IT afford to lose without suffering potential financial loss or disruption of expected services?

• Is IT positioned to respond to its business expectations, needs, and commitments in an acceptable manner despite a serious disruption?

• Is the IT management team trained in the discipline of crisis management?

• Who will make decisions during the disruption, and how will those decisions be communicated through the IT department?

• Is there a vital-records program in place that will allow the organization to retrieve and restore information following a major loss?

• Is there a contracted commercial or internal solution in place to test and train for disaster preparedness?

Effects of a Disaster

The effects of a disaster include the following:

• Business momentum

• Competitive edge

• Cash flow

• Human elements

In a disaster, one of the first things you will notice is a halt to your business momentum. It's not business as usual. The key is to minimize that and have a quick response so you can make your organization viable. A halt to your business momentum for an extended period of time could lose you any competitive edge that you hold in the marketplace. If it's a day or two, your customers will roll with you. If it's for an extended time, they will go elsewhere. So, if you are out for an extended period of time, it will start to effect your cash flow at a time when your company needs it most. If you experience a halt, you are going to need cash to control the problem. If you cannot send out your invoices to collect your accounts receivable, for example, you are effecting the thing that hurts the most: the bottom line.

Shock is another important effect of a disaster. Even your most competent staff, the person who's cool day-in and day-out, can experience shock. That's one of the reasons to document your course of action and develop task lists to keep people on track.

Information Technology Dependence

It is not necessarily the size of the disaster, but the likelihood of its occurrence and it potential effect on your IT installation, that you should weigh when evaluating and maintaining a disaster recovery plan.

Today, IT has become a strategic part of everyone's business. If the IT systems go down, it's very likely your business will not be able to continue its day-to-day operations. Disaster recovery planning is all about being able to mange the impact of disasters. More precisely, it must be about the ability to meet your organization's commitments, maintaining reliability, consistency, and dependability. A properly managed disaster recovery response can be a differentiating factor in this highly competitive business world. Most importantly, it supports your organization's commitment to shareholders, employees, customers, and suppliers.

It was not all that long ago that most companies were only open for business from nine to five, and just Monday to Friday. Having a system unavailable did not prevent a sale from happening. Customer transactions were usually conducted in person or over the phone, with details transferred from paper via a data-entry department, usually overnight. If a disaster shut down computing services for a few days, you could simply continue working in a manual business mode. In other words, it was business as usual.


Excerpted from System i Disaster Recovery Planning by Richard Dolewski. Copyright © 2008 Richard Dolewski. Excerpted by permission of MC Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Table of Contents

Building a Disaster Recovery Plan-The Need     1
The Need     3
Plan for All Types of Disasters     11
Reasons for Planning     13
Let's Get Started     17
Definitions and Risk Mitigation     26
Server Criticality and Recovery Strategies     33
Develop the Plan     40
Validate the Recovery Plan     43
Summary     46
Vulnerability Assessment & Risk Analysis     49
Site Vulnerability Assessment     50
Vulnerability Assessment Summary     73
Performing a Risk Analysis     74
Summary     85
Conducting a Business Impact and Recoverability Analysis     87
Starting the Business Impact Analysis     89
Tangible Costs     90
Intangible Costs     94
Identifying Mission-Critical Functions     96
Outage Impact     100
Recovery Time Objective vs. Recovery Point Objective     104
Shifting Focus for Return on Investment (ROI)     115
The Process of the BIA     116
Summary     121
Critical Server Ranking     123
Classifying Systems for Recovery Priority     124
Mission-Critical Only, Please     125
Rank Your Data Backup Priorities     127
Backups, and Recovery Time and Point Objectives     129
Critical Systems Definition, A List     132
Critical Systems Definition, B List     134
Is Email Mission-Critical?     135
Hardware Requirements for Mission-Critical Servers     135
Summary     136
Building Recovery Strategy Requirements     139
The Disaster Recovery Challenge     140
Guidelines for Selecting Recovery Strategies     141
Market Trends     144
Recovery Strategies     146
Data Center Recovery Solutions     150
Determine the Level of Business Resiliency You Want to Achieve     156
Overall Site Restoration Strategy Sample     158
Summary     159
Backup and Recoverability     161
Plan for Data Recovery     162
10 Issues for the Administration of Backups     167
Checklist for Backup and Recovery     174
Backup Media Management     176
How Much to Back Up for Disaster Recovery     185
Backup Recovery and Media Services (BRMS)     186
A Simple Save Strategy      192
Save More with Save-While-Active     195
Richard's Backup Solution     198
Backups for Planned Maintenance Windows     199
IBM's Virtual Tape Solution (VTL)     200
Duplicate Your Removable Media     203
Restoration Commands     204
The BRMS System Recovery Report     207
How the System Restores Access Paths     209
Backing Up and Recovering a Domino Server     209
Hardware Management Console (HMC)     213
Summary     214
Your Business Value of Systems Availability     217
High Availability-Take the High Road     219
Recovery on Your High-Availability Investment     220
Is Your H/A Truly High Availability?     232
IBM's Capacity Backup Offering     248
Summary     250
Vital Records and Critical Data Offsite Storage     251
Vital Record Management     253
Offsite Storage Considerations     268
Choosing an Offsite Storage Provider     271
Summary     273
Building Your Teams     275
Selecting Candidates: Pick Me! No, Don't Pick Me!     277
When There Is Loss of Life or Missing People     281
Building Your Recovery Teams     284
How to Work Together     289
The IT Recovery Management Team     293
The IT Technical Recovery Team     298
The Network Team     300
The Hardware Recovery Team     301
Application Recovery Team     302
Facility Recovery Team     303
Replacement Equipment     304
Disaster Recovery Preparedness     304
Administrative Responsibilities     305
Care for Your Recovery Teams During a Disaster     305
The Team's Meeting Place     310
Summary     314
Effective Communications     317
Develop an Employee Call Sheet     319
Who Do You Contact?     323
Selecting a Meeting Place for the Command Center     329
Facing and Dealing with the Media     334
Notification Solution Design     338
Summary     340
How to Develop and Document a Disaster Recovery Plan     341
Disaster Recovery Plan Development Overview     342
Ready, Set, Write the Plan     353
The Disaster Recovery Plan's Structure     359
Developing and Writing the Procedures     365
Disaster Recovery Teams Overview      381
Summary     389
Effective Plan-Activation Procedures     391
The Disaster-Alert Notification Procedure     393
First-Alert Response     396
Hotsite Call-up Procedures     406
Recalling Tapes from Your Offsite Storage Provider     412
Site Restoration Activities     413
Summary     422
The Need for System-Related Documentation     425
A Change in the i5 Philosophy Silos     427
Write It All Down     428
I Thought Those Backup Tapes Had Everything!     429
Collecting and Maintaining System Information     431
The Prtsysinf Command     432
Complete Site Loss versus Server Loss     434
Summary     441
System i5/iSeries Restoration Procedures     443
Recovery Procedures     444
Case Study Sample     444
Summary     475
System i5/iSeries BRMS Restoration Procedures     477
Summary     506
Testing Your Disaster Recovery Plan     507
Practice Just Like the Pros     510
Satisfy the Need for Testing     511
The Embarrassment of Testing: What If We Fail?      512
Open-Book Testing     514
Define a Complete Testing Project     515
Passive Testing     518
Active Testing     529
Disaster Recovery Coordinator Testing Duties     533
Introducing Murphy's Law     534
Evaluation of Test Results     535
Be a Survivor     536
Summary     538
Plan Maintenance     539
Your Plan Design     541
Implementing a Maintenance Philosophy     542
Revisit Your Plan-Get into Maintenance Mode     545
Change Management     549
Summary     560
Selecting a Commercial Hotsite Provider     563
Advance Planning = Hotsite     564
Internal or External Hotsite?     566
What to Look for in a Hotsite Provider     567
Cost Considerations     576
Summary     581
A Family DR Plan     583
Disaster Recovery Begins at Home     584
Emergency Supplies     585
Practice and Maintain Your Plan     587
Personal and Family Requirements     588
Awareness Training     589
Information on Family Disaster Plans     589
Summary      590
Sample Documents     591
Business Impact Analysis Questionnaire     591
Operational Priorities     592
Operational Impacts     592
Customer Service     593
Cash Flow/Revenue     593
Regulatory (If Applicable)     594
Increases In Liability     594
Vendor Relations     595
Financial Control/Reporting     595
Mission Critical IT Applications     596
Vulnerability     596
Server Criticality Analysis     597

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews