Read an Excerpt
Chapter 11: Troubleshooting and System RecoveryNetworks are amazingly intricate. Their software is even more complex than their hardware, and network operating systems are the most sophisticated pieces of software that programmers can create. It's virtually impossible to make such complicated program elements foolproof. Although Windows NT 4 is as robust an operating system as you will encounter, you will occasionally be called on to fix network problems-which often look much like server problems. It's important that you learn to determine quickly what components are likely to cause various fault conditions.
Troubleshooting is a skill like any other. Certain general principles can be applied to any troubleshooting situation, but you'll also need to know how your specific system works. The more you know, the better you'll be at diagnosing faults. And, as with any other skill, you'll get better at troubleshooting with practice. This chapter introduces you to some general computer troubleshooting principles and then shows you how to troubleshoot Windows NT operating system software and network connections. We will also look at procedures for backing up and restoring the system using tape backup systems.
This chapter covers material related to the "Troubleshoot and optimize server performance" objective only with respect to the "Troubleshoot performance problems by using Task Manager, Event Viewer, or Performance Monitor" subobjective. See Chapter 10, "Optimizing Windows NT 4," for information on the "Move, size, and add new page files" subobjective. See Chapter 1, "Planning the Windows NT Environment," for information on the other subobjectives, "Allocate server hardware based on application requirements..." and "Modify backup domain controller (BDC) placement...".
Principles of TroubleshootingTroubleshooting is the process of methodically eliminating faults from a system. Although troubleshooting a computer is difficult, you can quickly isolate faults by following a few basic rules that help you focus on the components more likely at fault.
To troubleshoot a network, you first determine the component that is at fault, then change the hardware or software configuration of the suspected component, and then test to see whether the configuration change has eliminated the fault.
If a hardware failure caused the fault, you will have to find and replace the failed component. If a software configuration causes the fault, you will have to reconfigure your system to eliminate the fault. In some cases, you may not be able to reconfigure the system because the problem involves the denial of some service that is required to reconfigure the faulty component. If you run into this catch-22, you may have to reinstall the operating system on the server or client that is faulty.
Working on electronic devices such as computers can be dangerous. Do not attempt to troubleshoot a computer unless you are very familiar with electrical safety, electronic equipment, and computer hardware.
With the latest service packs installed, the Windows NT software is very stable. Windows NT runs well and all of its services operate properly. If you have a persistent problem with a Windows NT server, the most likely cause is an incompatible hardware driver or improperly configured software. That said, bugs do exist in all nontrivial software.
Bugs are most likely to exist in rarely executed code.
Windows NT is very specific about which hardware it will work with. Early in NT's design cycle, Microsoft chose not to support every possible peripheral device for two reasons:
- DOS mode drivers allow security holes.
- It would be impossible to write drivers for all existing PC-compatible hardware.
Bugs are more likely the fault of a third-party driver than of Windows NT standard components. Consider these drivers primary suspects when troubleshooting.
Focus is important in troubleshooting. Making changes randomly, hoping something will work, is a good way to waste a lot of time and create even more problems with untracked changes. Focus on a specific component. Test your fix thoroughly. If you are not able to correct the fault, it's important to restore the original configuration before moving on to another component.
Troubleshooting is relatively easy when you are dealing with only one fault, as is generally the case with a hardware failure. Software failures, on the other hand, are usually more complicated. Sometimes two or more simultaneous problems are causing a fault. Correcting only one fault at a time will change the symptoms, but it will not correct the problem. For instance, suppose your modem doesn't work. You have a hardware conflict because your modem is set to the same IRQ as your LAN adapter, which caused your modem software to automatically detect the wrong modem. In fact, you have two problems to fix (a hardware setting and a software configuration) before you can operate your modem. Correcting one or the other problem will not allow you to use your modem.
Partial troubleshooting success or rotating symptoms usually indicates the presence of more than one problem.
Following the general principles of troubleshooting discussed in this section will help you quickly determine what is at fault in your system. However, no book or set of rules will really help you find a problem unless you understand the system you are troubleshooting. That's why we put this coverage here at the end of the book. We hope that by per forming the exercises and taking the tests in the first 10 chapters, you've gained the requisite knowledge to effectively troubleshoot Windows NT Server 4....