Statistical Methods in Diagnostic Medicine / Edition 2 available in Hardcover, eBook

Statistical Methods in Diagnostic Medicine / Edition 2
- ISBN-10:
- 0470183144
- ISBN-13:
- 9780470183144
- Pub. Date:
- 03/29/2011
- Publisher:
- Wiley

Statistical Methods in Diagnostic Medicine / Edition 2
Buy New
$148.95Buy Used
$101.68-
SHIP THIS ITEMIn stock. Ships in 1-2 days.PICK UP IN STORE
Your local store may have stock of this item.
Available within 2 business hours
-
SHIP THIS ITEM
Temporarily Out of Stock Online
Please check back later for updated availability.
Overview
" . . . the book is a valuable addition to the literature in the field, serving as a much-needed guide for both clinicians and advanced students."—Zentralblatt MATH
A new edition of the cutting-edge guide to diagnostic tests in medical research
In recent years, a considerable amount of research has focused on evolving methods for designing and analyzing diagnostic accuracy studies. Statistical Methods in Diagnostic Medicine, Second Edition continues to provide a comprehensive approach to the topic, guiding readers through the necessary practices for understanding these studies and generalizing the results to patient populations.
Following a basic introduction to measuring test accuracy and study design, the authors successfully define various measures of diagnostic accuracy, describe strategies for designing diagnostic accuracy studies, and present key statistical methods for estimating and comparing test accuracy. Topics new to the Second Edition include:
- Methods for tests designed to detect and locate lesions
- Recommendations for covariate-adjustment
- Methods for estimating and comparing predictive values and sample size calculations
- Correcting techniques for verification and imperfect standard biases
- Sample size calculation for multiple reader studies when pilot data are available
- Updated meta-analysis methods, now incorporating random effects
Three case studies thoroughly showcase some of the questions and statistical issues that arise in diagnostic medicine, with all associated data provided in detailed appendices. A related web site features Fortran, SAS®, and R software packages so that readers can conduct their own analyses.
Statistical Methods in Diagnostic Medicine, Second Edition is an excellent supplement for biostatistics courses at the graduate level. It also serves as a valuable reference for clinicians and researchers working in the fields of medicine, epidemiology, and biostatistics.
Product Details
ISBN-13: | 9780470183144 |
---|---|
Publisher: | Wiley |
Publication date: | 03/29/2011 |
Series: | Wiley Series in Probability and Statistics , #712 |
Pages: | 592 |
Product dimensions: | 6.40(w) x 9.30(h) x 1.40(d) |
About the Author
Nancy A. Obuchowski, PhD, is Vice Chairperson of the Department of Quantitative Health Sciences at the Cleveland Clinic Foundation. A Fellow of the American Statistical Association, she has written more than 100 journal articles on the design and analysis of studies of screening and diagnostic tests.
Donna K. McClish, PhD, is Associate Professor and Graduate Program Director in Biostatistics at Virginia Commonwealth University. She has written more than 100 journal articles on statistical methods in epidemiology, diagnostic medicine, and health services research.
Read an Excerpt
Statistical Methods in Diagnostic Medicine
By Xiao-Hua Zhou Donna K. McClish Nancy A. Obuchowski
John Wiley & Sons
ISBN: 0-471-34772-8Chapter One
Introduction1.1 WHY THIS BOOK?
Diagnostic tests play an important role in medical care and contribute significantly to health care costs (Epstein, Begg, and McNeil, 1986), yet the quality of diagnostic test studies has been poor (Begg, 1987). Reid, Lachs, and Feinstein (1995) reviewed articles on diagnostic tests that were published between 1978 and 1993 and reported many errors in design and analysis. These errors have fostered distrust in the conclusions of diagnostic test studies and have contributed to misunderstandings in the selection and interpretation of diagnostic tests.
Some examples of common errors in diagnostic test studies help illustrate the problem. One common error involves how the diagnostic tests are interpreted. Many investigators of new diagnostic tests attempt to develop criteria for interpreting such tests based only on the test results of healthy volunteers. For example, for a new test to detect pancreatitis, investigators might measure the amount of a certain enzyme in healthy volunteers. A typical decision criterion, or cutpoint, is three standard deviations (SDs) from the mean. Patients with an enzyme level of three SDs below the mean of healthy volunteers are labeled positive for pancreatitis; patients with an enzyme level above this cutpoint are labeled negative. In proposing such a criterion, investigators fail to recognize
1. the relevance of natural distributions (i.e., are they Gaussian [normal]?);
2. the amount of potential overlap with test results from patients with the condition;
3. the clinical significance of diagnostic errors, both attributed to falsely labeling a patient without the condition as positive and a patient with the condition as negative; and
4. the poor generalization of results based on healthy volunteers.
In Chapter 2, we discuss factors involved in determining optimal cutpoints for diagnostic tests; in Chapter 4, we discuss methods of finding optimal cutpoints and estimating diagnostic errors associated with them.
Another common error in diagnostic test studies is the notion that making a rigorous assessment of a patient's true condition-with the exclusion of patients for whom a less rigorous assessment was made-allows for a scientifically sound study. An example comes from literature on the use of ventilation-perfusion lung scans for diagnosing pulmonary emboli. The ventilation-perfusion lung scan is a noninvasive test used to screen high-risk patients for pulmonary emboli; its accuracy in various populations is unknown. Pulmonary angiography, on the other hand, is a highly accurate test for diagnosing pulmonary emboli, but it is invasive. In a study that assesses the accuracy of ventilation-perfusion lung scans, the study sample usually consists of patients who have undergone both a ventilation-perfusion lung scan and a pulmonary angiogram, with the angiogram serving as the reference for estimating accuracy. (See Chapter 2 for the definition and some examples of gold standards.) Patients who undergo a ventilation-perfusion lung scan but not an angiogram would be excluded from such a study. This study design can lead to serious errors in test accuracy estimates. These errors occur because the study sample is not truly representative of the patient population undergoing ventilation-perfusion lung scans-rather patients with positive scans are often recommended for angiograms and patients with negative scans are often not sent for angiograms because of the unnecessary risks. In Chapter 3, we discuss workup bias and its most common form, verification bias, as well as the strategies to avoid them. In Chapter 10, we present statistical methods developed specifically to correct for verification bias.
Another error involves problems with agreement studies, in which investigators often draw conclusions about a new test's diagnostic capabilities based on how often it agrees with a conventional test. For example, digital mammography, a new method of acquiring images of the breast for screening and diagnosis, has many advantages over conventional film mammography, including easy storage and transfer of images. In a study comparing these two tests on a sample of patients, if the results agree often, we will be encouraged by the new test. But what if the digital and film results do not agree often? It is incorrect for us to conclude that digital mammography has inferior accuracy. Clearly, if digital mammography has better accuracy than film mammography, then the two tests will not agree. Similarly, the two tests can have the same accuracy but make mistakes on different patients, resulting in poor agreement. A more valid approach to assessing a new test's diagnostic worth is to compare both tests against the true diagnoses of the patients to estimate and compare the accuracy of both tests. Assessment of diagnostic accuracy is usually more difficult than assessment of agreement, but it is a more relevant, valid approach (Zweig and Campbell, 1993). In Chapter 5, we present methods for comparing the accuracy of two tests when the true diagnoses of the patients are known; in Chapter 11, we present methods for comparing the accuracy of two tests when the true diagnoses are unknown.
There is no question that studies of diagnostic test accuracy are challenging to design and require specialized statistical methods for their analysis. There are few good references and no comprehensive sources of information on how to design and analyze diagnostic test studies. This book fulfills this need. In it, we present and illustrate concepts and methods for designing, analyzing, interpreting, and reporting studies of the diagnostic test accuracy. In Part I (Chapters 2-7), we define various measures of diagnostic accuracy, describe strategies for designing diagnostic accuracy studies, and present basic statistical methods for estimating and comparing test accuracies, calculating sample sizes, and synthesizing literature for meta-analysis. In Part II (Chapters 8-12), we present more advanced statistical methods of describing a test's accuracy when patient characteristics affected it, of analyzing multireader studies and studies with verification bias or imperfect gold standards, and of performing metaanalyses.
1.2 WHAT IS DIAGNOSTIC ACCURACY?
A diagnostic test has two purposes (Sox, Jr. et al., 1989): (1) to provide reliable information about the patient's condition and (2) to influence the health care provider's plan for managing the patient. McNeil and Adelstein (1976) added a third possible purpose: to understand disease mechanisms and natural history through research (e.g., the repeated testing of patients with chronic conditions). A test can serve these purposes only if the health care provider knows how to interpret it. This information is acquired through an assessment of the test's diagnostic accuracy, which is simply the ability of a test to discriminate among alternative states of health (Zweig and Campbell, 1993). Although frequently there are more than two states of health, the clinical question can often be appropriately dichotomized (e.g., the presence or absence of Parkinson's disease or the presence or absence of an invasive carcinoma). In this book, we consider these types of situations (i.e., the binary health states).
In assessing the performance of the diagnostic test, we want to know if the test results differ for the two health states. If they do not differ, then the test has negligible accuracy; if they do not overlap for the two health states, then the test has perfect accuracy. Most test accuracies fall between these two extremes. The most important error to avoid is the assumption that a test result is a true representation of the patient's condition (Sox, Jr. et al., 1989). Most diagnostic information is imperfect; it may influence the health care provider's thinking, but uncertainty will remain about the patient's true condition. If the test is negative for the condition, should the health care provider assume that the patient is disease-free and thus send him or her home? If the test is positive for the condition, should the health care provider assume the patient has the condition and thus begin treatment? And if the test result requires interpretation by a trained reader (e.g., a radiologist), should the health care provider get a second opinion of the interpretation?
To answer these critical questions, the health care provider needs to have information on the test's absolute and relative capabilities and an understanding of the complex interactions between the test and the trained readers (Beam et al., 1992). The health care provider must ask, How does the test perform among patients with the condition (i.e., the test's sensitivity)? How does the test perform among patients without the condition (i.e., the test's specificity)? Does the test serve to replace an older test, or should multiple tests be performed? If multiple tests are performed, how should they be executed (i.e., sequentially or in parallel)? How reproducible are interpretations by different readers?
Radiographic image quality is often confused with diagnostic accuracy. As noted by Lusted (1971), an image can reproduce the shape and texture of tissues most faithfully from a physical standpoint, but it may not contain useful diagnostic information. Fryback and Thornbury (1991) described a working model for assessing the efficacy of diagnostic tests in medicine. The model delineates image quality, diagnostic accuracy, treatment decisions, and patient outcome and describes how these conditions relate to the assessment of a diagnostic test. Expanding upon other works (Cochrane, 1972; Thornbury, Fryback, and Edwards, 1975; McNeil and Adelstein, 1976; Fineberg, 1978), Fryback and Thornbury (1991) proposed the following 6-level hierarchical model. Level 1, at the bottom, is technical efficacy, which is measured by such features as image resolution and sharpness for radiographic tests and optimal sampling times and doses for diagnostic marker tests; level 2 is diagnostic accuracy efficacy, that is, the sensitivity, specificity, and receiver-operating characteristic (ROC) curve; level 3 is diagnostic thinking efficacy, which can be measured, for example, by the difference in the clinician's estimated probability of a diagnostic before versus after the test results are known; level 4 is therapeutic efficacy, which can be measured by the percentage of time that therapy planned before the diagnostic test is altered by the results of the test; level 5 is patient outcome efficacy, which can be defined, for example, by the number of deaths prevented, or a change in the quality life because of, the test information; and level 6, at the top, is societal efficacy, which is often described by the cost-effectiveness of the test as measured from a societal perspective. A key feature of this model is that for a diagnostic test to be efficacious at a higher level, it must be efficacious at all lower levels. The reverse is not true; that is, the fact that a test can be efficacious at one level does not guarantee that it will be efficacious at higher levels. In this book, we deal exclusively with the assessment of diagnostic accuracy efficacy (level 2 of the hierarchical model), recognizing that it is only one step in the complete assessment of a diagnostic test's usefulness.
1.3 LANDMARKS IN STATISTICAL METHODS OF DIAGNOSTIC MEDICINE
In 1971, Lusted wrote a highly influential article in the journal Science in which he postulated that to measure the worth of a diagnostic test, one must measure the performance of the observers with the test. Lusted argued that ROC curves provide an ideal means of studying observer performance. Lusted was writing about radiographic tests, but ROC curves are now used to assess diagnostic test accuracy in many disciplines of medicine.
An ROC curve is a plot of a diagnostic test's sensitivity (i.e., the test's ability to detect the condition of interest) versus its false-positive rate (i.e., the test's inability to recognize normal anatomy and physiology as normal). The curve illustrates how different criteria for interpreting a test produce different values for the test's false-positive rate and sensitivity.
ROC curves and their analyses are based on statistical decision theory; they were originally developed for electronic signal-detection theory (Peterson, Birdsall, and Fox, 1954; Swets and Pickett, 1982). They have been applied in many medical and nonmedical endeavors, including studies of human perception and decision making (Green and Swets, 1966), industrial quality control (Drury and Fox, 1975), and military monitoring (Swets, 1977).
Lusted (1971) indicated that in diagnostic medicine, as in electronic signal-detection theory, a distinction must be made between the criteria that an observer uses for deciding whether a condition is present or absent and the observer's abilities the sensory and cognitive attributes used for interpreting the test results) for detecting the condition. ROC curves can be used to make this distinction. Lusted gave the following example: Suppose that the six points in Fig. 1.1 represent the diagnoses of six different physicians. The physicians have identical sensory and cognitive abilities for detecting tuberculosis on a chest radiograph, but they have different criteria for which densities actually should be called tuberculosis. The upper points on the curve represent individuals with more liberal decision criteria (i.e., the low-density nodules are called positive), whereas the lower points on the curve represent individuals with more stringent criteria (i.e., only high-density nodules are called positive). In diagnostic medicine, we are interested in measuring the observer's abilities for interpreting test results rather than his or her criteria for decisions.
Swets and Pickett (1982) noted two other key features of ROC curves that make them ideal for studying diagnostic tests. First, the curves display all possible cutpoints and thus supply estimates of the frequency of various outcomes (i.e., true positives, true negatives, false positives, and false negatives) at each cutpoint. (See Chapter 2 for definitions.) Second, the curves allow the use of previous probabilities of the condition, as well as calculations of the costs and benefits of correct and incorrect decisions, to determine the best cutpoint for a given test in a given setting. (See Chapters 2 and 4.)
Green and Swets (1966) were first to use the Gaussian model for estimating the ROC curve. They assumed that the various sensory events (i.e., test results) could be mapped on a single line. The numerical value of an observed event (call it T) affects the observer's confidence about whether the condition is present or absent.
Continues...
Excerpted from Statistical Methods in Diagnostic Medicine by Xiao-Hua Zhou Donna K. McClish Nancy A. Obuchowski Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
List of Figures xixList of Tables xxiii
0.1 Preface xxix
0.2 Acknowledgements xxxi
Part I. Basic Concepts and Methods
1. Introduction 3
2. Measures of Diagnostic Accuracy 13
3. Design of Diagnostic Accuracy Studies 57
4. Estimation and Hypothesis Testing in a Single Sample 103
5. Comparing the Accuracy of Two Diagnostic Tests 165
6. Sample Size Calculations 193
7. Introduction to Meta-analysis for Diagnostic Accuracy Studies 231
Part II. Advanced Methods
8. Regression Analysis for Independent ROC Data 263
9. Analysis of Multiple Reader and/or Multiple Test Studies 297
10. Methods for Correcting Verification Bias 329
11. Methods for Correcting Imperfect Gold Standard Bias 389
12. Statistical Analysis for Meta-analysis 435
Appendix A. Case Studies and Chapter 8 Data 449
Appendix B. Jackknife and Bootstrap Methods of Estimating Variances and Confidence Intervals 477
What People are Saying About This
"The authors, overall, have done a good job of revising their first edition, addressing the critical reviews as well as expanding and updating their coverage . . . In summary, this is a good book, focusing on medical diagnosis as the name promises, presenting a wealth of methods in detail with good discussion." (Journal of Biopharmaceutical Statistics, 2011)
"Early chapters are accessible to readers with a basic knowledge of statistical and medical terminology, and the second section addresses data analysts with basic training in biostatistics. Later chapters assume deeper background in statistics, but the examples should be accessible to all. The 2002 edition has been updated throughout, and three new case studies have been added." (Booknews, 1 June 2011)