Read an Excerpt
Quantifying the User Experience
Practical Statistics for User Research
By Jeff Sauro James R. Lewis
Morgan Kaufmann
Copyright © 2012 Jeff Sauro and James R. Lewis
All right reserved.
ISBN: 978-0-12-384969-4
Chapter One
Introduction and How to Use This Book
INTRODUCTION
The last thing many designers and researchers in the field of user experience think of is statistics. In fact, we know many practitioners who find the field appealing because it largely avoids those impersonal numbers. The thinking goes that if usability and design are qualitative activities, it's safe to skip the formulas and numbers.
Although design and several usability activities are certainly qualitative, the impact of good and bad designs can be easily quantified in conversions, completion rates, completion times, perceived satisfaction, recommendations, and sales. Increasingly, usability practitioners and user researchers are expected to quantify the benefits of their efforts. If they don't, someone else will—unfortunately that someone else might not use the right metrics or methods.
THE ORGANIZATION OF THIS BOOK
This book is intended for those who measure the behavior and attitudes of people as they interact with interfaces. This book is not about abstract mathematical theories for which you may someday find a partial use. Instead, this book is about working backwards from the most common questions and problems you'll encounter as you conduct, analyze, and report on user research projects. In general, these activities fall into three areas:
1. Summarizing data and computing margins of error (Chapter 3).
2. Determining if there is a statistically significant difference, either in comparison to a benchmark (Chapter 4) or between groups (Chapter 5).
3. Finding the appropriate sample size for a study (Chapters 6 and 7).
We also provide:
Background chapters with an overview of common ways to quantify user research (Chapter 2) and a quick introduction/review of many fundamental statistical concepts (Appendix).
A comprehensive discussion of standardized usability questionnaires (Chapter 8).
A discussion of enduring statistical controversies of which user researchers should be aware and able to articulate in defense of their analyses (Chapter 9).
A wrap-up chapter with pointers to more information on statistics for user research (Chapter 10).
Each chapter ends with a list of key points and references. Most chapters also include a set of problems and answers to those problems so you can check your understanding of the content.
HOW TO USE THIS BOOK
Despite there being a significant proportion of user research practitioners with advanced degrees, about 10% have PhDs (UPA, 2011); for most people in the social sciences, statistics is the only quantitative course they have to take. For many, statistics is a subject they know they should understand, but it often brings back bad memories of high school math, poor teachers, and an abstract and difficult topic.
While we'd like to take all the pain out of learning and using statistics, there are still formulas, math, and some abstract concepts that we just can't avoid. Some people want to see how the statistics work, and for them we provide the math. If you're not terribly interested in the computational mechanics, then you can skip over the formulas and focus more on how to apply the procedures.
Readers who are familiar with many statistical procedures and formulas may find that some of the formulas we use differ from what you learned in your college statistics courses. Part of this is from recent advances in statistics (especially for dealing with binary data). Another part is due to our selecting the best procedures for practical user research, focusing on procedures that work well for the types of data and sample sizes you'll likely encounter.
Based on teaching many courses at industry conferences and at companies, we know the statistics background of the readers of this book will vary substantially. Some of you may have never taken a statistics course whereas others probably took several in graduate school. As much as possible, we've incorporated relevant discussions around the concepts as they appear in each chapter with plenty of examples using actual data from real user research studies.
In our experience, one of the hardest things to remember in applying statistics is what statistical test to perform when. To help with this problem, we've provided decision maps (see Figures 1.1 to 1.4) to help you get to the right statistical test and the sections of the book that discuss it.
What Test Should I Use?
The first decision point comes from the type of data you have. See the Appendix for a discussion of the distinction between discrete and continuous data. In general, for deciding which test to use, you need to know if your data are discrete-binary (e.g., pass/fail data coded as 1's and 0's) or more continuous (e.g., task-time or rating-scale data).
The next major decision is whether you're comparing data or just getting an estimate of precision. To get an estimate of precision you compute a confidence interval around your sample metrics (e.g., what is the margin of error around a completion rate of 70%; see Chapter 3). By comparing data we mean comparing data from two or more groups (e.g., task completion times for Products A and B; see Chapter 5) or comparing your data to a benchmark (e.g., is the completion rate for Product A significantly above 70%; see Chapter 4).
If you're comparing data, the next decision is whether the groups of data come from the same or different users. Continuing on that path, the final decision depends on whether there are two groups to compare or more than two groups.
To find the appropriate section in each chapter for the methods depicted in Figures 1.1 and 1.2, consult Tables 1.1 and 1.2. Note that methods discussed in Chapter 10 are outside the scope of this book, and receive just a brief description in their sections.
For example, let's say you want to know which statistical test to use if you are comparing completion rates on an older version of a product and a new version where a different set of people participated in each test.
1. Because completion rates are discrete-binary data (1 = pass and 0 = fail), we should use the decision map in Figure 1.2.
2. Start at the first box, "Comparing Data?," and select "Y" because we are comparing a data set from an older product with a data set from a new product.
3. This takes us to the "Different Users in Each Group" box—we have different users in each group so we select "Y."
4. Now we're at the "3 or More Groups" box—we have only two groups of users (before and after) so we select "N."
5. We stop at the "N - 1 Two-Proportion Test and Fisher Exact Test" (Chapter 5).
What Sample Size Do I Need?
Often the first collision a user researcher has with statistics is in planning sample sizes. Although there are many "rules of thumb" on how many users you should test or how many customer responses you need to achieve your goals, there really are precise ways of finding the answer. The first step is to identify the type of test for which you're collecting data. In general, there are three ways of determining your sample size:
1. Estimating a parameter with a specified precision (e.g., if your goal is to estimate completion rates with a margin of error of no more than 5%, or completion times with a margin of error of no more than 15 seconds).
2. Comparing two or more groups or comparing one group to a benchmark.
3. Problem discovery, specifically the number of users you need in a usability test to find a specified percentage of usability problems with a specified probability of occurrence.
To find the appropriate section in each chapter for the methods depicted in Figures 1.3 and 1.4, consult Table 1.3.
For example, let's say you want to compute the appropriate sample size if the same users will rate the usability of two products using a standardized questionnaire that provides a mean score.
1. Because the goal is to compare data, start with the sample size decision map in Figure 1.3.
2. At the "Comparing Groups?" box, select "Y" because there will be two groups of data, one for each product.
3. At the "Different Users in Each Group?" box, select "N" because each group will have the same users.
4. Because rating-scale data are not binary, select "N" at the "Binary Data?" box.
5. We stop at the "Paired Means" procedure (Chapter 6).
You Don't Have to Do the Computations by Hand
We've provided sufficient detail in the formulas and examples that you should be able to do all computations in Microsoft Excel. If you have an existing statistical package like SPSS, Minitab, or SAS, you may find some of the results will differ (e.g., confidence intervals and sample size computations) or they don't include some of the statistical tests we recommend, so be sure to check the notes associated with the procedures.
We've created an Excel calculator that performs all the computations covered in this book. It includes both standard statistical output (p-values and confidence intervals) and some more user-friendly output that, for example, reminds you how to interpret that ubiquitous p-value and that you can paste right into reports. It is available for purchase online at www.measuringusability.com/ products/expandedStats. For detailed information on how to use the Excel calculator (or a custom set of functions written in the R statistical programming language) to solve the over 100 quantitative examples and exercises that appear in this book, see Lewis and Sauro (2012).
KEY POINTS FROM THE CHAPTER
The primary purpose of this book is to provide a statistical resource for those who measure the behavior and attitudes of people as they interact with interfaces.
Our focus is on methods applicable to practical user research, based on our experience, investigations, and reviews of the latest statistical literature.
As an aid to the persistent problem of remembering what method to use under what circumstances, this chapter contains four decision maps to guide researchers to the appropriate method and its chapter in this book.
CHAPTER REVIEW QUESTIONS
1. Suppose you need to analyze a sample of task-time data against a specified benchmark. For example, you want to know if the average task time is less than two minutes. What procedure should you use?
2. Suppose you have some conversion-rate data and you just want to understand how precise the estimate is. For example, in examining the server log data you see 10,000 page views and 55 clicks on a registration button. What procedure should you use?
3. Suppose you're planning to conduct a study in which the primary goal is to compare task completion times for two products, with two independent groups of participants providing the times. Which sample size estimation method should you use?
4. Suppose you're planning to run a formative usability study—one where you're going to watch people use the product you're developing and see what problems they encounter. Which sample size estimation method should you use?
Answers
1. Task-time data are continuous (not binary-discrete), so start with the decision map in Figure 1.1. Because you're testing against a benchmark rather than comparing groups of data, follow the "N" path from "Comparing Data?" At "Testing Against a Benchmark?," select the "Y" path. Finally, at "Task Time?," take the "Y" path, which leads you to "1-Sample t (Log)." As shown in Table 1.1, you'll find that method discussed in Chapter 4 in the "Comparing a Task Time to a Benchmark" section on p. 54.
2. Conversion-rate data are binary-discrete, so start with the decision map in Figure 1.2. You're just estimating the rate rather than comparing a set of rates, so at "Comparing Data?," take the "N" path. At "Testing Against a Benchmark?," also take the "N" path. This leads you to "Adjusted Wald Confidence Interval," which, according to Table 1.2, is discussed in Chapter 3 in the "Adjusted-Wald Interval: Add Two Successes and Two Failures" section on p. 22.
3. Because you're planning a comparison of two independent sets of task times, start with the decision map in Figure 1.3. At "Comparing Groups?," select the "Y" path. At "Different Users in Each Group?," select the "Y" path. At "Binary Data?," select the "N" path. This takes you to "2 Means," which, according to Table 1.3, is discussed in Chapter 6 in the "Comparing Values" section. See Example 6 on p. 116.
4. For this type of problem discovery evaluation, you're not planning any type of comparison, so start with the decision map in Figure 1.4. You're not planning to estimate any parameters, such as task times or problem occurrence rates, so at "Estimating a Parameter?," take the "N" path. This leads you to "Problem Discovery Sample Size," which, according to Table 1.3, is discussed in Chapter 7 in the "Using a Probabilistic Model of Problem Discovery to Estimate Sample Sizes for Formative User Research" section on p. 143.
(Continues...)
Excerpted from Quantifying the User Experience by Jeff Sauro James R. Lewis Copyright © 2012 by Jeff Sauro and James R. Lewis. Excerpted by permission of Morgan Kaufmann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.