Usability Testing Essentials
Ready, Set ... Test!
By Carol M. Barnum
Copyright © 2011 Elsevier Inc.
All right reserved.
Chapter One Establishing the essentials
Communication equals remembering what it's like not to know. —Richard Saul Wurman
From the moment you know enough to talk about a product—any product, whether it's hardware, software, a video game, a training guide, or a website—you know too much to be able to tell if the product would be usable for a person who doesn't know what you know. As Jakob Nielsen, a strong advocate of usability in product design, puts it, "Your best guess is not good enough." That's why usability testing is essential.
With usability testing, we get to see what people actually do—what works for them, and what doesn't—not what we think they would do or even what they think they would do if they were using your product. When usability testing is a part of design and development, the knowledge we get about our users' experience supports all aspects of design and development.
This chapter presents the essentials of usability testing, which include the need to
focus on the user, not the product
start with some essential definitions:
* defining usability * defining usability testing and differentiating the two main types of testing: – formative testing – summative testing
know when to conduct small studies
know how to conduct small studies, which include:
* defining the user profile
* creating task-based scenarios
* using a think-aloud process
* making changes and testing again
know when to conduct large studies
think of usability testing as hill climbing
Focus on the user, not the product
When you focus on the user and not the product, you learn what works for your users, as well as what doesn't work, what pleases, what puzzles, and what frustrates them. You understand your users' experience with the product to determine whether the design matches their expectations and supports their goals.
Usability testing gives you this access to your users using your product to perform tasks that they would want to do, which are matched to goals that are realistic for them. In the testing situation, you have the chance to elicit their comments, to observe their body language (in many cases), to discover their wishes and hopes for the product, and to learn how well the product supports them in their goals. The mantra of usability testing is, "We are testing the product, not you." Many people begin a testing session with this statement. Even if you don't make this statement to the participant, it's important to remember that this is the focus of testing.
Start with some essential definitions
To have a common vocabulary to talk about user experience, we need a common set of definitions for the essential words we use.
types of testing
The best-known definition of usability is the one from ISO, the International Organization for Standardization (9241-11): "The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use."
Although this definition is rather formal, as you might expect for one that has become a standard, I like it because it encompasses the three critical elements of
Specific users—not just any user, but the specific ones for whom the product is designed.
Specified goals—these specific users have to share the goals for the product, meaning that the product's goals represent their goals.
A specific context of use—the product has to be designed to work in the environment in which these users will use it.
I also like this definition because it focuses on the critical measures of usability:
Effectiveness and efficiency support the user's need to achieve a goal for using the product with accuracy and speed. Frequently, this also means that the product supports the user in a way that is better than the current way in which the user works. This is the value-added part of usability. If the product doesn't add value to the way in which the user currently performs tasks or needs to learn to perform tasks, then the user will have no use for the product. For instance, if the user perceives that the online bill-paying feature offered by her bank is not worth the effort to set up and use, then she will continue to write checks, put stamps on envelopes, and mail in her payments. Her rejection of the new product may be because it does not appear to be efficient, even if it proves to be effective.
Beyond effectiveness and efficiency, however, is the critical criterion of satisfaction. Although measures of effectiveness and efficiency are, to some extent, determined by the user's perceptions of these qualities, there is no denying that the measure of satisfaction is derived wholly from the user's perception of satisfaction. Is the user satisfied with the display of the information on the page or screen? Is the design pleasing to the user? Is the overall experience a positive one? If users think that the answer to these questions is "yes," their interest in using the product will often trump recognized problems affecting effectiveness and efficiency. Why? Because satisfaction desirability. And the "desirability factor" is often the elusive brass ring that developers, especially marketing teams, are seeking in new products.
Satisfaction was clearly important when the ISO standard was developed, but it has become even more important today—some would say it is the most important measure of usability. That's because users expect products to be usable. Meeting users' expectations for satisfaction can determine whether users will resist, repel, or even rebel against using the product.
If the ISO definition seems a bit too formal for your tastes, you might find Whitney Quesenbery's definition more to your liking. Quesenbery, a well-known usability consultant, distills the definition of usability into the following easy-to-remember dimensions of usability, which she calls the 5Es:
Effective How completely and accurately the work or experience is completed or goals reached
Efficient How quickly this work can be completed
Engaging How well the interface draws the user into the interaction and how pleasant and satisfying it is to use
Error tolerant How well the product prevents errors and can help the user recover from mistakes that do occur
Easy to learn How well the product supports both the initial orientation and continued learning throughout the complete lifetime of use
Peter Morville, a well-known information architect and co-author of the "polar bear" book, put together many of these concepts of usability in a visual form, which he calls the user experience honeycomb (Figure 1.1). It was originally intended to explain the qualities of user experience that web designers must address, but it can just as easily show the experience that all product designers should address.
The facets in the honeycomb include both behavioral measures and the intangibles of "valuable," "desirable," and "credible" that users determine through their use of the product. You can use the honeycomb as the basis for discussion about what elements are most important to build into your products so that the user experience is a positive one. You can also use the honeycomb to determine what facets you want to learn from users when you conduct usability testing.
Defining usability testing
When I refer to usability testing, I mean the activity that focuses on observing users working with a product, performing tasks that are real and meaningful to them.
Although much has changed in the approaches we may take to doing usability testing, even including the possibility of not observing users when conducting remote unmoderated testing, the core definition remains basically unchanged. Changes in technology, including access to users anywhere at any time, coupled with changes in the scope of testing (from very big to very small studies) mean that the definition of usability testing needs to stretch to encompass the methods and practices that support testing in many different environments and under many different conditions. As you will see in this book, the simple definition I use can make that stretch.
Using this definition for all usability testing, we can now look at subdividing testing into two types, depending on the point at which it is done and the goal for the study:
Formative testing—while the product is in development, with a goal of diagnosing and fixing problems; typically based on small studies, repeated during development.
Summative testing—after the product is finished, with a goal of establishing a baseline of metrics or validating that the product meets requirements; generally requires larger numbers for statistical validity.
With these essential definitions for talking about usability testing, we can now start to apply them.
For those of you who want to take a small detour first, you might want to take a peek at a brief history of usability testing practice in the sidebar. I've put this history in this first chapter because there are still people who question how you can get good results from small studies. I find that I frequently need to explain how—and why—usability testing works when you see only a few users. If you need the ammunition for this argument, you'll get it from this brief history.
Take a peek at a brief history of usability testing—then and now
"Those who don't know history are destined to repeat it." Edmund Burke, a British philosopher and statesman, made that statement in the 18th century, and you have probably heard something like it said in your history class or somewhere else. So, what's its relevance here? A little bit of the history of usability testing can help you see where the practice came from and how it's practiced today. Some people still think the traditional way is the only way it's done. If you want to take a quick peek at how it was practiced in the beginning and how it's practiced today, read on.
Traditional usability testing relies on the practices of experimental design
Usability testing, as it was commonly practiced from its beginnings until well into the 1990s, was a formal process, employing the methods of experimental design. As such, it was expensive, time consuming, and rigorous. Labs, where such tests were conducted, were managed by usability experts who typically had education and training as cognitive scientists, experimental psychologists, or human factors engineers. Because tests were viewed as research experiments, they typically required 30 to 50 "test subjects."
Who could afford to do it? Not many. So, not much usability testing was done.
However, in the early 1990s, some research studies showed that effective testing could be done with smaller numbers. Among those doing this research were Jakob Nielsen and his colleague Tom Landauer, both, by the way, human factors researchers who were well versed in the experimental design method for usability studies. However, they were seeking a quicker way to get results, and they found one.
"Discount" usability testing changed the way we think about testing
Nielsen and Landauer (both working as researchers at Bellcore at that time) determined that the maximum cost–benefit ratio, derived by weighing the costs of testing and the benefits gained, is achieved when you test with three to five participants, as shown in the classic "curve" (Figure 1.2).
Here's what Nielsen says about the curve:
The most striking truth of the curve is that zero users give zero insights. As soon as you collect data from a single test user, your insights shoot up and you have already learned almost a third of all there is to know about the usability of the design. The difference between zero and even a little bit of data is astounding.
According to Nielsen, you should stop after the fifth user because you are seeing the same things repeated, and you will have reached the optimal return of 85% of the findings to be uncovered.
Good ideas have a tendency to bubble up at the same time. Just so, other researchers were publishing similar findings from small usability tests.
Robert Virzi, a researcher at GTE Laboratories at that time, reported his findings from small studies in "Streamlining the Design Process: Running Fewer Subjects" and "Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough?" James Lewis, a researcher at IBM, published his findings in "Sample Sizes for Usability Studies: Additional Considerations." Virzi and Lewis each found that small studies uncover 80% of the findings from a particular test. Nielsen and Landauer said the number was 85%. What these researchers gave us is evidence that small studies can be highly effective. Putting these research findings together, we can safely say that small studies can uncover 80–85% of the findings from a particular test. This result is not to be confused with uncovering 80–85% of usability findings for the entire product. That would take many, many studies. However, the findings from a particular study can frequently be applied to other parts of the product not tested.
When compared to large studies, small usability studies give us the following advantages over large studies. They can be
incorporated into the development of the product at little cost
incorporated into the development of the product without adversely affecting the development timeline
done early and often
These are the reasons why Nielsen called this approach "discount" usability testing. Nowadays we don't need to give it such a formal name. We call it usability testing.
Know when to conduct small studies
Today, the formal methodology of experimental design has largely given way to informal studies (although formal studies are still conducted and for good reasons).
These informal studies are in the category of "formative" usability testing. They are typically small in scope and often repeated during stages of product development. Their value comes from providing the development team with a list of findings to analyze and fix, then conducting another small study to see whether the fixes worked.
Formative studies also reveal what users like. These positive experiences are important to capture in a report or study notes so that they won't be lost as the product moves through development.
Formative studies are also a great tool for ending arguments. With a small study, developers can find out what works best for users, not what a vocal or powerful team member or manager thinks will work best.
Small studies, being small, don't provide metrics or statistics, but the list of findings that results from small studies provides great insights to developers that can be put into action right away.
Excerpted from Usability Testing Essentials by Carol M. Barnum Copyright © 2011 by Elsevier Inc.. Excerpted by permission of Morgan Kaufmann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.