In A/B testing, you unleash two different versions of a design on the world and see which performs the best. For decades, this has been a classic method in direct mail, where companies often split their mailing lists and send out different versions of a mailing to different recipients. A/B testing is also becoming popular on the Web, where it's easy to make your site show different page versions to different visitors.

Sometimes, A and B are directly competing designs and each version is served to half the users. Other times, A is the current design and serves as the control condition that most users see. In this scenario, B, which might be more daring or experimental, is served only to a small percentage of users until it has proven itself.

最后,在多变量测试, you vary multiple design elements at the same time, but the main issues are the same as with the more common A/B tests. For simplicity, I'll use the term "A/B" to refer to any study where you measure design alternatives by feeding them live traffic, regardless of the number of variables being tested.


与其他方法相比,A / B测试有四个巨大福利:

  • As a branch of网站分析, it measures theactual behaviorof your customers under real-world conditions. You can confidently conclude that if version B sells more than version A, then version B is the design you should show all users in the future.
  • 它可以measure very small performance differenceswith high statistical significance because you can throw boatloads of traffic at each design. Thesidebar shows how you can measure a 1% difference在两个设计之间的销售中。
  • 它可以resolve trade-offsbetween conflicting guidelines or qualitative usability findings by determining which one carries the most weight under the circumstances. For example, if an e-commerce site prominently asks users to enter a discount coupon,user testing shows如果他们没有优惠券,那人们会痛苦地抱怨,因为他们不想支付超过其他客户。与此同时,优惠券是一个很好的营销工具,如果没有进入代码的简单方法,优惠券持有者的可用性显然会降低。当电子商务网站尝试使用和没有优惠券入口字段的A / B测试时,当用户未提示用户在主要购买和结帐路径上提示用户时,整体销售通常增加了20-50%。因此,一般指南是避免突出的优惠券字段。尽管如此,您的网站可能是一个例外,优惠券有助于比他们伤害更多。通过在您自己的特定情况下,可以通过在自己的特定情况下进行自己的A / B测试来轻松了解。
  • It'scheap:一旦您创建了两种设计替代方案(或对您当前设计进行测试的一项创新),您只需将它们两者都放在服务器上并使用一点软件来随机为每个新用户提供一个版本或其他版本。此外,您通常需要cookie用户,以便在随后的访问时看到相同版本,而不是遭受波动的页面,但这也很容易实现。昂贵的可用性专家不需要监控每个用户的行为或分析复杂的交互设计问题。你才等到你收集了足够最好的数字的设计直到你收集了足够的统计数据。


With these clear benefits, why don't we use A/B testing for all projects? Because the downsides usually outweigh the upsides.

首先,A / B测试可以上ly be used for projects that have one clear, all-important goal,这就是说单个KPI(关键绩效指标)。此外,这个目标必须是可通过计算机可测量,通过计算简单的用户操作。可衡量行动的示例包括:

  • 销售电子商务网站。
  • Users subscribing to an email newsletter.
  • Users opening an online banking account.
  • 用户下载白皮书,要求销售人员呼叫,或以其他方式在销售管道中明确移动。

不幸的是,这是罕见的,这些行为是一个唯一的目标。是的,对于电子商务,通过销售收集的美元可能是至关重要的。但是,在线关闭销售的网站通常不能说出一个所需的用户动作是唯一重要的。是的,如果用户填写销售人员联系的表格,这是良好的。但是如果他们离开该网站对产品的更好感觉并将您放在购买过程中以后的候选公司,那么它也很好,特别是B2B sitesIf, for example, your only decision criterion is to determine which design generates the most white paper downloads, you risk undermining other parts of your business.

对于许多网站来说,最终目标是不可衡量through user actions on the server. Goals like improving brand reputation or supporting the company's public relations efforts can't be measured by whether users click a specific button. Press coverage resulting from your上line PR informationmight be measured by a clippings service, but it can't tell you whether the journalist visited the site before calling your CEO for a quote.

Similarly, while you can easily measure how many users sign up for your email newsletter, you can't assess the equally important issue ofhow they read your newsletter content在没有观察用户的情况下打开消息时。

A second downside of A/B testing is that it仅适用于完全实现的设计。测试设计是便宜的上ce it's up and running,but we all know that implementation can take a long time. Before you can expose it to real customers on your live website, you must fully debug an experimental design. A/B testing is thus suitable for only a very small number of ideas.

相反,paper prototypinglets you try out several different ideas in a single day. Of course, prototype tests give you only qualitative data, but they typically help you reject truly bad ideas quickly and focus your efforts on polishing the good ones. Much experience shows that refining designs through multiple iterations produces superior user interfaces. If each iteration is slow or resource-intensive, you'll have too few iterations to truly refine a design.

A possible compromise is to use paper prototyping to develop your ideas. Once you have something great, you can subject it to A/B testing as a final stage to see whether it's truly better than the existing site. But A/B testing can't be the primary driver on a user interface design project.

Short-Term Focus

A/B testing's driving force is the number being measured as the test's outcome. Usually, this is an immediate user action, such as buying something. In theory, there's no reason why the metric couldn't be a long-term outcome, such as total customer value over a five-year period. In practice, however, such long-term tracking rarely occurs. Nobody has the patience to wait years before they know whether A or B is the way to go.

Basing your decisions on short-term numbers, however, can lead you astray. A common example: Should you add a promotion to your homepage or product pages? Unless you're promoting something relevant to a user's current need, every promotion you add clutters the pages and lowers the site's usability.


Sometimes, an A/B test can help you here, if you examine the impact on overall sales, not just sales of the promoted product. Other times, A/B tests will fail you if the negative impact doesn't occur immediately. A cluttered site is less pleasant to use, for example, and might reduce customer loyalty. Although customers might make their current purchases, they might also be less likely to return. However small, such an effect would gradually siphon off customers as they seek out other, better sites. (This is how more-cluttered search engines lost to Google over a four-year period.)

No Behavioral Insights

The biggest problem with A/B testing is that you don't know为什么you get the measured results. You're not observing the users or listening in on their thoughts. All you know is that, statistically, more people performed a certain action with design A than with design B. Sure, this supports the launch of design A, but it doesn't help you move ahead with other design decisions.


Of course, you also have no idea whether其他变化可能会带来更大的改进, such as changing the button's color or the wording on its label. Or maybe changing the button's page position or its label's font size, rather than changing the button’s size, would create the same or better results. Basically, you know nothing about why button B was not optimal, which leaves you guessing about what else might help. After each guess, you have to implement more variations and wait until you collect enough statistics to accept or reject the guess.

Worst of all, A/B testing仅在您测试的元素上提供数据。这不是用户测试等开放式方法,用户通常会透露您从未预期的绊脚石。例如,常见的是发现与之相关的问题相信, where users simply don't want to do business with you because your site undermines your credibility.

Bigger issues like trust or uninformative product information often have effect sizes of 100% or more, meaning that your sales would double if such problems were identified and fixed. If you spend all your time fiddling with 1-2% improvements, you can easily overlook the100% improvementsthat come from qualitative insights into users' needs, desires, and fears.


A / B测试专业blems than benefits. You should not make it the first method you choose for improving your site's转换率。And it should certainly never be the only method used on a project. Qualitative observation of user behavior is faster and generates deeper insights. Also, qualitative research is less subject to the manyerrors and pitfalls that plague quantitative research

A/B testing does have its own advantages, however, and provides a great supplement to qualitative studies. Once yourcompany's commitment to usability has grown到您经常进行多种形式用户研究的水平,A / B测试肯定在工具箱中的位置。

(More on this angle in the full-day course oncombining UX mehods and Analytics methodslike A/B testing.)