菜单 关闭

定量研究的风险

通过 雅各布·尼尔森3月1日,二千零四

总结:数字拜物教导致可用性研究误入歧途,因为它关注的是通常是错误的统计分析,偏颇的,误导,或者过于狭窄。最好强调洞察力和定性研究。


用户研究主要有两种类型:quantitative(statistics定性的(洞察力).定量有着奇特的优势,but qualitative delivers the best results for the least money.Furthermore,quantitative studies are often too narrow to be useful and are sometimes directly misleading.

The key benefit of quantitative studies is simple: they boil a complex situation down to a single number that's easy to grasp and discuss.我自己也利用了这种交流的清晰度,例如,在报告使用网站是206% more difficult for users with disabilitiesand老年人更难接受122%而不是主流用户。

Of course,使用底线分数总结详细的可用性研究结果忽略了需要273页来解释的细节:为什么?are websites more difficult for these groups?什么should you do about it?

Numbers,however,有自己的故事可讲:

  • 他们告诉我们,残疾用户的情况比老年人更糟。因为有更多的老年人,他们构成了一个特别富裕的观众,websites might nonetheless choose to spend more resources catering to seniors than to the disabled.Knowing the score lets organizations make conscious decisions in how they allocate scarce resources.
  • 他们告诉我们问题并不小。如果残疾用户的网络比其他用户困难5%,大多数人会说“随便什么;deal with it." But discriminating by 206% is too much for many of us to stomach.

数字也允许设计之间的比较manbetx官方网站手机版随着时间的推移跟踪。10年后(2014年)如果老年人使用网站的难度比年轻人高50%,we'll know that we've made substantial progress.(事实上,对于老年人来说,网站已经变得更好了。in the 11 years between 2002 and 2013.)

当心数字崇拜

当我阅读别人的研究报告时,I usually find that their qualitative study results are more credible and trustworthy than their quantitative results.It's a dangerous mistake to believe that statistical research is somehow more scientific or credible than insight-based observational research.事实上,大多数统计研究是较少的比定性研究更可信。manbetx官方网站手机版设计研究与医学不同:人种学是传统科学领域中最接近的类比。

User interfaces and usability are highly contextual,它们的有效性取决于对人类行为的广泛理解。通常情况下,manbetx官方网站手机版designers must combine and trade-off design guidelines,which requires some understanding of the基本原理和原则建议背后。Issues that are so specific that a formula can pinpoint them are usually irrelevant for practical manbetx官方网站手机版design projects.

Fixating on numbers rather than qualitative insightshas driven many usability studies astray.如以下几点所示:定量方法在许多方面固有的风险。

Random Results

研究人员经常进行统计分析,以确定数值结果是否statistically significant“按照惯例,他们认为,如果结果低于5% probability that it could have occurred randomlyrather than signifying a true phenomenon.

This sounds reasonable,但这意味着20个“重要”结果中的1个可能是随机的如果研究人员仅仅依靠定量方法。

幸运的是,大多数优秀的研究人员,尤其是用户界面领域的研究人员,使用的不仅仅是简单的定量分析。Thus,they typically have insights beyond simple statistics when they publish a paper,开车下来,but doesn't eliminate,虚假的发现

There's a reverse phenomenon as well: Sometimes a true finding is statistically insignificant because of the experiment's manbetx官方网站手机版design.也许这项研究没有包括足够多的参与者来观察大量但罕见的发现。因此,仅仅因为这些问题没有出现在定量研究结果中,就把它们视为无关紧要的问题而不予考虑是错误的。

这个"butterfly ballot" in the 2000 election in Florida这是一个很好的例子:一项对100名选民的研究,不包括统计上有相当多的人,他们打算投阿尔·戈尔的票,但却给帕特里克·布坎南打了一个洞。because less than 1% of voters made this mistake.A qualitative study,另一方面,可能会透露一些选民说,"Okay,I want to vote for Gore,所以我在打第二个洞…哦,wait,it looks like Buchanan's arrow points to that hole.我得下一个去打戈尔的洞。” Hesitations and almost-errors are gold致观察型学习辅导员,but to translate them into manbetx官方网站手机版design recommendations requires a qualitative analysis that pairs observations with interpretive knowledge of usability principles.

Pulling Correlations Out of a Hat

如果你测量的变量足够多,you will inevitably discover that some seem to correlate.通过软件运行所有的统计数据,一些“重要”的相关性肯定会出现。(Remember: 1 out of 20 analyses are "significant," even if there is no underlying true phenomenon.)

测量7的研究韵律学将在变量之间产生21个可能的相关性。Thus,平均而言,such studies will have one bogus correlation that the statistics program deems "significant," even if the issues being measured have no real connection.

在我的Web可用性2004项目中,我们收集了53个不同方面的网站用户行为的指标。因此,有1378个可能的相关性,我可以扔到漏斗。Even if we didn't discover anything at all in the study,about 69 correlations would emerge as "statistically significant."

显然,I'm not going to stoop to correlation hunting;我只报告与基于对潜在现象理解的合理假设相关的统计数据。(事实上,统计程序分析假设研究者预先指定了假设;如果你追寻事后输出的“意义”,你在滥用软件。)

忽略协变

即使相关性代表一个真实的现象,如果实际行动涉及到第三个变量与你正在学习的两个变量有关.

For example,studies show that intelligence declines by birth order.In other words,一个第一胎的人平均比第二胎的人有更高的智商。Third-,第四,第五出生的孩子等的平均智商逐渐降低。This data seems to present a clear warning to prospective parents: Don't have too many kids,or they'll come out increasingly stupid.Not so.

There's a游戏中隐藏的第三个变量: smarter parents tend to have fewer children.When you want to measure the average IQ of first-born children,you sample the offspring of all parents,不管他们有多少孩子。But when you measure the average IQ of fifth-born children,you're obviously sampling only the offspring of parents who have 5 or more kids.There will thus be a bigger percentage of low-IQ children in the latter sample,giving us the true — but misleading — conclusion that fifth-born children have lower average IQs than first-born children.任何一对夫妻都可以随心所欲地生孩子,而且他们的孩子不太可能比他们的大孩子聪明。When you measure intelligence based on a random sample from the available pool of children,however,你忽视了父母,who are the true cause of the observed data.

(更新补充2007年:最新研究表明,在校正家庭规模和父母的经济和教育状况后,初生子女的智商实际上可能有微小优势。但关键是你必须修正这些协变。and when you do so,the IQ difference is much less than plain averages may lead you to believe.)

As a web example,您可能会发现,较长的链接文本与用户成功呈正相关。这并不意味着你应该写长链接。网站设计师是隐藏manbetx官方网站手机版在这里的协变者:笨拙的设计师倾向于使用诸如“更多”、“点击这里”之类的短文本链接,并用虚构的词。相反地,注重可用性的设计师倾向于用以用户为中心的manbetx官方网站手机版语言解释可用的选项,emphasizing text and other content-rich manbetx官方网站手机版design elements over more vaporous elements such as "微笑的女士“许多设计师的链接可能确实有更高的manbetx官方网站手机版字数,但这并不是设计成功的原因。manbetx官方网站手机版添加单词不会使糟糕的设计变得更好;manbetx官方网站手机版it'll simply make it more verbose.

Over-Simplified Analysis

为了得到好的统计数据,你必须严格控制实验条件——通常是如此严格以至于研究结果不能概括为现实世界中的实际问题。

This is a common problem for大学研究,where the test subjects tend to be undergraduate students rather than mainstream users.也,而不是用大量复杂的上下文测试真实的网站,many academic studies test scaled-back manbetx官方网站手机版designs with a small page count and simplified content.

For example,进行一项研究很容易面包屑没有用:只需给用户定向的任务,这些任务要求他们沿直线到达所需的目的地并停在那里。这样的用户将(正确地)忽略任何breadcrumb跟踪。Breadcrumbs are still recommended对于许多网站,当然。它们不仅重量轻,因此不太可能干扰直接移动用户,但对于通过搜索引擎和直接链接深入网站的用户来说,它们是有用的。Breadcrumbs give these users context and help users who are doingcomparisons通过提供对更高级别信息架构的直接访问。

通用性is often neglected by narrow research that doesn't consider,例如,复习行为,搜索引擎可见性,以及多用户决策。许多这样的问题对于一些高价值设计的成功至关重要,manbetx官方网站手机版such asB2B网站and企业应用程序on intranets.

失真的测量

在错误的时间帮助用户或使用错误的任务很容易影响可用性研究。事实上,you can prove virtually anything you want if you manbetx官方网站手机版design the study accordingly.这往往是“赞助”研究背后的一个因素,该研究旨在证明一个供应商的产品比竞争对手的产品更容易使用。

即使实验者不是骗子,很容易被方法论上的弱点蒙蔽,such as directing the users' attention to specific details on the screen.实际上,您询问的是一些设计元素而不是其他元素,这会让用户更加注意到它们,从而改变它们的行manbetx官方网站手机版为。

一项关于网络广告的研究试图避免这种错误,只是换了一个。The experimenters didn't overtly ask users to comment on the ads.Instead,他们要求用户对一组网页的总体设计进行简单的评论。manbetx官方网站手机版测试结束后,the experimenters measured users' awareness of various brands,resulting in high scores for companies that ran banners on the web pages in the study.

这项研究是否证明横幅广告对品牌有作用?even though they don't work for getting qualified sales leads?不。记住,用户被要求对页面设计发表评论。manbetx官方网站手机版These instructions obviously made users look around the page much more thoroughly than they would have during normal web use.In particular,判断设计的人通常检查页面上的所有单个设计元素,manbetx官方网站手机版including the ads.

Many web advertising studies are misleading,可能是因为这些研究大多来自广告公司。最常见的扭曲是新奇效应:每当引入新的广告格式时,it's always accompanied by a study showing that the new type of ad generates more user clicks.Sure,这是因为新的格式享受temporary优点:它之所以能引起用户的注意,仅仅是因为它是新的,而且用户还没有训练自己去忽略它。The study might be genuine as far as it goes,但是,一旦新奇感消失,新的广告格式的长期优势就一文不值了。

出版偏见

Editors follow the "man bites dog" principle to highlight new and interesting stories.This is true for both scientific journals and popular magazines.虽然可以理解,this preference for new and different findings imposes a significant bias in the results that get exposure.

可用性是一个非常稳定的领域.User behavior is pretty much the same year after year.我一次又一次地在研究中发现同样的结果,和其他许多人一样。偶尔,一个虚假的结果出现了,出版偏见确保它得到了比它应得的更多的关注。

Consider the question of web page download time.每个人都知道越快越好。交互设计理论记录了manbetx官方网站手机版importance of response times自1968以来,自1995年以来,在无数的网络研究中,这种重要性已经被经验所证实。加快响应时间的电子商务网站销售更多.这一天您的服务器是缓慢的,你失去了交通。(这发生在我身上:1月14日,2004,TOG有“斜线虚线”;because we share a server,当Asktog增加的流量减慢了useit.com的速度时,我的网站在一个星期三失去了10%的正常页面浏览量。)

如果20个人学习下载时间,19 will conclude that faster is better.But again: 1 of every 20 statistical analyses will give the wrong result,and this 1 study might be widely discussed simply because it's new.The 19 correct studies,in contrast,might easily escape mention.

判断奇怪的结果

奇怪的结果有时得到看似令人信服的数字的支持。You can use the issues I've raised here as asanity check这项研究是否从帽子里提取了相关性?Was it biased or overly narrow?它是纯粹因为与众不同才被提升的吗?Or was it just a fluke?

通常情况下,你会发现不正常的发现应该被忽略。在交互系统中,人类行为的广义概念是稳定的、易于理解的。

例外情况通常是这样的:例外。

Of course,sometimes a strange finding turns out to be revolutionary rather than illusory.This is rare,但它确实发生了。The key differentiators are whether the finding is repeatable and whether others can see it now that they know where to look.

1989,例如,我发表了一篇关于折扣可用性工程的论文,说这么小,fast user studies are superior to larger studies,and thattesting with about 5 usersis typically sufficient per manbetx官方网站手机版design iteration.This was quite contrary to the prevailing wisdom at the time,which was dominated by big-budget testing.15年以来我最初的要求,其他几个研究人员也得出了类似的结论,我们开发了一个数学模型来证实我的经验观察背后的理论。今天,almost everyone who does user testing has concluded that they learn他们从5个用户那里学到的大部分知识.

作为另一个例子,my conclusion thatPDF documents are bad for online information access得到了4项不同研究的支持。我们在最新的研究中发现了同样的问题,所以这个结论也会持续几年。我开始对反对在线PDF很犹豫,because it works so well in other contexts (most notably,正在下载要打印的文档,这就是它的设计目的)。manbetx官方网站手机版随着证据不断增多,however,很明显,在线PDF的结论与打印PDF的结论非常不同。

你可能会忽略一项研究,该研究得出的结论是,原本好的PDF文件实际上在网上不好。但是4或5项研究构成trend,这大大提高了该发现作为一种普遍现象的可信度。

Quantitative Studies: Intrinsic Risks

我列出的定量研究的所有原因都是误导性的指出错误的研究;这是可能的良好的定量研究,并从测量中获得有效的见解.但是这样做是昂贵和困难的。

定量研究必须在每一个细节上都准确无误,否则数字将具有欺骗性。There are so many pitfalls that you're likely to land in one of them and get into trouble.

如果你依靠的是没有洞察力的数字,如果出了问题,你就没有后援了。你会走错路,因为这就是数字的方向。

定性研究are less brittle and thus less likely to break under the strain of a few methodological weaknesses.Even if your study isn't perfect in every last detail,you'll still get mostly good results from a qualitative method that relies on understanding users and their observed behavior.

对,专家比初学者从定性研究中获得更好的结果。But forquantitativestudies,只有最好的专家才能得到任何有效的结果,只有当他们非常小心的时候。