任何UX-Research研究旨在回答关于我们的设计或关于我们用户的一般性问题。我们的用户人口的百分比将能够订阅我们的时事通讯?人们在我们的网站上遇到什么主要的可用性问题?设计比我们的目标受众更具可用性吗?但是,随时我们建立了一个ux研究的研究,是否量化或定性,有危险的危险是,它不会反映我们想要捕获的现实,因为该研究设计不佳。

有两种大类的学习设计错误:

  1. Internal-validity偏见参与者对某种反应或行为的错误
  2. 外部有效性errors that capture behaviors or situations which are not characteristic for our target audience

We’ll talk about each of these separately. But before we do, let’s note thatvalidity is separate from可靠性。一项研究的可靠性只意味着如果重复该研究,您将获得相同的结果。换句话说,发现不是随机的。有大量的统计方法可以计算学习可靠性程度,以及提高可靠性的主要方式是测试更多参与者。但可靠性is no good without validity:具有高可靠性和低有效性的研究是一个你真正衡量错误的东西的研究。

Internal Validity for UX Studies

认为研究的比较两个网站,网站and site B. You are trying to decide which of the two is better and you always show the participants in your study design A first, ask them to complete some tasks on it, then move to design B and show them the same tasks. Is this study design likely to produce accurate results, that reflect the reality? In other words, will this study identify the better design?

Not necessarily. This study setup favors design B because, when they get to it, participants will be already used to the testing situation and with the task domain — if they’re testing car-rental sites, they will already know what a LDW (loss-damage waver) is when they get to site B and they may have certain expectations regarding the steps of the rental process. They will also know what you expect them to do and how they’re supposed to perform the task. Therefore, this study is missing internal validity. (The usual fix to this problem is to alternate which site goes first, and have half of the users try site B first.)

定义:一项研究有内部有效性如果它不赞成或鼓励任何特定的参与者的响应或行为。

内部有效性是定性和定量研究的问题。伴随着质的定性研究,促进者可能无意中bias or eliciting a certain responsefrom the participants. For example, even a simple questions such as “Have you found the checkout difficult?” may invalidate the study results because the participants are灌输要想到困难,所以他们可以识别出多于正常(就像Richard Nixon的“我不是骗子”声明)。

通过定量研究,缺乏内部有效性可能会产生偏斜的结果,但不反映现实。例如,您可以在一个基准测试研究,发现您的时间在任务上的时间更好地在网站上的重新设计版本而不是原始版本,并且您可以推断您与重新设计做得很好,事实上,差异是由于不同的研究协议 - 原始测试使用了思考 - 大声协议那but the test of the redesign didn’t. (And thinking aloud does take some extra time, so it can cause longer task times.)

In this example, the protocol is an example of aconfounding variable- 一个可以影响您学习结果的隐藏变量,但在您设计的研究时,您没有考虑到。

External Validity

外部有效性是关于您的学习的自然主义者。

如果您正在为普通人群设计老年人和招聘学习参与者,那将有效吗?它会告诉你关于你真正的观众的内容吗?可能不是,因为年轻的参与者可能表现得比旧的参与者不同。或者,如果您在桌面上测试移动设计,您的发现会推广在野外使用设计吗?也许是的,也许不是 - 肯定是不可能的(除非你做另一个学习)。在这两种情况下,研究缺少外部有效性。

定义:一项研究有external validityif the participants and the study setup are representative for the real-world situation in which the design is used.

The concept of external validity also applies to both qualitative and quantitative studies — for obvious reasons.

学习设计的建议

Here are some recommendations to help you set up studies that are both internally and externally valid.

内部有效性

随机化对于确保内部有效性至关重要。

  1. 使用任务随机排序。

任务订单可以偏见任务响应。在一项研究开始时,人们通常是学习环境的新增功能,以及他们正在测试的系统。它们需要更长时间才能在会话中执行第一个任务,并且可能比正常的错误更好。另一方面,会话结束时显示的任务可能会看到参与者疲劳的效果。

这就是为什么我们强烈建议在任何测试中,无论是定性还是定量,您都尽可能多地随机化任务的顺序。(但是,有时,在此建议之后可能并不完全可行 - 例如,如果任务是登录andDeposit check,可能是不可能的Deposit check跟随登录).

Additionally, to mitigate the learning phase at the beginning of every session, we recommend that you prepare 1–2warmup tasks(psychologists call thempractice trials) that are irrelevant for your study and that are meant to get participants familiar and comfortable with the study environment and the study procedure. I like to pick easy tasks that bolster participants’ confidence and make them feel relaxed. But, if you do use warmup tasks, make sure that you do not include them in your analysis.

  1. If your study contrasts two or more conditions (e.g., you want to compare your site with a competitor site) and each participant will be exposed to all conditions (i.e.,within-subject design),您应该抵消或随机化每个参与者接触到这些条件的顺序(例如,他们看到您的网站和竞争对手的顺序)。

这个建议是与前一个有关— randomizing the task order. However, if you’re testing, say, 2 ecommerce sites, sometimes it may be unrealistic or unfeasible to ask the participant to shop on site one, then add an item to a wishlist on site 2, then go back to site 1 and subscribe to the newsletter, then shop on site 2 — this would be a detrimental and possibly confusing setup, if you want, for instance, to collect post-test questionnaires such asandNPS对于会话结束时的两个设计。

In that situation, we recommend that you group all the tasks for design 1 together and all the tasks for design 2 together. You should, however, randomize the order in which participants see the two designs — with some participants seeing design 1 first and others seeing design 2 first. And, within each design itself, the order of tasks should be randomized.

  1. 控制研究设置从一个会话到下一个会话,寻找混淆变量 -hidden factors that could affect your results.

For example, assume a researcher is interested in comparing two sites and uses a between-subject design. She decides to study site A with the participants in the morning sessions and site B with those participants coming for afternoon sessions. If she ends up finding that participants perform better on, say, site A, it could be because site A is better, or it could be because people are less tired in the morning.

Similarly, if a colleague helps you facilitate a study and you divide the sites — you take the sessions with site A and she takes site B, the facilitator is a hidden variable. It could be that one facilitator’s style is more biasing than the other or that one facilitator is a naturally a more pleasant person and participants feel more talkative and relaxed with her.

Thus, if you know that there will be any factors that will need to vary from one session to the next, ensure that they vary for all the conditions in your study.

当您为您的组织组成基准计划时,仔细规划内部有效性至关重要。您必须仔细记录您的学习条件(任务措辞,学习协议,无论是习惯的,等等),以便他们可以在进一步的研究中复制,以便随着时间的推移确定设计改进。否则,系统和先前安装的当前版本之间的差异可能只是由于研究设置而不是可用性改进。

External validity

  1. 招募代表您的目标受众的参与者— both in terms of demographics and user goals.

In general, researchers are very careful with creating筛选者与他们的人口的确切人口统计数据相匹配,但这可能不足以确保外部有效性。可能是您的参与者处于合适的人口统计学,但与您的用户有不同的目标(或者他们根本不够动力)。始终努力找到可能与您的用户具有相同目标的参与者。

  1. 以您的能力,自然形势复制其中参与者将使用他们测试的UI。

Are your participants supposed to use your car-repair mobile application in their garage? Then don’t have them test it in a conference room. The environment — light, dirty hands, place where the phone is positioned, time available, tools available — are all likely to play a role in how usable this app is.

However, sometimes it may be impossible for a study to be externally valid.

外部有效性总是可能吗?

在某种意义上,任何研究都会缺乏外部有效性 - 我们很少使用与陌生人的界面观看在我们的肩膀上,坐在桌子或实验室。(在某种程度上,人们甚至可能争辩说remote studies比在人的自然环境中更加有效,因为至少参与者可能在他们的自然环境中。)我们还知道参与者倾向于略有不同 - 更符合且更持久的 - 在可用性测试情况之外。

此外,有时,在自然环境中测试设计可能太成本了。例如,我们是伟大的倡导者paper prototyping,但这些类型的测试将始终缺乏外部有效性。那么,我们该怎么办?

In these situations, some testing is better than no testing. With paper prototyping, it may be that your results are not externally valid and you will have to retest later on in naturalistic conditions. But the goal of paper prototyping is to identify any big hurdles so that you won’t spend money implementing something that is completely off. So, run a paper-prototyping study, identify the big issues, fix them, then move forward to a高保真原型that you could test in naturalistic conditions, on the device that participants will use to complete the task.

缺乏外部有效性的另一个常见情况是mobile testing- 大多数参与者不会使用移动设计不间断,坐在桌子上,并连接到WiFi。但是,它可以是可以接受的,以便在该设置中测试,以确定即使在大连接的最佳情况下也会遇到的问题,也不会中断。这些可能是许多移动站点需要解决的第一个问题 - 如果该网站即使在理想条件下存在问题,则设计需要修复。一旦您拨出这些问题,您仍然需要在更现实的条件下重新测试。

同样,一些定量研究专业人员建议只有在某些定量研究中仅包括专家参与者,以减少可变性(缺乏可变性转化为研究结果的较低误差幅度,并且可以允许研究人员减少参与者的数量)。专家用户将为您提供最好的情况,只要您不认为结果将概括为您的所有用户,您应该没问题。

一般来说,如果你发现自己被迫牺牲一些外部有效性,你总是在语境中解释你的发现是至关重要的,并意识到他们如果要在现实条件下复制这项研究,他们可能不会稳定。

结论

Poorly planned research will translate in results that are invalid. You may have potentially wasted time and money on running a study which doesn’t tell you anything about your product or your audience. Pay attention to your study’s internal and external validity — strive to recruit participants that are representative of your target audience and make sure that the study setup replicates how your users will use the system in real life and that it does not encourage any one behavior or response.