One of the biggest challenges in website and intranet design is creating theinformation architecture:什么是哪里?经典错误是根据如何构建信息空间view the content— which often results in different subsites for each of your company's departments or information providers.

Rather than simply mirroring your org chart, you can better enhance usability by creating an information architecture that reflects how用户view the content. In each of ourintranet studies那we've found that some of the biggest生产力提升发生公司重组他们的内联网to reflect employees' workflow。和在ecommerce那sales increase when products appear in the categories where users expect to find them.

All very good, but你怎么发现用户对信息空间的看法以及他们认为每个项目应该去的地方?为了researching this type of mental model那the primary method is card sorting:

  1. Write the name (and perhaps a short description) of each of the main items on an index card. Yes, good old paper cards. (Taking care not to use偏见的术语用户。)
  2. 将卡片洗牌并将甲板送到用户。(标准招聘测试参与者的建议apply: they must be representative users, etc.)
  3. Ask each user to sort the cards into piles, placing items that belong together in the same pile. Users can make as many or as few piles as they want; some piles can be big, others small.
  4. 可选的额外步骤包括询问用户将生成的桩排列为更大的群组,并命名不同的组和桩。后一步可以为您提供用于导航标签,链接,头条新闻和搜索引擎优化的单词和同义词的想法。

因为卡排序使用没有技术,所以这photo of a 1995 card sortlooks the same as one conducted today.


Fidelity Investments has one of the world's best usability teams, led by Dr. Thomas S. Tullis, senior VP of human interface design. Tullis and co-author Larry Wood recently reported the results of a study measuring the trade-off curve for testing various numbers of users in a card sorting exercise.

First, they tested 168 users, generating very solid results. They then simulated the outcome of running card sorting studies with smaller user groups by analyzing random subsets of the total dataset. For example, to see what a test of 20 users would generate, they selected 20 users randomly from the total set of 168 and analyzed only that subgroup's card sorting data. By selecting many such samples, it was possible to estimate the average findings from testing different numbers of users.

卡分拣研究的主要定量数据是一组similarity scores这可以测量各种项目对的用户评级的相似性。如果所有用户将两张卡分为同一堆,那么卡片表示的两个项目将具有100%相似性。如果用户将两张牌放在一起并将其放在单独的桩中,那么这两个项目将具有50%的相似度分数。

We can assess the outcome of a smaller card sorting study by asking how well its similarity scores correlate with the scores derived from testing a large user group. (A reminder: correlations run from -1 to +1. A correlation of 1 shows that the two datasets are perfectly aligned; 0 indicates no relationship; and negative correlations indicate datasets that are opposites of each other.)

How Many Users?

对于大多数可用性研究,我推荐testing 5 users,因为这是足够的数据,以教你大多数你在考试中学的内容。然而,对于卡分类,5个用户的结果与最终结果之间只有0.75个相关性。这还不够好。

You must test 15 users to reach a correlation of 0.90, which is a more comfortable place to stop. After 15 users, diminishing returns set in and correlations increase very little: testing 30 people gives a correlation of 0.95 — certainly better, but usually not worth twice the money. There are hardly any improvements from going beyond 30 users: you have to test 60 people to reach 0.98, and doing so is definitely wasteful.


Why do I recommend testing fewer users? I think that correlations of 0.90 (for 15 users) or maybe 0.93 (for 20) are good enough for most practical purposes. I can certainly see testing 30 people and reaching 0.95 if you have a big, well-funded project with a lot of money at stake (say, an intranet for 100,000 employees or an ecommerce site with half a billion dollars in revenues). But most projects have very limited resources for user research; the remaining 15 users are better "spent" on 3 qualitative usability tests of different design iterations.


I don't recommend designing an information architecture based purely on a card sort's numeric similarity scores。When deciding specifics of what goes where, you should rely just as much on the qualitative insights you gain in the testing sessions. Much of the value from card sorting comes from听取用户的评论as they sort the cards: knowingwhypeople place certain cards together gives deeper insight into their mental models than the pure fact that they sorted cards into the same pile.


我们知道5足以让大多数用户的可用性studies, so why do we need three times as many participants to reach the same level of insight with card sorting? Because the methods differ in two key ways:

  • 用户测试是一个评估方法:we already have a design, and we're trying to find out whether or not it's a good match with human nature and user needs. Although people differ substantially in their capabilities (domain knowledge, intelligence, and computer skills), if a certain design element causes difficulties, we'll see so after testing a few users. A low-end user might experience more severe difficulties than a high-end user, but the magnitude of the difficulties is not at issue unless you are running a measurement study (which requires more users).All you need to know is that the design element doesn't workfor humans and should be changed.
  • Card sorting is a生成方法:我们还没有设计,我们的目标是了解人们如何考虑某些问题。不同人的心理模型和他们用来描述相同概念的词汇具有很大的变化。我们必须在达到稳定的照片之前从公平的用户收集数据用户的首选结构并确定如何适应差异在用户中。

如果您有现有网站或内联网,则测试一些用户会告诉您人们是否对信息架构有问题。To generate a new structure from scratch, you must sample more people.

幸运的是,你可以结合两种方法:First, use generative studies to set the direction for your design. Second, draft up a design, preferably using纸质原型设计那and run evaluation studies to refine the design. Because usability evaluations are fast and cheap, you can afford multiple rounds; they also provide quality assurance for your initial generative findings. This is why you shouldn't waste resources squeezing the last 0.02 points of correlation out of your card sorts. You'll catch any small mistakes in subsequent user testing, which will be much cheaper than doubling or tripling the size of your card sorting studies.

Study Weaknesses

The Fidelity study has two obvious weaknesses:

  • It's only one study. It's always better to have data from multiple companies.
  • The analysis was purely quantitative, focusing on a statistical analysis of similarity scores and ignoring user comments and other qualitative data.


尽管更多的数据是令人欣慰的,我有confidence in the Fidelity study's conclusions because they match my own observations from numerous card studies over many years. I've always said that it was necessary to test more users for card sorting than for traditional usability studies. And I've usually recommended about 15 users, though we've also had good results with as few as 12 when budgets were tight or users were particularly hard to recruit.

There are a myriad ways in whichquantitative studies can go wrongand mislead you. Thus, if you see a single quantitative study that contradicts all that's known from qualitative studies, it's prudent to disregard the new study and assume that it's likely to be bogus. But when a quantitative study confirms what's already known, it's likely to be correct, and you can use the new numbers as decent estimates, even if they're based on less data than you would ideally like.

Thus, the current recommendation is to测试15个用户进行卡排序in most projects and 30 users in big projects with lavish funding.


Tullis, Tom, and Wood, Larry. (2004)How Many Users Are Enough for a Card-Sorting Study?,Usability Professionals Association (UPA) 2004 Conference, Minneapolis, MN, June 7–11, 2004.