Much of the implicit value proposition for using intelligent, conversational assistants is the use of natural language. But language is an essentially social interface — it evolved to help us communicate with other peopleHow do the social aspects of language shape our interactions with language-based user interfaces? This is one of the questions that emerged in our research of conversational assistants.
为了更好地了解这些助手今天所面临的挑战以及他们帮助用户的位置，我们开展了两项可用性研究（一项在纽约市，一项在旧金山湾区）共有17名参与者 - 其中5名在纽约，12名在加利福尼亚 - 经常使用至少一名主要智能代理（Alexa，Google Assistant和Siri），他们被邀请参加实验室进行个别会议Each session consisted of a combination of usability testing (in which participants completed facilitator-assigned tasks using Alexa, Google Assistant, or Siri) and an interview.
语言的社会性使人们将拟人化的品质投射到计算机上Most of our participants referred to the assistant using a gendered pronoun (“she” or “he” when they had selected a male voice). Some also inserted politeness markers such as “please” or “thank you”, or started questions with “Do you think…”, “Can you…”
在谈论助手时，他们经常使用人类隐喻例如，一位用户说“她的思绪空白She’s like ‘What does this lady want from me? Leave me alone!’” when Alexa beeped, presumably to signal lack of understanding.
Another participant recounted, “I swear at her [Siri] when she doesn’t understand, and then she says something funny back — and we have fun.” And yet another rationalized inappropriate answers by saying, “These are [complex] questions that I normally would not ask, so I guess it takes more thinking所以它不能只是[立即]回答这样的事情The questions I ask usually aren’t that deep; they don’t take that much thought."
人们认为助手不善于发现情绪，也不会从语调或词语选择中理解他们的挫败感They had mixed attitudes when it came to using slang or idiomatic language: some felt that slang might not be understood by the assistant and purposefully avoided it; other said that they had tried it and that the assistant worked just fine.
用户没想到代理商会选择精细的意义区分For example, a user who asked “How much is a one-bedroom apartment in Mountain View?” commented that her question was really too vague for Alexa , because, to her, the word “apartment” implies “rent” — if she had been interested in a sale price, she would have used the word “condo.” However, she did not expect Alexa to pick on that difference.
我们通常没有在公共场所与真人交谈然而，这种行为并不适用于智能助理我们研究中的用户表示，只有在家或自己使用语音助手时才会强烈倾向于使用语音助理大多数人表示他们不会在公共环境中与Siri或Google Now等基于手机的代理进行互动Some, however, where willing to ask for directions while walking.
一位参与者明确表达了这一点，并指出“当我在公共场合时，我通常不会使用[Siri] - 我只是觉得它看起来有点尴尬它也是感觉在公共场合使用她很尴尬我也没有看到其他人在公共场合使用它。“
Whereas people usually reported using the assistants for simple, black-and-white queries, often in situations when their hands were busy, another common use was entertainment: many told us that at least occasionally they or someone in their family (usually their child) enjoyed hearing a joke or playing a game with the assistant一些家长报告称使用Alexa作为一种娱乐他们的孩子的方式，并使他们远离基于屏幕的设备，如平板电脑或智能手机。
When asking fun, silly questions (for example, about the agent’s preferences, such as a favorite food), users in our study understood that they weren’t getting authentic artificial-intelligence responses, but only a series of preprogrammed jokes written by the engineering team.
- 那些使用与人类相同的语言结构的人These participants were often polite in their query phrasing; they ended their interactions with “Thank you!,” and often formulated questions that started with “Please…”, “Do you think…”, “Can you tell me…”. These participants usually had a positive attitude towards the agents.
- 那些试图提高语言效率以增加理解机会的人。In this category, we saw a continuum of behaviors — from participants who changed sentence word order to make sure the query started with a keyword, to instances where they eliminated articles such as “a” or “the”, and, finally, to examples where participants simply used the assistant as a voice interface for a search engine and compressed their queries to a few keywords devoid of grammatical structure (such as “OK Google, events in London last week of July”).
For example, one participant noted that a query like “Restaurants near Storm King” might retrieve different results than “Storm King Restaurants”, but “a person would get what I meant either way.”
一些参与者保持他们的查询简短，并由几个关键字组成One said, “I don’t speak to it as if it were a real person — unless I have one of 5W1H (Who, What, Where, When, Why, How) questions.[..] I would speak to a person like this only if they were ESL [English-as-a-second-language speaker] and I knew they may be overwhelmed by too many words in a sentence.”
While some users did refer to the assistant with gendered pronouns, most users did not use pronouns such as “it” or “this” in their actual queries — they preferred to refer to the object of their query explicitly, even if the name was long and complicated人们没想到助手能够理解“this”或“it”这样的代词可能会引用什么，特别是当代词的前因是先前查询的一部分时(This is otherwise one of the key advantages of true natural-language recognition.) Although assistants are getting better at follow-up queries (and Google recently announced a “conversation” feature for its Google Assistant), most participants had learned to not expect them to preserve context from one query to another一位与会者说，“[Siri]通常不会保存这样的东西[例如短信草稿]当她完成某件事后，她就完成了它。“
即使人们使用助理语言并将类似人类的品质投射到他们身上，他们对代理人可以完成的任务有明确的期望。Some said that the assistant was like a young child who could not understand complicated things; others compared it with an old person who did not hear very well有人指出，你不能“说话太久，因为[助理]会分心，”而另一位参与者表示查询应该短于10个字但是很多人说助手甚至比人类更好，因为它“知道一切”并且是客观的 - 没有情感或感情。
复杂的问题被认为不太可能取得好的结果，复杂性往往与问题或任务是否需要先分解成多个部分密切相关一位用户注意到她过去运气不佳，要求Siri和Alexa知道阿黛尔在纽约的下一场音乐会是什么时候另一位用户指出，“你可以像对待孩子一样对话No complex questions; weather, math, sports, calendar, internet [will work]; it won’t do your taxes and might not even be able to order you pizza.” (Note that the “child” analogy is yet another example of anthropomorphizing the assistant.) And yet another participant said that the most complicated thing one could ask an assistant was a place-based reminder.
However, there was a sense that even complex questions could be answered by the assistants if one “learned how to ask the question.” One participant compared the assistant with a file system with many files: the trick is to find the right questions with which to access the information in this database.
还有一个人抱怨她不想考虑如何提出正确的问题许多参与者提到效率是一个重要的考虑因素，他们是否可能会费心去问一个问题 - 如果更快“自己做”，那么他们觉得与助手的互动不值得花费这种措辞是我们参与者心理模型的关键指标 - 并反映了使用助理的信念应该很容易，而不是需要广泛的互动。
例如，一些参与者描述了如何使用助手设置计时器比使用其他设备更快，而使用Waze等应用程序或在计算机上使用Google地图可以更快地计算周末到蒙托克的流量这一决定是基于平衡的成本效益分析预期 interaction cost完成任务本身与他们的期望of whether the assistant would be able to do the task.
Interpretation, Judgement, and Opinion Weren’t Trusted
该研究中的所有用户都指出，他们对使用代理商提供基于意见的信息（如建议）不感兴趣任何涉及判断或意见的任务都被我们的参与者怀疑“我永远不会问她”这是对涉及个人或高度主观信息的任务的常见回应，比如想知道应该在布拉格度过多长时间度假One user said, “I would look up a forum about Prague for locals; I don’t want to do the tourist things that everybody does.”
However, it was not only subjective information that participants scoffed at: one user thought it unlikely that Alexa would be able to tell him who the star player was in the previous night’s Boston Celtics game because that involved interpretation; this participant formulated his query to Alexa initially as “Who scored the most points in the last Celtics game”, and, when Alexa failed to answer his question, he changed it to “Provide me player statistics for the most recent Celtics game.” (Both questions with an objectively true answer, rather than questions of judgment.)
一位与会者注意到智能代理人与人类之间的另一个重要区别：如果你给他们一个模棱两可的请求，语音助理就不会要求你澄清问题 Humans will typically respond to an ambiguous statement with followup questions, but intelligent assistants attempt to carry out the request without getting additional information.
我们研究的参与者经常注意到某些简单的任务非常适合语音助手Questions that were considered to work well were typically fact-based, such as checking the weather, finding out a celebrity’s age, getting directions to a destination with a known name, and reviewing sports scores.
Some described the assistants as “just” doing an internet search and acting as a voice interface to Google. Others mentioned a list of preprogrammed things that agents could respond to, and anything outside of that preprogrammed menu of choices would not work and would default to a search有些人形容这样一种信念，即这些会话助理并非真正“聪明”，甚至不是真正的人工智能形式，可以有意义地理解查询，也可以创造性地解决新问题。
一位参与者甚至说“停止称之为AI（人工智能）”，直到它真实为止它能做的事情并不复杂这不是学习人们根据“人工智能”这个词形成期望，但这会产生误导无人驾驶汽车 - 那是人工智能那更接近了。“
在谈到Alexa时，另一个人描述了类似的信念：“两者之间存在差异会心你说的是什么理解what you’re saying. You might say, ‘a tomato is a fruit’ and it wouldn’t know not to put it in a fruit saladI don’t think she necessarily understands; maybe one day computers will get there, where they can teach or learn. Right now, I think it’s more like a trigger word:这个用词这个单词，结合这个字给出了这个答案如果你服用了这个说出来并将其与之相结合这个 其他一句话，它会给这个回答。”
One of the most concerns expressed in our study was that the conversational assistants were always listening (and transmitting audio to the cloud). Several users expressed strong skepticism (or outright distrust) to the claim that agents are only listening when triggered by their keyword. During the interview portion of the study, some participants reported seeing advertisements for things that they normally never shopped for after mentioning them in a conversation near their assistantA few said they had even engaged in informal tests of this hypothesis: they had mentioned an invented new hobby near their smart speaker or phone, and then saw advertisements for related products soon thereafter.
Some users also believed that the agents were recording and transmitting full audio files (rather than some sort of abstracted data version of what they said) to the cloud for interpretation一位用户非常惊讶Google智能助理能够在手机没有连接到互联网的情况下听写（并做得非常出色）。
Participants reported some discomfort with using agents when an error or a misunderstanding could have consequences; the most common examples were making incorrect purchases or using the assistant for work.
One user related how he used a voice assistant to dictate a work email while walking home from the subway, and froze in panic when he noticed later on, while proofreading his email, that the agent had replaced something he said with an inappropriate word. As he explained it, “it was like my phone had turned into a live grenade in my hand — I now had to defuse it very carefully.”
另一位用户提到他拥有一台智能空调，但不会将它与Alexa一起使用，因为他不相信它没有错误并在家中保持适当的温度他提到他有宠物，并担心，如果它不能正常工作，它可能会让他们在高温下窒息，“我关心我的动物 - 我不相信Alexa。”
而科幻电影和电视来自2001年to她的provide us a wealth of examples of people comfortably using voice interactions with their computers for a range of complex interactions (especially tasks that involve interpretation, judgement, or opinion), the participants in our study were hesitant to even try more complicated tasks and did not trust that the assistants would perform well with such queries.
随着这些系统提高他们的能力，一个巨大的挑战将是修改用户的现有心理模型所以他们可以包括一些这些能力Users’ mental models are typically much more stable than the feature set of a product that’s being constantly updated, so a catch-22 emerges: users don’t know that the systems can handle more complex queries than before, and so they don’t use them for those types of tasks, which then reduces the amount of training data available to improve the assistants.