菜单

Intelligent Assistants: Creepy, Childish, or a Tool? Users’ Attitudes Toward Alexa, Google Assistant, and Siri

通过 Page LaubheimerRaluca Budiu2018年8月5日

摘要:用户认为助手对复杂任务的能力较低,并发现他们在社交方面难以与之互动。


介绍

虽然类似于星际迷航般的未来与普通计算机进行口头互动尚未到来,但近年来出现了许多进展。智能助理,如Alexa,Siri和Google智能助理,正变得越来越普遍。

Much of the implicit value proposition for using intelligent, conversational assistants is the use of natural language.  But language is an essentially social interface — it evolved to help us communicate with other peopleHow do the social aspects of language shape our interactions with language-based user interfaces? This is one of the questions that emerged in our research of conversational assistants.

为了更好地了解这些助手今天所面临的挑战以及他们帮助用户的位置,我们开展了两项可用性研究(一项在纽约市,一项在旧金山湾区)共有17名参与者 - 其中5名在纽约,12名在加利福尼亚 - 经常使用至少一名主要智能代理(Alexa,Google Assistant和Siri),他们被邀请参加实验室进行个别会议Each session consisted of a combination of usability testing (in which participants completed facilitator-assigned tasks using Alexa, Google Assistant, or Siri) and an interview. 

本文总结了我们关于用户感知,心智模型和使用这些代理的社会维度的主要发现,而另一篇文章讨论了与可用性和互动

我们看到用户高度认识到他们所谓的智能助手并非完全智能化虽然人们不一定完全正确地理解助手的局限性,但他们的观点从将助手的思维看起来有点令人毛骨悚然或幼稚,或者只是将其视为替代计算机工具当用户信任计算机智能助理时,我们远远没有潜在的未来状态,因为他们会信任一位优秀的人类行政助理。

拟人化的品质

语言的社会性使人们将拟人化的品质投射到计算机上Most of our participants referred to the assistant using a gendered pronoun (“she” or “he” when they had selected a male voice).  Some also inserted politeness markers such as “please” or “thank you”, or started questions with “Do you think…”, “Can you…”

在谈论助手时,他们经常使用人类隐喻例如,一位用户说“她的思绪空白She’s like ‘What does this lady want from me? Leave me alone!’” when Alexa beeped, presumably to signal lack of understanding.

Another participant recounted, “I swear at her [Siri] when she doesn’t understand, and then she says something funny back — and we have fun.”  And yet another rationalized inappropriate answers by saying, “These are [complex] questions that I normally would not ask, so I guess it takes more thinking所以它不能只是[立即]回答这样的事情The questions I ask usually aren’t that deep; they don’t take that much thought."

我们的研究参与者意识到这种拟人化现象,许多人对此表示嘲笑或评论一位Google智能助理用户故意试图将助理称为“它”,说“我只想要答案我不想和一个人说话[所以我不想说'OK Google']人工智能的东西已经让我感到很沮丧”

人们认为助手不善于发现情绪,也不会从语调或词语选择中理解他们的挫败感They had mixed attitudes when it came to using slang or idiomatic language: some felt that slang might not be understood by the assistant and purposefully avoided it; other said that they had tried it and that the assistant worked just fine.

用户没想到代理商会选择精细的意义区分For example, a user who asked “How much is a one-bedroom apartment in Mountain View?” commented that her question was really too vague for Alexa , because, to her, the word “apartment” implies “rent” — if she had been interested in a sale price, she would have used the word “condo.” However, she did not expect Alexa to pick on that difference.

当人们使用助理时

我们通常没有在公共场所与真人交谈然而,这种行为并不适用于智能助理我们研究中的用户表示,只有在家或自己使用语音助手时才会强烈倾向于使用语音助理大多数人表示他们不会在公共环境中与Siri或Google Now等基于手机的代理进行互动Some, however, where willing to ask for directions while walking.

一位参与者明确表达了这一点,并指出“当我在公共场合时,我通常不会使用[Siri] - 我只是觉得它看起来有点尴尬它也是感觉在公共场合使用她很尴尬我也没有看到其他人在公共场合使用它。“

Whereas people usually reported using the assistants for simple, black-and-white queries, often in situations when their hands were busy, another common use was entertainment: many told us that at least occasionally they or someone in their family (usually their child) enjoyed hearing a joke or playing a game with the assistant一些家长报告称使用Alexa作为一种娱乐他们的孩子的方式,并使他们远离基于屏幕的设备,如平板电脑或智能手机。

When asking fun, silly questions (for example, about the agent’s preferences, such as a favorite food), users in our study understood that they weren’t getting authentic artificial-intelligence responses, but only a series of preprogrammed jokes written by the engineering team. 

与助理一起使用的语言

谈到他们与助手的谈话方式时,参与者可分为两类:

  1. 那些使用与人类相同的语言结构的人These participants were often polite in their query phrasing; they ended their interactions with “Thank you!,” and often formulated questions that started with “Please…”, “Do you think…”, “Can you tell me…”.  These participants usually had a positive attitude towards the agents.
  2. 那些试图提高语言效率以增加理解机会的人。In this category, we saw a continuum of behaviors — from participants who changed sentence word order to make sure the query started with a keyword, to instances where they eliminated articles such as “a” or “the”, and, finally, to examples where participants simply used the assistant as a voice interface for a search engine and compressed their queries to a few keywords devoid of grammatical structure (such as  “OK Google, events in London last week of July”).

For example, one participant noted that a query like “Restaurants near Storm King” might retrieve different results than “Storm King Restaurants”, but “a person would get what I meant either way.”  

一些参与者保持他们的查询简短,并由几个关键字组成One said, “I don’t speak to it as if it were a real person — unless I have one of 5W1H (Who, What, Where, When, Why, How) questions.[..] I would speak to a person like this only if they were ESL [English-as-a-second-language speaker] and I knew they may be overwhelmed by too many words in a sentence.”

While some users did refer to the assistant with gendered pronouns, most users did not use pronouns such as “it” or “this” in their actual queries — they preferred to refer to the object of their query explicitly, even if the name was long and complicated人们没想到助手能够理解“this”或“it”这样的代词可能会引用什么,特别是当代词的前因是先前查询的一部分时(This is otherwise one of the key advantages of true natural-language recognition.) Although assistants are getting better at follow-up queries (and Google recently announced a “conversation” feature for its Google Assistant), most participants had learned to not expect them to preserve context from one query to another一位与会者说,“[Siri]通常不会保存这样的东西[例如短信草稿]当她完成某件事后,她就完成了它。“

什么查询预计会起作用

即使人们使用助理语言并将类似人类的品质投射到他们身上,他们对代理人可以完成的任务有明确的期望。Some said that the assistant was like a young child who could not understand complicated things; others compared it with an old person who did not hear very well有人指出,你不能“说话太久,因为[助理]会分心,”而另一位参与者表示查询应该短于10个字但是很多人说助手甚至比人类更好,因为它“知道一切”并且是客观的 - 没有情感或感情。

(绕道而行科幻小说中未来的用户界面我们应该注意到,缺乏情感往往被描述为一个缺点,Lt星际迷航的数据作为主要展览然而,客观性和不需要担心计算机的感受当然是有利的。)

复杂和多重问题被认为是困难或不可能的

复杂的问题被认为不太可能取得好的结果,复杂性往往与问题或任务是否需要先分解成多个部分密切相关一位用户注意到她过去运气不佳,要求Siri和Alexa知道阿黛尔在纽约的下一场音乐会是什么时候另一位用户指出,“你可以像对待孩子一样对话No complex questions; weather, math, sports, calendar, internet [will work]; it won’t do your taxes and might not even be able to order you pizza.” (Note that the “child” analogy is yet another example of anthropomorphizing the assistant.) And yet another participant said that the most complicated thing one could ask an assistant was a place-based reminder.

However, there was a sense that even complex questions could be answered by the assistants if one “learned how to ask the question.” One participant compared the assistant with a file system with many files: the trick is to find the right questions with which to access the information in this database.

还有一个人抱怨她不想考虑如何提出正确的问题许多参与者提到效率是一个重要的考虑因素,他们是否可能会费心去问一个问题 - 如果更快“自己做”,那么他们觉得与助手的互动不值得花费这种措辞是我们参与者心理模型的关键指标 - 并反映了使用助理的信念应该很容易,而不是需要广泛的互动。

例如,一些参与者描述了如何使用助手设置计时器比使用其他设备更快,而使用Waze等应用程序或在计算机上使用Google地图可以更快地计算周末到蒙托克的流量这一决定是基于平衡的成本效益分析预期 interaction cost完成任务本身与他们的期望of whether the assistant would be able to do the task. 

Interpretation, Judgement, and Opinion Weren’t Trusted  

该研究中的所有用户都指出,他们对使用代理商提供基于意见的信息(如建议)不感兴趣任何涉及判断或意见的任务都被我们的参与者怀疑“我永远不会问她”这是对涉及个人或高度主观信息的任务的常见回应,比如想知道应该在布拉格度过多长时间度假One user said, “I would look up a forum about Prague for locals; I don’t want to do the tourist things that everybody does.”

However, it was not only subjective information that participants scoffed at: one user thought it unlikely that Alexa would be able to tell him who the star player was in the previous night’s Boston Celtics game because that involved interpretation; this participant formulated his query to Alexa initially as “Who scored the most points in the last Celtics game”, and, when Alexa failed to answer his question, he changed it to “Provide me player statistics for the most recent Celtics game.” (Both questions with an objectively true answer, rather than questions of judgment.)

一位与会者注意到智能代理人与人类之间的另一个重要区别:如果你给他们一个模棱两可的请求,语音助理就不会要求你澄清问题 Humans will typically respond to an ambiguous statement with followup questions, but intelligent assistants attempt to carry out the request without getting additional information.

基于事实的任务被认为可能成功

我们研究的参与者经常注意到某些简单的任务非常适合语音助手Questions that were considered to work well were typically fact-based, such as checking the weather, finding out a celebrity’s age, getting directions to a destination with a known name, and reviewing sports scores. 

心理模型

我们询问参与者的一个问题是他们的助手对他们的了解答案总是“不多”。他们觉得助理可能会跟踪他们的一些搜索并使用它们,但他们并不认为助理会明显地调整其行为以便更好地为他们服务。

Some described the assistants as “just” doing an internet search and acting as a voice interface to Google.  Others mentioned a list of preprogrammed things that agents could respond to, and anything outside of that preprogrammed menu of choices would not work and would default to a search有些人形容这样一种信念,即这些会话助理并非真正“聪明”,甚至不是真正的人工智能形式,可以有意义地理解查询,也可以创造性地解决新问题。

一位参与者甚至说“停止称之为AI(人工智能)”,直到它真实为止它能做的事情并不复杂这不是学习人们根据“人工智能”这个词形成期望,但这会产生误导无人驾驶汽车 - 那是人工智能那更接近了。“

在谈到Alexa时,另一个人描述了类似的信念:“两者之间存在差异会心你说的是什么理解what you’re saying.  You might say, ‘a tomato is a fruit’ and it wouldn’t know not to put it in a fruit saladI don’t think she necessarily understands; maybe one day computers will get there, where they can teach or learn.  Right now, I think it’s more like a trigger word:这个用词这个单词,结合这个字给出了这个答案如果你服用了这个说出来并将其与之相结合这个 其他一句话,它会给这个回答。”

与信任和​​隐私相关的态度

我们研究中的用户存在信任智能代理的问题,存在一系列问题:

  • 隐私和社交尴尬
  • 始终录制和传输音频到云端
  • 误解用户所说的内容的后果
  • 以未经授权的方式联系其他人
  • 导致智能家居功能无法正常工作的错误
  • 使用过多的移动数据

One of the most concerns expressed in our study was that the conversational assistants were always listening (and transmitting audio to the cloud).  Several users expressed strong skepticism (or outright distrust) to the claim that agents are only listening when triggered by their keyword.  During the interview portion of the study, some participants reported seeing advertisements for things that they normally never shopped for after mentioning them in a conversation near their assistantA few said they had even engaged in informal tests of this hypothesis: they had mentioned an invented new hobby near their smart speaker or phone, and then saw advertisements for related products soon thereafter.

Some users also believed that the agents were recording and transmitting full audio files (rather than some sort of abstracted data version of what they said) to the cloud for interpretation一位用户非常惊讶Google智能助理能够在手机没有连接到互联网的情况下听写(并做得非常出色)。

Participants reported some discomfort with using agents when an error or a misunderstanding could have consequences; the most common examples were making incorrect purchases or using the assistant for work.

One user related how he used a voice assistant to dictate a work email while walking home from the subway, and froze in panic when he noticed later on, while proofreading his email, that the agent had replaced something he said with an inappropriate word.  As he explained it, “it was like my phone had turned into a live grenade in my hand — I now had to defuse it very carefully.”

另一位用户提到他拥有一台智能空调,但不会将它与Alexa一起使用,因为他不相信它没有错误并在家中保持适当的温度他提到他有宠物,并担心,如果它不能正常工作,它可能会让他们在高温下窒息,“我关心我的动物 - 我不相信Alexa。”

智能助理的未来潜力

而科幻电影和电视来自2001年to她的provide us a wealth of examples of people comfortably using voice interactions with their computers for a range of complex interactions (especially tasks that involve interpretation, judgement, or opinion), the participants in our study were hesitant to even try more complicated tasks and did not trust that the assistants would perform well with such queries.

即使这些系统变得更好,新功能的可发现性也很低:在我们的网站中上一篇文章关于这个主题,我们注意到用户如何仅访问智能助理中可用功能的一小部分,他们甚至经常不得不记住可行的查询公式提供以前不可用的新选项很棘手,因为大多数用户都会忽略教程,发行说明和提示。

随着这些系统提高他们的能力,一个巨大的挑战将是修改用户的现有心理模型所以他们可以包括一些这些能力Users’ mental models are typically much more stable than the feature set of a product that’s being constantly updated, so a catch-22 emerges: users don’t know that the systems can handle more complex queries than before, and so they don’t use them for those types of tasks, which then reduces the amount of training data available to improve the assistants.

换句话说,低可用性智能助理的早期发布可能会阻碍未来使用大大改进的助手。

摘要

即使智能会话助理迅速提高了正确理解用户语音的能力,仍然存在一些主要的社会和心理模型挑战,阻碍用户自然地与这些系统交互信任问题和用户期望对于这些系统,除了简单的听写和事实查找请求之外,还推动了代理的采用。