In the past few decades, integration of voice interaction with technology wasn’t really successful aside from sci-fi movies. A bit of a progress was made back in the 1990s with voice dialing; then in the 2000s the first Interactive Voice Response (IVR) systems emerged for banks or airplane reservations. And then Siri, Cortana, Alexa, Nina, Watson and others came to the party.
In 2016, we use voice in home entertainment, automation and security, retail experience, safe driving, workplace productivity, health and fitness - literally everywhere! Designing voice interfaces for IoT devices is more complex than adding voice recorder to a wearable gadget. Designing for voice inputs and audio outputs brings two bigger challenges – recognition and understanding. Everything to keep the impression of a real conversation that people are used to start, maintain or finish.
Usually, voice interfaces are divided by the following types of interaction:
One of the earliest and most common forms of speech input. The system is trained to recognize a small set of commands (e.g., Play, Stop, Open, Dial) or “finite state grammars”. The user is often proposed with available options (“You can say Tickets, Booking or Checking after the tone”) to match the system’s algorithm.
Speech recognition is one of the most desirable yet unstable programs that we’re looking forward to since the introduction of “hands-free”. Transcribing speech by skipping the typing part might bring communication to the new standard of velocity and availability, not to mention safety benefits along the way.
The program that can accept the natural language input (“Would I make it on time to the airport if I go now?”), process it and respond appropriately. This type of interface is usually presented as a persona and is widely used not only for assistance but as a part of brand recognition and loyalty among the users due to the impression of a fully comprehensive conversation.
This interaction type is based on biometric voice identification which makes it slightly ‘’movieish’'. Identification is an important part to ensure user’s privacy and keep more than one account synched with the device (i.e. for smart homes or family cars).
Designing for voice user interfaces (VUI) requires a very deep understanding of user needs and possible context of use, which is very much alike to graphical user interfaces (GUI). But voice makes it almost impossible to outline the input limit and correlate it with response predictability. This brings us to possible challenges and perspectives of VUI design:
- Recognition. Products in the late 1990s had almost 65% recognition rate and now they are around 92%, which still leaves us with that 8% of frustration and misunderstanding.
- Discoverability and predictability. Voice interfaces are still unable to fully handle multi-step or complex tasks. People don’t usually explain the path to required information, they need a prompt result which might contain few background web searches, launched apps or access to personal data.
- Accessibility. We don’t usually talk in a silent room without any background noises. Also, people with limited mobility can’t always access the device to enable the voice control. Many more scenarios like these should be considered during the development.
- Privacy. It’s only a matter of time that we will share all of our personal data with voice agent personas. Emails, contacts, bank accounts and health records – all in one place, speaking loud and clear right back at you and someone who might be listening.
- Multi-user and multi-device identification. Smart home should distinguish temperature levels of a thermostat and an oven. Car should adjust the seat to mom and dad differently. My Apple devices should know which one I was referring to saying “play some music” – iPad, iPhone or my Apple TV.
We are currently witnessing the spectacular time of IoT product development. Despite its nascent stage, IoT keeps technology companies thrilled with increased possibilities in Voice Interface Design.