At this point, all of us have become acquainted with Siri, Alexa, Google, and Cortana. Voice interfaces are growing more popular every month, and in this article we'll try to explain why, to what extent, and whether they're here to stay.
The development of tech interfaces
Technology interfaces have come a long way. From punch cards and hierarchically organised lists that only made sense to operators and programmers to easy, clickable, drag-and-drop, intuitive designs that even children can use.
Just for fun: Kids react to old computers. The anti-intuitive-ness.
It didn't happen overnight. It's a process that started with keyboards, took its first important step forward in the 1960s with the introduction of the mouse, and it's still a work in progress.
The big milestone took place in the 1980s though, when a new kind of computer was introduced. One that anyone could use, which had a desktop and the option to click to select and move things around the screen.
Technology became increasingly accessible and popular. There was finally a monetary incentive to design easy-to-use devices. The solution in most cases was to apply skeuomorphic design; that is, making the interface look and behave as a real-life object that the user already knows. The first examples of this would be the keyboard, which is like a typewriter, or the desktop metaphor.
All of this is called the GUI (or "gooey"): the Graphical User Interface.
What's the next step?
For some experts, now it's the time of the CUI (Conversational User Interfaces), which would allow us to get over the limitations and complexity of certain GUI.
Touchscreens were the last big change that enhanced our interactions with computers, especially after the introduction of the iPhone. They remove the abstraction of keyboard and mouse, making the actions on graphic interfaces as direct as they can be.
Like many other UX and UI design improvements, touchscreens resemble (or have been directly inspired by) science fiction interfaces.
How long do I have to wait for a Jarvis? Edit: Oh, wait. Mark Zuckerberg just built one.
Looking at the most recent developments in user interfaces, it seems like the next steps in human-technology interaction will be aimed towards making the approach even closer and more direct. If we keep dragging inspiration from sci-fi, the ideal machines of the future would be those capable of communicating with us just like other people.
Can voice recognition replace keyboards?
It depends on what the end user needs to get done. When it comes to functions such as researching information and giving simple commands, it clearly can.
Voice search is rapidly growing in popularity as text-to-speech technologies improve, with research indicating that fifty-percent of consumers are using voice search more frequently now than they did 12 months ago. (Source: ‘What’s Next in Search?‘—Anne Ahola Ward)
Looking at the size and design of some of the most modern technology interfaces, such as the iWatch, many of them don't even have enough room for a keyboard. Voice recognition software will have to improve quickly in order to allow good and fluid communication between these devices and their users.
"The conversational paradigm is more social, and therefore less technologic. We use humane verbs like “add”, “invite”, “contact”, “mute”, “block”, and “message”. The language of conversation is more accessible to a broader audience, which will in turn accelerate the adoption of conversational agents faster than we saw with desktop apps." (Source: Chris Mesina)
Interfaces that act like humans, like artificially intelligent chatbots, are also capable of creating emotional bonding, unlike most graphic interfaces.
On top of that, a recent study by Stanford University proved that voice recognition software has finally become faster than humans at typing messages.
Google Home and Alexa are two good examples of devices that use voice recognition to make life more convenient and fun.
They can do lots of stuff already. True, sometimes they still disappoint, but their responses can also be surprisingly accurate or witty.
Starting in December 2016, Google is opening Home to third-party developers so they can help the device learn more things. Third parties can also teach new skills to Alexa and soon, maybe, to Cortana. There seems to be a trend.
But there are many unsolved challenges ahead
Human languages are incredibly complex and nuanced. One thing is the literal meaning of a word, but contextualising it and understanding the real intention behind it depending on the mood or tone is far more difficult.
Poor bots, even some humans struggle with things like sarcasm!
Read more: Can a computer pass for a human? The Turing Test.
And sarcasm isn't the only issue. We just don't type like we talk. When using voice commands, we start doing odd things such as thanking the machine or speaking to it like we would to another person. This different behaviour has to be taken into account when programming VR interfaces.
There is an adorably polite 86-year-old lady who didn't even need a human-like interface to type "please" and "thank you" on a Google search query.
The list of potential challenges is long. Consider Spanglish. The interface that seems to be doing the best job in this particular field so far is OK Google, because it can learn new words based on how often they're used in particular areas.
“It’s something that we know it happens in places like Texas, and other places around the world, like Hong Kong.” “[...] places like Austin or California, where Spanish words are used in street and restaurant names.” (Ignacio López Moreno, developer at Google)
Both Google Now and Siri are far from understanding proper Spanglish, but the first one does a better job at recognising the name of a Spanish person in the contact list or the name of a street or restaurant named after something in Spanish. That’s because every search becomes part of OK Google’s huge database.
The goal is to optimise the system to suit the needs of the majority of users.
As one final downside, most users argue that even if voice recognition can be convenient for simple tasks and to interact with technology in some cases, it will never replace keyboards for activities that need precision. For example, programming.
Imagine giving voice commands to write code. Yes, that's right. Not going to happen.