Image of Dad Baking with Kids and Device
Read Time, 4 min.

Getting Clear On Voice Recognition 

Alot of work has been done on making voice the preferred input method for technology users. In 1952 Bell Laboratories designed the “Audrey” system, which recognized digits spoken by a single voice. 10 years later this was followed by IBM who demonstrated its “shoebox” machine that could understand 16 words.

It’s only now that we are getting to the stage where voice has become commonplace to the point that it can complement, or even replace, peripherals like keyboard and mouse, touch, and gesture control.

As with smartphones and touch interfaces, the key to getting users to adopt voice-control is to make them feel like a new kind of experience.

Is it working? The short answer is Yes.

A global survey by research firm Cint of 5,000 consumers found that by July 2018 almost a fifth (19%) of households owned either an Amazon Echo, Google Home and Apple HomePod, and a further 38% of respondents said they planned to purchase one. In short, there is a market for devices that do the same things as a PC/tablet/Bluetooth connected speaker with a voice interface. This year Amazon even ran short of supply of Alexa powered speakers in the run up to Christmas.

Voice First Generation  

Beyond the hardware, we are seeing the appeal of voice assistants expand to the point where so-called Gen Z (those born in the mid-90s to early-00s) is on track to become a ‘voice first’ generation. Google reports that 20% of its searches are now through voice. As a parent of two I can attest to the immediate impact of introducing voice activated devices to our household. With two daughters (ages 5 and 2) running riot, at some point every piece of furniture has been asked “Hey Cortana”, “Ok Google” or “Alexa” in the hope of Frozen being played just one more time.

The trick is making sure this momentum doesn’t stall is by making sure responses are heard, understood and actionable. That’s a big challenge.

You often hear of artificial intelligence projects being undermined by a ‘junk in = junk out’ failure, where poor data leads to bad results. In voice recognition the line between quality and junk input is so thin that the fault lies not in the user for having ‘bad input’ but the software for not being able to recognise. There are reasons digital personal assistants have slow rollouts across the globe, they basically have to learn how to communicate with the market in which they are about to be launched – that means reaching at least 95% accuracy. As a native English speaker with, what I consider to be, a very flat Irish accent there are some challenges with voice recognition and “localisms”.

That learning is an ongoing process for which there are two approaches: ship updated software with a block of feature updates or update on a rolling basis using data gathered on a rolling basis. Yes, we are talking about another use where the cloud is a better solution than big box software.

The best example of how problems identified in the wild need rapid iteration is that of voice control for children. To date, much of the work done on voice recognition has been with adults, however, children have their own intonation, play in different environments, and come with a raft of safety and security concerns from parents.

We’ve all heard anecdotes of children using service accounts linked to their parents’ credits cards to ratchet up huge credit card bills. Wouldn’t it be advantageous to identify when the card holder is speaking to a device, as opposed to their child, who shouldn’t have unfettered access to it – if at all. Furthermore, we want to trust our devices not only with our credit but with our data and especially with our children’s data.

Like any project reliant on user data to improve itself, we have to be concerned about how that data is collected, stored and used and the user themselves has to be comfortable with that process.

One Irish company that’s worth checking out in this regard is Soapbox Labs. Fronted by Patricia Scanlon, Soapbox uses a ‘privacy by design’ approach to create voice interfaces that are accurate, age-appropriate and safe for children to use. Powered through years of research and unique expertise in this field Soapbox’s plan is to generate a ‘magical world for children’ and that means you’ll be finding its tech in toys, robots, virtual and augmented reality and games. Even more exciting is their vision to apply their tech to improve child literacy across the planet.  Unicef Report from July 2018 cites that while the literacy rates globally have improved from 83% to 91% in the last two decades it still remains the case that 115 million young adults remain illiterate and 59% of these are young women.

Soapbox is all safe, secure, and constantly learning to think like a child and protect like an adult. If you’ve heard of ‘privacy by design’ you could call this ‘parental by nature’.

 

Paul Shanahan

Intelligent Cloud Business Group Lead

Looking to skill up on Azure?

Get the e-book and learn Azure in a Month of Lunches

Education

Finance & Insurance

Government

  • -

    ENABLING DIGITAL IRELAND

    ENABLING DIGITAL IRELAND Over the last several months Microsoft and the Office of the Government CIO, with significant input from The Fletcher School at Tufts University, have developed a joint Summary Report which assesses Ireland’s digital progress and provides recommendations for the country to achieve its ambition of becoming a leading digital nation. The news […]

  • Two people inside using Surface Go in office

    How technology is transforming the work of governments

    Governments have the ultimate responsibility to their citizens. People depend on the services they provide like no other institution. Whether in healthcare, education, business, roads, railways, water – it’s essential that government institutions are always at the forefront of social and digital trends. Cloud-powered technologies are playing a critical role in helping government agencies to […]

Healthcare

Manufacturing

Retail