In comparison to newcomers like Google Assistant and Amazon’s Echo-based Alexa virtual assistant, Apple’s Siri has been around for quite a while. In fact, it was alongside the iPhone 4s, back on October 14, 2011, when Cupertino first unveiled its proprietary voice assistant, which has gone through a number of dramatic and intermittent refinements over the past several years making it one of the oldest, yet most inherently powerful, voice assistants available to users on the market today.
Yet even as powerful and robust as she is, Siri has struggled to keep pace with her new-age constituents from Google and Amazon, respectively — platforms that offer customers a wider set of skills, albeit at the expanse of a fairly limited linguistic capacity. While Amazon’s Echo-only Alexa platform speaks and is capable of responding to only two languages — English and German — and while Google’s Assistant ups the ante just a notch to four languages — Apple’s Siri takes the multicultural cake, since it currently boasts support for 21 distinct languages spoken in 36 countries around the world.
So as you can see, Siri has a clear leg up when it comes to her ability to interact with users in more regions of the globe — a skill set she’s had years to develop, and to her benefit, because the process of teaching her how to understand and respond to just a single new language is so complicated, the sheer amount of time, patience, and resources that go into teaching her just might blow your mind.
How Does Apple Teach Siri a New Language?
According to Alex Acero, head of Apple’s Siri Language Development team, the company starts off by recruiting multiple human subjects — proficient in the dialect at hand, but varying in accent — and asks them to read from a series of pre-selected passages.
Apple then “captures a range of sounds in a variety of voices,” according to Reuters, from which the most accurate language model is built with the sole intent of trying to predict words and sequences thereof.
Apple starts out by adding the new language to its dictation, text-to-speech (TTS) translator platform for iOS and macOS. As Apple users speak and interact with these platforms on their desktop, laptop, or mobile device, the company anonymously collects samples of the spoken data and associated background noise, which pass through its servers for quick processing.
Once these recordings are transcribed into written text, prior to being fed back through the computer in verbatim, once again, Apple is able to reduce the overall speech recognition error rate by over 50%.
After the lengthy and time-consuming process is completed, Apple starts the process all over again by recruiting yet another round of voice actors to start on an entirely different language.
And if you thought for a second that was the end of the road — think again! Siri’s learning experience is constant and ongoing, since she receives updates from Apple once every two weeks or so, at the least. As users continue to interact with her on their iPhones, iPads, and macOS computers, Siri only learns more and more about the dialect fed to her — information which is sent directly to Apple for the sake of making her a true powerhouse among the increasing number of virtual voice assistants.