Is life boring in Nlu

How our scientists are making Alexa smarter

Rohit Prasad is Vice President and Senior Scientist at Amazon Alexa. He is responsible for research and development in the areas of speech recognition, natural language understanding and machine learning, which are used to improve the customer experience on the Alexa-powered Echo devices. Here, Rohit answers five questions about the technology and future of Alexa.

The U.S. Defense Advanced Research Projects Agency began work on language technology in the early 1970s. How does it come about that this technology suddenly appears in conversation-driven AI technologies like Alexa?

Conversational AI as a technology has been actively researched for nearly 50 years. The aim is to make interaction with machines as smooth as communication between people. This is one of the most difficult areas of artificial intelligence, because machines have to be extremely intelligent in order to understand human language and to communicate in it, whether verbal or in writing, or in combination with haptic impressions or visualizations.

Language as the interface between man and machine has always been viewed as optimal, but the biggest hurdle for the introduction so far has been the difficulty of machines in recognizing and understanding the hands-free and non-written voice input at all. This is the challenge of far-field or distance speech recognition, in which a surrounding device such as an echo can recognize words spoken from a distance with high accuracy.

With the launch of Echo in November 2014, we showed that far-field speech recognition is possible with high accuracy even in environments with background noise thanks to the combination of machine learning algorithms, data and immense computing power.

Another important reason for implementing Alexa is the multitude of intentions that it can recognize and to which it can react. It revolutionizes everyday comfort offers such as access to music, books and videos, controlling smart devices in the household, communicating with friends and family, shopping, setting up reminder messages or calling up information.

What are the key conversational AI and machine learning technologies behind Alexa?
Alexa was developed in such a way that it decides on the best reaction in the interests of the user based on its interpretation of the user's intention. In contrast to search engines, it does not simply respond with ten blue links from which the user then has to select the most suitable - instead, Alexa already acts in the interests of the user by asking clarifying questions as required. There are several key technologies that are responsible for this at Alexa.

It starts with the recognition of the "wake-up word", which causes Alexa to even hear the words spoken by the user afterwards. The wake word recognition is based on deep learning technology that runs on the device to recognize the wake word chosen by the user. The automatic Far Field Speech Recognition (ASR) in the Amazon Web Services Cloud (AWS) then converts the audio following the wake word into text and determines when the user has stopped speaking to Alexa.

"The success and acceptance of Alexa make us very happy, but we are still at the very beginning of what is possible."
Rohit Prasad, Vice President and Senior Scientist, Amazon Alexa

Once speech has been converted to text, Alexa uses NLU (Natural Language Understanding) to translate the words into a structured interpretation of intent, and then to respond based on more than 30,000 Alexa skills from our own and external developers to formulate.

This structured interpretation is used in combination with various forms of context, e.g. For example, what kind of device the user is interacting with, which skills are most likely to provide an answer, or who is speaking. This context helps determine the most appropriate response that Alexa should now show. You can choose to either respond with the best response thanks to a skill or to ask the user for more information.

For a natural-looking dialogue, it is also important how Alexa reacts or sounds. This is achieved via text-to-speech synthesis (TTS), which converts any word sequence into natural-sounding, understandable audio.

All of the above technologies focus on data-driven machine learning and the fastest possible feedback to provide an accurate answer in the shortest possible time. As scientists and developers, we always struggle with this predetermined tension between accuracy and the waiting time in which the user dialogue with Alexa is ended until she reacts.

Like other AI-based technologies, Alexa becomes smarter the more it is used and the more it learns about users. How are Amazon scientists and developers making Alexa smarter?
Since Alexa's brain is mainly in the cloud, she learns with every interaction. Alexa uses a number of learning techniques: supervised, semi-supervised, and unsupervised learning. Supervised learning is most effective; but it doesn't scale because we can't generate manual input at the pace it would take to continuously improve Alexa for our customers. As a result, our scientists and developers are constantly applying new learning techniques to reduce the reliance on manual input when feeding our statistical models. There is, for example, active learning. This is a subspecies of semi-supervised learning techniques in which the system determines what part of the interactions it needs from a human expert. This type of learning is used throughout our technologies. We also use unsupervised learning without predefined answers to make Alexa more intelligent, especially when it comes to speech recognition. We also use the concept of learning transfer, so that Alexa can learn from one skill to another, or even translate it into another language.

What is unique about conversational AI research on Amazon?

What makes us unique is how we approach research in general. Every research problem begins with a backward-looking methodology that stems from our approach to product development on Amazon. The basic idea is simple. We start with a blueprint that defines what research, if successful, would ultimately achieve or revolutionize. Then we work backwards from that goal, designing our experiments and the milestones with which we check the progress of the research. We believe in quick experimentation and in proving or refuting our hypotheses as early as possible.

Another unique aspect of conversational AI research at Amazon is that we have a breakthrough product in the form of Alexa that we can use to scale new algorithms and technology. This also underpins our technical advances, which we publish at conferences or in magazines.

The combination of large amounts of data, almost infinite computing power, our team's extensive expertise in AI problems that you can learn from, and our willingness to take risks make Amazon the best company in the world where you can realize AI research dreams.

And what does the future of conversational AI look like?
Overall, I find the future of AI extremely exciting. AI will have a profound impact on society and help us learn new skills that we cannot even imagine today. As for conversational AI, I think we're still on day one. The success and acceptance of Alexa makes us very happy, but we are still at the very beginning of what is possible.

In the next five years, AI will evolve multidimensionally as we make further advances in machine learning and logical thinking. Based on these advances, Alexa will become more contextual in recognizing, interpreting and responding to user requests. Alexa will learn faster and faster because unsupervised learning is becoming more and more dominant in her "training".

Alexa will soon be able to converse naturally with people on everyday topics and news events. This is exactly what we are concentrating on with our Alexa Prize, a university competition for the construction of “socialbots” that can hold a 20-minute conversation with a person in a coherent and appealing way. Our customers recorded more than 100,000 hours of conversations with the Alexa Prize social bots from 2017. Our social bots for the Alexa Prize 2018 will go online in May. It's great fun to try them out. Just say "Alexa, let's chat".