Apple Siri, Amazon Echo, Google Voice, Microsoft Clippy or whatever you call them, the fact of the matter is that voice-control and virtual assistants are coming – and in a big way!
- Why? To make the user experience better to interact with computers just like you would with humans.
- So what’s the big deal? Having a computer understand human language is extremely difficult.
Every single major IT company including Google, Microsoft, Amazon, Salesforce, Apple (and many others) have invested significant efforts in the area of ‘Artificial Intelligence (AI), Neural Networks or Natural Language Processing (NLP)’. I will group these terms together of the purpose of this article, and I provide some specific definition below, but please know the essence is of machine-learning, where a computer thinks (and reacts) with logic and not an algorithm.
- Artificial Intelligence definition: the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
- Neural Network definition: a computer system modeled on the human brain and nervous system.
- Natural Language Processing definition: is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction.
As I mentioned earlier there has been a lot of investment and research in the areas of AI and NLP, so the question begs why are we seeing such a big push in the consumer market of voice-control technology all of a sudden?
That is a reasonable question and I think it’s easily answered by two words: ‘The Cloud’! First, you need unimaginable amounts of data to ‘train’ a computer. For example, IBM Watson, at its core only knows two things which are binary (or 1’s and 0’s, yes or no, true or false). To make IBM Watson, Jeopardy-worthy it had to be pumped with years and years of library data for it to ‘learn’. My point is that without this very specific training, a computer system as powerful as Watson still cannot communicate like humans do naturally every day. Secondly, massive compute-power is required. ‘The Cloud’ offers both access to unimaginable amounts of data plus massive compute-power where, finally, computer systems an achieve machine-learning capabilities in earnest.
I was fortunate enough to work at a company called ABBYY for the past 4+ years. ABBYY is known in the industry as a leader in Optical Character Recognition (OCR), which is true and one of the reasons I decided to join the company, however and what I was pleasantly surprised to quickly learn, was that high-quality OCR was a result of a much bigger vision. This bigger vision was from David Yang, ABBYY founder, and his passion to help people understand each other. It might sound simple but with so many different languages, interpretations of language and structure of language it is impossible to apply scientific logic to understand meaning or intention that a computer might understand.
- Introduction of ABBYY Machine Translation
https://www.youtube.com/watch?v=_Gb9cgzxPWk (click here to see some great use case examples for NLP)
Eugene describes the use cases for NLP technology including:
- Keyword vs. Semantic Search
- Syntactic language parsing
- Semantic indexing
- Contextual understanding
- Document classification
- Similar documents
It was my great honor to work with some serious Linguistic scientists and learn a little bit about the complications of NLP, and true language understanding. Previously I had thought, probably like most of us reading this article, ‘why can’t Apple Siri understand what I’m saying?’. ‘Why is she so dumb about her answers?’. I do recall those reactions and it still happens to-this-day. The magic of Siri is not the voice recognition itself of word-for-word conversion, but rather the understanding of the complete question or command in a full sentence. I learned that to understand human language, meaning and intent is nearly impossible, even with the greatest scientific-minds and compute-power completely focused on such realization.
In summary, and probably in the not so distant future, I can imagine the reality of our everyday lives being controlled by voice-command. It is an undeniable trend. The domain names of NLPIsAFad.com, AIisAFad.com and ArtificialIntellingenceIsAFad.com will become hot comities as collectors gobble-up such forward-thinking treasures. A day where you walk in your front door and say something such as ‘summer-time’ and your stereo immediately starts summer-time music, then the shades automatically turn to a sandy-location and then your television instantly clicks on-to a refreshing beach sunset! Ahhhhh, so tropical. This is all within our grasp now and will soon be a realization for you.