Conversational AI and Speech recognition

These two terms are some of the hottest tech words you may have seen on Twitter and LinkedIn. You may have also seen our blogs about dictation and speech recognition. We have yet to dig deep and understand what these terms mean in the workspace context. The future of the workspace is heading towards these innovative technologies. Today, we lay down what these two terms mean, what the combination of the two looks like, and how this could apply to an office workflow.

What is Conversational AI?

IBM or ADA CX explain conversational artificial intelligence simply as combining natural language processing (NLP) with software such as virtual agents and voice assistants which users can talk to. Natural language processing (NLP) is a branch of computer science that is concerned with giving computers the ability to understand text and spoken words like human beings can including understanding meanings beyond the literal sentence. Think of your at-home voice assistant when you ask a follow-up question such as: Could you tell me more? Or why is that? It knows you are requesting a follow-up question rather than taking the sentence as is.

Another important key feature of conversational AI is the machine learning component. As we interact more and more with AI, the computer learns more about communication behavior through repeated interactions. Think like how a child learns how to communicate and eventually can communicate as an adult after years of talking to people of different backgrounds.

What is Speech Recognition?

Continuing from how conversational AI has a spoken component, this is can only be as good if the computer can understand our speech. Speech recognition, defined by IBM, is a program to process human speech into a written format. It is pretty often that voice recognition is confused with speech recognition. Voice recognition is when the computer can distinguish between each person’s voice to dispel any confusion.

Speech recognition can be found in VoIP calling when the AI can transcribe your caller’s voice mail into text or when you dictate into Google translate for it to be translated into another language. The more we interact with our robotic counterparts, the more they learn to understand and make work easier for us.

A match made in heaven for the office

While these technologies exist for a while now, how far are they to support day to day work in an office environment? Speech recognition is already widely used in particular professions and has become more ubiquitous with the use of smartphones and integrated voice to text capabilities. However, its biggest limitation starts when more than one person speaks, and it cannot understand context. That’s where conversational AI picks up.

With AI recording technology that can discern various speakers while recording, speech recognition can then transform voice to text even with multiple speakers. In combination with NLP the software could then for example suggest a next best action after a meeting, or at least fill in those punctuations that have not been said explicitly.

We are just at the beginning of combining many technologies that exist separately today to transform the office workday from typing to talking one step at a time.


At Speech Processing Solutions, we are committed to invest in developing state-of-the-art voice technology that drives the change to new ways of working. What combination of technologies do you envision as a driver for the modern office? Let us know in the comments below!

2 comments

Leave a Reply