We use language to communicate from birth, starting with crying to show our feelings and later with our first words, even if we don't know what they mean. This language accompanies us throughout our lives, whether it is to read a fantasy story, to find out the news or to fill in a document at the bank.
Knowing the importance of language, in 1969, Henry Kucera and W. Nelson Francis published the first paper on computer-based language analysis. In the following years, statistics began to be incorporated to improve language recognition and with it came the first machine learning algorithms. In addition, with the increase and improvement of computer systems, and the enormous amount of data available on the Internet, Speech to Text (STT) technologies began to be developed, and technology companies began to incorporate natural language recognition techniques, known as Natural Language Processing (NLP), into their processes and products, such as Microsoft Word and Google Translate. [1]
Below, we explain how these two technologies work, in brief:
Speech recognition or Speech to Text (STT).
Image source: How to Build Domain Specific Automatic Speech Recognition Models on GPUs
STT is a set of linguistic algorithms for classifying auditory signals, from a conversation for example, and transforming them into text, but how?
Speech-to-text conversion works through a complex machine learning model consisting of several steps [2]:
Natural Language Recognition or Natural Language Processing (NLP)
mage source: How to Build Domain Specific Automatic Speech Recognition Models on GPUs
NLP is the ability of a computer to understand human language as we do. How does a computer understand language? We could phase it, in a very simplified way, in the following steps [3]:
Imagen of a connected device with voice recognition.
Large technology companies have been able to see the potential offered by the combination of these two branches of artificial intelligence and countless applications have appeared, for different fields that make the most of STT and NLP. Moreover, it could be said that they have found a market niche in what we call "age tech", where more than 60% of the population aged 55 and over is connected to the internet, according to this report by the Mapfre Foundation.
One of the most widespread applications are virtual assistants, which are already part of many people's daily lives, such as Siri (created by Apple), Alexa (Amazon's star product) or Cortana (the voice assistant of Microsoft's operating systems). Virtual assistants can help remove technological barriers for senior profiles, as they allow them to interact by voice with a device to perform everyday tasks such as, for example, calling family members -without the need to type in the phone number or look it up in the phone book- through a voice command to activate the call, for example: "Alexa, call the medical centre" or "Siri, call Maria".
Another particularly useful feature for senior profiles is the ability to save reminders about medical appointments and taking medication in the calendar. Other functions are also very helpful for people with reduced mobility or reduced visibility:
By promoting the use of technologies in a more enjoyable, simple and functional way, STT and NLP technologies are demonstrating that they are capable of helping to improve the quality of life, generating applications and tools that are accessible to everyone and above all with practical applications for society as a whole.
Bibliography
[1] ¿Qué es Natural Language Processing?.
[2] What is Speech to Text? - Transcription Beginner's Guide - AWS
[3 ]NLP explained - What is Natural Language Processing? - MoreThanDigital