CENIE · 25 May 2022

Close, friendly and simple: that's how natural language processing technology is. We tell you how it works.

We use language to communicate from birth, starting with crying to show our feelings and later with our first words, even if we don't know what they mean. This language accompanies us throughout our lives, whether it is to read a fantasy story, to find out the news or to fill in a document at the bank.

Knowing the importance of language, in 1969, Henry Kucera and W. Nelson Francis published the first paper on computer-based language analysis. In the following years, statistics began to be incorporated to improve language recognition and with it came the first machine learning algorithms. In addition, with the increase and improvement of computer systems, and the enormous amount of data available on the Internet, Speech to Text (STT) technologies began to be developed, and technology companies began to incorporate natural language recognition techniques, known as Natural Language Processing (NLP), into their processes and products, such as Microsoft Word and Google Translate. [1]

 Below, we explain how these two technologies work, in brief: 

Speech recognition or Speech to Text (STT).

 

Image source: How to Build Domain Specific Automatic Speech Recognition Models on GPUs

STT is a set of linguistic algorithms for classifying auditory signals, from a conversation for example, and transforming them into text, but how? 

Speech-to-text conversion works through a complex machine learning model consisting of several steps [2]:

  1. The sounds that are emitted when speaking produce a series of vibrations and the STT technology picks up these vibrations to convert them into a digital language.
  2. The sounds are segmented into hundredths or thousandths of a second and then combined into phonemes, a phoneme being a unit of sound that distinguishes one word from another in a given language.
  3. The phonemes are then passed through a statistical model that compares them with known sentences, words and phrases.
  4. Finally, the result is presented as close as possible to what has been spoken in text format.

Natural Language Recognition or Natural Language Processing (NLP)

 

mage source: How to Build Domain Specific Automatic Speech Recognition Models on GPUs

NLP is the ability of a computer to understand human language as we do. How does a computer understand language? We could phase it, in a very simplified way, in the following steps [3]:

  • Classification of individual words and phrases.
  • Extraction of the grammatical information of each one of them.
  • Detection of the functions of each of the words (subject, verb, adjectives, etc.).
  • Interpretation of the full or partial meaning of sentences.
  • Understanding the context of sentences and their relationships.

Imagen of a connected device with voice recognition.

Large technology companies have been able to see the potential offered by the combination of these two branches of artificial intelligence and countless applications have appeared, for different fields that make the most of STT and NLP. Moreover, it could be said that they have found a market niche in what we call "age tech", where more than 60% of the population aged 55 and over is connected to the internet, according to this report by the Mapfre Foundation

One of the most widespread applications are virtual assistants, which are already part of many people's daily lives, such as Siri (created by Apple), Alexa (Amazon's star product) or Cortana (the voice assistant of Microsoft's operating systems). Virtual assistants can help remove technological barriers for senior profiles, as they allow them to interact by voice with a device to perform everyday tasks such as, for example, calling family members -without the need to type in the phone number or look it up in the phone book- through a voice command to activate the call, for example: "Alexa, call the medical centre" or "Siri, call Maria".

Another particularly useful feature for senior profiles is the ability to save reminders about medical appointments and taking medication in the calendar. Other functions are also very helpful for people with reduced mobility or reduced visibility:

  • Controlling lights and heating in the house
  • Internet searches on interesting topics
  • Reminders about important dates and events
  • Narrating an audio book
  • Keeping up to date with the news
  • Check today's weather
  • Controlling music playback

By promoting the use of technologies in a more enjoyable, simple and functional way, STT and NLP technologies are demonstrating that they are capable of helping to improve the quality of life, generating applications and tools that are accessible to everyone and above all with practical applications for society as a whole. 

Bibliography

[1] ¿Qué es Natural Language Processing?.

[2] What is Speech to Text? - Transcription Beginner's Guide - AWS

[3 ]NLP explained - What is Natural Language Processing? - MoreThanDigital

Compartir 
Under the framework of: Programa Operativo Cooperación Transfronteriza España-Portugal
Sponsors: Fundación General de la Universidad de Salamanca Fundación del Consejo Superior de Investigaciones Científicas Direção Geral da Saúde - Portugal Universidad del Algarve - Portugal