Using NVIDIA Speech AI and Rasa to Create a Multi-Use Chatbot

Contributor

Sam Armstrong

Using NVIDIA Speech AI and Rasa to Create a Multi-Use Chatbot

Introduction:

An issue for our healthcare facilities as well as many others around the country is better understanding the needs of patients. If a patient has experienced a form of trauma, talking to a doctor face-to-face may be difficult. This is where the NVIDIA Omniverse can help doctors extract vital information that they would otherwise miss. Chatting with a non-threatening animated avatar (shown in this NVIDIA Audio2Face demo) may help a patient, especially a child, to speak openly.

Another emerging use case is with tracking of blood glucose readings for diabetic patients. To accomplish this, a text message would be sent to a patient reminding them that it is time to take their glucose reading. This message would then guide the patient to a web-based chat interface (using NVIDIA Riva and Rasa as text processors and response generators) to walk a patient through an interactive questionnaire to vital statistics and analyze parameters over time.

Technology Overview

There are two applications from by NVIDIA and one from by Rasa Technologies, Inc. that are in use to make this pipeline work.

The first being NVIDIA Audio2Face. Audio2Face creates natural facial movement with the help of AI based on either live or recorded audio input (Fig. 1). This application come with three out of the box avatars/face models. However, if needed, an avatar mesh can be created using standard modelling tools (such as Blender or Unreal Engine) and the imported. Once setup, Audio2Face can interact with Riva (described below) to add a visual and aural element to a chatting application.

Fig. 1. Audio2Face example from NVIDIA

The next component to a chatbot application is NVIDIA Riva. Riva combines text-to-speech (TTS), natural language processing (NLP), and automatic speech recognition (ASR) into a single API. For our uses, we use ASR to convert an audio recording from the user into text, which is then sent to Rasa (described below). After we receive a text response from Rasa, we use TTS to convert response into an audio file that is pushed to Audio2Face for a visual and aural output. More info about Riva can be found here.

NVIDIA is planning to add native language translation to this application (shown in the video below). Once this happens, we can move forward into building an application to house this technology and allow it to be user-friendly.

The last and most integral component of our chatbot application is Rasa. Rasa is an open-source, conversational AI that is fully customizable to ask question, reply, or simply talk based on a domain defined prior to training a Rasa model. The domains in our case are diabetes management and doctor-patient related chats. An example of a chat interaction for diabetes management is shown in the video below. More info for Rasa can be found here.