How to create the smartest multilingual Virtual Assistant using AWS and ChatGPT

Read Time:10 Minute, 34 Second

Last week ChatGPT was released and everyone has been trying amazing things. I also started playing with it and wanted to try how it would integrate using the AI services from AWS and the results are AWSome!

In this post I will explain step by step how I created this project so you can also do it!

Best of all, you don’t need to be an AI expert to create this!

I will assume you already know what ChatGPT is and have an account to play with AWS. In case you don’t know what ChatGPT is, please check here what ChatGPT is and how to try it yourself.

The full code for this project can be found here.

Last week ChatGPT was released and everyone has been trying amazing things. I also started playing with it and wanted to try how it would integrate using the AI services from AWS and the results are AWSome!

In this post I will explain step by step how I created this project so you can also do it:

https://dev.to/aws-builders/how-to-create-the-smartest-multilingual-virtual-assistant-using-aws-and-chatgpt-4i5k

Best of all, you don’t need to be an AI expert to create this!

Steps of the project

How to create the smartest multilingual Virtual Assistant using AWS and ChatGPT

I have devided this project in 8 steps:

  1. Record an audio and save it in WAV format
  2. Upload the audio file to Amazon S3
  3. Transcribe and detect the language of the audio saved in S3 using Amazon Transcribe
  4. Amazon Transcribe saves the transcript in Amazon S3
  5. Send the transcription to ChatGPT
  6. Receive the text answer from ChatGPT and remove code chunks
  7. Convert the text to audio using the language detected in…

Steps of the project

Image description

I have divided this project in 8 steps:

  1. Record an audio and save it in WAV format
  2. Upload the audio file to Amazon S3
  3. Transcribe and detect the language of the audio saved in S3 using Amazon Transcribe
  4. Amazon Transcribe saves the transcript in Amazon S3
  5. Send the transcription to ChatGPT
  6. Receive the text answer from ChatGPT and remove code chunks
  7. Convert the text to audio using the language detected in step 3 using Amazon Polly and download the audio in MP3 format
  8. Reproduce the audio file

Before we start, we need to define the general parameters that you will need to create and later replace in the following code. The creation of this credentials will be explained on the next steps.

Enter fullscreen mode Exit fullscreen mode

1. Record an audio and save it in WAV format

First, we will need to record the audio in where we will ask the question we want ChatGPT to answer. For that we will use the package sounddevice. Make sure that you have selected the correct microphone in the default configuration of your OS.
In this case, the amount of time it will be recording the voice is 4 seconds. In case you want to increase or decrease this time just modify the value of the parameter duration.
The script will save the audio inside a folder called audio in the current working directory. In case this folder doesn’t exists it will create it using the os module.

Enter fullscreen mode Exit fullscreen mode

2. Upload the audio file to Amazon S3

In this step, first we need to create an Amazon S3 Bucket. For that we go to the AWS Console and search for the service Amazon S3. Then click on Create bucket.

We need to put the name of our bucket (bucket names must be unique across all AWS accounts in all the AWS Regions) and select the AWS Region.

Image description

The rest of params we can left them as default. Finally, click on Create bucket at the bottom of the page.

In the parameters section from the beginning we need to replace this values with the bucket name and the region selected:

Enter fullscreen mode Exit fullscreen mode

Next step is to create a new user that we will use to access to this S3 bucket using boto3. Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.

To create the new user, we search for IAM on the AWS Console. Then click on Users on the left menu under Access management:

Image description

Click on Add users on the top-right corner. We need to provide a user name and then click on the checkbox of Access key – Programmatic access.

Image description

Then click on Next: Permissions. Here click on Attach existing policies directly and then on Create policy.

Image description

Here I would like to mention that we could just select the policy called AmazonS3FullAccess and it would work but that goes against the principal of least-privilege permissions. In this case we will just provide access to the bucket we created before.

On the Create policy page click on Choose a service and search for S3 and click on it. Then on Actions click the options:

  • ListBucket
  • GetObject
  • DeleteObject
  • PutObject

On Resources click on Specific and then on bucket click Add ARN, put the bucket name we created before and click on Add. On object click also on Add ARN and put the bucket name created before and on Object name click the checkbox Any.

Image description

Then click on Next: Tags and Next: Review. Finally, put a name to the new policy and click on Create policy.

Once the policy has been created, go back to the creation of the user page and search for the new policy created. In case it doesn’t appear, click on the refresh button.

Image description

Then click on Next: Tags and Next: Review. Finally, review everything is ok and click on Create user.

Image description

On the next page we will get the Access key ID and the Secret access key. Make sure to save them (specially the secret access key) and don’t share them. In the parameters section from the beginning, we need to replace this values:

Enter fullscreen mode Exit fullscreen mode

With that we have a user with permissions to write into the S3 bucket created before.

Enter fullscreen mode Exit fullscreen mode

3-4. Transcribe and detect the language of the audio saved in S3 using Amazon Transcribe

Amazon Transcribe is an AWS Artificial Intelligence (AI) service that makes it easy for you to convert speech to text. Using Automatic Speech Recognition (ASR) technology, you can use Amazon Transcribe for a variety of business applications, including transcription of voice-based customer service calls, generation of subtitles on audio/video content, and conduct (text-based) content analysis on audio/video content.

To be able to use Amazon Transcribe with the IAM user created on the previous step we need to provide access to it via a IAM Policy.

For that we need to go to IAM in the AWS Console, click on Users on the left menu and then click on the user created before. Click on Add permissions and then Attach existing policies directly. Search for AmazonTranscribe and click the checkbox of AmazonTranscribeFullAccess.

Image description

Click on Next: Review and Add permissions.

At this point this user should have 2 attached policies:

Image description

After adding this extra permission you don’t need to modify/update the access key id nor the secret access key.

On the following python code we use Amazon Transcribe via the boto3 package to transcribe the voice recorded in the audio to text. Amazon Transcribe also detects the language that is being used on the audio.

Here you can read all the documentation regarding the TranscribeService on the boto3 documentation.

The transcription is saved in a JSON file in Amazon S3. You can either choose to save your transcript in your own Amazon S3 bucket, or have Amazon Transcribe use a secure default bucket. In my case, I choose the default option that is on an Amazon S3 bucket owned. If we choose the default option, the transcript is deleted when the job expires (90 days). If we want to keep the transcript past this expiration date, we must download it.

Enter fullscreen mode Exit fullscreen mode

5. Send the transcription to ChatGPT

Once we received the transcript from Amazon Transcribe we need to send this to ChatGPT. For that, I am using the revChatGPT package. To use this package we need to authenticate to ChatGPT, this can be done using username and password or using the session_token. In my case, because I am using the Google OAuth authentication method I will use the session_token.

To get the session token we need to log in into ChatGPT and then click F12 or right-click and Inspect. Then search for the Application tab and on the left menu search Cookies. Select the website https://chat.openai.com and then search the cookie with the name __Secure-next-auth.session-token and copy the value of this cookie.

Image description

In the parameters section from the beginning, we need to replace this value with the session token value you have:

Enter fullscreen mode Exit fullscreen mode

In case you want to use the email and password as an authentication method you can check the steps on how to do it here.

Once this is done, we should be able to connect to ChatGPT using Python.

Enter fullscreen mode Exit fullscreen mode

6. Receive the text answer from ChatGPT and remove code chunks

Once we get the answer from ChatGPT can be that we get one or more chunks of code. In this case, I am applying a regex function to remove the chunks of code.

Here you can also add your own rules on how to filter or clean the answer from ChatGPT.

Enter fullscreen mode Exit fullscreen mode

7. Convert the text to audio using the language detected on step 3 using Amazon Polly and download the audio in MP3 format

Amazon Polly uses deep learning technologies to synthesize natural-sounding human speech, so we can convert text to speech.

After cleaning the answer from ChatGPT we are ready to send it to Amazon Polly.

To be able to use Amazon Polly with the user created before we need to provide access to it using a policy like we did in the previous step with Amazon Transcribe.

For that we need to go to IAM in the AWS Console, click on Users on the left menu and then click on the user created before. Then click on Add permissions and then Attach existing policies directly. Search for AmazonPolly and click the checkbox of AmazonPollyFullAccess.

Image description

Click on Next: Review and Add permissions.

At this point this user should have 3 attached policies:

Image description

Amazon Polly supports multiple languages and different genders. In this case, the code I provide has predefined 3 languages: English, Spanish and Catalan. Also, note that for each language you can have different variations depending on the country. For example, for English we have en-US, en-GB, en-IN and others.

The full list of all available languages and variations are available here.

After sending the text to Amazon Polly then we will receive a stream containing the synthesized speech.

Enter fullscreen mode Exit fullscreen mode

8. Reproduce the audio file

Finally, we just need to play the audio result from Amazon Polly.

Depending on the OS or from where you are running this it may not work. In my case when I run the function speak_script(output_file) from the Terminal in a macOS it works. In case you are using a notebook like Jupyter Notebook then use the function speak_notebook(output_file).

Enter fullscreen mode Exit fullscreen mode

Example output

If we followed all the previous steps, we should be ready to start playing with our new multilingual virtual assistant. To show you how an output looks like, I recorded myself asking “What is Amazon Web Services?” and you can clearly see that’s exactly the transcript generated by Amazon Transcribe and then the answer provided by ChatGPT.

Enter fullscreen mode Exit fullscreen mode

I hope you enjoy it as much as I did when I was building and playing with these services. I think these state-of-the-art technologies have a lot of opportunities/potential and when we use all of them together the results are AWSome!

If you have any question, suggestion or comment please feel free to add them on the comments or contact me directly! πŸ™‚

Source: https://dev.to/aws-builders/how-to-create-the-smartest-multilingual-virtual-assistant-using-aws-and-chatgpt-4i5k

Tag Cloud

Java Java Logical Programs OTP Generation in Java python Recursion youtube video ASCII Upper and Lower Case blockchain javascript graph learn to code software development Successful Software Engineers breadth first search Java Array Programs Java Programs Uncategorized android ios programming kotlin web-development django data sql cybersecurity database swiftui serverless aws swift rust react background-position gradients loader mask grid nth-child pseudo elements indieweb WordPress Print Array without brackets C++ factorial Java String Programs Final Keyword Static Variable Axie Infinity Cryptokitties NFT games tool inserting MISC Tips Codes python code python projects python3 system info python project Bigginers How to Do Integrations Payment Gateways PHP checkout page in php Implement stripe payment gateway in Step by step in PHP integrate stripe gatway in php mysql payment gateway integration in php step by step payment gateway integration in php step by step with source code payment gateway integration in website PHP Integrate Stripe Payment Gateway Tutorial PHP shopping cart checkout code shopping cart in php stripe php checkout PHP/MySQL/JSON best international payment gateway does google pay accept international payments how to accept international payments in india paytm payment gateway razorpay codeigniter github razorpay custom checkout github razorpay get payment details razorpay integration in codeigniter github razorpay international payments Razorpay payment gateway integration in CodeIgniter razorpay payment gateway integration in php code Razorpay payment gateway integration with PHP and CodeIgniter Razorpay payment gateway setup in CodeIgniter Library & Frameworks Tips & Tricks UI/UX & Front-end coding birds online html code for google sign in login with google account in PHP login with google account using javascript login with google account using javascript codeigniter login with google account using php login with google account using php source code
Meme Monday πŸŽ‰ Previous post Meme Monday πŸŽ‰
The dirty secrets of the IT Staffing industry Next post The dirty secrets of the IT Staffing industry

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.