GPT-Powered chatbot over the phone - Try it, and see how it was built

Read Time:5 Minute, 43 Second

ChatGPT has sent the internet into a frenzy. For developers, it’s just the tip of the iceberg. OpenAI’s API allows us to leverage the power of the GPT Models in as many ways as we can imagine.

Today, I hacked together a cutting-edge AI chatbot that you can call from any phone, anywhere in the world. What’s more, you can try it yourself! See the end of the article for more details.

🗺 Architecture

Amazon Connect is a powerful solution that lets us set up a phone number, build contact flows, support call centre agents, and everything in-between. It’s not exactly a leader in this space, but it’s smooth integration with other AWS Services makes it a breeze for this scenario.

Amazon Lex, AWS’s natural language conversational AI service. With Amazon Connect, it seamlessly leverages Amazon Transcribe to understand what is being said (speech-to-text), and Amazon Polly to provide the verbal response (text-to-speech). We aren’t really using the Natural Language powers of Lex, but it has other uses for us:

AWS Lambda handles the call to OpenAI’s API. The function itself is quite simple, although the supporting Lambda Layer was significantly larger. Since OpenAI’s API costs money, I’ve leveraged AWS Secrets Manager to keep my API Key secure. To make debugging less nightmarish (we’ll see why!), Amazon CloudWatch provides a simple way to store my basic log files, and Amazon API Gateway gives an ability to retrieve them!

🛠 Deep Dive into Services

Knowing the architecture is great, but it doesn’t help build everything out in detail. Let’s dive into some of those services in some more depth:

☎️ Amazon Connect

Contact Flow Diagram in Amazon Connect

Connect is a powerful service, but it can feel counter-intuitive if you’ve never managed a virtual contact centre before (lucky you!). Like Quicksight and a few other services, Amazon Connect has its own control panels outside the AWS Management Console.

First, you need to setup your Amazon Connect instance in your chosen region. You’ll need to setup a DID number (Direct In-dial number; fancy word for your inbound phone number) for a country of your choice, but make sure it’s available in your region on the Connect Pricing page, and that you meet any necessary regulations.

Building your Contact Flow is the cool stuff happens. You can use the editor to build out a workflow that meets your needs in a huge number of works, including logic, prompts, and an ability to trigger Lambda functions, making it hugely extensible!

Amazon Lex UI showing the bot used by this project

💬 Amazon Lex

Lex (v2) is an immensely powerful service for building natural language experiences. The name isn’t a coincidence either, since it also powers Amazon Alexa. But we aren’t using any of those features; it’s just a convenient way of handling the processing between text and speech with our Lambda function! Lex developers, look away now!

After we create our bot, we need to configure it. Normally we would create multiple intents, with their respective utterances. Instead, I create a single intent (I called it campingInTents, because childish humour) with a random utterance (pancakes; I was hungry). Neither mattered in this case, since both the created intent and the default FallbackIntent are going to be doing the same thing.

After enabling the Lambda code hooks for initialization, and configuring the bot to use my Lambda function, it will always behave the same, sending the message through to OpenAI, regardless of what’s been said. This is an incredibly hacky way of doing it, and I’m sure there are many better ways. But I’m a hacky dev, building a hacky solution today.

Screenshot of AWS Lambda

🧠 AWS Lambda

Lambda functions are best when kept simple. This one sends the request to OpenAI, outputs some information for logging, and passes the information back to Lex. More info about the code down below.

Since it relies on an external API call, it does take a while to execute (1.285 seconds on average), which is very inefficient, and despite throwing Lambda Power Tuning at the problem, it’s purely an external dependency issue, and can’t be resolved by resources. You could make it more cost-effective using AWS Step Functions to handle the OpenAI API call asyncrhonously, but I didn’t worry for this project.

The openai-python library makes interacting with the API a breeze. Although at 112MB fully installed, including requests, numpy, and pandas, it’s pretty beefy. Consolidating it into a Lambda Layer made it simple and easy to manage. numpy didn’t appreciate running in Lambda after being loaded for my Windows machine, so you may need to specify the right package before building the layer.

After some incredibly nonsensical replies to my queries, I added some basic logging which is output to CloudWatch logs. The reason is that Amazon Transcribe is awful still leaving a lot to be desired, especially with my accent. So, I wanted to capture what Transcribe thought I’d said, leading to some interesting insights. This has to be my favourite:

What I said:
What’s the key difference between Alexander the Great, and Julius Caesar?

OpenAI’s Response:
The key difference between Alexander the Great and Reseda is that Alexander the Great was a Macedonian king and conqueror who reigned from 336-323 BC, while Reseda is a city in Los Angeles, California.

What Transcribe thought I said:
What’s the key difference between alexander great enjoy reseda

Image description

⏱ How much coding?

For a fully-fledged chatbot which comes close to passing the Turing Test, powered by some of the most incredible AI models in the modern age, that runs on the worlds most ubiquitous communication medium, surely there’d be some pretty significant coding required.

Just 49 lines of code. That’s all of the Python code, including comments, which I had to write for Lambda to make this work, all available on GitHub. Everything else was configuring the existing services. In fact, it can be compacted down into less than 10 lines of code, though substantially harder to read.

Modern solutions built around cloud computing, microservices, and API’s means you can leverage a surprising amount of power by simply tying solutions together. Reading the code, there’s virtually no custom logic involved, except for the prompt. Everything else is arguably just glue code.

On the pricing dimension, it costs all of 9.7 cents per invocation, which is pretty impressive.

Photo of a person holding a phone

📞 Try it for yourself!

Here we go, the US phone number that hosts the chatbot. OpenAI is expensive, so this will disappear eventually when either it runs out of credits, or gets excessively misused. Give it a try, but be reasonable!

+1 (445) 202-5802

It does take several seconds for the API to return the response, so don’t be surprised if it takes a little while.

Stay safe over the holiday season, and best wishes towards 2023!