ChatGPT, a powerful AI language model by OpenAI, has been garnering significant attention recently. Its potential use cases span various industries and applications. In this blog post, we will explore capabilites that intergrating ChatGPT with Asterisk, a popular open-source PBX and telephony platform, can enable.
Call Routing with ChatGPT
One of the ways to use ChatGPT with Asterisk is to enhance call routing by determining the intent of the caller. This can eliminate the need for large IVR menus and make it easier for both the caller and the business to connect to the right people. ChatGPT can analyze the intent of the caller by listening to keywords and the context of the call. With the right prompt, ChatGPT can be highly effective in this role. All you need is a simple script that listens to the callers prompt and a directory of queues of people.
Another application of ChatGPT in conjunction with Asterisk is call summarization. By processing the transcript of the recording, ChatGPT can identify the most important points of the call and summarize them. This feature is incredibly useful for those who do not have time to listen to full recordings or want to quickly search for the right recording to read the full transcript. With minimal effort you could create a script that does this for you after every call, and e-mails it to you.
ChatGPT API for Conversations
These functionalities were already possible with the OpenAI Text Completion API. However, with the release of the ChatGPT API, we can now set up conversations and talk to it instead of typing. The older text completion API was not good enough for this case, because it can easily lose context. The ChatGPT API provides a way to send system prompts (for setting up the converstation), as well as user prompts.
Implementing ChatGPT with Asterisk: AGI Script
To implement these features, we need to write an AGI (Asterisk Gateway Interface) script. AGI is an older method of interfacing between Asterisk and internal data sources, but it is simple to write for. Python libraries, such as pyst2 and openai, are available to simplify the process.
The implementation requires converting speech to text, connecting to OpenAI’s ChatGPT API, and converting the answer back to speech. This article provides an outline of how to achieve this, along with links to sample code. Note that this approach has some downsides and is primarily a proof of concept.
Speech-to-Text Conversion: Using Whisper
The first step, converting speech to text, can be done using Whisper, OpenAI’s automatic speech recognition system. You can run it locally or use their API. Experimenting with Whisper locally requires a fast GPU for quick file processing, which is crucial since the caller is waiting for a response.
In our script, we start a recording and feed it to the Whisper API, but local processing is also possible and would be relativly easy to use with this script. In fact, that’s what I started experimenting with.
Interacting with the ChatGPT API
Once we have the text, we call the ChatGPT API using the prompt from Whisper. Although the prompt may include some errors, ChatGPT can still provide a coherent answer. We set up ChatGPT with the system prompt “You are a helpful assistant on a phone call.” Everything that follows is between the caller and ChatGPT.
After receiving a text response from ChatGPT, we must communicate it to the caller using text-to-speech (TTS) conversion. In our proof of concept, we used Google TTS because it is quick, does not require authentication, and has a Python library available.
This is fine for our proof of concept. However, to make the experience better, we can create a more engaging and enjoyable experience for callers while talking to ChatGPT.
Limitations and Future Improvements
There are some downsides to this approach. The main issue is the waiting time between the caller finishing their prompt and receiving a response. The entire process, from sending the recording to OpenAPI to receiving a sound file from Google TTS, can be lengthy, depending on the prompt and ChatGPT’s response. All the while the caller is waiting in silence.
Another concern is the quality of the text-to-speech conversion using Google TTS. The voice quality may be less than ideal, and it can only process a small amount of text, leading to awkward breaks in sentences and unnatural intonation.
To address these concerns, consider the following improvements:
- Use a more natural TTS engine, such as Microsoft Azure Cognitive Services, Amazon Polly or a custom solution like ElevenLabs or Resemble AI.
- Improve processing speed by exploring real-time speech processing. This could significantly reduce waiting times for callers.
While this proof of concept uses AGI scripts, a potentially better approach may be to create a Stasis application to address these limitations.
Try it out!
If you want to speak to ChatGPT for yourself, just call +31532401205 (Dutch)
To try it out for yourself, check out the GitHub repo containing the code and necessary Asterisk configuration.
Integrating ChatGPT with Asterisk can greatly enhance call management capabilities, from routing calls more efficiently to summarizing conversations for quick reference. By considering the limitations and potential improvements, businesses can explore innovative ways to harness the power of AI in their telephony systems.
What does this mean for Speakup? We will be continuing experimenting with this technology. However we also still have a long way to go before we can release a product that uses this technology. We need to address some challenges such as scalability and data privacy before we can release a product that we truly believe in.