How “programmable” voice builds the future

blogpost Written by: Erik Orbons 07 jan. 2021

Communication solutions of the future are often built rather than bought to meet increasing demand for flexibility. In this post we explore some of the ways in which a Telecom Operator such as Speakup can leverage (REST) APIs to keep up with these developments and provide a smooth experience for the builders. Thereby ensuring that anyone, anywhere, can keep enjoying the perks provided by a carrier grade global communication network.

The story so far

Communication is everywhere, in many forms. While it may not be the oldest form of communication speech has been around for quite a while now. Over the years, starting with the invention of the landline telephone, technology has aided us in talking to each other over larger distances. Fast forward just a few more years, invent the internet along the way, and we see high grade IP-based voice communication networks spanning the entire globe. Do you have a subscription on a telephony service? Chances are high you can call almost any person or business in the world who also has one. This is no small feat, considering the number of telephony subscriptions worldwide easily surpasses the reach of the largest social media network[1] by almost three times[2]. The global telephony network is distributed over many, many vendors each doing their part. A truly distributed communication network. Not to mention some of the other perks: in many countries you have the freedom to choose a vendor, you are not tied to one massive company that controls your communication. In many cases it is possible to switch vendor while holding on to “your” number. Additionally Telephone Companies (Telcos) help make sure that emergency and other critical services are always one call away. Telcos around the world, such as Speakup, work hard to ensure that its customers stay connected and will keep doing so in the future!

Communication? Everywhere!

Society, as well as the technology that supports it, is ever evolving. In time, those that stay behind are left behind. Nowadays voice communication technology is present in more products outside of the realm of the “classical” telecommunication industry (often accompanied by video as well). Various chat applications have developed a voice option. We see voice communication being integrated in digital workspaces, collaboration platforms and many other systems. Furthermore, we see more creativity and a higher demand for flexibility in the way voice technology is integrated in the business, from small to large companies alike. Prime examples are Uber or MS Teams for example.

Complete standard solutions often don’t exist, or come packaged with a lot of other functionality that will never be used but is seemingly there just to confuse you. You may want to integrate with functionality that is not even within the domain of telecommunications. In order to bridge this gap we need to think smaller: communication solutions of the future are built or otherwise assembled by integrating smaller components. Often through creative use of APIs (REST, GraphQL, you name it).

Now, building such solutions isn’t what your typical Telco is particularly good at, nor should they be. You’re building something for a specific business. It has specific needs, requirements and thoughts on how communication should play a role in there and what technology should be used. It is time to look at communication technology as a part of a larger whole that can built by any developer working in any domain! As we will see in the next section, I strongly believe that the Telco can and will play an important role in this development!

Build, rather than buy

We’re talking development right? So let’s say you’re building, rather than buying, your next innovative customer care solution. Tightly integrated with your business, that targets a large audience and you feel that spoken communication is an essential part of it. At this point you may need a few things:

  1. Access to the global telephony network as this is still the best way to ensure that you’re accessible to the largest possible audience. You will need one or more telephone numbers that can be dialed!
  2. Dropped or failing calls, bad audio quality, or the inability to call you at all will dissatisfy your customer. You need an infrastructure that can handle these calls reliably and that is resilient to outages.
  3. You will want to integrate. That means you need to know when someone is calling and have control over the call itself: you may want to play a friendly greeting, do a lookup to see if this person or number has called before, check how busy it is and report the estimated wait time (or encourage a different mode of communication if the lines are really busy), start recording the call or go more fancy and use speech recognition technology to answer simple queries without human intervention. You need to know what’s happening with each call and you need to have programmatic control over it.
  4. At some point you set up a conversation with a human agent. You may want to let the agent know that someone is waiting, who’s calling, what the history of that person is, what the call is about (you’ve used speech technology to ask that before, right?). The agent will need the ability to answer the call, perhaps using a button on the same screen that was used to report all that information?

Let’s do this!

You’re not in the telecommunication business, so you rely on your friendly neighborhood Telco to set you up with the numbers you need and the connectivity, you can now also rely on their “carrier grade” network and resilience. Great! We’ve tackled challenges 1 and 2!

If you look at the classical Telco offering you may now be dealing with something called a “SIP Trunk”: this is what you use to send and receive voice (or video) calls using the Session Initiation Protocol (SIP) and exactly how many VoIP telephone systems worldwide are connected. In fact: Speakup will set you up with such a Trunk in mere minutes so you can enjoy connectivity with anyone anywhere in the world! Or can you? Now you just set up a PBX or SIP server on your end, figure out how to connect your agent’s phones to it, make sure it’s resilient, integrate it with your own software so you can control the call flow (to tackle 3 and 4), deal with several quirks along the way and bang! You’re good to go! Minutes just turned into weeks or months and you’re probably already out there looking for a SaaS offering that doesn’t require you to become an expert in managing a PBX so you can focus on creating value instead.

Reinvent the Telco

That’s a shame really. At Speakup we have the knowledge and the infrastructure to set up, and reliably host, those pesky SIP servers for you. We can provide programmatic access to them. And all the while, we’ve already gotten most of the voice related challenges you’ll face out of the way. What we need is a different, more developer friendly way, for you to access this infrastructure. To accomplish just this we’re currently developing a set of REST APIs, combined with a touch of webhooks and finished off with just a pinch of websockets that allow you to programmatically access low-level functionality of the Speakup voice network, without having to host any components yourself. You access the functionality yourself in a way that is recognizable and understandable by many developers without the need to specialize.

We believe in freedom and flexibility, therefore the first iteration of our APIs will allow access at the lowest possible level while still being convenient to use. Also these APIs stay true to the core of what voice communication is, we don’t want to piggy back any additional functionality or make assumptions on how the voice functionality should be used at a company. That means no concept of users, you’ll also not (yet) find queues, IVR menus or conference boxes in this iteration (these can however be built on top).

What we’re aiming at:

  • The ability to create or answer outbound or inbound “voice channels” through a REST call: a voice connection between the Speakup infrastructure and an external endpoint, for example a physical phone or a call directed/from to the global telephone network.
  • REST calls to control a voice channel, for example: hang up, pause, play an audio file that you provide.
  • Functionality to “bridge” or link existing voice channels together in any combination you require. Either one-way or two-way. Again, using a simple REST call. Once bridged, audio will flow from one channel to the others in the bridge. Allowing participants to hear each other. Use this to set up point-to-point calls or perhaps make a conference call using many participants.
  • We provide a webhook or websocket connection that you can use to “listen for” events that are happening on our side, for example: new voice channel, voice channel ended because the external party hung up, an outbound voice channel has been answered or rejected. These events are important because you’ll never have full control over what happens to a voice channel: there’s always a party involved that is external to the system.
  • In addition to performing audio playback, we also allow you to “listen in” on the audio through the REST interface by instructing the voice API to send a live feed of a voice channel to an HTTP endpoint of your choosing. You control when the feed starts or ends through the API. This is a powerful feature that allows you to do your own call recording or even perform transcriptions of the audio using speech-to-text technology.

While being simple building blocks we believe that these lie at the very core of what our voice network entails, thereby providing you with maximum flexibility. Of course sometimes you don’t need all that flexibility and want to just quickly set up a queue or IVR menu as part of your application. For these situations we envision a set of higher level APIs that are layered on top in the future. Or perhaps you build these yourself? The possibilities are endless!

Building the future, one step at a time

At this time we’re just happily experimenting away by performing small projects both within and outside of the company that use these APIs to build solutions that would not have easily been possible with previous offerings in the Speakup portfolio. All the while learning from feedback, improving and stabilizing the APIs over time. If you have any thoughts, ideas or opportunities of your own surrounding this topic, we would love to hear about them! In future posts we will be diving more into the technical side of things and look at some of the things that have been built and more specifically how they have been built. For now I leave you with a reference to the documentation and specification of the current development version of our voice API so you can have a look for yourself!