APIs are like waiters Part II

By Mark Wharton

Let’s start with a brief recap. Using the analogy of ordering fish and chips, I covered basic client-server applications and started to touch on how things could get more complicated when the interaction got beyond just “Cod and chips twice” - “here you are”. I went through ways that the client could become more independent of the server and be released to do things when the server isn’t responding. You wouldn’t want your website to stop working while the server got some results for you, would you? (How we all hate the “twirly gif of death”.)

What I want to talk about in "part deux" is server-to-server interactions where there isn’t really the distinction between who is the client and who is the server. Just because you initiated the conversation doesn’t mean that you are the client. Imagine responding to a push advert for an offer of fish and chips by phoning the chippy. They initiated the conversation, but they are the server. So who are you in this scenario? The whole client/server relationship breaks down. Let’s just talk about autonomous agents working in collaboration with others where the objective is to be able to communicate whilst still working asynchronously so that the interaction between the two agents doesn’t block the activity of either.

Let’s start where we left off, but with a slightly modern twist. Let’s add home delivery to the mix.

Yet more fish and chips. Other dinner options are available.

This updated chip-shop conversation is analogous to the use of webhooks. A webhook is just an endpoint on a web server with a unique address (a Unique Reference Locator or URL). Making up an imaginary one, it might look like.

https://markwharton.com/deliver/food

This allows the “server” (chipshop) to do an HTTP POST to you with a payload (the fish and chips).

Let that sink in a bit. We’ve always had an API on the chip shop to which I can POST my order, but now I have an API that I can get the chip shop to call (deliver to my address aka POST the webhook).

Client/Server vs Agent/Agent (or Server/Server)


Interesting… (Your opinion may differ). This webhook URL might point to an endpoint on a server you host, or it might, in the modern, server-less world, be an “ARN” (Amazon Resource Name) to a lambda function hosted by AWS, where they do all the hosting for you and just make sure your code is called when the lambda function’s ARN is hit. (Other hosting services are available.)

Sorry, not sorry.

Are we all good, now? Well, no. Not really. We’ve all had this real-world experience. Order something online. They say they’re going to deliver between 2 and 5pm, you go out at 4:55 for two minutes and get back to a “We tried to deliver your package” note. In our “home delivery” example above, that’s the equivalent of not answering the doorbell when the delivery arrives. What is the server going to do now? They’re blocked - which breaks our “doesn’t block the activity of either” rule. They could retry, as some parcel delivery services do, or they could dump it in your front garden, take a photo of it and whizz off to their next destination - which some others do. Not so great in the case of fish and chips.

In the above case we can see that there’s going to be a lot of code in the agents that deals with the “not in” scenarios (in the above example) where the other agent doesn’t do what you expect. Now’s the time for the message queue cavalry to come to our rescue.

As we discussed in the previous article, a lot of these API call/response activities can be done via message queues. Imagine the chip shop scenario being done over SMS, for example. SMS is a terrible example as sometimes the messages seem to go to Mars and never get delivered. This is why most message queues offer a “guaranteed delivery” mode where you are sure the recipient has received your message. These message queues act a bit like the post office in that they store your message until they can get an acknowledged delivery. Using a third party like this removes all the queueing and retrying logic from your program whilst enabling asynchronous working.

We’re in quite a good place, now. The two agents can communicate with each other asynchronously, neither waiting for the other to be ready or blocking the other, leaving the complexity of message delivery to the third-party delivery system.

But… are we truly autonomous? The message queue intermediary is all well and good but we still need to discover the services that get us what we want (fish and chips) and to know what format the messages/API calls need to take. When I say “we” here, I don’t mean we humans, I mean we autonomous agents, i.e. computers and their programming (as well as we humans).

While I was writing this article I looked at the Deliveroo and UberEats interfaces and saw them with new eyes. (Other delivery services are available). In our analogy that APIs are like waiters, Deliveroo/UberEats are API gateways. They’re one-stop shops for finding “waiters” (i.e. delivery people) that will get you food. These aggregators demand that restaurants sign up to its terms and conditions in exchange for advertising their wares and organising the delivery bit. This contract is similar to the way that APIs link to an API gateway. In that model, your program calls the gateway, the gateway translates the call into the underlying call to the individual API and does the re-fangling to return you the response.

This would all be great if there was a “Deliveroo for things” or an “UberEats for assets”, but there isn’t. Of course there are API gateways, but they only solve the problem of aggregating a load of APIs. They don’t solve the two fundamental problems of APIs: they are service-focussed, not asset-focussed, and they are rarely discoverable by computers. To illustrate this point any code for UberEats won't work for Deliveroo. Although the semantics are the same at the high level (i.e. order food), the implementations are different and the semantics are embedded in the structure of API. A trivial example of why things might not be compatible:

\<restaurant>\order\<item>

- or -

\order\<restaurant>\<item>

(Embedding the word (text string) “order” in the API is embedding the semantics in the implementation. Not to mention what sequence of tokens you use…)

It’s taken me a long time to get round to my favourite topics: Digital Twins and Semantics. If we re-think this problem using digital twins - i.e. a virtual version of the restaurant, or even the waiter, a lot of the API problems go away. You’re talking directly to the twin which should know everything about the restaurant and encapsulate the static and dynamic data (think Address, Phone number and hygiene rating as well as current occupancy, power consumption and oxygen levels in the air). "Talking directly to the twin" treats the twin like a "nano-service" to that asset. The semantic part is about describing all the data that the twin has so you know (and especially computers can know) what you’re getting (is that oxygen level measured in percentage, parts per million or milligrams per cubic metre)?

Again, we’ve come a long way. Now we have server-to-server communication, API gateways and message queuing systems with guaranteed delivery. That’s quite an achievement, but… we still haven’t reached my goal of autonomous interoperability.

Next
Next

APIs are like waiters - or are they?