This is a write-up of a talk and a hackathon that we did at Cloud Foundry Summit Silicon Valley. You can view my related talk 99 Problems But a Container Ain’t One and the code (along with demo video) here. It’s about making Cloud Foundry “serverless.”
So: Let’s talk about the overlap between Cloud Foundry and Serverless.
“BUT CLOUD FOUNDRY IS A PAAS! IT’S NOT SERVERLESS.”
I’d still like to talk about Serverless — and then let’s come back to Cloud Foundry.
What is Serverless About, Really?
Serverless is a few things. It’s a pricing model, it’s functions-as-a-service, it’s being able to “just push code” and not care about containers and scaling and orchestration and service discovery (and so on). And some of those things are more valuable than others.
But let’s go back a bit further. Fundamentally, building successful distributed systems almost always comes down to managing state. State is bad.
State is Bad
Why is state bad? Because state is where all the hard bits live! Once your app is stateful you can’t scale it by just creating another copy. Once it’s stateful you can’t just upgrade it by standing up a new version. You can’t recover from failure by just throwing the broken instance away and replacing it with a new one. State needs to be backed up, it needs to be persisted, it needs to be carefully protected and looked after.
State is the original sin that turns cattle into pets. State is hard.
The Magic Answer
So, what’s the answer? It seems too easy, but here it is: avoid state. Make as much of your system as possible stateless, and punt the hard bits — the stateful bits — to someone else.
This is the core magic behind Map/Reduce (give me a map function and a reduce function and I’ll give you an entire battle-tested distributed data processing pipeline), it’s the core magic behind Serverless (give me some stateless functions and I’ll give you an entire event-driven fault-tolerant distributed fabric) and it’s the core magic behind Cloud Foundry (give me a stateless web app and I’ll give you an endlessly scalable, health-managed platform-as-a-service to run it on).
Push the problem of state away and the stateless bit suddenly becomes easy (obviously!).
But isn’t this just hiding the problem? We’ll still have to deal with the stateful stuff somewhere, right? Yes. Right. But: it turns out that punting state to the platform means we deal with it once, in a standard way, and that we don’t have to keep dealing with it. By restricting the platform’s UX to map and reduce jobs, or to stateless functions, or to stateless web apps, the platform can allow most people using the platform to simply not worry about orchestration, about packaging, about health management and fault tolerance, about scaling and resilience — really about anything except their code.
Of course we’ll still need to set up, manage and monitor the databases and message queues and other persistent services those stateless things use, but we — and by “we” I mean platform users: developers — don’t pay that cost for every single app; we pay it just for those custom services someone isn’t already providing. And often, in a large organization or a large cloud, we pay it with one team of SREs who are dedicated to that particular system and supporting large numbers of simple stateless apps whose developers and operators don’t need to worry about anything other than code.
PaaS vs Serverless
If the fundamental trick of distributed systems is getting rid of state, and both PaaS and Serverless are based on this trick, then why is Serverless different from PaaS? They both have the same basic idea, but Serverless addresses a set of use cases that PaaS does not: small event-driven functions.
It turns out Serverless is not actually just Serverless. Serverless is Serverless (you don’t see or care about servers, the platform manages that), it’s Containerless (forget about containers and packaging too), it’s Orchestratorless (you don’t need to think about how to scale your functions or make the system fault-tolerant) and it’s also App-less (you don’t need to push the whole app, you push functions and tell the system how to knit them together).
PaaS is Serverless, it’s Containerless, it’s Orchestratorless, but it’s not App-less. PaaS says “Push your app, we’ll do the rest. But push your app”.
PaaS and Serverless differ in another way: triggers. In PaaS we have stateless code which we can invoke and scale up and down in response to requests to a shared routing tier. In Serverless, we have stateless code which we can invoke and scale up and down in response to events on a shared message fabric. In both cases, it’s stateless code and a shared platform, and in both cases, the persistent stuff (aka the hard bit!) happens in services available to the code. In Cloud Foundry’s case, this would be the various services available in the huge service marketplace provided by most Cloud Foundry implementations.
Back to the PaaS
So: back to PaaS. The obvious question is what can PaaS learn from Serverless and vice versa? I would argue that Serverless validates a core assumption behind CF — and PaaS in general: many developers want to just push (stateless) code and delegate the hard stuff to a platform and this is possible. PaaS has been focused on the web app use case, so in Cloud Foundry, after pushing code, you use the “cf map-route” command to bind a route to it in the shared router and the Cloud Foundry Service Catalog to access a whole marketplace of services for dealing with persistent state.
But could we expand the use cases for which we can use Cloud Foundry without complicating the UX too much by also supporting other triggers? How about if we added a “cf map-event” function so you could say “invoke this code when this trigger happens.” We could also supply some simple libraries to allow apps and services in the platform to trigger these events.
Suddenly, with “cf map-route” and “cf map-event”, you’d have the best of both worlds. You’d have all the power of Cloud Foundry’s buildpacks for converting code to containers (including for converting small function snippets into runnable cloud functions), Docker image support for directly pushing pre-packaged functions, the huge ecosystem of services available in the Cloud Foundry Service Catalog for dealing with state, all the access control and the org and space model, and — of course — the Diego scheduler technology behind the scenes which already knows how to do fault tolerance and resilience and scaling for stateless apps.
And, even better, since the majority of things most people write are actually web apps or web services (and since that’s what people are already used to writing) you can start by just pushing your app to the platform (cf push, cf map-route) and then — as you find it useful — pull out individual functions and “map-event” them so they can be invoked and scaled independently. A real joined up solution to a large number of use cases, all within the Cloud Foundry ecosystem.
How Can I Get This?
Well, that’s the bad news.
Right now, this is a proof of concept we have at https://teddyking.github.io/hottopic-release/ — not a real thing you can use today. We’re interested in opinions and feedback and contributions but I wouldn’t yet recommend using this in production. Still, it’s an interesting direction, right?
I’d love to hear your feedback on this. You can find me on Twitter at @doctor_julz.