Episode #52: The Past, Present, and Future of Serverless with Tim Wagner

June 8, 2020 • 68 minutes

In this episode, Jeremy chats with Tim Wagner about the history behind AWS Lambda, why the stateless versus stateful debate rages on, how to use serverless as a supercomputer, what innovations are still needed, and so much more.

Watch this episode on YouTube:

About Tim Wagner

Tim Wagner is known for starting the serverless movement with the original business plan for AWS Lambda, and served as general manager for three of their central serverless offerings: Lambda, API Gateway, and the Serverless Application Repository. After AWS, Tim helped lead another bleeding-edge movement, driving forward blockchain innovation as the VP of Engineering at the digital currency exchange platform Coinbase. Tim is currently working on a new stealth startup, Vendia, with more information to come on June 26th.Watch this episode on YouTube: https://youtu.be/M6I0ay5R884

Transcript

Jeremy: Hi everyone. I'm Jeremy Daly and this is Serverless Chats today. I'm chatting with Tim Wagner. Hey Tim. Thanks for being here.

Tim: My pleasure. Thanks so much for having me.

Jeremy: So you have a lot of history. There's a lot of stuff that we're going to get into today, but right now you are the CEO and the cofounder of Vendia. So I'd love it if you could tell the listeners a little bit about your background, your history, and then what Vendia is all about.

Tim: Sure, sure. So last few jobs here. I mean, I started what eventually became AWS Lambda at AWS. Joined there back in 2012, we launched that in 2014. And that taught me a ton, not just about how to run a business in the cloud, but also about how you build these massive horizontally scalable cloud services. Then I spent some time down here in San Francisco at Coinbase, a US-based cryptocurrency exchange. And I learned a lot about a different kind of scale, which is how you run these massively scaled ledgers that can hold really important information, for example like somebody's bank account. And then Vendia is in some sense kind of the combination of these two things.

I took everything that I've learned over the last seven years and my cofounders Shruthi Rao and I have brought that together to create a business to help companies break down some of the data silo and information exchange problems that they've got today. So we're still in stealth mode for a few more weeks, but I can tell you a couple of things about it. For one, when I sold AWS Lambda, customers were always excited about the product, but they also always had two concerns. First, it was an inherently proprietary technology specific to AWS. And then secondly, while it was this awesome solution for compute, it didn't kind of come preset for data solutions or a solution for state. And so with Vendia, we're trying to reimagine how companies can go serverless and then at the same time solve some of the biggest baddest challenges they've got around data silos and vendor lock in at the same time. By the way, speaking of serverless, Vendia's also proudly server and container free.

Jeremy: Awesome. So that's awesome first of all, and I'm excited for Vendia. I really am interested. Anything that you do is just gold. So I think that this is going to be pretty exciting and I can't wait for it to come out. But what I'd really like to do today since I have you, I mean, for all intents and purposes and I think you always say this lovingly, but you're really the father of serverless, right? I mean, Lambda is what kicked off this whole thing. And I know that there were other companies that this sort of like a fast type thing, but not anywhere near to the scale that that Lambda did. And I would love to hear that story. As a fan of serverless, as a fan of AWS Lambda, could we go back to the beginning and just maybe give me a little, some insights into how this all started?

Tim: So a little bit of the Lambda origin story, huh?

Jeremy: Yes. Please.

Tim: Yeah. So we roll back the clock. It's 2012, I get hired into AWS and it's my first day there. And my boss Alyssa Henry, who at that time is running all of storage, so S3, EBS, like the whole storage division for AWS sits me down at lunch and says, "Okay, Tim, so here's the deal. We heard from customers that they love S3. It's simple, it's easy to use. It's a different kind of way of thinking about the cloud. They love all of that, but it's just a storage solution, right? There's no way to ... Let's say you store an image, there is no way to make a thumbnail of it. You pull out a compressed file, there's no easy way to decompress it on the fly plus the other million things developers might want to do with the stuff that they're storing in here.

So they've told us this in customer advisory meetings and one on ones, see if he can do something with that. Okay. I'm busy, got to run. Good luck." So this is day one for me at AWS. This is literally my very first conversation coming out of the sort of the onboarding and signing up all the paperwork. So I'm like, "Okay, grow a business in the cloud. Make it easy and think about S3 as a kind of inspiration." And it's funny because a lot of people think that Lambda grew out of EC2 and it's obviously a natural extension of thinking about compute in the cloud, but it really came out of the S3 organization. And it was this kind of kissing cousin to the idea of making storage super simple. Back then S3 basically did PUT, GET and LIST. That was it.

And so the idea is what is the ... this is sort of the remit that we had. What is PUT, GET, LIST for compute? What does that ... What if you could just say run or what became invoke in the cloud and you could make a service like that? So we got started. We did, I think as Amazon is famous for doing, we worked back from customers. I did just dozens and dozens of calls with some of the folks who were some of the biggest and frankly some of the smallest AWS customers at the time. And we asked them, "How would you like this to work? What would you want it to do?" And we went through lots of, as anything finding product market fit, the false starts. At one point we thought maybe this is like a scripting service. It should be a scripting language. We could call it Amazon simple scripting service. And then we realized the acronym maybe didn't work the best for that.

So from domain specific imagery stuff to scripting, to finally landing on, no, really the challenge here is make compute simple. Then we realized we were onto something when we realized that the first million developers using AWS are not the ... They're not the next 10 million developers. We had to make the cloud as easy for someone who does applications and business logic as it is for someone with a PhD in distributed systems. And that's when we realized like there was some there, there. And so we got excited about that. We came up with this idea for event hookup and we were kind of off to the races.

Jeremy: Awesome. So I love that. And now obviously you mentioned product market fit, so there's no way you got this thing right on the first shot. Right. You must've had to go through a million different iterations. So what did you get right and what did you get wrong?

Tim: Yeah. It is funny like you think where's the crystal ball clear and where was it maybe a little bit muddy here? I think one of the things we got right and I say this without ego, I mean, because this was a lot of us working hard on this was the event piece of this. We realized that there's a lot that you can do to make asynchronous event generation and handling really easy. It's a super powerful paradigm. And if you look at the stuff in Lambda that's probably been the most, some of the first things that accompany adopts around things like cron jobs and simple events coming out of S3 and also sort of where Lambda's got a lot of its scale and initial success, a lot of it has been around those asynchronous and event handling mechanisms.

And obviously AWS has continued to double down on that with services like event hub that make that even easier to do. So I think then on the what did we get right, easier way to compute events. The idea of making it multi-lingual not tying it to a single language or necessarily a single paradigm. So making it as broad as possible. Things that we got wrong. Well, I've told this story before. I remember sitting in Andy Jassy's conference room. And for those of you who haven't ever worked at AWS, Andy's conference room was called "the chop." So different conversation about why it's called the chop, but it's called the chop. And so when you talk about going to the chop, it's this big thing. Andy Jassy, all his directs are there. It's this high pressure environment.

And I remember sitting in the chop and the guy who was running sales at the time asks me, so he was like, "I got to sell this crap you're about to make here dude. So I got to know what's it good for and what's it not good for? Tell me something a customer will never do with it." And I'm like, "Oh Adam, no one's ever going to use this for video transcoding. That kind of dense compute, we'll never get that. We'll never have that with Lambda." So of course a cloud guru, has been up on stage at the Serverlessconf talking about how fantastic Lambda is for doing video transcoding.

One of the, in fact, the fastest known algorithm for video transcoding beating Google and all other kind of in practice mechanisms is based on Lambda. Some great research out of UCSD and other places. And so this is a good example of getting it wrong, where we thought it was going to do one thing and in fact, the developers showed us that it could be so much more and really just a much, much broader set of use cases than we had ever imagined.

Jeremy: Yeah. And I know you've been away from AWS for a while now, but during those early years of Lambda, were there ... I mean, obviously you're rolling this thing out. There are people adopting it, the adoption curve has been somewhat slow. I mean, think it's sped up now, but like were there missed opportunities early on? Are there things you could have done better you think that maybe would have sped up that adoption?

Tim: Yeah. It's a great question. And one of the hard things to balance, I mean, certainly I'm encountering this again with Vendia is how do you blend the top-down and the bottom-up, right? Your fastest path to revenue is picking a few very large enterprises and trying to sell them something and your best path over the long haul to a broad successful adoption in anything IT or developer related is to get millions and millions of developers to love an experience. And so of course the best of all is when you can do both of these things, but that takes time. It takes time and energy. And I know one of the things we wrestled with in our first couple of years here was how do you balance those trade offs?

Every minute you spend on evangelism and developer education and docs and ease of use features is a minute that you're not spending helping a John Deere or a Nike or a Nordstrom or somebody else become incredibly successful at making their business soar. And so that trade off was tricky. And I think in our first couple of years we had some missteps there in terms of trying to figure out how to blend those kinds of activities and it took a little while. I mean, the other practical reality is that the more innovative something is, the more different it is. And the more different it is, the harder it is to get people to understand it, adopt it, integrate it. And I think you're still seeing some of that. Containers are a small step away from ... they're baby servers, right? So they're an incremental and organic step away from what people were already doing with serverless and more broadly with managed services.

We were asking people to forget everything they've learned about the cloud and to some degree about backend software development and start all over again. And some of the things we screwed up. We were slow to even just adopt the word. And so we had what I now call the Voldemort problem. We had a thing we couldn't name. So people would say, "What do you use Lambda for?" And now we would say to build serverless applications, right. But at the time we were trying to say, "Well, to build applications that use events to do stuff which is simple, but it's cool." Hence the Voldemort problem. And so once we allowed ourselves to start using the word, and I'm not going to defend serverless as the best term, but at least it is a term. I'll tell you one of the hardest things to do as a business owner is sell something that you're not allowed to actually name. So I learned my lesson with that. Won't repeat that particular mistake in the future.

Jeremy: Right. Yeah, no, definitely. So I guess maybe a question I have for you too, and this is something now that you're away from AWS maybe you can answer this. I get what you said. You have to sort of focus on these enterprise customers, right? The enterprise customers are important. They're the ones who pay the bills. But that broader adoption, that sort of ground swell, right. The developers figuring out a better way to do something, that's why all these frameworks, that's why all these JavaScript frameworks become so popular because you get all these developers using them. I mean, is that something like with Lambda early on, were you really pushing that towards just enterprise customers? Or was that something where you thought this could be like a ground up approach?

Tim: Yeah, it's a great question. And I think one of the things that we at AWS at the time really dragged our feet on and to ill effect was coming up with some of these frameworks. And look, great kudos to the serverless framework guys and Austin and others there for even stepping in and doing that. There are tools now, I mean, Stackery has done a great job of making serverless I think consumable by the enterprise, something that we kind of miss. Look, AWS is fantastic at focusing on things like the availability, right? The nines of the service, latency, jitter. These key kind of golden, what people would call the golden metrics of a service.

It lives and breathes that one of the most important things you do in the course of a week at AWS is you go to the ops meeting. And the ops meeting is where you show your dirty laundry, you learn from your mistakes. You reveal your metrics to your colleagues, right? You hold yourself accountable and these are the things that you focus on. And all of that's amazing. But there's no equivalent of, there's no ease of use meeting every week at AWS. Right. And so the idea of helping developers be productive, of making things simple and making them consumable. When you started with things that were just infrastructure, that wasn't really necessary. But as AWS moved up the stack into these managed services, it's had to learn that that's actually a big piece of the equation.

Something that Microsoft has known for years, right? And sort of great job in actually helping developers not just know of something or keeping something up and running, but helping them actually figure out how to use it. And so that's a systemic learning for AWS as a whole. And I'll certainly say like in terms of being vocally self critical, I didn't get that right either at first. And so we waited way too long to do things like Sam. We didn't put enough wood behind some of those arrows. And so I think we kind of left the community to sort it out. And you still see the aftereffects of that. I still talk to people who say, "I don't know how to deploy it. It doesn't really fit into my CICD pipeline. It seems simple to run, but it's not simple to kind of build and test and operate in the same way that other things are." And so the fact that somebody could find Kubernetes easier to deploy than a Lambda is-

Jeremy: It's kind of scary.

Tim: It's unfortunate and a bit of an indictment that the tooling and especially some of the kind of the broader enterprise usage patterns weren't first and foremost in our thinking when we brought this to market originally.

Jeremy: Yeah, I totally agree. Because I mean, I think that's one of the things that has been the biggest complaint that I hear is just this lack of I guess coordination or organization where you can deploy it with the serverless framework, which is great. You can deploy with Stackery now, but back before it was like Cloud Formation. It was using Terraform. I mean, even when serverless framework came around, that made it a lot easier, but there are still people who write blog posts about how they write this custom deployment script that generates a cloud formation or something like that, or uploads them manually and triggers an API. I mean, and certainly those are all valid ways to do it. It's just seems like there wasn't a way that was put into place early on that would have been really helpful to build off of that, as opposed to like you said letting the community kind of figure it out on its own.

Tim: Yeah. And some of this was a learning curve for us at AWS too, right? In terms of understanding. Because if you thought of a Lambda as something that you hooked up to an S3 bucket, then maybe it didn't need a whole kind of development paradigm or CICD pipeline mechanism or application construction framework around it. And it quickly grew to be obviously so much more than that. And I think had we known how far and how fast it was going to go back in the day, we would have given more credence to the idea that we need our own ... we need a framework here and we need client side support for this.

We also had a little bit of that AWS-itis where you're like look, if the service is great, people will do whatever they need to do. And we didn't realize, we didn't think hard enough about the fact that, hey, if that's hard or even if there just isn't a simple way of doing it, it's going to actually make the service difficult to consume because it's not the least common denominator plugin piece of infrastructure here. It's something that is very, very different in that regard.

Jeremy: Alright. Missed opportunities aside, the past is the past. What we've gotten to now is an absolutely amazing ecosystem that allows people to build applications without thinking too much about the infrastructure. You still got to think about it a little bit, but for the most part, all of these amazing tools. So where are we now? Like where, you said what? It was 2014 when it was in preview, went live in 2015, right? So it's been over five years. Where are we now with serverless?

Tim: Yeah, I call it the ... I always say like we're in the terrible teens now. It's far enough along, it's no longer an infant. It's obviously become something that millions of developers are using, that the majority of the fortune 500 have some kind of serverless technology or solution in place from some cloud vendor or another. So obviously in that sense, it's been a remarkably successful introduction of a net new technology and paradigm. On the flip side of that, look, you can see the shape of the adult that it's going to become, but it's not adult in all ways yet. So I'll take an example here, a DTCC, fantastic example. So the US financial system has a lot of safeguards in place, and one of them is that it has to be possible if something happens on the East coast, if there's let's say a flood or something in New York that you can keep the stock exchange and other kinds of key financial capabilities up and running.

And so that means you have to have capacity that you know you can get to in another region. This is the kind of thing that's really tricky to do and the way this would have been done kind of formerly with servers is you just point to them. You're like, "Okay, look, we got a thousand servers. They're sitting in us West too, we're good to go. They're RIs or DIs or something on AWS or the equivalent on another cloud. But we couldn't really do that with the Lambdas, right? There was no way to say, "Well, these are your Lambdas, right?" Because Lambdas aren't capacity. And so things like the provision capacity feature now, it's not just a response to developer needs. It's also the kind of thing that makes enterprises able to deliver these regulatory compliant capabilities.

And so it's a great example of what I call kind of serverless growing up. All of a sudden, key economic and financial mechanisms that power, not just the US but the global economy can run on Lambda. And that's a huge step forward and very, very different from where we were back in 2014 when we released this as a kind of simple scripting mechanism to thumbnail images coming into S3. So really, really, really real game changers there.

Jeremy: Yeah. And I think that you're right about the enterprises showing up, right? Like finally you see more stories. I mean, it's just like Liberty Mutual and Lego. And I mean, just so many of these stories now that are fascinating of them like rapidly moving to only serverless or as much serverless as they possibly can. So the other thing though I think that's interesting, and this is something that was sparked. It was sort of like almost like an arms race or a space race where it's like who can develop serverless better or do more serverless things? And you got a lot of the big ones in there.

So you've got Microsoft obviously doing it, and you've got IBM taking over Apache OpenWhisk and you've got Google in there. But then you have all these other like sort of fringe edge providers, like the Cloudflares and the Fastlys. So this has created a whole new sort of ecosystem. So what are your thoughts on like how is that driving maybe the complexity or maybe the, I guess, the confusion or the adoption? I don't know what the right way to say that is, but it's like the wild West.

Tim: Yeah. Or the terrible teens. Right. I mean, it's tricky. I mean, I actually think a lot of those things are positive. It's one of things I've said before like Google for example is doing a really bang up job of thinking about the customer use cases. And if you kind of position them, the different cloud providers have taken very different tacks here. AWS, we asked the question, "If we just let go of everything that we've ever done or ever known in the cloud and made something completely new, how should it work? What would it do? How could it best serve developer needs?" Right. And that's an interesting question to go answer. I think Google has asked a very different question. They've said, "What is the kind of minimal risk, maximum insurance solution we could give people that adds some value over where they are today?"

And you see things like Cloud Run, which is a relatively narrow technology, but an insanely useful one. They've taken this really important use case of building a stateless front end and they've gone out there and nailed it. And in some ways it's thematic with what they've done with their app engine, with Anthos and others and Knative. They've said, "How do you get some additional value in the world that you're already in today?" And that world might be on-prem for example, or it might be an existing monolithic application or it might be a container. And so they haven't stepped as ... They haven't gone nearly as far or stepped nearly as aggressively as AWS has, but arguably they're giving a lot of people a lot of value, even though it's perhaps not as far away from that.

And then you get folks like Cloudflare who I think are, obviously they're building what they do best here, but it's amazing. They've taken this challenge of how do you do compute on the edge, even if it's a stripped down modest kind of compute. But making it almost ubiquitous so that literally kind of in line with every HTTP call. It's like if every HTTP call in the world could be scripted, what would that look like? And Cloudflare is doing a fantastic job of making that a reality. I'm envious. We wanted to that with Lambda@Edge and I would say we never quite got there with that product for all that it does some super useful things, but it's not that kind of ubiquitous inline everywhere on every edge cell that Cloudflare has created.

And I'm really impressed by what those folks have done. In fact, AWS if you're listening, think about this for your edge devices. You don't want to run a thousand EC2s in every Verizon pod sitting up on the street and on the street corner, what I want is something that works like Cloudflare's. So I think they're serving a useful role as challengers here and for customers who have that particular need, just like with the Google Cloud Run, I think it's a really nice product.

Jeremy: Yeah. So another thing I think that we're, or where we're at a point with Lambda and with serverless in general is we've got all these frameworks, we've got a lot of tools now. AWS has built a whole bunch of tools in that help with deployment and things like that, but you're still either using APIs or in many cases configuration files. So that undifferentiated heavy lifting, a lot of that's gone. I don't have to write my own queue anymore. I don't have to manage my own database, but I still got to connect those. And the only way to do that is with configuration and often YAML files. Right. So is that still a friction that you see hindering or slowing down innovation or is it something that you think that just needs to be abstracted away at some point?

Tim: Some of this is definitely a consequence. It's the square peg in a round hole kind of challenge, right? The irony of course is that serverless was supposed to make the cloud easier to use, but when you take an existing tool and you try to reapply that, sometimes it can actually make things harder. And a lot of the CICD mechanisms out there, things like CloudFormation was designed to take a bunch of servers, configure them in a particular way and put them in an environment that would allow them to run and get something done. That's a very different problem from saying, "I want to build an application out of fully managed components, and I want to wire it up, ensure that it has least privilege, make it a femoral so I can stand it up and tear it down again."

I'll just give a little anecdote. In building, so Vendia is not just serverless in its kind of runtime, it's also serverless in its CICD deployment. So what I've done is I've taken the AWS CDK. For those who haven't used that, think cloud formation, but like turn into Python or JavaScript node form. So you can programmatically construct things. And so I essentially wrapped a compiler around it, which makes it really easy for me to stand up our code base and create as many different test cases or production deployments in parallel as I need. It's great in that I was able to accomplish that. As one guy here, I could do a prototype of something that would normally have taken a team of 10 people to do, thanks to tools like the CDK. The flip side of that, like I had to go build a compiler around the CDK to make this really easy to use.

And even with all of that technology, this took a lot of work and a lot of energy. So I think there were folks here. I'll put a plug in for Stackery. I think they've done a fantastic job of helping people find a very different and much easier way of constructing a serverless application. And then starting with CloudFormation, even if CloudFormation kind of sits in the background of that, just as it does for the CDK and others. But treats it more like the assembly language of the cloud. And I think Reed Hastings said this best back in AWS reinvent in like 2011 or 2012, where we're at the assembly language level. I think like with some of these tools, we've come up to maybe the C level. We're not quite up to Python and any other language here yet, we're getting there.

Jeremy: Well, you mentioned CICD and it's funny because I have this running in my newsletter that I write every week that it essentially always calls out like it's another week, another custom CICD process for serverless. Because it seems like every time I see a CICD process, it is written differently. And I know AWS has now put into place, obviously they have code build and code pipeline. They've added more features to that. You have the amplified console, which will do CICD for you. Plus they do like these bootstrap templates now with Lambda applications where you can set those up. But that's the thing too, like just getting through that process, implementing CICD in serverless is just, it's not easy.

Tim: It's not. And look, I think we kind of went from let a thousand flowers bloom to the problem of tyranny of choice here, right? Where just as you say, just keeping up with the set of ... in the space of options can be problematic. And I know, when I was at Coinbase for example, we went back and forth on this. We ended up writing some of our own custom stuff to make this work. At Vendia, I essentially, having tried with some of my serverless networking pieces to use off the shelf stuff, I ended up doing my own custom CICD and I feel the pain because it's challenging. The flip side of this is if we get it right, it's also amazing. And this is the piece that I also want people to like have this takeaway here, right?

It's not just that whacking this into sort of old school tools is difficult, but part of the reason you see people experimenting is because of the incredible potential. The ability to run a thousand different tests. To literally stand up your production infrastructure, not a stage, not a strip down developer workflow, not something that runs in some kind of wacky emulation mode on my local machine. But to literally in real time run a thousand different tests in production, in a real production environment, in the cloud, at scale, with all of the capabilities that they will actually have in production and then just as easily tear them all down five minutes later is unbelievable. Right? You never do that with servers. It's way too expensive. It's way too hard. You'd have a team of 200 working on this for years, right? Nobody does that.

And that is the great opportunity of managed services. It's also the thing that is furthest away from what the existing tool sets are capable of giving us. And I think one of the reasons you see people experimenting here. It's not just that they want CICD and deployment and testing to match the simplicity of the underlying services, it's also that the incredible opportunity here isn't fully exposed by a CloudFormation or some of the other tools that are out there. I'd say the CDK makes it possible. You can now do things like put a for loop around your deployments, which lets you do incredible stuff, but only if you're willing to write the code for it. And so I think that's where we are with this. The kind of that golden future Nirvana where all of this is not just possible but easy, we haven't quite gotten to that yet.

Jeremy: Right. Yeah. And I love some of the next generation tools too that are like sort of working on this stuff. And I mean, I'd like to say they're getting it right, but I still am not quite sure what right is yet either. That's something which is part of the sort of the conversation. And actually speaking of that, I'd love to move on to this thing that I don't know if we've got it right or we've got it wrong. But that's this idea of state in Lambda functions or state in serverless or I guess any type of FaaS. So state versus stateless. I love stateless because I feel like it gives me a lot of control it. I know it's like pure functions with a functional programming language. Right. Just feel like you're not bound by the state or your applications don't get confused by that state. And so I really do like the stateless aspect of a Lambda function. On the other hand, lots of applications need state. So where are we with this?

Tim: Yeah. I mean, I think in some ways this is sort of the, I call this the great philosophical debate of our time. Right. And look, we talked about the origin story. The original concept was motivated by 12 factor app design and so forth, was separation of concerns. Let S3 be an amazing storage service, let Lambda be an amazing compute service, like let Dynamo be an awesome NoSQL database. And then there's this cool thing called the internet and network cables, right? Wire it all up and put it together and it'll be awesome. And I think in some cases that is exactly how it plays out. I mean, if you've got a relatively simple use case where you want to store something in S3, you trigger Lambda function, operate on that thing, maybe put it back again, rock on, few lines of code, almost hard to imagine how you make that thing much simpler.

I mean, CICD aside because we've discussed, right? I think we've squeezed out about as much of the cost complexity and so forth of that and we've transferred as much of the operational hassle back to AWS on that as it's probably possible to do. So I think that part's all great. Where this gets tricky though is as you say, lots of applications have state and even sometimes that's macro state, sometimes it's micro state. One of the big things that's tough about using Lambda with Kinesis for all the hard work that has gone into making that integration soar, but it's still the case that there's no state there, which means there's no affinity, which means you can't do something simple like just add up the values on a particular channel within that broader data stream because you never know which Lambda function is going to get it.

And so you end up doing things like copying it into Dynamo in the back out again or something, which is pretty strange. Or taking one persistence mechanism and then copying it to another mechanism, rendering it to disc, going back again. It's a huge waste of money, of time, of opportunity to do something that could obviously be done simpler. And there's a good place where state is meaningful, it matters. And we obviously haven't quite nailed the way that that gets put together. I also think it's just confusing. The other thing about state and serverless is, look, I had actually had this conversation with a developer and this guy came up to me and said, he's like, "I looked at Lambda. It looks simple. It looks really cool, but my code has variables and I store stuff in those variables. So I don't think I can use it because they say it's stateless."

So look, we can chuckle at that a little bit, but it is a good example of folks who have a hard time understanding what does stateless mean here. Because it's obviously it's got memory, it's got disc, right. It can hook up to things like Dynamo. Making that easy and approachable is something that I don't think we got quite right. And Sam has tried to make that easier. And obviously there's a lot of education out there on serverless design patterns. But one of the things that is really tricky is it's still not easy to hook up Redis to Lambda. The standard mechanism of storing, of durable but not persistent state. I mean, the thing that everybody uses to build their like every B2C application out there, right. And it is one of the hardest things to do with a Lambda.

And so I think that's a good example and Lambda is not specific here. Azure Functions, Google Functions all the same. So here's a good example where I think the challenge to the cloud service providers is make the practical kinds of state easy to do, whether that's Redis integration, conventional file system integration. Like I love S3, but sometimes you really just want to a Linux file system hooked up. And we still don't ... EFS from AWS for example, it's like you have an infinite disc drive and then in Lambda you have an infinite computer, but then it's like there's a wall in between them. The cable hasn't quite connected on the floor there. So I think there are some of those pieces were we to kind of get them together would give developers just a phenomenally better, easier, more tractable way to handle some of the problems that they have of writing practical applications. And we can call that state.

Jeremy: Right. Yeah. I mean, and I think one of the things that I love about serverless and the fact that it has been stateless, and you're right, stateless in this sort of context does not mean that there is no state at all. I mean, obviously you can use variables and all that stuff. And it's very simple. Like you said, call DynamoDB, rehydrate some object or whatever it is that you're working on. I mean, all of that is very possible in there. And I like the fact that that kind of forces you to think a different way about how you build your applications because it also helps when you start thinking about scale. Because I think a lot of people who build stateful applications are not thinking about scale.

So there are still going to be people that do that. And as much as I would love to see us just change the mindset to say anything that you build in a stateful manner you could probably build in a stateless manner, there are still going to be people who are going to want to do that stateful stuff. So does serverless, if it doesn't get there, if it doesn't add that statefulness, is that going to hinder it from becoming sort of the default paradigm for building applications?

Tim: I have these two strong reactions to that statement, right? One of them is I would say in some ways the most successful thing Lambda has done is to challenge thinking, right? To get people to say, do you really need a server stood up, turned on taking 20 minutes to fire up with a bazillion libraries on it and then you have to keep that thing alive and in perfect condition for its entire life cycle in order to get something done in terms of a practical enterprise application? And challenging that assumption is one of the most exciting, important and successful things that I think Lambda and other serverless offerings have accomplished in our industry. The flip side to this is to be useful, sometimes you have to be practical. And it's equally true that you can't walk up to an enterprise and say, "All right, step one, let's throw all your stuff away and then step two, you're not going to get past step one."

It's funny, we talk about greenfields, brownfields, it's all brown in the enterprise. Even if you write a net new Lambda function, it's running against existing storage, existing data, existing APIs, whatever that is. Nothing is ever completely de novo. And so I think to be successful and be as adopted as possible in the long run, serverless offerings are going to also have to be, they're going to have to be flexible. And I think you see this with things like provision capacity. I mean, when I was at Lambda still, we had long painful debates about is this the right thing to do? And for understandable reasons, because it is less stateless. It took the ... it's obviously optional. We don't force anyone to use it. But by doing it, it makes Lambda look more like a conventional, well, server container, conventional application approach because there is this piece that is a little bit stateful now.

And I think the arc here is for the serverless offerings to not lose their way, to find this kind of middle ground that is useful enough to the enterprises that still challenges assumptions that gets people to write stuff in a way that is better than what came before and doesn't pander completely to just make it feel like a server. But is also practical and helps enterprises get their job done instead of just telling them that ... because just sermonizing to them is also not the right way to do it.

Jeremy: Right. Yeah. And I also wonder too, I mean you mentioned Cloud Run earlier, which I think Cloud Run is an engineering marvel. I mean, it's probably not that complex, but it really is... I really like what they did there to make, to basically take something that wasn't serverless, a container and then give it those characteristics. And I feel like you have that bleeding back and forth between those. And obviously you've got Fargate with AWS and they call it serverless containers and that, I don't know how I feel about that. Right. Because does it put you ... If we blur the lines too much, does that lose? I mean, do we redefine what serverless is? Does that even matter? I mean, what are your thoughts on that?

Tim: Well, look, and I say this with love because I want things like Lambda and serverless, sort of fully serverless apps to be super successful. But if all that we accomplished was to help the cloud providers make things like containers have fewer infrastructure artifacts, fewer things to have to set up and configure, less painful maintenance and deployment and security overhead so people could get the jobs done faster, that would still have been a success. And I think to the extent that we can also sort of challenge the dominant paradigm as it were and get developers to build applications that are easy, fast and fun, all the better. So I think it's all good.

I think it's also the case that some of these are interim states and some of these are end states. And one of the ways I've talked to people about this in the past is when you write an application as maybe you write it on a server and you think like okay, well, at some point I'm going to containerize that. And then you containerize it at some point you think, you know I should really be running this on whatever, Google Cloud Run or perhaps Fargate for my application and my cloud choice because I'm doing too much low level. Like I'm still responsible for keeping that underlying piece of infrastructure alive and running, and that's just kind of a waste of my time and energy. AWS or Google or Microsoft can do that better than I can.

So there's a sense of progression. But once you've built something out of a set of managed services, you're done. That's the sort of the end of that state machine, right? It's kind of the final, final. It's the end game. And this is something that I think is going to take us a while to get to. I mean, we will have as an industry, you always iterate organically. That's why, I mean, not everybody's even in the cloud yet today. So you can see those trend lines and you can see where that's going. And I think realistically as providers, the CSPs are going to have to do both.

They're going to have to provide people the organic incremental steps that help them do something a little better than they did yesterday. And they have to work on this thing, which is: what is the ultimate end game if you could get everything to be the way that you wanted? And those two things are going to run in parallel, which is why like it's never an either or. It's not like one's going to win, one's going to lose in the same way that VMs are still around and will be for certainly for our lifetimes.

Jeremy: So you mentioned this terrible teens idea and obviously innovation in serverless is not done. There is a lot more that we can do. We have to figure out this blurred line, we have to figure out maybe networking and some of these things. So you have a blog post that you put together a couple of weeks ago. So which I thought was great. It was like a 2020 re:Invent wishlist for AWS Lambda or for serverless. And we've talked about a couple of these things, but there's a whole bunch of them and I'll put the link in the show notes because I encourage people to go and look at this if only to figure out what is missing. Because I think a lot of people don't even know what's missing until they read your post and be they'll be like, "Wow, I didn't know you couldn't do that or that it wasn't part of it."

But I'd like to go through a couple of these because I think maybe the more interesting ones, at least interesting to me, because I think that these sort of strike a chord at least with me in terms of things that you know you need or we know we need in order for, like you said earlier, like this to become just the way we build applications. So the first one was this idea of doing one millisecond duration granularity. What's that about?

Tim: Yeah. So look, to understand this one you have to also remember the context too. When we were putting Lambda together, so say 2012, 2013, you couldn't have an EC2 instance on for less than an hour. And most companies, most of the time were still provisioning on-prem servers that they had to go buy and stand up. And it was usually a month lead time to get new hardware into those racked and stacked and stood up. And so the idea of 200 milliseconds, 100 milliseconds, I mean that just that blew people's minds. It was just astonishing. In a world where you can run an EC2 instance of or a Fargate instance or something else for as little as a minute of duration, however, and where people are using languages like go that might do something useful in just a handful of single digit milliseconds.

And frankly also as hardware continues, I mean, the whole sort of efficiency curve there has slowed down a lot. But still even from Lambda's incarnation to where it is today, the hardware has gotten much faster that it runs on, networks have gotten faster. And so the idea of running maybe an application that takes two or three milliseconds per call, and then spending a hundred milliseconds starts to look like exactly the sort of waste that Lambda was designed to get rid of. One of the most successful things about Lambda and serverless in general is that it collapses this cost structure. Companies, enterprises famously, analysts will tell you maybe 10% utilization. So they radically overspend on the amount of hardware capacity that they need and serverless helps them get rid of that.

But there are still these other forms of waste in the system and this is a good one, right? Where it just is impossible for something is very, very fast, and it opens up a new set of ... If you can do that, it also opens up a new set of applications. Because if you have something especially that fits on a front end that maybe takes two or three milliseconds to run, you're probably not going to do that on a Lambda today. With the improvements that the Lambda team has made with the speeding up the latency, running on the new firecracker architecture, it is also possible now to write these very low latency applications. And I think part and parcel of that is being built fairly for those low-latency applications.

Jeremy: Right. And I totally agree, because the last time I was in Seattle and I was talking to one of the Lambda PMs. Like, what would you like to see? I said, "Lower units of billing for it because a hundred milliseconds is just crazy." Now, when you think about it, it never used to be like you said, but now it just seems crazy. I mean, even if you did, it had to be at least 50 milliseconds at least that, and then it was per millisecond after that or something like that. I mean, even that would be better than what it currently is. Again, a hundred milliseconds is still not very much time. So it's still pretty amazing, but I'm totally with you on that. The other one, this is funny because I know you have some history with this, is EFS integration with Lambda because Fargate has it now.

Tim: Yeah. Look, and I say this like I'm not part of the team anymore. I have no special insight into this, but I can certainly speak to the need for it. One of the things that's really challenging is especially in a world where people want to try to do things like ML training or running kind of some of those ML outcomes off larger data sets, where they want to be able to start to use, think of some of these applications where you do want to use lots of Lambdas to process data in parallel. So you might want to work on a large dataset. As we start to expose Lambda to data scientists and there's a whole other conversation about how you do that. Eric Jonas has written this fantastic paper, this computing for the 99% all about how that should be a focus and a priority.

As you start to move in that direction though, you've got to make it really easy for the compute to line up with these large data sets. And one of the challenges with the S3 model is that you've got to pull them all into memory on data and then kind of shove them all, or into Lambda and shove them all back out again. And so this is where I think if we can get a ... it's going to be the ultimate compute meets the ultimate storage here, right? Like you put it together and you get, it is the mainframe of our day. And I don't mean that as a slur. I mean, that as the highest possible compliment. It is the processing engine that emulates what a supercomputer can do, but only if you can plug these pieces together. So I'm really excited for this. I think it opens up a lot of applications and frankly, it makes some of the stuff that's just really hard to do today like managing that small amount of slash temp space effectively. Those problems, if not go away, at least they get a whole lot easier.

Jeremy: Yes, definitely. Yeah, I think the use cases with that are, it just opens up a whole new world of things that you can do. Because I know I've worked with large files and with S3, you're always streaming data. And then you can't, like you said, it's hard to ... just that little bit of state would be nice in those sorts of computing situations. So another one, and this is funny because this is again personal to me. I give a lot of talks. And oftentimes when I give talks, one of my new ones is how to fail with serverless, which is all about the failure modes in the cloud. And one of those things that I find that I end up introducing this term to people. And this is not a term, I think if you're a computer scientist or maybe a more traditional background, you would know this term.

But I think there's a lot of people, especially front end developers, people getting into serverless that aren't quite as technical on that level and nothing against their level of technicality, but is this word idempotency. A lot of people, I don't think know what that means and also don't realize the impact of it when you are using events in serverless applications, because one of the things on your list was this idea of idempotency protection. So I'd love it if you could explain what your solution to that is.

Tim: Yeah. This is a ... Look, if there is an unfortunate garden path in Lambda, this is probably it. Because it gives you the illusion and it is mostly true and the mostly of course is the devastating part of this. It's mostly true that when you call the Lambda, it runs exactly once. And so take something simple, like you're going to go off and let's say using this Lambda to implement a bank account, right? So you're going to go off and you're going to compute and add interest to somebody's account, or you're going to go do a transfer with that Lambda function. So you call it once, it runs once, it does its thing once, everything looks good. The problem is the actual semantics of Lambda are at least once, which means every once in a while, not all that often, but not zero times either, it'll run more than once.

So maybe it'll run two times or even three, which means you'll end up adding more money to that bank account than, you expected. And while not everybody's using Lambda to manage a bank account, it turns out that just lots of things in the enterprise are important to do exactly once. Not maybe a couple of times, not occasionally a couple of times, but exactly once. And this is a good example where I would say by focusing a whole lot on the service metrics and dynamics, it's possible to sort of lose sight of some of the things that developers really need. And this was sort of ... this was something I feel like certainly I got wrong in not giving people a simple solution for this earlier on.

Because as someone who has run a team at Coinbase trying to adopt serverless, someone who has been trying to build a business around it, I can tell you that some of these things not being there are incredibly challenging for adoption purposes. And one of the most common of them is just you want your code to run once. It seems like a really simple ask and it turns out that building that practice is a bit of a mess, and you've got to stand up. In addition to your Lambda function, you need a full powered step function with a couple of nodes in it that you can go and run a task, and then it'll give you the exactly once. And it's not that it's impossible, it's that it's expensive, clunky and time consuming. And you require every one of the 10 million developers you want to go use this to go figure it out on their own. And that's really painful, right?

It's something that should be as simple as like go and check a box here. So this is one of my big asks for AWS to think about is yes, it's possible to solve. No, it's not easy. Please, please, please, please give us this check box that says make it run the way I've kind of always wanted it to run. And we've all heard about the cap theorem. We know that's not easy to do. It's okay if it costs a little more, but it's something that would really make Lambda just dramatically simpler to use for virtually all of us who try to get something done with it on a day to day basis.

Jeremy: Right. Yeah. And I think that's one of those things. There are some of those patterns where the developer is left to figure them out on their own. Right. So you're putting in something to try to battle that idempotent operation, and it can be quite a headache. So we talked about serverless Redis a bit, and I'm sure we could talk even more about that. But the last thing that I wanted to talk about on your list was serverless networking. And so you've done a lot of work on this and like a lot of work on this, which is pretty cool in terms of being able to have running Lambda functions communicate with one another. And this is something, again, I feel like we could talk about this more, but the ability for those running functions to communicate with one another, that is what gets us to this idea of the Lambda supercomputer, right?

Tim: Yeah, so a little bit of context on this. Some of the most exciting research and innovation that's happening in the space right now is happening in academia. And you've seen these, we touched on this earlier for the video transcoding work that's gone on, on top of Lambda. There are some researchers here, Eric Jonas, Sadjad Fouladi, Johann Schlier-Smith, Vikram Sreekanti, who are doing just incredible, insightful work into building just massively scaled systems that are often combining the state and the compute together and doing really interesting, massive data parallel applications on top of a serverless architecture. So that's the good news. The bad news, it's a struggle. It's really hard. And if you ask yourself this question, like could you go rebuild some of the big infrastructure solutions of today like MongoDB? Could you go recreate MongoDB or Aurora DB on top of Lambda? And the answer is probably not.

And so what is making this hard for researchers? What makes this hard for somebody who might want to construct an infrastructure style service on top of a serverless base? And the answer is complicated as it always is, but in some of these, it's a few missing pieces, right? It's the fact that you can't do cross calls with Lambda. So that's the serverless networking piece of this, right? If I've got two Lambda functions, I can use them to call out to other services, but services can't call into them. And we talked a little bit about the origin stories earlier. One of the reasons we did this in Lambda was to keep people from using it as a conventional web server, because that was a failure pattern, right?

We would trick them into thinking that that state was there when it wasn't. So we turned that off. But in turning it off, we made it impossible to build some of the ... use some of the standard techniques that you use to build high speed, data dense multi computer parallel applications, right? And so all these things data scientists want to come and do now get way harder. So serverless networking was an attempt to solve some of that by doing NAT punching and some of these other techniques down at the low level, so that Lambdas could actually communicate and take advantage of the high bandwidth network that sits between them. And that's just one of the several things that you need to do.

To really make this serverless supercomputer a reality, you need not just a distributed networking solution, you need low latency as in single millisecond style choreography. You need high-speed key value stores like we talked about with the serverless Redis. You need a way to hook this up to immutable inputs and outputs. So there's a whole set of things that you have to do there so that you can ultimately build something like say CRISPR or a MongoDB on top of this, with the kind of outcome that you could get if you were to grab a bunch of servers. And doing this unlocks a whole new set of people and a whole new set of applications that can start running serverlessly. So I remain incredibly excited about this. I think we've seen enough evidence in the research community to suggest it's all possible.

I think we've seen enough evidence and direction from the cloud providers to suggest it is all doable, but there's still a long road ahead to make all of that possible. So I wanted to help out with that, hence some of the open source stuff that I've done with the serverless networking piece. But really that is just one of some of these foundational elements and we really kind of need to get them all in place to make this happen.

Jeremy: Right. So is that something you're going to keep working on or is it something where you think like, you've proven this is a valuable or viable thing and now the cloud providers just need to go and run with it?

Tim: No. I mean, it's probably a yes and a yes, right? Like I continue having some great conversations with folks who are working on this in the research community, continue sort of working as just kind of time and energy permits in some of the open source parts there, and look forward to some exciting collaborations with others on thinking through some of these challenging problems, like the choreography and the key value stores. So with Vendia, I've chosen to put my energy into a commercial enterprise that's helping to solve a slightly different set of problems. I think this space is going to be one where honestly, and my great hope here is that this is also part of where open source works for serverless. Obviously open source is not going to mean that we pull Lambda out of AWS or we pull Azure functions out of Azure.

It's going to be that people can create these frameworks and these mechanisms that help them get incredible new things done in the cloud. And that's where I think you can see the university research and the research community coming together with the cloud providers, coming together with this growing ecosystem around serverless to produce something that is amazing. So I think that's probably the best role for me, and that is not to try to be the commercializer of those pieces, but to help be a human choreographer of some of that work and energy.

Jeremy: Yeah. And I mean, speaking of open source too, I mean, think about Kubernetes, right? Kubernetes has taken the world by storm because obviously containers are the, I guess the standard that most people are now considering to be cloud native, even though there's plenty of people doing stuff with serverless. So is that something where you see something like Kubernetes ... I mean, obviously it's going to be around for a while because so many people have started to adopt it. But is that something where you think that's going to continue to gain steam or are we going to see serverless and maybe some more open source serverless options kind of take over for that?

Tim: Yeah. This is one of the things that Kubernetes got right. And I would say like everything's a mix, right? It's complicated in a lot of ways, but it's also open and portable, which is a key requirement for a lot of enterprise use cases. And I think that is part of the direction that serverless needs to move in. Because one of the key buying objections, anytime I would talk to a customer, they'd always be like, "Wow, this is just, I'm like a kid in a candy shop. I love all of this. On the other hand, I'm afraid I'm going to get cavities in the form of vendor lock in here. So help me out with that. What do I do about it?"

And one of the things that has helped give Kubernetes momentum is the fact that that question has an obvious answer in a way that today Google Cloud Run and a Lambda and an Azure function don't have an equally simple answer. So stay tuned for more from Vendia, perhaps on some of those topics. But I also think this is a place where the open source community has to come together and think about what's the right way to make this work. It's not going to be trying to run to ... It's not going to be trying to emulate the services like Lambda on a bunch of individual machines. Right. And you can see some of the challenges of doing it.

I can tell you for example having been at Coinbase and watched distributed ledgers go that one of the big difficulties for them is that everybody's running this stuff on kind of stock hardware. It doesn't use the best and brightest of the cloud, and it's the least common denominator. You wonder why Ethereum is slow? Well, that's because you can run it on a laptop. Imagine running Amazon S3, literally S3, like for everybody on your laptop. And that's kind of why Ethereum is running at the pace it is. So there's a lot to do there. I think it's not going to look like ... Open source solutions for serverless will not look like the Kubernetes model, but it is still a missing piece. And I think if we could get there, we'd also create a collaboration forum that people could lock and latch onto in a way that is never going to be quite as well developed if it has to be a single cloud provider running the show.

Jeremy: Right. Yeah. I totally agree. All right. So speaking of Vendia, I know you can't tell us a ton because you're still in stealth mode-

Tim: A few more weeks.

Jeremy: But you said you built it all serverlessly and you mentioned a little bit about using the CDK and some of that stuff, but any success stories of building it serverlessly?

Tim: Well, proudly serverless, one of the nice things about that is you can do a lot incredibly quickly, right? And here's a good story of developer productivity and progress because like you are looking at the moment and we're hiring by the way. But at the moment you are looking at the developer team for Vendia. So there are like a dozen people behind me furiously typing away, right? This is me and me and my spare time on nights and weekends primarily. But think about just pick one thing here like regional build-outs. So you scroll back to when I started at AWS 2012, building a new region for let's say a service like S3, six months on a good ... if you're lucky, right?

Because hundreds of people, everything from surveyors and electricians, hundreds of vendors, supply chain in the thousands. You've got to go get this thing stood up, filled with servers, filled with racks. Fast forward to Lambda. So now circa 2016 let's say. So build a new region in six weeks with a dozen engineers who are able to use things like EC2 and take advantage of the cloud. Fast forward to Vendia. I launched not just one region, but regions all over the world, a large subset of them, all the ones in which the services I needed were available in about six minutes, because all I had to do was list the names, type the names into the CDK and wrap a for loop around it and I was done.

So you go like one guy, six minutes, launching a production service at scale worldwide with essentially three lines of code. Now that's an amazing, amazing example of the kind of productivity success that you can get out of serverless and a well-matched set of tools like the AWS CDK. And I think that's kind of the story here. It's getting rid of undifferentiated heavy lifting, but it's also this idea of capital efficient value creation, which is really what we're all about.

Jeremy: It's amazing. Well listen Tim, thank you so much for one, speaking to me and taking the time today, but also for serverless. I mean, this is my livelihood. This is the livelihood of a lot of people that I know. What you and your team did at AWS in those early days was just absolutely incredible. And it's just sparked this thing that I think has completely changed the way people build applications. I know it's changed the way I do. And I look forward to everything that happens after this, including all this new stuff you're coming out with, with Vendia and that sort of stuff. So again, thank you. If people want to find out more about you, more about Vendia, all the stuff you're working on, how do they do that?

Tim: So we've got a website stood up. It's a coming soon website at vendia.net. Tune in. We come out of stealth mode on June 26th. I'll be doing the keynote at the AWS serverless community day for Australia and New Zealand on that time. And that's also when we'll stand up more of our ... kind of take the wrappers off as it were and tell the world what we're all about here. So can't wait to tell that story and have a broader conversation about it.

Jeremy: Awesome. And you are a course on Twitter, Tim Allen Wagner. Your blog on Medium @Tim A Wagner and of course LinkedIn and all that stuff. So we will put all that into the show notes. Thanks again, Tim.

Tim: My pleasure. Thanks so much for having me, Jeremy.

This episode is sponsored by Dynobase and Datadog.