Episode #18: Pushing the Limits of Lambda with Michael Hart (Part 1)

October 14, 2019 • 48 minutes

Jeremy chats with Michael Hart about the inner workings of AWS Lambda, the hows and whys of Custom Runtimes & Layers, Docker Lambda, serverless CI and so much more! This is PART 1 of a two-part conversation.

About Michael Hart

Michael has been fascinated with serverless, and managed services more generally, since the early days of AWS because he’s passionate about eliminating developer pain. He loves the power that serverless gives developers by reducing the number of moving parts they need to know and think about. He has written libraries like dynalite and kinesalite to help developers test by replicating AWS services locally. He enjoys pushing AWS Lambda to its limits. He wrote a continuous integration service that runs entirely on Lambda and docker-lambda, which he maintains and updates regularly, and has gone on to become the underpinning of AWS SAM Local (now AWS SAM CLI).


Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly, and you're listening to Serverless Chats. This week, I'm chatting with Michael Hart. Hey, Michael. Thanks for joining me.

Michael: G’day, Jeremy, mate. How’s it going? You having a good day? Is everything going alright so far?

Jeremy: I love the Australian. I love the Australian accent. You don't actually talk like that, but that was...

Michael: I don't know what you're talking about. I talk like this all the time. Yeah, I do. I do wonder. I feel like if I did speak like that all the time, people might find me charming, but I don't think they'd have a clue what I was saying.

Jeremy: Exactly. Yeah. No, I actually thought Australians spoke English until I met a bunch of Australians, and I said I don't know if that's English, but anyway, so it's awesome to have you here. You’re the VP of Research Engineering at Bustle. You're also an AWS serverless hero. Why don't you tell the listeners a little about your background, what you do, and what's going on at Bustle?

Michael: Sure. So a little bit of my background: I have started a couple of companies, co-founded a couple of companies, been CTO before, in Australia. This was moved to New York, did a bit of consulting and then joined Bustle as the VP of Research Engineering. So I, you know, do a bunch of interesting research things there. Bustle is a digital media company. We have a bunch of sites, mainly targeted at sort of millennial women, although we've recently been expanding that market.

Jeremy: Awesome.

Michael: And we have, just in the last year or two, I think I think we've sort of acquired or started about nine other sites, so yeah, growing.

Jeremy: And you guys are using serverless up and down your entire stack.

Michael: Yes, serverless across the board. Yeah. Been pretty early on that, yeah.

Jeremy: Awesome. Alright, so I have had a number of conversations with you. We were out in Seattle. We were out in New York City the other day. We've had a ton of conversations about serverless and Lambda and all these things that it can do. I would have recorded the conversations, but usually we're in a bar drinking Old Fashioneds or just being, you know, whatever, and the audio quality wouldn't be that good. So anyways, I want to talk to you about all these cool things that you do with Lambda functions because I have talked to tons of people and I capture use cases in my newsletter every single week and, you know, they're interesting things, but I don't think I've met anybody who has pushed Lambda to the limits like you have. And, I mean, not just like one thing, like multiple things. So I want to get into all of that stuff. But just maybe we could start by talking about, you know, in case people don't know, what is it— The Lambda function itself, it's actually an execution environment. There's an Amazon Linux runtime underneath there, or operating system underneath there. So you know this inside and out. And this will become abundantly clear that you know probably more about this than some of the AWS engineers as we go through this, but just let's start with that. What is a Lambda function? What is it made up of?

Michael: Sure. Yeah, so you're absolutely right. It is the environment that your function is running in is sitting on Amazon Linux, until very recently until the Node.js 10 runtime. That was all Amazon Linux 1, which is getting pretty old now. And then the ruttimes themselves would would sit on top of that just in a directory in the operating system, and each runtime would have, you know, whether it's running on Python, then the Python binary and all the libraries. And if it’s Node, then the Node binary and other libraries so that it'd sort of be the only difference between those two runtimes; the underlying operating system’s still the same. I mean, and these are launched very quickly. Now it's on Firecracker, which is Amazon’s sort of new VM-type technology that sort of provides isolation. But essentially, you know, these isolated environments spin up very quickly and they're running an operating system that runs a runtime that then invokes your function, which is also sitting on the file system.

Jeremy: Alright. So let's get into a little bit more than the details though, I mean, in terms of things that are installed and ready to go. I mean, it's more than just the runtime, right? I mean, there's other libraries, and other things...

Michael: No, you're absolutely right. So, unlike — I'm trying a bit of a good example — unlike if anyone's played with Cloudflare Workers or something like that, that's just running JavaScript. There's no sort of file system that you have access to or anything like that. It's just sort of a JavaScript environment with all of the things that you would have access to if you were in the browser, for example, those so you know most of, or a lot of those sorts of APIs. In Lambda, there is a file system in there. It's a Linux operating system running a process and and then including your Javascript file or Python file and running that. So as well as your file and and the runtime itself, there are, you know, a bunch of base operating system binaries sitting there that you can access as well and Amazon’s pretty cagey about what guarantees they give you about what binaries will be there. They say, “You know if you want to compile something native that your Lambda uses, compile it for an Amazon Linux environment, you know, and that's pretty vague because obviously, you could have an Amazon Linux environment with a whole bunch of binaries or dynamic libraries installed or you could have an incredibly stripped-back operating system that has nothing installed. So in that case, you'd need to sort of bring those binaries yourself into your Lambda. So a good example is if your Lambda function wanted to call out to Bash, do you just assume that Bash is there on the operating system? That's probably a pretty fair assumption. Bash is on most Linux operating systems or certainly the larger ones. So that might be a fine assumption. But then another example might be Perl. You know, maybe you need — maybe your function does something a little bit exotic. Maybe it's doing some cool image manipulation or video manipulation that it needs to call out to a Perl script. Do you assume that Perl is in the Lambda environment or do you bundle yourself or include it in the layer or something like that. So, yeah, those are the sort of questions I think that you need to think about, but because a bunch of binaries do exist in the operating system and you can kind of see them there in your Lambda, it's quite tempting to use them.

Jeremy: Yeah, and so I think that's actually something that's really, really interesting, because when I first started using serverless and Lambda functions, it was basically, the idea was okay, great. I got a stipid of code. I upload it, and it can access the database or it's going to, you know, call an API, right? It's going to use some of these basic functions that are built into the runtimes. But then, you know, every once in a while you have that image manipulation thing. You know, so you have some sort of resizing an image or, you know, and there's basic binaries out there, and this used to be really, really hard, to sort of compile those, and then you have to package them. And it was such a pain to do it. Obviously, there’s Lambda layers now, and we can talk about those in a few minutes. But so let's talk about — before we get into that stuff — let's talk about the Node 10.x runtime stuff, right?

Michael: Right, right.

Jeremy: So this was sort of a huge — this was kind of a big leap when they when they implemented this.

Michael: It was and they didn't announce it as such, not from memory, anyway. They were basically, just like, “Okay, we're now supporting Node 10.” But then when I looked it up, I was like, oh, hang on a sec[ond]. This is actually quite different to all of the other runtimes up until this point, even the custom runtimes that they had introduced used the same sort of base operating system, whereas Node.js 10 was running on Amazon Linux too. So all the other runtimes run on Amazon Linux 1 and Node.js runs on Amazon Linux 2, which, you know, is a lot more modern and has a lot more modern, a more modern kernel, and more modern binaries available for it and, you know, if anyone's using EC2 and using Amazon Linux, that's exactly what they will have been using for quite a while now. So I was running Amazon Linux 2 and it was running an incredibly slimmed down — or it runs an incredibly slimmed down version of it. The entire OS is something sort of like around 90 MB or something which, whereas Amazon Linux 1 — I'm gonna pull a number completely out of nowhere, but it's something I want to say. It's 500 or 600 MB or something like that. What you'd consider maybe a fairly standard OS installation. Whereas Node.js 10 is running on OS that doesn't even have the “find” command. It doesn't even have — there's some very, very basic commands that they've just removed completely, which, when you think about it, kind of makes sense because these Lambdas, you want them to spin up as quickly as possible. And if you are only running, your .js code or your Python code, and you're not using any native dependencies or anything like that, you're not relying on any binaries, or you've compiled your own go binary or something like that. You don't want there to be any extra fluff and you want the little VM instance that it’s running on to spin up as quickly as possible. So it makes sense that you'd want the smallest OS possible. And this is true for anyone who's using Docker, you know, they know this, that the smaller your image is, the smaller your container is, the quicker it starts, so it completely makes sense. But it's certainly a little bit of a departure from what they had done before. Thankfully, you know, they didn't they didn't sort of do it under the feet of anyone on the existing runtimes. It was a completely new runtime, so no one's code was going to break. But if you did want, there are definitely a bunch of functions out there that you can't just move from, say, Node.js 8 to Node.js 10 to move from one runtime to another, if you were relying on a certain behaviors or, you know, expecting certain binaries or dynamic libraries to exist there.

Jeremy: Yeah. I mean, there's been a bunch of changes with sort of how callbacks work in some of that other....

Michael: Right. So that's actually on top of the OS itself. Yeah, so that was another departure - you're right - as well as the underlying operating system and how tiny it was. The runtime that sits on top of, which is really only just a JavaScript file or two that gets executed, then loads your code and runs it — that had been completely rewritten. The runtime that exists on Node.js8 and Node.js 6 and Node.js 4 was all pretty — the code’s basically the same. It's the same Node.js code that loads your handler function, and then runs it. The code that was on Node.js 10 in the runtime was quite different. Firstly, it's using the same mechanism that custom runtimes use, which is sort of an HTTP client method, as opposed to the other runtimes that were using some sort of native binding to sort of speak to the Lambda supervisor. So it was written in a completely different way.

Jeremy: And in terms of some of the other breaking changes — so I know there was a bunch of complaints about the logging stuff, right?

Michael: Right. Yeah. So, as I do, I kind of love to look under the hood of these sorts of things. So I had a look at the code that was written for the Node.js run-time. AndIt wasn't not good, I would say. Just as a Node.js developer who's been developing for many years, it was code, but you know, you get an intuitive feel for whether someone kind of knows what they're doing or not. And it looked as though the code had been written by people who had never written Node code before. They were doing things where they were completely ignoring call backs. They were completely — it was not how you would write asynchronous code in Node. I did a blog post, made some complaints about it, and other people had seen changes as well that, you know, weren't necessarily directly as a result of the quality of the code, but was certainly as a result of some of the decisions they made. Errors were being swallowed at the top level and not being printed out as they were on other runtimes. And yet, as you say, the logging had changed. They  had added another field in the logging. So any of your existing logging as is, or anything like that wouldn't work anymore. They — and then this is still true — strip out any new lines in your logs and replace them with carriage returns. And that could be lossy potentially, depending on how complex the thing that you're logging is, but it meant that, you know, a log that previously would be multiple lines is now all on the one line. And these sorts of things I think would be fine if they had done them from the very start, but because they hadn't and they change them in the new runtime, I think people upgrading to that runtime, you know, have had to maybe just deal with things a little bit differently.

Jeremy: Well, I will say anytime you see a Node script that uses double quotes instead of single quotes, you know that’s suspect, right?

Michael: Yeah, single quotes for life.

Jeremy: Exactly. Alright, but right now, I mean the thing was, is that you're right. When they first came out, there were a lot of things you wrote about. You had a great blog post about that whole thing. But right now, I mean, I've been using it now and in my latest project, and I mean, I've been pretty happy with it. It is fast and stable. I think it's a huge improvement over what was there before.

Michael: 100% agree. They they went in. They made a bunch of changes. I mean, the runtimes actually change every now and then. I noticed this, but because you know, I have this Docker Lambda project that we might chat about later. So I'm constantly sort of checking for changes that will have happened. And they actually changed the runtimes every now and then, sometimes it’s security patches. But sometimes it's just that they'll add something in a little bit extra or that they're clearly they've added something in because there's been an edge case or a bug. You can kind of see exactly what it was with the code that's changed. With a Node.js 10 runtime, yeah, they completely rewrote all of the code, and it's much better now, and I agree with you. I think I would recommend anyone use it. You know, be aware if you're upgrading from from the 8 runtime to the 10 runtime. There are some minor changes with logging. But aside from that, I think everything's pretty rock solid.

Jeremy: So one of things you mentioned with the new runtime is that the Amazon Linux 2 strips out all kinds of these basic commands, and we generally don't need them. We might not need to do like a disk usage command line within a Lambda function, so we don't need that. We don't we don't need “find,” but we do need some things, and maybe we need a different runtime. So custom runtimes and Lambda layers are two things that were introduced back, you know, almost a year ago now, which is kind of crazy that that much time has gone by.

Michael: Yeah, wow.

Jeremy: But so I still feel like the custom runtime stuff, you know, there's a lot of blog posts about this too, like, why would you use custom runtimes? And I was thinking about it myself, and I'm kind of like, yeah. I mean, you know, why would I use my own Node custom runtime? And I know you have version 12 custom runtime that you've built for it. But I think you have a really good perspective on this because it sort of changed the way I look at it. What do you think about custom runtimes?

Michael: Yeah, so I actually agree with the general advice that, you know, one of the reasons that you choose Lambda is because so many things are managed for you, and you don't need to worry about so many things and there is an issue if you choose a custom runtime that is an extra thing that you now need to think about in terms of patching — at least the runtime anyway. The OS under the runtime will still continue to get security patches and kernel updates and all that sort of thing without you knowing about it. You know it'll happen one day, and suddenly your new Lambdas, whenever they cold start, will be running on a, you know, a patched OS. So you don't need to worry about that with custom runtimes, but you do need to worry about, of course, upgrading the language that the runtime's running on itself and if that happens to be a dynamic language. I mean, I think plenty of people [are] using custom runtimes for languages that just aren't supported on Lambda, and that's an excellent use case. You know, if you want to be using Swift or something like that, or Rust or whatever it is, then I think that's a very valid use case for a custom runtime because you have no other choice other than to use an officially supported language. But yeah, like, why would you want to choose a custom runtime if you're running Node, when there are Node runtimes available? And I think there are a few reasons for this — not many, but a few. One is if you want to include some things that, across your organization, you know that you're gonna need and that everyone's gonna need. Then you can sort of bundle them in the runtime itself. I don't know. You might have some wrapper around your function that every function needs to have and that every function needs to invoke when it first starts up. And you don't want the authors of these functions to worry about that or to also have that extra step of adding a layer because arguably a layer could do that. But having the custom runtime there means that you're in control of that boot-up process. You might need to authenticate with something before the function runs or there might just be a bunch of things that you want to manage across your organization, and providing a custom runtime might be the easiest way to do that. And then there's just, I don't know, if you're the sort of person that likes staying up -to-date I mean, I know that I can release my Node custom runtimes quicker than Amazon can. So when I'm using my custom runtimes, I know I'm actually less out of date than everyone else who's using Node 10 because I haven't checked, but the last time I checked Node 10 was still a few versions, you know, a few patch versions, if not a few a couple of minor versions out from the from the most current Node. So, yeah, if you're the sort of person that is happy managing your own runtime — because that's not quite the same ask as managing an entire operating system or…

Jeremy: Installing Kubernetes.

Michael: Yeah, yeah, yeah, yeah. It's not quite that intense. it's literally just okay, yeah, I know how to get the latest Node binary and turn it into a...

Jeremy: But I actually thought, though, that the — and we can talk about layers too, because actually — well, let's talk about layers and we'll go back to your custom runtime use case. Because in my most recent project I started, I'm like, I'm gonna use layers, right, because everybody says I should use layers and so forth. And then I ran into this problem of alright, well, where do I deploy my layer to? Where do I package that? Is that in a separate service. I'm sharing that across multiple services. What's the service discovery on that? There's no semantic versioning on it, and everything's just adding a number. And then I found out hey, I could put it in a SAR app and I can version that and whatever. And then I realized I can just put “GitHub:” and then the path in my NPM, I mean in my package.json file and I could just install it right from my own, with semantic versioning, right? I can put tags and things like that, and I was just thinking like, you know what, for the purpose of this, this is just going to be easier. And the reason why I was even doing that was because I have something that works for specifically formatting logging a certain way. I have an authentication component that will take in the authentication headers or the custom authorizer headers. And it sort of transforms that into a common object that all of my scripts can deal with. I've got an EventBridge emitter function that allows me to easily send notifications to EventBridge and things like that. And rather than writing that service for every single, you know, service that — or writing those scripts for every service that have, I just created those, have those as shared services, and I pull those in when they're installed. There's no dependencies basically, and your use case or what you were talking about with the custom runtimes, was actually kind of interesting, because I'm thinking to myself: if all of my services are using Node and I have five or six different sort of common packages that I want, like, why not have those sort of built into the runtime as opposed to having to make sure that I installed them as Node dependencies and have access to things like that. And that way I could just say if there's a major update to these individual dependencies, I could just update the runtime and use the next version of the runtime. And to me, that seems like it's a lot easier to manage that one thing. Now again, you're still managing your runtime, but for a larger organization that wanted to enforce certain security policies and certain logging requirements or whatever it was, I think it’s probably a really slick way to do it.

Michael: Yeah, 100% agree. I think a classic example of that would also be, you know, the AWS SDK.

Jeremy: Yeah, absolutely.

Michael: You know, so many people use that. Of course, you can use the one that exists in the Lambda itself. But even AWS recommends…

Jeremy: Suggests you don’t.

Michael: ...Recommends against that. It's very convenient that is there because that means you can write a one-line Lambda that does amazing things, but...

Jeremy: Especially if you're forced to write code through the console for some of those…

Michael: Under duress.

Jeremy: Under duress on a live video stream. Yes.

Michael: Yeah. I don't know why you’d want to do that or put yourself in that situation. But yes, yes, for things like that is very handy to have the Node runtime there. But of course, yes. managing especially, you know, a new service comes out and you want to use it from your Lambda. Well, guess what? Hey, it's not supported by the AWS SDK version that's installing Lambda just yet, so you're gonna have to wait or you bundle it yourself. And yeah, a custom runtime would would be one example of doing that. Again custom runtime, it does mean that you need to — that you would need to understand how custom runtimes work. A soon as you understand that, I would say there's no extra onus on you to keep it up to date or anything like that. There's nothing particularly fancy that you need to do. And of course, there are a number of custom runtimes out there that you could just extend yourself and add your own packages to. But yeah, the layer thing. I agree with you on that. I think it's a real pity it doesn't have semantic versioning. And it was really interesting to find out that, yeah, you could do it via SAR if you want, but it's almost like a hack, in a way — a hack of SAR, of like, creating a whole application that is really just a layer. You get some anti-versioning and things like that, and the layer can be across all regions, which is kind of nice and a little bit of a pain if you manage your own sort of open source or a commercial layer that you want other people to use. It can be a pain to replicate that across all regions. But one thing that you should consider, with not using layers, is deploy time. Now, your dependencies might be nice and small, which is great, but, you know, if they start getting up into the multiple MB, if you have something in a layer, you don't actually need to deploy it each time. And yeah, I think it was maybe even Brian Leroux who made that lightbulb switch in my head, because I was, when layers first came out, I was like, “Okay, this is kind of neat,” but, you know, assuming you know how to manage dependencies, what's the point? There's no real point to this because I think I was hoping when they were announced that I was like, “Oh, great, this is gonna give you extra package size.”

Jeremy: Which it didn't.

Michael: Give you extra space. It's maybe going to make cold starts quicker because, oh, they're gonna be able to bake the layer in with the base image and have that all ready to go. That could still be an optimization that they might make, but my understanding is they don't you know, I mean, that would, I guess, I mean, they'd have to bake as it were a lot of many, many, many different combinations of peoples runtimes. But yeah, there is this advantage — so you don't get any other advantages, really, but there is this advantage of deployment time, which is where you don't need to deploy the layer again and all the code that sits in it. And if you have dozens of MBs, that could make your deployment a lot quicker.

Jeremy: Yeah, I definitely think — I mean, I think the public layer stuff, especially some of that stuff that Gojko has done with FFmpeg and some of those others, or NumPy or some of those other things, those are great, because nobody wants to compile those things down. Although you might have an alternative to that that we'll talk about in a minute. But, you know, that's just easier for someone to say why I don't want to have to run a local Docker and compile it down and then do that all myself and package my own layer and then manage that layer and so forth. And I think some of those sort of things are relatively safe. If I need to pull in, you know, FFmpeg or something like that, pulling that in via a layer is so much easier than trying to manage that myself.

Michael: Right, and I think that's a good point is that perhaps for people who are new to Lambda, they might not actually realize that oh, your Lambda is going to be running in a different environment probably than what you're developing on.

Jeremy: Yes.

Michael: You're probably not developing on Amazon Linux. You might be developing on macOS or something like that, and you're used to being able to just sort of, you know, maybe brew install something, or uninstall something, and have it ready to go for your application. And I think this trips up people a lot is when they first go to start using Lambdas is they don't realize that the package that they've installed, you know, that's sitting in their Node modules that they then zip up and fire off to Lambda, well, that was natively composite for MacOS which is what they’re developing on. That's why it works locally, but they've bundled it up. They've sent it to their Lambda, and hey, it doesn't work anymore. Because it was a native binary component for MacOS and I think that so there's that step of going “Oh, hang on. Okay, I messed up. I have to do something about this. What can I do?” And then answer, really from Lambda - it's not a great one - you know, it's like, “Oh, well, you need to make sure that your binary is compiled for an Amazon Linux environment.” And it’s like, did you just tell me to go “F” myself? Like what does that mean? Does that just mean I can find a Linux binary somewhere and use that? Maybe, you know, sometimes you can. But what does a Linux binary mean? What is it depending on? What is it assuming about the environment that it's running in? And I think people, especially newcomers, they end up going having to go down a rabbit hole that they really didn't want to go down and they were just like, “I just wanted to install this dependency.”

Jeremy: Alright, so that's a perfect segue, because now we should talk about Docker Lambda, which is essentially a Docker image of that Lambda runtime environment that you can use to test things locally, compile binaries. You can do all kinds of things in there. Why don’t you tell everybody a little bit about that?

Michael: Yeah, 100% right. It's basically a Docker image or a set of Docker images that are trying to be the most faithful reproduction of the live Lambda production environment as possible and they have all the exact same files, or at least the ones that you have access to. There are some obviously that only accessible by root that I can't replicate. But aside from that, it's the exact same file system and as much as possible, the same sort of commissions because in the Lambda environment, you only have write access to /tmp. You don't actually have write access to the directory that your code is in and that can trip people up as well. So I created it because I was trying to do some some pretty fancy stuff in Lambda, and I was just getting frustrated with this. You know, the idea of oh, I have to spin up in an EC2 instance with Amazon Linux and then compile my stuff there and then copy that over what, to my local machine and then zip that up or, you know, or deploy it from EC2. It was getting very painful when, you know, you could maybe do it once, but then the cycle of development is really slow. So I was like, well, I wonder how hard it would be to at least try and, you know, get a similar sort of environment to Lambda just running locally using Docker. It's a sort of perfect use case for Docker, you know, because Lambda is kind of like containers. And the way that I ended up doing it was essentially by running a Lambda, which tars, you know, creates a tarball of the entire file system and and copy that over to S3, and then I pull that down. And you know, there's a Docker command that you can use to create an image from a tarball of an entire file system — so sort of as much of the file system as I can grab. I grabbed that, put it in a tarball and then create a Docker image from that. And then during that process, as I was doing that, I realized, “Oh, hang on. All of these runtimes, they're actually running the same operating system. It’s just one directory that's different — you know, /var/runtime. That's where the runtime specific code lives. That's where the Node.js 8 code or the Python 3.6 code or whatever lives. The rest of the operating system’s exactly the same. So that's kind of cool. I can create a base image and then each runtime can just — I just need to dump the /var/runtime directory. And there's a couple of others now, there’s /var/lang, /var/rapid I think, is one of the new ones. There's a couple of directories that change within each runtime, but everything else is the same, so that makes it easy to create separate images for each runtime then as well. And then what I did on top of that was — so that's great. That gives you an image that's a replication of the environment, and you can use that to then compile stuff, and that sort of thing. But I thought, well, hang on. I could also use this to mark out running a Lambda, and I could actually use the runtime code itself because it's sitting there so I could use the exact same code that requires your index.js and looks for the handler and then executes the handler with all that. And, you know, on Node, for example, it overrides console.log and adds a bunch of fields so that whenever you console that log, it actually outputs a bunch of extra fields, and it’s the runtime that's doing that patching. So I was like, well, instead of trying to replicate that, I'll just let that code do that, you know? And then the only thing I do need to mark out is obviously, whenever it tries to talk to the Lambda parent process, you know, talk back to the supervisor and say, “Hey, here. Sends these logs off to CloudWatch Logs. I'll just intercept that and we'll output them to the console and that sort of thing, so, yeah, thus Docker Lambda was born, and I think a lot of people started using it. They found it really useful. It's a little bit if you are testing your Lambda locally and you don't have any native dependencies, and it is just a simple function. It's not really necessary to go to the degree of this level of replication. You can very easily write unit tests that just test your functionality without requiring a whole Docker environment. But if you are doing anything like writing to the filesystem or anything like that, it sort of gives you this extra level of parity with the real environment. Yes. So I wrote that. I remember chatting with, at the very first ServerlessConf in 2016, with Tim Wagner about it who is the sort of father of Lambda. He created Lambda. He was at AWS At the time. I was chatting with him about it and a bunch of other things, like oh will Lambda ever be able to run Docker and things like that. I said, “So I've been experimenting with this idea of creating this thing, and, you know, it'll allow people to test.” He was like, “Okay, yeah. It sounds OK, but I don't know. I think people would probably just want to test in the cloud really? Like I think that's where we'll put our focuses on, and we'll just make it easy for people to test in the cloud. I don't think they are really going to want to run Docker on their local machine as part of a testing environment or anything like that.” Or all more just, you know, he was like, “No, that's probably just not an effort that I think we want to pursue at Amazon or anything like that.”

Jeremy: And then...

Michael: Which I think, you know, i’s okay. That's a valid response, and I could buy it if you could literally deploy everything, you know, deploy your entire cloud formations, stacking milliseconds and have a really fast cycle and also be able to, like, have free development accounts and things like that. Then I could buy that you could use the cloud as a testing environment. I still think that's something that Amazon could aim for is to make that process much easier, because what better way than, you know, they're testing in the environment where your production's actually going to run. but But anyways, I created t and then yeah, a good yea later and AWS Sam’s CLI team reached out to me, and they were like, “Hey, we noticed your Docker Lambda project, and we're thinking about writing a local testing utility in Sam’s CLI. Maybe we could use that.” And I was like, “Yeah, great. Go for it.”

Jeremy: So when you said people are using it, you meant a billion-dollar company was like, “Right, we’re going to borrow this, if you don't mind.”

Michael: A trillio- dollar company

Jeremy: I’m sorry. Did I say billion? Trillion-dollar company, yes.

Michael: So now if anyone uses Sam locally. Long story short, if anyone’s using Sam locally and then they’re spinning up the Docker Lambda containers basically.

Jeremy: Awesome. Alright, so that was one cool thing that you did, which was sort of just duplicating the environment — not “just” duplicating — but you did all this work, right? But knowing all those internals, you then took this further and you started doing these other crazy things with it. And you launch this thing called lambci, that basically built an entire serverless build system for CI, for continuous integration. So tell us about that.

Michael: Right. So actually, lambci, preceded Docker Lambda. I wrote Docker Lambda as a utility as I was writing lambci. I was like, well, hang on with Lambda we've got access to, you know, you can suddenly spin up all these instances incredibly quickly. What's one of the most painful parts of sort of development and sitting there twiddling your thumbs? And that's waiting for a CI build to finish. And I was like, “Well, hang on, if we could get CIs running in Lambda, that’d be incredibly fast.” Not only that, but at the time — because this is back in 2016, maybe even 2015 when I first writing it — there was no sort of pay-per usage CI systems. You basically paid per month. You know, you'd be like okay, yes, I want access to one build server or maybe four build server or something like that. You’d pay per month. There was, as far as I knew anyway, there was no way to get a per request pricing or per build pricing or something like that, which, you know, especially in the serverless world, whether that thinking maybe wasn't as common — it seems obvious now — but it wasn't  as common back then that we'd only want to pay for the resources that you use. And I think people would do things like they might be running Jenkins, and then at night, they'd shut down their build cluster, and in the morning, they’d start it back up again. You know, you could do cost saving things like that. But I was like, well, hang on, we can just spin up a build in milliseconds and have it shut down again. You're only charged for the time that it’s building. So that's one advantage. And then the other advantages well, you’re not just limited to maybe four concurrents because, you know, I was CTO of a company that had not many developers, but 16 developers. And once you've got 16 developers, if you're really doing CI/CD and you're pushing stuff out all the time and you want, you know, to be able to everyone have their own staging environment and all this sort of thing, you're pushing the CI service pretty hard. And the worst thing is for people to be sitting there stuck in a queue waiting for someone else’s build to finish so that they can then access the CI server. With Lambda, obviously. I mean, you could get to that point, but you'd have to be doing thousands and thousands of concurrent builds basically before you started stepping on someone else's toes. So, yeah, I thought, look, this would be great. It's running Linux. Yes, there are some limitations, but I'm sure there's a whole bunch of things that I could get running. So I started exploring what you could do, and it's very straightforward to sort of do very basic sort of Node testing you because it's just running Node files. But then I started pushing it more. I’m like what if you wanted to do an NPM install and you had native dependencies. How are those native dependencies going to build in Lambda? I was like, okay, I need to get GCC compiled on Lambda, you know, and that was quite an adventure, but I did it and so you know, then it was like well, how you could do MPM install and have native dependencies. And so because GCC’s running on Lambda, it's compiling your files. And initially I think the timeout limit was five minutes and then they expanded to 15 minutes, and that opened up, because people were using lambci and running into this limitation. But then it became 15 minutes and that actually, there are people, obviously, that have builds that run for longer than 15 minutes, but not many, whereas there are a lot of people that have builds that run longer than five minutes. So that opened up a big bunch of use cases there. There was some, I think, restrictions early on about TCP sockets and things like that that stops people from opening local servers that they might want to do integration tests on. You know, you create a local express server that you then test all your HTTP routes against and that sort of thing. You can now do that very easily in Lambda. So as time's gone on, there have been more use cases that you can kind of do, and that you can then use lambci for basically.

Jeremy: So the other thing that I’m just thinking about, too like, you can do other tests. Like, can you run like selenium tests? And like headless browsers?

Michael: You can do a headless, yeah. So, you know,  some awesome people have experimented, I guess, just as I have, with getting some crazy things compiled and running on Lambda, including TensorFlow, but also including Chromium. So you can run a headless Chromium in Lambda. You can you can do a whole bunch of UI testing. You can run Lighthouse. You can actually use it for all sorts of interesting things. So you could be doing that as well.

Jeremy: And part of the reason why this is so cool is because, as you mentioned, you have a Jenkins build server or something like that, it only has so much processing capacity, and it runs a lot of these tests serially. I didn’t know if I was going to be able to say that word. I’ve got to channel Peter Sbarski here and say “parallelized.”  So you can parallelize these jobs with lambci and you had a post the other day on Twitter where you said you took like build times down from seven minutes down to, like, 15 seconds or something like that?

Michael: Yeah, yeah, yeah, that's right. So we use this or at least, have been experimenting with using this, at Bustle for a while now. And you know, I came on two years ago. There was already an existing CI system in place and we do some pretty hairy things. We talked to Elasticsearch and we talked to Redis and we do that in our tests as well. So we need the CI system to be running Elasticsearch and running Redis. And you could do this in Travis and Circle CI. You can have these services set up, you know, a local Elasticsearch running, and a local Redis. But you could do it in Lambda as well. You can, as part of your build process, download Elasticsearch, certainly in the Amazon Linux 1 runtimes. Java is sitting there. So you can download Elasticsearch and you can run it on and have it exposed locally. Redis, even easier. You know, this is a very small binary and you can run that. Just expose local ports and then your unit/integration tests can talk to that locally. So we do that at Bustle, and that's back-end testing, front-end testing, similar sort of thing. But you know, you're often running tests, unit tests, but you’re also running linting and you’re running formatting. And you know there's a whole bunch of things that you can do in parallel. People might be parallelizing these already in their CI systems if they've got a couple of concurrent servers available to them. But with Lambda, you can really go crazy parallelizing it so I've got it running so that it tests every nth test. You can, with most test runners these days, you can pass the list of files to it instead of just passing a directory, you can pass a list of files so you can just use the find utility and an awk command or whatever to go and grab every fifth file or every tenth file, or every fiftieth file, pass that to your test runner, and each Lambda can be doing every fiftieth file. And you can run 50 Lambdas in parallel and boom. Your tests suddenly run 50 times faster. Any job like that that you can do in parallel that you don't need some sort of serial bunch of steps for, yeah, is a perfect use case for — I mean is a perfect use case for any parallel system. It’s just that Lambda is incredibly good at that, and it's gonna be a lot cheaper to do on Lambda than it is to be buying 50 concurrent servers from, you know, renting them from a traditional CI.

Jeremy: That's awesome. Alright, so now we're going to go to the next level stuff.

Michael: Right.

Jeremy: So if you've been...

Michael: That's not next level enough for you.

Jeremy: Well, that's what I’m saying. If you made it this far, I hate to tell you what we just talked about was kid's stuff. Alright, we're going to the next level. Alright, so you have been working on a new project called Yumda.

Michael: Right.

Jeremy: Tell us about this because this blows my mind...

ON THE NEXT EPISODE, I CONTINUE MY CHAT WITH MICHAEL HART...