October 21, 2019 • 39 minutes
Jeremy continues his talk with Michael Hart about pushing the limits of Lambda. They discuss Michael's new "yumda" project, how to use Lambda for machine learning hyperparameter optimization, and whether or not Lambdas should call Lambdas!
This is PART 2 of my conversation with Michael Hart. View PART 1.
Michael has been fascinated with serverless, and managed services more generally, since the early days of AWS because he’s passionate about eliminating developer pain. He loves the power that serverless gives developers by reducing the number of moving parts they need to know and think about. He has written libraries like dynalite and kinesalite to help developers test by replicating AWS services locally. He enjoys pushing AWS Lambda to its limits. He wrote a continuous integration service that runs entirely on Lambda and docker-lambda, which he maintains and updates regularly, and has gone on to become the underpinning of AWS SAM Local (now AWS SAM CLI).
Jeremy: Alright, so now we're going to go to the next level stuff, right? So if you’ve been...
Michael: That's not next level enough for you.
Jeremy: Well, that's what I’m saying. If you made it this far, I hate to tell you what we just talked about was kid's stuff, right? We're going to the next level. Alright. So you have been working on a new project called Yumda, right? Tell us about this. Because this thing — this blows my mind.
Michael: Right. So this is basically what was born out of the realization that people have struggled traditionally to get things compiled — native binaries or anything like that compiled for Lambda. For example, if you do want to write a CI system like lambci, then you will need some sort of git binary or a git library. But I would suggest using the git binary because libgit is just not there with all the features, But, you know, you'll need to get binary running on your Lambda so you can do a git clone of the repo that you're then going to do your CI test on, and getting that on Amazon Linux 1 was kind of hard enough. Getting it on Amazon Linux 2 is much harder, because there are so many fewer dependencies that exist there. I think on Amazon Linux 1 already had — it has curl on it. You know, if you're in Node.js 8, you could just shell out to curl, so a git has curl as a dependency. So if you're compiling git for, you know, the older runtimes you didn't need to worry about a curl or anything like that. You just need to worry about git. On Amazon Linux 2, you don't have curl, you don't have some really, really basic system libraries. So if you want to get git running on Amazon Linux 2, you need to pull in a lot of stuff yourself. And I got to thinking, well, what would be the best way to provide, you know, a bunch of pre-built packages out of the box? Yes, you could use layers. And I think layers are a great idea for very high-level packages, very, very large binaries that have a huge tree of dependencies or certain utilities. But it's impractical to be creating a layer for every single dependency that your native binary's going to use. You don't want to be creating one layer for libcurl, and another layer for libssh and another layer for this. Firstly, you're only limited to five layers that you can currently use in your Lambda, so you'd need to be squashing them together anyway. And secondly, it's just layers, certainly as they stand at the moment, they're not — if there's no particularly good discovery around them. It's nothing like doing an `npm install` or a `yum install` or something like that.
Jeremy: Well, and I also think that many of those layers, that if you install five layers, that a lot of those might be sharing dependencies under the hood as well, like they might have shared dependencies and then you might be installing those twice or three times. I don't know if they would...
Michael: Right, right, could they be clashing.
Michael: No, no.
Jeremy: But anyway, sorry.
Michael: No, no. So that's another consideration. So I thought, well, ideally, what people want to do, and this is certainly what people do in the container world, if you're writing a Docker container, you know, and you need native dependencies, one of the first steps you'll do in your Docker file is you'll do `yum install` whatever dependency I need. And that'll go down and pull all the sub-dependencies and then that will be installed in your Docker container. And then you can, you know, run your app from there knowing that this stuff exists. We don't have anything like that for Lambda, so I thought, well, I want to run YUM install essentially and have all those packages — all those Amazon Linux 2 packages that are there — you know why couldn't I just get them and install them for Lambda? And the reason that you can't do that is when you run a yum install, it installs in the system directories; it installs software in /usr/bin, /usr/lib64 if it's a dynamic library. And you can’t install that to those places on Lambda. You can only install to, if you're using layers, /opt so /opt/bin is in the path and /opt/lib is in the lb library path, which is where dynamic libraries get loaded from. So you need to make sure that your binaries in your dynamic library sit in those path. That's where they'll be unzipped to essentially when your layer is mounted or /var/task if you've bundled them up with your Lambda function. So you need to make sure that the binaries that you're shipping and the dynamic libraries that you're shipping are okay living in those paths and there's a lot of binaries and libraries out there that aren’t. You can't just copy them from /usr/bin to /opt/bin because something's being compiled into that binary that is assumed that's living in /usr/bin. There are a bunch that you can just move around and that is a good first test. You may as well try it out, see if you can move a library from here to there or see if you can move a binary from here to there. But there might just be something down the track while you're using it, where it’s suddenly like, hey, I can't find this file or maybe it's depending on the configuration file in a path that's been hard coded as well, and you can't get your configuration file to that path because it's not writeable by you. So what I did was I took the Amazon Linux RPMs, and you can get, you know, all these RPMs are open source. You can get the source RPMs. RPMs is this sort of Red Hat package manager format for what a native package looks like on Red Hat Linux and all of its various children, including Amazon Linux, which, which sort of stemmed from Red Hat. So RPMs are what yum what YUM Install will use to install. So I pulled all these RPMs down there, and then I just re-compiled them instead of instead of /user being the path they were compiled for, compiled them for /opt so then you know, I had all these packages that I had re-compiled on, and then I created just a little a little Docker container that has YUM on it that is configured to install these RPMs in the right place because you also — the way that you, if you ever do want to YUM install the package in a non-system director, you have to provide a bunch of configuration to it to let it know that you're doing that. So I sort of pre-configured all that and to talk to the YUM repo that I had set up and that sort of thing. So basically, I've, you know, created a little Docker container where you can just do YUM install git and it will pull down git and all of its dependencies, everything that's been compiled for a /opt environment. And it'll install it all in a directory of your choosing, which you could then zip up and create a layer from, basically. Or you could also bundle it into your Lambda if you wanted to as well. But typically, I think people will want to create layers.
Jeremy: But the idea would be is that if you wanted git and SSH and a couple of these other things you compile, you can compile all those or combine all those into one layer, right? So you just have one layer and you have some limitations there, you know, like package sizes. So obviously you couldn't install the moon, right? You need to be a little bit…
Michael: Yeah, yeah, you're still limited, at least currently, I think the limit is 250 MB or something, I mean, as the total package size that you can have. So, yes, you're still limited to that.
Jeremy: So basically, there's those two sides. So you have your own sort of YUM repo that you’ve built around this. And then...
Michael: Yeah, yeah no, you're absolutely right. So it's a YUM repo, which is literally just it's where all the packages live. You know, they're up on the web, in S3 somewhere. So there's that part of it, all the re-compiled packages that live up there and then the other part is okay, that's fine. They live up there. But how do you tell YUM to A) use that as the repo to pull the packages down and B) to make sure that they know to install into /opt, which in Docker, you mount a local directory into a directory in the container, and that's how you would do it. You know, I'm tossing with the idea of turning this into a CLI, so that you wouldn't need to worry about the Docker run command. But it's a pretty basic command. It’s basically just Docker run Yumda, YUM install package, you know, and then packages, and it'll discover all the dependencies that it needs and make sure that they're all there, and all installed in the right places as well.
Jeremy: And sort of the long term goal here would be to open up that repo, right? So other people could compile and put it in there. But you've got a really good start. How many packages do you have already?
Michael: Yeah, so I've compiled 868 packages so far because the process is relatively easy to convert. Basically usually, the way the RPM packages are compiled, they use a spec file, which I guess is akin to like a Docker file or something like that if you're in that land. A spec file that says, okay, how’s this package going to be compiled and it tries to use variables for things like, what's the top level directory, you know, and things like that. And a lot of spec files, the way they've been written is pretty good, and they don't have any hard coded paths or anything like that. So as long as you re-compile and you pass it in the right variables, it'll just work without even needing to modify any of the source basically — any of the source code. There are a couple of packages, which you know have just assumed — made these hard-coded assumptions — that it's going to be installed in /user because that's where everyone installs everything and so they, you know, I needed to modify something's there. But I think the long term goal would be okay yeah, release this repo and the packages and then give instructions about how people could create their own repo. I think — so one thing about because I tossed around with the idea of: is there another package manager that would be better for this? Would it make sense to NPM install, you know, native dependencies and you certainly could do that, but it would require rewriting a bunch of stuff. There would be some advantages in that, you know, other people could then just NPM publish and that sort of thing. Yum, you know, and RPM, they're much older systems. It's not quite as easy as that, to just publish your own package, you need to kind of host your own Yum repo. But I think, yeah so I'd provide instructions about how you'd want to do that and then obviously, try to accept as many sort of pull requests and ideas for hey, I want this package and that sort of thing. I think that's the reason I haven't released this year. It’s because I keep thinking about do I really want this to be on my plate? What's the best way to sort of tell people, “Look, I'm happy maintaining a certain, you know, number of packages and obviously some core things that you might need,” but I don't know if I want to be brew or you know, homebrew or something like that.
Jeremy: But listen, the thing, though, about this is that if people don't realize how powerful this is, I mean, just think about the ability to quickly and easily compile a layer that has git or SSH and a lot of the use cases, the packages that you've done, you know, they play into, like, a use case like CI/CD. But at the same time, you were telling me about some of these ones you have. I mean, you did like GraphicsMagick for image manipulation...
Michael: Yeah, exactly. So there’s image manipulation. There’s sound conversion that you might want to do. There’s video conversion. All of these sorts of things are going to require native binaries, basically, because they’re difficult things to do. PDF rendering, things like that and Gojko, who you mentioned earlier, he's done a lot of exploration on this, but, you know, I remember on Twitter watching him bang his head against the wall about just how hard it is to figure out how to compile some of these things for that restricted environment. So this would really allow you to, you know, pull in a lot of these pieces that you might need.
Jeremy: But you've got runtimes compiled. You've got, like, Apache and MySQL. So you could literally use a serverless...
Michael: I’ve got MySQL, postgres...
Jeremy: ...So you could use a Lambda function to spin up and test. You do integration testing on your LAMP stack project. It’s kind of wild.
Michael: Yeah. I’ve got PHP and Python compiled and a whole bunch of things. Yeah you could run MySQL locally on Lambda, and I actually think it wouldn't be as crazy as it sounds. You know, a lot of these systems that, you know, a lot of them have options for hey, just start with an in-memory database or something like that. Start up because people do do integration testings with these servers.
Jeremy: Crazy. Alright, so listen, we've been talking for a very, very long time, but there's another thing that is even more — I don't know if this is higher level than what we just talked about. You wrote an article earlier this year called Massively Parallel Hyperparameter Optimization on AWS Lambda. And you were using this ASHA technique or some paper that you read. And now I’m going to tell you — this is above my head, too. So everything you're saying — I was reading the article, glassy-eyed, like not sure where I was, you know. But this is just really, really cool. And so maybe for my benefit and maybe for some of the listeners, you could explain what you did as if I was a five year old.
Michael: Sure. So, basically, there's a machine learning tool out there called FastText, which is for categorizing text. You know, spam is a classic example of this. This text is spam. This text isn’t spam. This text is spam. This text isn’t spam. We do a lot of that, that sort of thing at Bustle. You know, we might use it to aid us in saying, “Okay, this article belongs in this particular category of vertical. This article belongs in this one. And you typically, you often have a bunch of training data. So here’s data where we've had humans come along and manually label this stuff and they might have done, you know, if you're lucky, a few 1000 articles like this, but then we've got 300,000 articles that we need classified. It's just, you know, it would be incredibly tedious to get people to try and classify them all. Machine learning's great at this so let's do that. But you need to sort of train a machine learning model to do that and machine learning models are quite finicky in terms of what works on one data set might not work on another. You might have to tune certain parameters about the way that the machine learning algorithm runs. The way in this case, there’s a binary that runs the machine learning algorithm. The way that it runs is a bunch of parameters that you pass to it that affect how good it's going to be on a particular data set. So you essentially need to tune it. That process is called hyperparameter tuning. It's the idea of okay, I want to adjust the parameters to this algorithm so that it's sort of suits out our data set the best. And there are a number of ways of doing this. You can kind of try and do an exhaustive search of all of the combinations of parameters hopefully that you can find. But in practice, that ends up being a pretty bad approach. Yes, if you wait long enough, you know, all of the combinations to have been tried then you'll have a good idea of what was a good combination. But it could just take a really long time. And often with these things, it’s a little bit like programming. When you start a job running like that, that might run for hours and hours and hours, it might only be five hours in or a few days in that you realize, “Oh, hang on. I messed something up. Ah God.”
Jeremy: Everybody knows that feeling.
Michael: You know, or maybe you have this lightbulb go off and you go, “Wait. I could have included this extra data in there, and I think that would give it extra accuracy. Okay, scrap everything that I’ve done. I'm going to rerun the experiment.” So there's that sort of thing as well. So there are techniques out there, and you can use SageMaker to do this as well. It will sort of try and autotune your hyperparameters for you. But it typically takes hours, if not days, to run these sorts of jobs because, you know, they're speeding up big instances and they often using techniques that very fairly serial in nature. So they need to wait for a couple of different parameters to have been tried before they before they say, “Okay, maybe if I move in this direction, it will be a better set of parameters,” and it could just take a very long time. But there are some algorithms out there, the simplest, of course, which would be a random search. So just randomize all the parameters. Try that and see how you go and then randomize them again. Try that. Now that's a perfect case for where you could do a parallel search because you could just start up thousands of searchers, all with completely different random parameters, and then that'll, you know, maybe take roughly the same time to finish doing these sort of training jobs, and then you come back and you just pick the one that had the best accuracy. That actually works surprisingly well. And in the world of hyperparameter tuning, there's a lot of algorithms that really struggle to beat random search as a baseline. A bit like in the drug world, beating the placebo is really hard. It's similar to that with hyperparameter tuning, but this particular algorithm uses random search but does it in a few phases. It'll you know, the first phase it'll spin up thousands with different random parameters. Then the next phase, it might, you know, cut that in half, do a bit of a binary search and say, “Okay, half of the ones that were the best, let's test them again and change them slightly.” That's a very simple way of thinking of the algorithm. And I just thought it was a perfect use case for Lambda because I was like, you know, I'm sitting here completely in the dark, you know, testing on my local laptop, all these different sorts of parameters. And, you know, I feel like I'm just in the dark. I'd love to just be able to to do this 10,000 times, and then it comes back to me and say, “Hey, here's the best combination parameters.” So I got FastText, which is a native binary. I mean, this machine learning binary built compiling on Lambda, you know, using my Docker Lambda. That was pretty easy. And then it's just create a Lambda function that called out to the binary. There are some limitations. Obviously you only have 500 MB of disc space, so if you're needing to train on data that was bigger than that, you just couldn't do it at the moment — or at least you couldn't train on all of the data in a single Lambda. You’d need to come up with a clever technique to split that up. But the data I was training on, 500 MB is quite a lot for text. I think I was training on 50,000 articles or something like that. So that was fine. Wasn't going to run into any limitations there on and so I could just do a test run and each Lambda I would invoke with a different set of parameters. And then, you know, you use sort of just a coordination process to then, once the first batch had finished, use this ASHA algorithm to figure out which set of parameters do I keep and then try and manipulate. But, you know, I was getting results even within the first five or 10 seconds, because I was speeding up 3000 Lambdas in parallel, you know, and that's 3000 experiments being running parallel. You're going to get quite good results, unless the hyperparameter space you’re trying to search, you know, the number of parameters is incredibly large. 3000 covers of fair bit of that space, so already within the first launch, you're getting very good results. And then, you know, if you're successively sort of narrowing down that search space, yeah within 30 seconds, I got basically a state of the art result, because I then went back and benchmarked it on some of the data sets that they use in the papers. And, yeah, I was getting state of the art results within 30 seconds, whereas I tried the same thing on SageMaker, and it was, you know, within half an hour it still hadn't returned a result that was any way near as good as what I got on Lambda. So I think for things like this — and there are plenty of other examples that you could imagine in the AI ML worlds. Reinforcement learning is another perfect example. Like game playing — anything like this where you're trying to tune and algorithm and spin up many, many instances and run many sort of games in parallel or many environments in parallel. Anything like this. I think Lambda is a great use case for it and and there are some limitations at the moment. But, yeah, I'm hopeful that AWS will, you know, just bump them up a little bit, increase them a little bit and then Lambda will become a supercomputer.
Jeremy: I was just going to say though. That's the use case, right? I mean, that's where we're getting to where that promise of Lambda and that parallel computing being that supercomputer is there. And I'll just say I mean, I've spent a lot of time working on some NLP stuff and then using the output of NLP to do some like multi-class classification stuff with Baseyian and, like I mean, but honestly, like it's all — it works well. It works really, really well, but the stuff you're talking about is just it's insane and the fact that you're pushing the limits is pretty crazy. So alright, so again, we've been talking for a very long time. I’ve got one more thing to ask you, because you and I agree on this, and I want you to share your real world use case here because people ask this question all the time. Some people think it’s an anti-pattern. Lambda’s calling Lambdas. Reasons why you would do this? And you have some very good reasons for that, right?
Michael: It's true. So I'll start off with, I think, why most people suggest it's not a best practice to call a Lambda from another Lambda. And I think that's A) they’re specifically talking about a synchronous use case. So when you're using the request response mode of invoking a Lambda. So you're invoking the Lambda. You're waiting for it to finish, and then you're using the output of it to then do something yourself. So, you know, there are some obvious caveats with that. One is, well, if the timeout of the Lambda you're calling is greater than your timeout and it runs for a really long time, you might timeout before the other Lambda comes back. So that's one thing to think about. And then the other thing to think about, of course, is, if you are doing things at massive concurrency, then you might be hitting, getting close to your limits and if you don't have sort of per function concurrency set up or anything like that, you might be getting yourself in a situation where, you know, you're chewing through your concurrency basically because you're changing your Lambda and you're calling them like that. And actually Joe Anderson just pointed out on Twitter, another why people say this is because they perhaps they think that people have split up their functionality into lots of different functions, and they're trying to compose them in a way that would really be better just composed in a single Lambda itself.
Michael: I 100% agree with that. If you're literally just trying to like call a method from another class, don't turn that class into another function unless this is an incredibly good reason for it to be living, you know, like it's managed by another team or something right about then that I think that's a good use case. So I think that's why people say that. I think, of course, the simple comeback is: well, hang on, but Lambda is just another API. Are you saying that I shouldn't call any API from my Lambda? And people might go “Well, no, you can call other APIs. Maybe just don't call Lambda from Lambda. It's an anti-pattern.” And you go “Well, hang on. If I call other APIs, what if that API’s backed by Lambda?” You know? What's the logic? You know, you need to give people a reason, I think, for why you're saying this is a bad practice. There are certainly use cases, I think where asynchronous patterns work well and I think this is true for any microservices, regardless of if you're using Lambda or not. If you're needing to wait 10, 20 seconds for an API to get back to you and this is a request that a user’s waiting on, there's probably a better way to architect your app, you know. And that's where SNS or SQS, or, you know, using some sort of messaging system is probably a good idea and having an architecture in place that you can return early to the user and then go back and poll. But you can invoke Lambdas asynchronously, you know, like the idea of putting a message on an SNS queue and then when that SNS message gets picked up by a Lamba, it does something. I would say, well, think about what you're actually getting from that on top of just the Lambda asynchronously invoking the other Lambda because you can do that. You could, you know, just say you call this using the event invocation style, and that API call will return within milliseconds. It'll invoke the other Lambda. It'll go and do its thing, and it will return within milliseconds. Now...
Jeremy: It bypasses all that API Gateway stuff.
Michael: Right. It bypasses all the API Gateway stuff. It's incredibly low latency. There's no, you know, it's very fast assuming that you haven't run into a cold start or something like that. But yeah, it's probably as fast as calling SNS. It is, you know, in a lot of cases. So I think it's perfectly valid for that. I think I'm actually the sort of person that prefers to have much less architecture. I think queues and notification systems and things like that can be really useful. And especially, well, if you get a dead letter queue for free and it’s something that you can go revisit later, that's a good use case I think, you know, for having queues, if you need to throttle things and there are plenty of reasons, obviously, for having these things in place. I just I prefer to start without them. And I think you'd be surprised at how far along you can get without needing some sort of intermediary and it probably saves you a lot of headache. And look, maybe if it fails, you log it, and then you haven't alert on your logs, like, you know, it's not as though, there aren't patents for dealing with this.
Jeremy: Yeah, and I mean, so just my quick two cents on this is: I do this all the time where you usually don't want to wait for a synchronous request from a customer, goes through API Gateway, hits a Lambda function, and then that Lambda function has to call another Lambda function to get something and then return it back. Although, I will say when you build other services, you might have a user service, you might have an article service, whatever it is, that one of those services does need to grab some data in order to denormalize into its own service, that it's pretty fast. And if you set low timeouts, you know, so you say, “Listen, this doesn't respond within three seconds, then I'm just going to go back. I’m going to send it to an SQS queue or I'm going to log it somehow and fail back to the customer.” Or say to the customer, “Hey, we got it. Alright, we'll deal with this. We’ll deal with it later.” I think that's a perfectly good use case because you're just calling an HTTP connection when you’re calling Stripe or calling…
Michael: Right. There's nothing special about Lambda in this respect.
Michael: It's like, this is just sort of best practices if you were calling any API or if you're writing any API that if you're waiting for many, many, many, many seconds, then you might want to deal with that. And those are the sorts of use cases where I think, okay, fine, that's perhaps not a good practice. You actually, you asked me. We use this at Bustle. So we have a Lambda that renders our frontend HTML code. It's a preact app. It does service side rendering of the HTML, but it delivers to the, you know, to the browser via API Gateway and a CDN and things like that. But it calls our other Lambda directly, which is a GraphQL backend. It calls that to pull in the data that it needs to render the HTML page. Now in the browser, it also will call that GraphQL backend , but it will do it via API Gateway. Because it's coming from the browser, so it needs to make some an authenticated HTTP request into the function. But when you're in the Lambda world, well, that Lambda can just call that Lambda directly, and call the GraphQL Lambda, and that goes to Redis and Elasticsearch wherever it needs to pull the data and send it back. And we just make sure we have the timeouts tuned such that, you know, I mean it responds within milliseconds. It’s not even a thing we would really run into.
Jeremy: But if it didn't, I mean, you build in that resiliency, right? You just figure out what do I do if it does fail? You know, the happy path on these things, which is 99.9999% of the time, or whatever it is, usually is going to give you a low enough latency that these things don't matter. And then the other thing I would say too is, oftentimes, when you send a request that is asynchronous, that asynchronous Lambda function that's running, you have a little bit more flexibility. You can wait a little bit longer if you have to make some synchronous calls from — You know, once it's disconnected from the frontend, who cares if it takes a little bit of extra time? I mean, there's some tuning you might want to do there from a cost standpoint, but if you need to pull data from four different APIs in order to compile some, you know, some object that gets saved so then it's denormalized and accessible by a customer with a single call to DynamoDB? Do that. It's a good way to do it, in my opinion anyways, and I do it all the time and never had problems.
Jeremy: And this is the last thing I'll drop in here is the fact that even if it's not cost-optimized, I mean, unless you go from like zero to a billion invocations on day one, the cost is going to be low enough that you can experiment with this stuff, and eventually get to a point where you know you'll get it tuned the way you need to.
Michael: Right. I agree. And typically the sort of cost tuning that you're doing is going to be nothing compared with the time cost or the people costs or whatever, you know, it is that you would have spent in the extra development trying to do something in a particular way. Probably.
Jeremy: Like you could, for a $180,000-a-year engineer spends five weeks to figure out how you can save $50 on a Lambda function.
Michael: Yeah, nope. You end up doing these calculations in your head all the time as a CTO, and it's like, yeah, forget it. I mean, this the CI argument as well. I'm like, CI seems to be this last bastion of where we're still actually willing to accept multiple minutes of something happening. That's like — what? No. This stuff should be immediate. We shouldn’t be accepting anything less.
Jeremy: Alright, Well, listen, this has been awesome. Honestly, this should be a 500-level course at re:Invent. They don't even have 500-level courses I don't think, but seriously, this was great. And the stuff that you're doing, obviously, AWS Serverless Hero and all these cool things you're doing this Yumda thing. I think it's going to be a huge game changer. It's going to open up a whole bunch of use cases for people that, you know, right now might be compiling these things to Docker or using an EC2 instance. So how can people find out more about you so they can stay up to date with all the stuff you're working on?
Jeremy: And that's H-A-R-T.
Jeremy: Awesome. Alright, I'm going to get all that into the show notes. Thanks again, Michael.
Michael: Thanks, Jeremy. It was great, as always.